MKL vs LAPACK

We strongly encourage the use of Intel MKL instead of Lapack (with vanilla Blas) because MKL is more efficient and optimized for the CPU architecture.

The default BLAS is not parallel. It uses only 1 core

Let's use dgesv to solve a random linear system.

randomsys.f90
program randomsys
    implicit none
    real(kind=8), dimension(:),allocatable :: x,b
    real(kind=8), dimension(:,:),allocatable :: a
    real(kind=8) :: err
    integer :: i, info, lda, ldb, nrhs, n
    integer, dimension(:), allocatable :: ipiv
 
    ! initialize random number generator seed
    ! if you remove this, the same numbers will be generated each
    ! time you run this code.
    call init_random_seed()  
 
    print *, "Input n ... "
    read *, n
 
    allocate(a(n,n))
    allocate(b(n))
    allocate(x(n))
    allocate(ipiv(n))
 
    call random_number(a)
    call random_number(x)
    b = matmul(a,x) ! compute RHS
 
    nrhs = 1 ! number of right hand sides in b
    lda = n  ! leading dimension of a
    ldb = n  ! leading dimension of b
 
    call dgesv(n, nrhs, a, lda, ipiv, b, ldb, info)
 
    ! Note: the solution is returned in b
    ! and a has been changed.
 
    ! compare computed solution to original x:
    print *, "         x          computed       rel. error"
    i=n
    ! do i=1,n
        err = abs(x(i)-b(i))/abs(x(i))
        print '(3d16.6)', x(i),b(i),err
    !  enddo
 
    deallocate(a,b,ipiv)
 
end program randomsys

Compile with LAPACK (and BLAS)

$ module load gcc
$ gfortran -O2 randomsys.f90 -o randomsys_lapack.exe -lblas -llapack 

Compile with Intel MKL

$ module load intel/14.0.2
$ ifort -O2 -mkl randomsys.f90 -o randomsys_mkl.exe  

Results

NLAPACKMKL (1 core) MKL (16 core)
400014s6s2s
600041s12s4s
80001m27s25s5s
100002mn46s45s8s

For MKL, use OMP_NUM_THREADS env variable to select the number of threads to use