MKL vs LAPACK
We strongly encourage the use of Intel MKL instead of Lapack (with vanilla Blas) because MKL is more efficient and optimized for the CPU architecture.
The default BLAS is not parallel. It uses only 1 core
Benchmarks
Let's use dgesv
to solve a random linear system.
- randomsys.f90
program randomsys implicit none real(kind=8), dimension(:),allocatable :: x,b real(kind=8), dimension(:,:),allocatable :: a real(kind=8) :: err integer :: i, info, lda, ldb, nrhs, n integer, dimension(:), allocatable :: ipiv ! initialize random number generator seed ! if you remove this, the same numbers will be generated each ! time you run this code. call init_random_seed() print *, "Input n ... " read *, n allocate(a(n,n)) allocate(b(n)) allocate(x(n)) allocate(ipiv(n)) call random_number(a) call random_number(x) b = matmul(a,x) ! compute RHS nrhs = 1 ! number of right hand sides in b lda = n ! leading dimension of a ldb = n ! leading dimension of b call dgesv(n, nrhs, a, lda, ipiv, b, ldb, info) ! Note: the solution is returned in b ! and a has been changed. ! compare computed solution to original x: print *, " x computed rel. error" i=n ! do i=1,n err = abs(x(i)-b(i))/abs(x(i)) print '(3d16.6)', x(i),b(i),err ! enddo deallocate(a,b,ipiv) end program randomsys
Compile with LAPACK (and BLAS)
$ module load gcc $ gfortran -O2 randomsys.f90 -o randomsys_lapack.exe -lblas -llapack
Compile with Intel MKL
$ module load intel/14.0.2 $ ifort -O2 -mkl randomsys.f90 -o randomsys_mkl.exe
Results
N | LAPACK | MKL (1 core) | MKL (16 core) |
4000 | 14s | 6s | 2s |
6000 | 41s | 12s | 4s |
8000 | 1m27s | 25s | 5s |
10000 | 2mn46s | 45s | 8s |
For MKL, use
OMP_NUM_THREADS
env variable to select the number of threads to use