Le mésocentre

We strongly encourage the use of Intel MKL instead of Lapack (with vanilla Blas) because MKL is more efficient and optimized for the CPU architecture.

The default BLAS is not parallel. It uses only 1 core

Let's use dgesv to solve a random linear system.

randomsys.f90

program randomsys
    implicit none
    real(kind=8), dimension(:),allocatable :: x,b
    real(kind=8), dimension(:,:),allocatable :: a
    real(kind=8) :: err
    integer :: i, info, lda, ldb, nrhs, n
    integer, dimension(:), allocatable :: ipiv
 
    ! initialize random number generator seed
    ! if you remove this, the same numbers will be generated each
    ! time you run this code.
    call init_random_seed()  
 
    print *, "Input n ... "
    read *, n
 
    allocate(a(n,n))
    allocate(b(n))
    allocate(x(n))
    allocate(ipiv(n))
 
    call random_number(a)
    call random_number(x)
    b = matmul(a,x) ! compute RHS
 
    nrhs = 1 ! number of right hand sides in b
    lda = n  ! leading dimension of a
    ldb = n  ! leading dimension of b
 
    call dgesv(n, nrhs, a, lda, ipiv, b, ldb, info)
 
    ! Note: the solution is returned in b
    ! and a has been changed.
 
    ! compare computed solution to original x:
    print *, "         x          computed       rel. error"
    i=n
    ! do i=1,n
        err = abs(x(i)-b(i))/abs(x(i))
        print '(3d16.6)', x(i),b(i),err
    !  enddo
 
    deallocate(a,b,ipiv)
 
end program randomsys

Compile with LAPACK (and BLAS)

$ module load gcc
$ gfortran -O2 randomsys.f90 -o randomsys_lapack.exe -lblas -llapack

Compile with Intel MKL

$ module load intel/14.0.2
$ ifort -O2 -mkl randomsys.f90 -o randomsys_mkl.exe

Results

N	LAPACK	MKL (1 core)	MKL (16 core)
4000	14s	6s	2s
6000	41s	12s	4s
8000	1m27s	25s	5s
10000	2mn46s	45s	8s

For MKL, use OMP_NUM_THREADS env variable to select the number of threads to use

MKL vs LAPACK

Benchmarks

Compile with LAPACK (and BLAS)

Compile with Intel MKL

Results