Matlab Parallel Computing Toolbox: PCT
Matlab Parallel Computing Toolbox (PCT) is now available at the Mesocentre as a part of Matlab r2011b.
We currently support only local parallel mode, i.e running on a single hardware node. The recommended best practice is to run on the cluster interactively or using the Matlab compiler.
Features
- Parallel for-Loops (parfor): Run loop iterations in parallel on a MATLAB® pool using parfor language construct
- Distributed Arrays and SPMD: Partition arrays across multiple MATLAB® workers for data-parallel computing and simultaneous execution
- Batch Processing: Offload execution of a function or script to run in a cluster or desktop background
- GPU Computing: Transfer data between MATLAB® and a graphics processing unit (GPU); run code on a GPU
Usage examples
Parallel for loop example
parfor is a parallel for loop. Loop index is automatically distributed to workers in chunks that operate in matlabpool.
Let's start by a simple program that use parfor loop to compute the values of a vector array.
- myParforTest.m
function myParforTest % we first open 4 workers matlabpool open 4 tic parfor i = 1:1024 A(i) = sin(i*2*pi/1024); end toc % we close the worker pool matlabpool close toc return
matlabpool open 4
and matlabpool close
in parallel.
Note that matlabpool open 4
is used to open and create 4 workers. The maximum workers can be created is 12 cores (8 on older versions). matlabpool close shutdown them.
SPMD example
spmd execute code in parallel on MATLAB pool. Inside the body of the spmd statement, each MATLAB worker has a unique value of labindex
, while numlabs
denotes the total number of workers executing the block in parallel.
Example : compute the arithmetic series
- spmdExample.m
function spmdExample matlabpool open 4 % This program compute the arithmetic series s = 1 + 2 + ... + n n = 10; % computes para11e1 range beginning and end indices % [n1,nb,ne]=prange(1, n, matlabpool('size')); spmd s = 0; % initia1ize 1oca1 sum for i=nb(labindex):ne(labindex) s = s + i; % local sum on each worker end disp(['Local sum on worker ' num2str(labindex) ' is ' num2str(s)]) s = gplus(s); % compute global sum with gplus end % spmd disp(['Global sum is ' num2str(s{1})]) matlabpool close
Executing result:
>> spmdExample Lab 1: Local sum on worker 1 is 6 Lab 2: Local sum on worker 2 is 15 Lab 3: Loca1 sum on worker 3 is 15 Lab 4: Local sum on worker 4 is 19 G1obal sum is 55
Worker | nl | nb | ne |
---|---|---|---|
1 | 3 | 1 | 3 |
2 | 3 | 4 | 6 |
3 | 2 | 7 | 8 |
4 | 2 | 9 | 10 |
Using GPU example
GPU Background
Originally used to accelerate graphics rendering, GPUs are now increasingly applied to scientific calculations. Unlike a traditional CPU, which includes no more than a handful of cores, a GPU has a massively parallel array of integer and floating-point processors, as well as dedicated, high-speed memory. A typical GPU comprises hundreds of these smaller processors. These processors can be used to greatly speed-up particular types of applications.
A good rule of thumb is that your problem may be a good fit for the GPU if it is:
- Massively parallel: The computations can be broken down into hundreds or thousands of independent units of work. You will see the best performance when all of the cores are kept busy, exploiting the inherent parallel nature of the GPU. Seemingly simple, vectorized MATLAB calculations on arrays with hundreds of thousands of elements often can fit into this category.
- Computationally intensive: The time spent on computation significantly exceeds the time spent on transferring data to and from GPU memory. Because a GPU is attached to the host CPU via the PCI Express bus, the memory access is slower than with a traditional CPU. This means that your overall computational speedup is limited by the amount of data transfer that occurs in your algorithm.
With that background, we can now start working with the GPU in MATLAB.
Let's create a GPUArray and perform a fft using the GPU. However, let's first do this on the CPU so that we can see the difference in code and performance.
A1 = rand(3000,3000); tic; B1 = fft(A1); toc;
To perform the same operation on the GPU, we first use gpuArray to transfer data from the MATLAB workspace to device memory. Then we can run fft, which is one of the overloaded functions on that data:
A2 = gpuArray(A1); tic; B2 = fft(A2); toc;
To bring the data back to the CPU, we run gather.
B2 = gather(B2);
Let's test our program in tesla device.
We first need to connect to the tesla machine :
$ qlogin -q tesla -l h_vmem=2G
and then load needed modules: cuda, matlab.
$tesla% module load cuda matlab
$ matlab -nodesktop -nodisplay >> fft_bench Elapsed time is 0.074823 seconds. % CPU Elapsed time is 0.022858 seconds. % GPU without data transfert
SpeedUp = time1/time2 = 3,27
How to execute programs in parallel?
We test parfor program myParforTest
on mesoshared.
$ module load matlab/r2011b $ matlab -nodesktop -nodisplay
< M A T L A B (R) > Copyright 1984-2011 The MathWorks, Inc. R2011b (7.13.0.564) 64-bit (glnxa64) August 13, 2011 To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> myParforTest Starting matlabpool using the 'local' configuration ... connected to 4 labs. Sending a stop signal to all the labs ... stopped. Elapsed time is 9.964229 seconds. >>
Using Matlab PCT on the Lumière cluster
Let's take again the parfor loop example : myParforTest.m
We compile the function using the Matlab compiler (mcc
).
$ module load matlab
$ mcc -m myParforTest.m
The compilation may take few minutes.
The result is an executable program independent from Matlab Licences file, so we can run it multiple time.
This program use 4 cores for workers + 1 core for the master program. To use SGE we just need to define parallel environment like we do with OpenMP programs.
Here is the SGE example script :
#!/bin/bash -l #$ -V #$ -N test_matlab_PTC #$ -cwd #$ -o $JOB_NAME.$JOB_ID.out #$ -pe openmp 5 module load matlab/r2011b # We need to load matlab to define the $MATLAB_HOME variable ./run_myParforTest.sh $MATLAB_HOME
Using GPU Matlab with SGE
We use Matlab Compiler to generate an executable program wich will be executed in tesla GPU.
Here is an SGE example:
#!/bin/bash -l #$ -V #$ -N test_matlab_PTC #$ -cwd #$ -o $JOB_NAME.$JOB_ID.out #$ -q tesla.q ## request tesla #$ -l h_vmem=10G module load gpu/cuda/8.0 module load matlab/r2015a ## first we compile the program mcc -m myGPUTest.m ## then we run the program ./run_myGPUTest.sh $MATLAB_HOME