HPCToolkit
HPCToolkit is a suite of tools for tracing, profiling and analyzing parallel programs. It can accurately measure a program's amount of work and resource consumption, as well as user-defined derived metrics such as FLOPS inefficiency and (lack of) scaling behavior. These metrics can then be correlated with source code to pinpoint hotspots.
Installed versions
- 5.4.2 with GCC and OpenMPI and PAPI
- 5.4.2 with Intel and OpenMPI and PAPI
module avail perf/hpctoolkit
to show all installed versionsAnd then load the needed version:
module load perf/hpctoolkit/gcc/5.4.2
for example
HPCToolkit usage and displaying tools
3 steps are needed to analyse and display profile and trace data.
Dynamically linked applications
As HPCToolkit is based on sampling, there is no need for manual source code instrumentation. Compilation remains mostly unchanged; more importantly, it is highly recommended to compile the target program with debugging information and optimization turned on:
$ mpicc -g -O3 cpi.c -o cpi
Recover static program structure
Next, we must recover the static program structure from the linked binary, for which there is a tool named hpcstruct
, typically launched with no extra arguments:
$ hpcstruct ./cpi
This will build a representation of the program's structure in cpi.hpcstruct
(e.g. loop nesting, inlining) to be used later when profiling/tracing, so that performance metrics may accurately be associated with the correct code construct (be it a loop or a procedure).
Execution differs in that hpcrun
should be used to launch the executable (in addition to mpirun):
$ mpirun -np 8 hpcrun <hpcrun-args> ./cpi
The argument for hpcrun
will define which measurements will be made, and how often. By default, HPCToolkit comes with a handful of events; a list may be obtained via the following command:
$ hpcrun -L ./cpi
Events are passed as arguments of the form –event E_I@P_I
, where:
E_I
: event identifier (WALLCLOCK
,MEMLEAK
, etc);P_1
: period in units meaningful to the event: microseconds forWALLCLOCK
, cycles forPAPI_TOT_CYC
, cache misses forPAPI_L2_DCA
, etc.
For example:
$ mpirun -np 16 hpcrun --event PAPI_L2_TCM@10000 --event PAPI_L2_DCA@10000 ./cpi
This will create a folder named hpctoolkit-cpi-measurements
, with entries for every rank used during runtime.
Analyze measurements
Finally, we combine the measurements with the program structure, obtaining the final profiling database, using the command hpcprof
:
$ hpcprof -S cpi.hpcstruct -I ./'*' hpctoolkit-cpi-measurements
The parameter -I
should point to the folder containing the program's source code
Display results
Tracing
To get tracing information, the flag -t
should be passed to hpcrun
.
We now point hpctraceviewer to the database folder:
$ hpctraceviewer hpctoolkit-cpi-database