HPCToolkit

HPCToolkit is a suite of tools for tracing, profiling and analyzing parallel programs. It can accurately measure a program's amount of work and resource consumption, as well as user-defined derived metrics such as FLOPS inefficiency and (lack of) scaling behavior. These metrics can then be correlated with source code to pinpoint hotspots.

  • 5.4.2 with GCC and OpenMPI and PAPI
  • 5.4.2 with Intel and OpenMPI and PAPI

Use module avail perf/hpctoolkit to show all installed versions
And then load the needed version: module load perf/hpctoolkit/gcc/5.4.2 for example

3 steps are needed to analyse and display profile and trace data.

Dynamically linked applications

As HPCToolkit is based on sampling, there is no need for manual source code instrumentation. Compilation remains mostly unchanged; more importantly, it is highly recommended to compile the target program with debugging information and optimization turned on:

$ mpicc -g -O3 cpi.c -o cpi 

Recover static program structure

Next, we must recover the static program structure from the linked binary, for which there is a tool named hpcstruct, typically launched with no extra arguments:

$ hpcstruct ./cpi 

This will build a representation of the program's structure in cpi.hpcstruct (e.g. loop nesting, inlining) to be used later when profiling/tracing, so that performance metrics may accurately be associated with the correct code construct (be it a loop or a procedure).

Execution differs in that hpcrun should be used to launch the executable (in addition to mpirun):

$ mpirun -np 8 hpcrun <hpcrun-args> ./cpi

The argument for hpcrun will define which measurements will be made, and how often. By default, HPCToolkit comes with a handful of events; a list may be obtained via the following command:

$ hpcrun -L ./cpi

Events are passed as arguments of the form –event E_I@P_I, where:

  • E_I: event identifier (WALLCLOCK, MEMLEAK, etc);
  • P_1: period in units meaningful to the event: microseconds for WALLCLOCK, cycles for PAPI_TOT_CYC, cache misses for PAPI_L2_DCA, etc.

For example:

$ mpirun -np 16 hpcrun --event PAPI_L2_TCM@10000 --event PAPI_L2_DCA@10000  ./cpi

This will create a folder named hpctoolkit-cpi-measurements, with entries for every rank used during runtime.

Analyze measurements

Finally, we combine the measurements with the program structure, obtaining the final profiling database, using the command hpcprof:

$ hpcprof -S cpi.hpcstruct -I ./'*' hpctoolkit-cpi-measurements

The parameter -I should point to the folder containing the program's source code

Display results

Display results using hpcviewer:

$ hpcviewer hpctoolkit-cpi-database

Tracing

To get tracing information, the flag -t should be passed to hpcrun.

We now point hpctraceviewer to the database folder:

$ hpctraceviewer hpctoolkit-cpi-database

Links