perf - Performance analysis tools for Linux
Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.
Listing Events
$ perf list
List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] ... ...
Counting Events
'perf stat
' runs a command and collects Linux performance statistics during the execution of such command.
Example: CPU counter statistics for the specified command
$ perf stat ./ser_matmul Performance counter stats for './ser_matmul': 10617,167685 task-clock # 1,000 CPUs utilized 54 context-switches # 0,005 K/sec 27 CPU-migrations # 0,003 K/sec 6 306 page-faults # 0,594 K/sec 28 119 617 371 cycles # 2,649 GHz [83,34%] 23 887 379 283 stalled-cycles-frontend # 84,95% frontend cycles idle [83,33%] 16 806 041 279 stalled-cycles-backend # 59,77% backend cycles idle [66,65%] 7 586 969 293 instructions # 0,27 insns per cycle # 3,15 stalled cycles per insn [83,33%] 1 085 642 258 branches # 102,253 M/sec [83,34%] 1 188 913 branch-misses # 0,11% of all branches [83,34%] 10,620474819 seconds time elapsed
Various CPU level 1 data cache statistics for the specified command:
$ perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores ./ser_matmul Performance counter stats for './ser_matmul': 3 237 999 611 L1-dcache-loads 1 639 446 360 L1-dcache-misses # 50,63% of all L1-dcache hits 12 950 108 L1-dcache-stores 10,537918325 seconds time elapsed