perf - Performance analysis tools for Linux

Performance counters for Linux are a new kernel-based subsystem that provide a framework for all things performance analysis. It covers hardware level (CPU/PMU, Performance Monitoring Unit) features and software features (software counters, tracepoints) as well.

$ perf list
List of pre-defined events (to be used in -e):
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  ...
  ... 

'perf stat' runs a command and collects Linux performance statistics during the execution of such command.

Example: CPU counter statistics for the specified command

$ perf stat ./ser_matmul
 
Performance counter stats for './ser_matmul':
 
      10617,167685 task-clock                #    1,000 CPUs utilized          
                54 context-switches          #    0,005 K/sec                  
                27 CPU-migrations            #    0,003 K/sec                  
             6 306 page-faults               #    0,594 K/sec                  
    28 119 617 371 cycles                    #    2,649 GHz                     [83,34%]
    23 887 379 283 stalled-cycles-frontend   #   84,95% frontend cycles idle    [83,33%]
    16 806 041 279 stalled-cycles-backend    #   59,77% backend  cycles idle    [66,65%]
     7 586 969 293 instructions              #    0,27  insns per cycle        
                                             #    3,15  stalled cycles per insn [83,33%]
     1 085 642 258 branches                  #  102,253 M/sec                   [83,34%]
         1 188 913 branch-misses             #    0,11% of all branches         [83,34%]
 
      10,620474819 seconds time elapsed

Various CPU level 1 data cache statistics for the specified command:

$ perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores  ./ser_matmul
 
Performance counter stats for './ser_matmul':
 
     3 237 999 611 L1-dcache-loads
 
     1 639 446 360 L1-dcache-misses
         #   50,63% of all L1-dcache hits  
        12 950 108 L1-dcache-stores
 
 
      10,537918325 seconds time elapsed

Links