Performance counters exist on many modern microprocessors. They can
count hardware performance events such as cache misses, floating point
operations, etc. while the program executes on the processor. The
Performance Data Standard and API (PAPI)
package provide a uniform interface to access these performance
counters.
To use these counters, First find out which PAPI events your system supports, type:
%> papi_avail Available events and hardware information. ------------------------------------------------------------------------- Vendor string and code : AuthenticAMD (2) Model string and code : AMD K8 Revision C (15) CPU Revision : 2.000000 CPU Megahertz : 2592.695068 CPU's in this Node : 4 Nodes in this System : 1 Total CPU's : 4 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------------------------------------- The following correspond to fields in the PAPI_event_info_t structure. Name Code Avail Deriv Description (Note) PAPI_L1_DCM 0x80000000 Yes Yes Level 1 data cache misses PAPI_L1_ICM 0x80000001 Yes Yes Level 1 instruction cache misses ...
Next test the compatibility between each metric you wish papi to profile,
use papi_event_chooser:
papi/utils> papi_event_chooser PAPI_LD_INS PAPI_SR_INS PAPI_L1_DCM Test case eventChooser: Available events which can be added with given events. ------------------------------------------- Vendor string and code : GenuineIntel (1) Model string and code : Itanium 2 (2) CPU Revision : 1.000000 CPU Megahertz : 1500.000000 CPU's in this Node : 16 Nodes in this System : 1 Total CPU's : 16 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------- Event PAPI_L1_DCM can't be counted with others
Here the event chooser tells us that there is an incompatible in the choice of these three metrics: PAPI_LD_INS, PAPI_SR_INS, and PAPI_L1_DCM. Let try again this time removing PAPI_L1_DCM:
% papi/utils> papi_event_chooser PAPI_LD_INS PAPI_SR_INS Test case eventChooser: Available events which can be added with given events. ------------------------------------------- Vendor string and code : GenuineIntel (1) Model string and code : Itanium 2 (2) CPU Revision : 1.000000 CPU Megahertz : 1500.000000 CPU's in this Node : 16 Nodes in this System : 1 Total CPU's : 16 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------- Usage: eventChooser NATIVE|PRESET evt1 evet2 ...
event chooser verifies that PAPI_LD_INS and PAPI_SR_INS compatible metrics.
Next, make sure that you a using a makefile with
papi
in its
name. Then set the environment variable TAU_METRICS
to a
colon delimited list of PAPI metrics
you would like to use.
setenv TAU_METRICS PAPI_FP_OPS\:PAPI_L1_DCM
In addition to PAPI counters we support TIME (via unix gettimeofday), On Linux and CrayCNL systems, we provide the high resolution LINUXTIMERS metric, on BGL/BGP systems we provide BGLTIMERS and BGPTIMERS.