Performance counters exist on many modern microprocessors. They can
count hardware performance events such as cache misses, floating point
operations, etc. while the program executes on the processor. The
Performance Data Standard and API (PAPI)
package provides a uniform interface to access these performance
counters.
To use these counters, you must first find out which PAPI events your system supports. To do so type:
%> papi_avail Available events and hardware information. ------------------------------------------------------------------------- Vendor string and code : AuthenticAMD (2) Model string and code : AMD K8 Revision C (15) CPU Revision : 2.000000 CPU Megahertz : 2592.695068 CPU's in this Node : 4 Nodes in this System : 1 Total CPU's : 4 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------------------------------------- The following correspond to fields in the PAPI_event_info_t structure. Name Code Avail Deriv Description (Note) PAPI_L1_DCM 0x80000000 Yes Yes Level 1 data cache misses PAPI_L1_ICM 0x80000001 Yes Yes Level 1 instruction cache misses ...
Next, to test the compatibility between each metric you wish papi to profile,
use papi_event_chooser:
papi/utils> papi_event_chooser PAPI_LD_INS PAPI_SR_INS PAPI_L1_DCM Test case eventChooser: Available events which can be added with given events. ------------------------------------------- Vendor string and code : GenuineIntel (1) Model string and code : Itanium 2 (2) CPU Revision : 1.000000 CPU Megahertz : 1500.000000 CPU's in this Node : 16 Nodes in this System : 1 Total CPU's : 16 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------- Event PAPI_L1_DCM can't be counted with others
Here the event chooser tells us that PAPI_LD_INS, PAPI_SR_INS, and PAPI_L1_DCM are incompatible metrics. Let try again this time removing PAPI_L1_DCM:
% papi/utils> papi_event_chooser PAPI_LD_INS PAPI_SR_INS Test case eventChooser: Available events which can be added with given events. ------------------------------------------- Vendor string and code : GenuineIntel (1) Model string and code : Itanium 2 (2) CPU Revision : 1.000000 CPU Megahertz : 1500.000000 CPU's in this Node : 16 Nodes in this System : 1 Total CPU's : 16 Number Hardware Counters : 4 Max Multiplex Counters : 32 ------------------------------------------- Usage: eventChooser NATIVE|PRESET evt1 evet2 ...
Here the event chooser verifies that PAPI_LD_INS and PAPI_SR_INS are compatible metrics.
Next, make sure that you are using a makefile with
papi
in its
name. Then set the environment variable TAU_METRICS
to a
colon delimited list of PAPI metrics
you would like to use.
setenv TAU_METRICS PAPI_FP_OPS\:PAPI_L1_DCM
In addition to PAPI counters, we support TIME (via unix gettimeofday). On Linux and CrayCNL systems, we provide the high resolution LINUXTIMERS metric and on BGL/BGP systems we provide BGLTIMERS and BGPTIMERS.