In order to measure performance in a high performance scientific environment, a component that can interact with the system's hardware as well as time desired events is needed. For our performance measurement system, we use the TAU component, which utilizes the TAU measurement library[18,19]. The TAU component is accessed via a MeasurementPort, which defines interfaces for timing, event management, timer control and measurement query. The timing interface provides a means to create, name, start, stop and group timers. It helps track performance data associated with a code region by bracketing it with start and stop calls.
The TAU implementation of this generic performance component interface supports both profiling and tracing measurement options. Profiling records aggregate inclusive and exclusive wall-clock time, process virtual time, hardware performance metrics such as data cache misses and floating point instructions executed, as well as a combination of multiple performance metrics. The event interface helps track application and runtime system level atomic events. For each event of a given name, the minimum, maximum, mean, standard deviation and number of entries are recorded. TAU relies on an external library such as PAPI  or PCL  to access low-level processor-specific hardware performance metrics and low latency timers. Timer control is achieved through the control interface, which can enable and disable timers of a given group at runtime. At runtime, a user can enable or disable all MPI timers via their group identifier. The query interface provides a means for the program to access a collection of performance metrics. In our performance system, the query interface is used to obtain the current values for the metrics being measured. The TAU library also dumps out summary profile files at program termination.