The TAU performance system [13] provides robust technology for performance instrumentation, measurement, and analysis for complex parallel systems. It targets a general computation model consisting of shared-memory nodes where contexts reside, each providing a virtual address space shared by multiple threads of execution. The model is general enough to apply to many high-performance scalable parallel systems and programming paradigms. Because TAU enables performance information to be captured at the node/context/thread levels, this information can be mapped to the particular parallel software and system execution platform under consideration.
TAU supports a flexible instrumentation model that allows access to a measurement API at several stages of program compilation and execution. The instrumentation identifies code segments, provides for mapping of low-level execution events to high-level computation entities, and works with multi-threaded and message passing parallel execution models. It interfaces with the TAU measurement model that can capture data for function, method, basic block, and statement execution. Profiling and tracing form the two measurement choices that TAU provides. Performance experiments can be composed from different measurement modules, including ones that access hardware performance monitors. The TAU data analysis and presentation utilities offer text-based and graphical tools to visualize the performance data as well as bridges to third-party software, such as Vampir [14] for sophisticated trace analysis and visualization.
As with EXPERT, TAU implements the OpenMP performance API in a library that captures the OpenMP events and uses TAU's performance measurement facility to record performance data. For example, the pomp implementation of the same functions as in Section 4.1 would look like the following in TAU:
TAU_GLOBAL_TIMER(tfor,``for enter/exit'', ``[OpenMP]'',OpenMP); void pomp_for_enter(OMPRegDescr* r) { #ifdef TAU_AGGREGATE_OPENMP_TIMINGS TAU_GLOBAL_TIMER_START(tfor); #endif #ifdef TAU_OPENMP_REGION_VIEW TauStartOpenMPRegionTimer(r); #endif } void pomp_for_exit(OMPRegDescr* r) { #ifdef TAU_AGGREGATE_OPENMP_TIMINGS TAU_GLOBAL_TIMER_STOP(); #endif #ifdef TAU_OPENMP_REGION_VIEW TauStopOpenMPRegionTimer(r); #endif }
TAU supports construct-based as well as region-based performance measurement. Construct-based measurement uses globally accessible timers to aggregate construct-specific performance cost over all regions. In the case of region-based measurement, like EXPERT, the region descriptor is used to select the specific performance data for that context. Following this instrumentation approach, all of TAU's functionality is accessible to the user, including the ability to select profiling or tracing, enable hardware performance monitoring, and add MPI instrumentation for performance measurement of hybrid applications.