The TAU performance system  provides robust technology for performance instrumentation, measurement, and analysis for complex parallel systems . It targets a general computation model initially proposed by the HPC++ consortium . This model consists of shared-memory nodes where contexts reside, each providing a virtual address space shared by multiple threads of execution. The model is general enough to apply to many high-performance scalable parallel systems and programming paradigms. Because TAU enables performance information to be captured at the node/ context/thread levels, this information can be flexibly map-ped to the particular parallel software and system execution platform under consideration.
TAU supports a flexible instrumentation model that allows access to a measurement API at several stages of program compilation and execution. The instrumentation identifies code segments, provides for mapping of low-level execution events to high-level computation entities, and works with multi-threaded and message passing parallel execution models. It interfaces with the TAU measurement model that can capture data for function, method, basic block, and statement execution. Profiling and tracing form the two measurement choices that TAU provides. Performance experiments can be composed from different measurement modules, including ones that access hardware performance monitors. The TAU data analysis and presentation utilities are open; they offer text-based and graphical tools to visualize the performance data as well as bridges to third-party software, such as Vampir [9,12] for sophisticated trace analysis and visualization.