The diversity of component implementation demands a robust parallel performance measurement system. The TAU project has developed performance technology for complex parallel and distributed systems based on a general complex systems computation model and a modular performance observation and analysis framework. It targets a general computation model consisting of shared-memory computing nodes where contexts reside, each providing a virtual address space shared by multiple threads of execution. The model is general enough to apply to many high-performance scalable parallel systems and programming paradigms. Because TAU enables performance information to be captured at the node/context/thread levels, this information can be mapped to the particular parallel software and system execution platform under consideration.
The TAU performance system  supports a flexible instrumentation model that applies at different stages of program compilation and execution. The instrumentation targets multiple code points, provides for mapping of low-level execution events to higher-level performance abstractions, and works with multi-threaded and message passing parallel computation models. Instrumentation code makes calls to TAU's measurement API. The measurement library implements performance profiling and tracing support for performance events occurring at function, method, basic block, and statement levels during execution. Performance experiments can be composed from different measurement modules (e.g., hardware performance monitors) and measurements can be collected with respect to user-defined performance groups. The TAU data analysis and presentation utilities offer text-based and graphical tools to visualize the performance data  as well as bridges to third-party software, such as Vampir  for sophisticated trace analysis and visualization.