The computation model above is general enough to apply to many high-performance architectures as well as to different parallel programming paradigms. Particular instances of the model and how it is programmed defines requirements for performance tool technology. For any performance problem, a performance framework to address the problem should incorporate:
The performance framework and the models therein must be realized by tools implemented in the particular computational environment where the performance problem solving will be done. We have developed the TAU performance framework as an integrated toolkit for performance instrumentation, measurement, and analysis for parallel, multithreaded programs that attempts to target the general complex system computation model while allowing flexible customization for system-specific needs.
Figure: Architecture of TAU Performance System
The TAU performance framework  is shown in Figure 3.1. It is composed of instrumentation, measurement, and analysis and visualization phases. TAU implements a flexible instrumentation model that allows the user to insert performance instrumentation calling the TAU measurement API at several levels of program compilation and execution stages. The instrumentation identifies code segments, provides mapping abstractions, and supports multi-threaded and message passing parallel execution models. Instrumentation can be inserted manually, or automatically with a source-to-source translation tool, such as implemented by the Program Database Toolkit (PDT)  program analysis facility. When the instrumented application is compiled and executed, profiles or event traces are produced. TAU can use wrapper libraries to perform instrumentation when source code is unavailable for instrumentation. TAU uses existing wrapper capabilities when possible, such as in the case of MPI's profiling interface. Instrumentation can also be inserted at runtime, prior to execution, using the dynamic instrumentation system DyninstAPI [3,11] or at the virtual machine level, using language supplied interfaces such as the Java Virtual Machine Profiler interface [19,20].
The instrumentation model interfaces with the measurement model. TAU's measurement model is sub-divided into a high-level performance model, that determines how events are processed, and a low-level measurement model, that determines what system attributes are measured. The measurement captures data for functions, methods, basic blocks, and statement execution. Profiling and tracing are the two measurement choices that TAU allows. The measurement API lets measurement groups be defined for organizing and controlling instrumentation. The measurement library also supports the mapping of low-level execution measurements to high-level execution entities (e.g., data parallel statements) so that performance data can be properly assigned. Performance experiments can be composed from different measurement modules, including ones that can measure the wall-clock time, the CPU time, or processor specific activity using non-intrusive hardware performance monitors available on most modern processors; TAU can access both Performance Counter Library  and Performance API  portable hardware counter interfaces. Based on the composition of modules, an experiment could easily be configured to measure the profile that shows the inclusive and exclusive counts of secondary data cache misses associated with basic blocks such as routines, or a group of statements. By providing a flexible measurement infrastructure, a user can experiment with different attributes of the system and iteratively refine the performance characterization of a parallel application.
The TAU data analysis and presentation models are open. Although TAU comes with both text-based and graphical tools to visualize the performance data collected , it provides bridges to other third-party tools (e.g., Vampir ) for more sophisticated analysis and visualization. The performance data format is documented and TAU provides tools that illustrate how this data can be converted to other formats .
An important component of the performance model presented in a tool is how its integration model provides composition and integration of its different components. The modules must provide well-defined interfaces that are easy to extend. The nature and extent of cooperation between modules that may be vertically and horizontally integrated in the distinct layers defines the degree of flexibility of the measurement system. The integration support in TAU has enabled the performance system to be ported to a diverse set of machine platforms, languages, runtime systems, thread and communication libraries, and application frameworks. It has also allowed TAU to incorporate performance technology of other groups, leveraging functionality to give TAU added capabilities (e.g., using the DyninstAPI  for dynamic instrumentation) or access to performance events that can be merged with TAU's mechanisms (e.g., using PAPI  to get to hardware performance data and high-resolution timing data). The configuration of available TAU capabilities is the final integration aspect to emphasize. Applied performance investigation depends on creating experiments that capture the type and amount of performance data needed for analysis during performance problem solving. The TAU performance system offers configuration and selection throughout, and this will continue to be important in its evolution and future application.