Given the diversity of performance problems, evaluation methods, and types of events and metrics, the instrumentation and measurement mechanisms needed to support performance observation must be flexible, to give maximum opportunity for configuring performance experiments, and portable, to allow consistent cross-platform performance problem solving. The TAU performance system [1,4], is composed of instrumentation, measurement, and analysis parts. It supports both profiling and tracing forms of measurements. TAU implements a flexible instrumentation model that permits a user to insert performance instrumentation hooks into the application at several levels of program compilation and execution. The C, C++, and Fortran languages are supported, as well as standard message passing (e.g., MPI) and multi-threading (e.g., Pthreads) libraries.
For instrumentation we recommend a dual instrumentation approach. Source code is instrumented automatically using a source-to-source translation tool, tau_instrumentor, that acts as a pre-processor prior to compilation. The MPI library is instrumented using TAU's wrapper interposition library that intercepts calls to the MPI calls and internally invokes the TAU timing calls before and after. TAU source instrumentor can take a selective instrumentation file that lists the name of routines or files that should be excluded or included during instrumentation. The instrumented source code is then compiled and linked with the TAU MPI wrapper interposition library to produce an executable.
TAU provides a variety of measurement options that are chosen when TAU is installed. Each configuration of TAU is represented in a set of measurement libraries and a stub makefile to be used in the user application makefile. Profiling and tracing are the two performance evaluation techniques that TAU supports. Profiling presents aggregate statistics of performance metrics for different events and tracing captures performance information in timestamped event logs for analysis. In tracing, we can observe along a global timeline when events take place in different processes. Events tracked by both profiling and tracing include entry and exit from routines, interprocess message communication events, and other user-defined atomic events. Tracing has the advantage of capturing temporal relationships between event records, but at the expense of generating large trace files. The choice to profile trades the loss of temporal information with gains in profile data efficiency.