next up previous
Next: Integration into TAU Up: Prototype Implementation Previous: Automatic Instrumentation

Integration into EXPERT

 

The EXPERT tool environment [17,18] is aimed at automatically uncovering performance problems in event traces of MPI, OpenMP, or hybrid applications running on complex, large SMP clusters. The work on EXPERT is carried out within the KOJAK project [11] and is a part of the ESPRIT working group APART [1].

EXPERT analyzes the performance behavior along three dimensions: performance problem category, dynamic call tree position, and code location. Each of the analyzed dimensions is organized in a hierarchy. Performance problems are organized from more general (``There is an MPI related problem'') to very specific ones (``Messages sent in wrong order''). The dynamic call tree is a natural hierarchy showing calling stack relationships. Finally, the location dimension represents the hierarchical hardware and software architecture of SMP clusters consisting of the levels machine, node, process, and thread.

The range of performance problems known to EXPERT are not hard-coded into the tool but are provided as a collection of performance property specifications. This makes EXPERT extensible and very flexible. A performance property specification consists of

Performance property specifications are abstractions beyond simple performance metrics, allowing EXPERT to explain performance problems in terms of the underlying programming model(s). Specifications are written in the event trace analysis language EARL [16], an extension of the Python scripting language. EARL provides efficient access to an event trace at the level of the abstractions of the parallel programming models (e.g., region stack, message queue, or collective operation) making it easy to write performance property specifications.

EXPERT's analysis process relies on event traces as performance data source. Event traces preserve the temporal and spatial relationship among individual events, and they are necessary to prove certain interesting performance properties. Event traces are recorded in the newly designed EPILOG format that, in contrast to traditional trace data formats, is suitable to represent the executions of MPI, OpenMP, or hybrid parallel applications being distributed across one or more (possibly large) clusters of SMP nodes. It supports storage of all necessary source code and call site information, hardware performance counter values, and marking of collectively executed operations for both MPI and OpenMP. The implementation of EPILOG is thread safe, a necessary feature not always present in most traditional tools.

Traces can be generated for C, C++, and Fortran applications just by linking to the EPILOG tracing library. To intercept user function calls and returns, we use the internal profiling interface of the PGI compiler suite [15] being installed on our LINUX SMP cluster testbed. For capturing OpenMP events, we implemented the pomp library functions in terms of EPILOG tracing calls, and then use OPARI to instrument the user application. For example, the omp_for_enter() and omp_for_exit() interface implementation for instrumentation of the #pragma omp parallel for directive for C/C++ would look like the following in EPILOG:

  void pomp_for_enter(OMPRegDescr* r) {
    struct ElgRegion* e;
    if (! (e = (struct ElgRegion*)(r->data[0])))
      e = ElgRegion_Init(r);
    elg_enter(e->rid);
  }

  void pomp_for_exit(OMPRegDescr* r) {
    elg_omp_collexit();
  }

What is important to notice is how the region descriptor is utilized to collect performance data per OpenMP construct. For hybrid applications using OpenMP and MPI, MPI-specific events can also be generated by a appropriate wrapper function library utilizing the MPI standard profiling interface.



next up previous
Next: Integration into TAU Up: Prototype Implementation Previous: Automatic Instrumentation



Sameer Suresh Shende
Thu Aug 23 11:13:47 PDT 2001