 
    
    
         
The EXPERT tool environment [17,18] is aimed at automatically uncovering performance problems in event traces of MPI, OpenMP, or hybrid applications running on complex, large SMP clusters. The work on EXPERT is carried out as a part of the KOJAK project [11] and is embedded in the ESPRIT working group APART [1].
EXPERT analyzes the performance behavior along three dimensions: performance problem category, dynamic call tree position, and location. Each of the analyzed dimensions is organized in a hierarchy. Performance problems are organized from more general (``There is an MPI related problem'') to very specific ones (``Messages sent in wrong order''). The dynamic call tree is a natural hierarchy showing calling stack relationships. Finally, the location dimension represents the hierarchical hardware and software architecture of SMP clusters consisting of the levels machine, node, process, and thread.
The range of performance problems known to EXPERT are not hard-coded into the tool but are provided as a collection of performance property specifications. This makes EXPERT extensible and very flexible. A performance property specification consists of

Performance property specifications are on a very high level of abstraction that goes beyond simple performance metrics and allows EXPERT to explain performance problems in terms of the underlying programming model(s). Specifications are written in the event trace analysis language EARL [16], an extension of the Python scripting language. EARL provides efficient access to an event trace at the level of the abstractions of the parallel programming models (e.g., region stack, message queue, or collective operation) making it easy to write performance property specifications.
EXPERT's analysis process relies on event traces as performance data source, because event traces preserve the temporal and spatial relationship among individual events, which are necessary to prove many interesting performance properties. Event traces are recorded in the newly designed EPILOG format that, in contrast to traditional trace data formats, is suitable to represent the executions of MPI, OpenMP, or hybrid parallel applications being distributed across one or more (possibly large) clusters of SMP nodes. It supports storage of all necessary source code and call site information, hardware performance counter values, and marking of collectively executed operations for both MPI and OpenMP. The implementation of EPILOG is thread safe, a necessary feature not always present in most traditional tools.
Traces can be generated for C, C++, and Fortran applications just by linking to the EPILOG tracing library. To intercept user function calls and returns, we use the internal profiling interface of the PGI compiler suite [15] being installed on our LINUX SMP cluster testbed. For capturing OpenMP events, we implemented the pomp library functions in terms of EPILOG tracing calls, and then use OPARI to instrument the user application. For example, the omp_for_enter() and omp_for_exit() interface implementation for instrumentation of the #pragma omp parallel for directive for C/C++ would look like the following in EPILOG:
  void pomp_for_enter(OMPRegDescr* r) {
    struct ElgRegion* e;
    if (! (e = (struct ElgRegion*)(r->data[0])))
      e = ElgRegion_Init(r);
    elg_enter(e->rid);
  }
  void pomp_for_exit(OMPRegDescr* r) {
    elg_omp_collexit();
  }
What is important to notice is how the region descriptor is utilized to
collect performance data per OpenMP construct.  For hybrid applications
using OpenMP and MPI, MPI-specific events can also be generated by a
appropriate wrapper function library utilizing the MPI standard profiling
interface.
 
 
    
   