LLNL/LANL ASCI Level 3 – Statement of Work (SOW) 2001-2002


Dr. Allen D. Malony (PI)

University of Oregon


Instrumentation / Measurement

Goal: Integrate the TAU performance system with the dynamic instrumentation capabilities offered by DyninstAPI.  Enable TAU performance measurement on the Compaq Alpha Cluster.  Improve PDT program analysis system for Fortran 90 instrumentation.


1.       INSTR-1: Develop dynamic TAU performance measurement mechanisms for MPI using DyninstAPI.

Status:This task was completed and demonstrated with the SIMPLE Hydrodynamics benchmark in our PDPTA'01 paper. TAU v2.11 ships with support for DyninstAPI and MPI.

2.       INSTR-2: Port the TAU performance measurement system to Compaq Alpha Cluster and demonstrate with MPI applications.

Status:This task is completed and TAU supports Compax (cxx, f90) and KAI (KCC, KAP/Pro) compilers under Tru64. TAU also supports Compaq Linux clusters. This has been demonstrated with SAMRAI [Andy Wissinsk, LLNL] and SAGE [Jack Horner, LANL] projects.

3.       INSTR-3: Complete PDT F90 implementation.

4.       INSTR-4: Develop tool for automatic source-level F90 instrumentation and demonstrate on F90 application code.

Status: TAU's PDT based tau_instrumentor supports F90, C99 and C++. TAU's F90 instrumentation capabilities are demonstrated in the Caltech ASCI/ASAP VTF project [Julian Cummings].


·         INSTR-1 is specifically for MPI only, not in conjunction with threads.

·         INSTR-2 only provides TAU's performance measurement capabilities on the Compaq Alpha Cluster and depends on access to Compaq Tru64 platforms.  Future work may include integration with DyninstAPI, PAPI, Fortran 90 compilers, and multi-threaded runtime systems.

·         The automatic F90 instrumentation tool in INSTR-4 depends on DUCTAPE extensions to be implemented by Bernd Mohr, ZAM, Germany.  The application code will be determined by LANL.


Unified Parallel Software (UPS)

Goal: Apply TAU’s capabilities for portable, multi-language, multithreaded performance measurement and multi-level software mapping of performance data to UPS.


1.       UPS-1: Work with UPS developers to integrate TAU performance system for instrumentation, measurement, and analysis in the UPS programming environment.  In particular, this includes:

·         Generating TAU event traces for analysis and visualization using Vampir.

·         Using PCL or PAPI for hardware performance profiling.

·         Developing a wrapper instrumentation scheme for UPS and system libraries.

·         Identifying user-defined events of interest and opportunities for event mapping

2.       UPS-2: Validate UPS/TAU performance measurement system on UPS-targeted ASCI platforms using UPS validation benchmarks.


·         UPS-2 applies only to platforms on which TAU and UPS are ported during project timeframe.

Status: TAU has been integrated with UPS as seen here. This was demonstrated to Richard Barrett and Federico Bassetti at the LACSI'01 conference.

Multithreading and Hybrid Parallelism

Goal: Apply TAU in multithreaded C++ and OpenMP programming environments and develop enhancements for hybrid (“mixed-mode”) parallel execution based on MPI.


1.       APP-1: Demonstrate TAU's ability to profile and trace example application codes developed with the Overture framework.

Status: TAU is integrated with the Overture and AMRSim frameworks [Brian Miller, CASC, LLNL].

2.       APP-2: Port TAU to multithreaded OpenMP environments, targeting the KAI KAP/Pro OpenMP compiler in particular, and interact with OpenMP application developers in its use.

Status: TAU supports OpenMP environments. This has been demonstrated with the Ocean circulation modelling code from SDSC in TAU [EWOMP'01]. TAU supports KAI's KAP/Pro, SGI, IBM, Compaq, and PGI OpenMP compiler suites.

3.       APP-3: Specify OpenMP runtime system “hooks” that OpenMP compiler vendors might provide that could be used effectively by TAU for performance measurement.

Status: The POMP interface (jointly developed by U. Oregon and FZJ, Germany) addresses this. It was demonstrated with TAU and Expert tools in our LACSI '01 paper.

4.       APP-4: Enhance TAU for use in C++/MPI and OpenMP/MPI (OpenMPI) hybrid parallel execution environments and demonstrate on selected applications.

Status: This task was completed and TAU now supports hybrid mixed-mode OpenMP/MPI applications using multi-level instrumentation [Shende PhD] based on PDT (for C,C++, and F90), MPI wrapper library, and Opari for rewriting OpenMP directives for POMP interface.

5.       APP-5: Support POOMA 2.4 development team in the use of TAU for performance instrumentation, measurement, and analysis.

Status: This task was completed. On Jeffrey Oldham's [Codesourcery, LLC] recommendation, TAU's PDT based instrumentor was extended to support selective instrumentation capabilities. A -noinline option was added to suppress instrumentation of inlined procedures. TAU and PDT is available for download from the POOMA webpage.


·         APP-2 depends on the level of access to OpenMP runtime system events.

·         APP-3 is merely a specification, not a standardization effort.  Task may include monthly teleconference with OpenMP discussion group.



The work described will be performed by the following primary personnel:


·         Dr. Allen D. Malony      : Associate Professor

·         Sameer Shende             : Post-Doctorate Research Associate

·         Robert Ansell-Bell         : Research Associate