[DOE 2000]

Parallel Program Analysis Framework
for the ACTS Toolkit

Project Status Report, January 2000

Allen D. Malony      Janice E. Cuny
Computational Science Institute
Department of Computer and Information Science
University of Oregon
[CSI-CIS-UO]

Report Outline

Project Homepage

http://www.cs.uoregon.edu/research/paraducks/proj/doe2000/index.html

Project Overview

The DOE 2000 project at the University of Oregon (UO) is creating technology to assist programmers using the DOE ACTS toolkit in the analysis of and interaction with their software. Our work is specifically targeted to four areas:

The first target area has as its goal the implementation of performance instrumentation, measurement, and analysis capabilities that can be applied throughout the ACTS programming layers. These capabilities include profiling and tracing support, instrumentation extensions for high-level libraries, and tools to analyze and visualize measured performance data at different levels of abstraction. The goal for the second target area is to build a robust code analysis system that supports source-based operations including instrumentation, browsing, and building of program interfaces. Multiple languages, including C, C++, and Fortran 90, will be targeted by the system. Building frameworks and mechanisms for analysis tools to work together provide opportunities to create more sophisticated analysis environments. The third target area is concerned with these issues. Lastly, it is becoming increasingly important that program analysis tools operate during program execution. Tool interaction forms the basis for more sophisticated runtime application functionality (e.g., computational steering or model coupling) as well as for more dynamic program monitoring. This work builds upon all the other areas.

Technical Approach

Our approach to building a robust program analysis capability for ACTS is guided by a strong need for portability between languages and across machines. The ACTS software includes components written in C, C++, and Fortran, and is intended to run on many different parallel and distributed platforms. To address these concerns, we are presently working on three projects to:
  1. design a robust performance measurement and analysis system for a general parallel program execution model based on HPC++;
  2. build a sophisticated program analysis system based on state-of-the-art parsers for C, C++, and Fortran; and
  3. create a monitoring framework for runtime performance analysis.
These projects are described below.

TAU - Tuning and Analysis Utilities

TAU is a program and performance analysis tool framework for high-performance parallel and distributed computing applications written in C, C++, Fortran 77/90, HPF, and Java languages. TAU offers a performance profiling, tracing and monitoring facility. The goal of the TAU project is to develop program and performance analysis technology that meets both the challenges of evolving scalable parallel computing systems and the needs of programming methodologies used for next-generation scientific applications. The technology should be able to target the diversity of computing paradigms and machines while offering a framework of portable and reconfigurable measurement and analysis components that can be optimized and extended. While the tools and techniques implemented may address specific needs of a language or execution environment, they should be coherent, based on a unified analysis model and able to interoperate with other framework components. The TAU system is shown in the figure below.

[TAU]


TAU is separated into three parts: instrumentation, measurement, and performance analysis. TAU supports several modes of instrumentation: source code, library, statically and dynamically linked, and runtime. The measurement library supports both profiling and tracing, and a configurable set of runtime data capture and analysis modules. Generated performance data can then be analyzed and visualized by a variety of tools.

PDT - Program Database Toolkit

PDT is a tool framework for static and dynamic analysis of object-oriented software. The toolkit consists of the IL (Intermediate Language) Analyzer, and DUCTAPE (C++ program Database Utilities and Conversion Tools APplication Environment). The Edison Design Group (EDG) C++ Front End first parses a source file, and produces an intermediate language file. The IL Analyzer processes this IL file, and creates a "program database" (PDB) file consisting of the high-level interface of the original source. Use of the DUCTAPE library then makes the contents of the PDB file accessible to applications. The PDT system is shown in the figure below.

[PDT]

TAU Monitoring Framework

To extend the usability of TAU performance analysis to runtime, we have implemented the TAU monitoring framework to support access to distributed TAU performance data during execution. Our framework model regards each application context as a performance data server. An additional server (monitor) thread is created within each context to enable any number of clients to attach and to respond to their requests for performance data. Careful attention must be paid to data access, as the server thread must synchronize with other context threads to guarantee data consistency. The client runs in a separate context from the application and can interact with multiple context monitors. An architectural diagram of the TAU monitoring framework is shown in the figure below.

[MONITOR]

Accomplishments

There were three primary accomplishments for the second year's work on the project:

Our accomplishments are discussed below. In conjunction with these three development goals, we have also worked to expand the use of the project's results.

TAU

  1. TAU Mapping API
  2. TAU applies the concept of mapping at levels within a programming hierarchy to build analysis abstractions that capture the important behavioral and semantic characteristics of the software. The mapping concept extends to languages where compile-time code manipulation can take place. TAU's support for analysis mapping is found in careful implementation of techniques consistent with the software level where they are applied.

    Example: Profiling Asynchronous execution in POOMA-2 and SMARTS.

    The TAU mapping API was integrated with the POOMA-2 application framework. POOMA is a C++ framework including data-parallel array and particle classes. The original POOMA implemented parallelism in a lock-step fashion using message passing. POOMA-2 includes thread-based evaluation and the ability to use the Scalable Multithreaded Asynchronous RunTime System (SMARTS). POOMA-2 and SMARTS present several problems to a performance analysis system. First, being a class library with data-parallel semantics, POOMA-level expressions will be mapped to parallel computations, either an SPMD code with message passing or a multithreaded asynchrounous code. The performance system has to be able to track this mapping and associated performance data with the framework-level abstraction. TAU does this through its mapping API and its support for tracking asynchronous execution. TAU is able to produce performance profiles of applications objects, such as expressions, instead of only routine profiles of object methods.

  3. Integration with DyninstAPI
  4. An important feature of the TAU system is its ability to interface with software at compile time and at runtime. In particular, TAU support different modes of instrumentation: source code, library, statically and dynamically linked, and runtime. It can use DyninstAPI for runtime code generation. This allows TAU to change the application while it is executing, without recompiling it to insert the TAU instrumentation. DyninstAPI provides a machine independent interface to TAU to permit insertion of snippets of instrumentation code.

  5. Integration with MPI Profiling Interface
  6. TAU provides an instrumented wrapper library for MPI (Message Passing Interface). It uses the MPI Profiling Inter face, which provides a general mechanism for intercepting calls to MPI routines independent of the vendor MPI implementation. This profiling wrapper library allows the users to instrument their MPI applications by relinking with this wrapper and does not require changes to their application or MPI library source code.

  7. Integration with Java Virtual Machine Profiling Interface (JVMPI)
  8. TAU can instrument Java applications without requiring any modifications to the Java application source code, the bytecode or the virtual machine. It uses profiling hooks in the Java virtual machine (JVMPI) by loading a TAU dynamic shared object in the virtual machine at runtime. It can then track events such as dynamic loading of classes, thread creation and destruction, method entry and exit and interface with the TAU API for performance measurement of Java applications.

  9. Integration with Performance Counter Library (PCL)
  10. TAU interfaces with PCL to acces s hardware performance counters that are available on most modern CPUs for performance measurement. PCL is a library that provides a uniform interface to a ccess hardware performance counters with low overhead. This allows the users to assess the performance of routines, basic blocks and statements in terms of cache misses, instructions issued, floating point operations and other counters. It currently supports access to hardware performance counters on Compaq Alpha 21164/21264 under True 64 and Cray Unicos, SGI MIPS R10000/R12000 under IRIX, Sun UltraSPARC I/II under Solaris, IBM PowerPC 604e under AIX and Pentium MMX/II/III under Linux operating systems.

  11. Support for new languages and platforms
  12. TAU supports an integrated, extensible analysis framework through modular component design, published data formats, standardized interfaces, and programs to interface to third-party tools. This has made it possible for TAU to be retargeted to new language, runtime, and system contexts and extended with new analysis functionality. The TAU profiling and tracing envirnonment is highly robust and works in the following cases:

PDT

  1. Released Version 1.1
  2. Version 1.1 of the Program Database Toolkit for C++ has been released. The distribution includes the C++ IL Analyzer, the DUCTAPE library, and the EDG C++ Front End. Various PDT processing tools (pdbmerge, pdbconv, pdbtree, and pdbhtml) are also available for use with PDT 1.1. This release of PDT was upgraded to version 2.41.2 of the Edison Design Group (EDG) C++ Front End.

  3. New features
  4. PDT 1.1 provides new and enhanced features. Some of these features include: position information for routines, classes, templates, namespaces; calls for constructors/destructors/new/delete; routine default arguments; optional template text strings; and optional reporting of unneeded entities. Implementation of some of these features proved to be challenging, since the EDG Front End was developed for code generation of compiler back ends, not static analysis by the IL Analyzer. Shell scripts for user configuration and execution of PDT were also improved.

  5. Robust Header Files
  6. The inclusion of standard C++ system header files from Kuck and Associates, Inc., KCC 3.4c compiler has significantly enhanced PDT's robustness of parsing and analysis in this release, while simplifying configuration and increasing the scope of supported platforms. Porting PDT to a number of new platforms enabled TAU's automatic instrumentation to be available on those platforms as well.

  7. Fortran 90 IL Analyzer
  8. Implementation of the Fortran 90 IL Analyzer, based on the Fortran 90 Front End developed by Mutek is progressing well. Mapping Fortran 90 language features to analogous C++ constructs was required first. The global structure of the Fortran 90 IL Analyzer is in place, and details for specific language constructs are being worked out. Mutek's elimination of the memory management scheme for their Front End that is based on the EDG Fortran 77 Front End necessitated changes in the handling of routine calls, for example. Appropriate modifications to the structure of the program database, and therefore DUCTAPE, are necessary to accomodate Fortran 90's modules, interfaces, derived types, array features, etc.

  9. PDT Applications
  10. In addition to the tools released with PDT 1.1, three applications have been developed that utilize the Program Database Toolkit. For very large and complex libraries, such as POOMA, source instrumentation for profiling and tracing can be time consuming if done manually. TAU uses PDT to access information needed for automatic instrumentation. This information includes function and method signatures and parameter type information. Similarly, we have implemented a coverage analysis tool that automatically instruments the program using information from PDT to determine possible and impossible calling paths and reachable routines. Perhaps the most extensive and sophisticated PDT application is for SILOON (Scripting Interface Languages for Object-Oriented Numerics). PDT enables SILOON to generate glue and skeletion code needed in provided scripting language access to scientific libraries. In using PDT, source code is first parsed by an EDG-based compiler front end. The appropriate IL Analyzer then walks the intermediate language tree, extracting the high-level interface and outputting item descriptions to a program database. These descriptions characterize the program's functions and classes, its types, source files, namespaces, templates and their instantiations, and macros. The figure below shows PDT's use with SILOON. In all the PDT applications, the DUCTAPE library provides the applications access to the program database.

    [SILOON]

TAU Monitoring Framework

During the last year, we have implemented a TAU runtime monitor based on the monitoring framework described above. In fact, we have implemented two versions of the TAU monitoring framework: one with HPC++ and one with Java. Our intention in the HPC++ implementation was to leverage the HPC++ library to build the middleware support required by the monitor and to create a server interface using HPC++ distribute object semantics.

The Java implementation grew out of our interest in providing a more portable and robust software development environment for clients, particularly for the creation of graphical dislays, and in building a more flexible and programmable server interface and monitor middleware system. As shown in the figure below, the Java-based TAU monitor server utilizes a Java Virtual Machine spawned from the profiled application. Communication and data transfer with the client are implemented with Java RMI.

[MONITOR-JAVA]

Cluster Performance Monitoring Tools

An effort is underway to build tools for monitoring general system-level performance metrics on Linux based clusters. The TAU Monitoring Framework discussed above is a specific type of tool of this sort, monitoring user level profiling data. Another example is the Supermon project at Los Alamos National Laboratory with which we collaborated on the performance client. To build these specific monitors in addition to a general monitor of arbitrary performance metrics, we are building two important components: a middleware toolkit for accessing and transporting data, and a set of clients for presenting and manipulating arbitrary metrics. Two middleware toolkits are under consideration for the data access and transport level of the monitor: a Java RMI based framework (discussed above) and the recently released open-source SGI Performance Co-Pilot. Our investigation is focusing on the performance and side-effects (undesirable perturbations) of each, in addition to the ease of adding features and clients in the future. Additional work has been done examining High Performance C++ (HPC++) as a middleware solution.

Future Plans - FY 2000

During the third year of the TAU project, we will focus on four main development activities:

Tool Availability

The latest TAU profiling and tracing toolkit (version 2.7) and Program Database Toolkit (version 1.1) are available as part of the LANL ACL Fall 1999 CD-ROM distributed at SC'99. This edition of the CD-ROM can be downloaded by users from:

http://www.acl.lanl.gov/software

TAU can be independently downloaded from its homepage at:

http://www.acl.lanl.gov/tau

PDT can be independently downloaded from its homepage at:

http://www.acl.lanl.gov/pdtoolkit/

Both of these URL's will continue to be updated with future versions of the software.

References

TAU PDT TAU Monitoring Framework