Report Outline
Project Homepage
http://www.cs.uoregon.edu/research/paraducks/proj/doe2000/index.html
Project Overview
The DOE 2000 project at the University of Oregon (UO) is creating
technology to assist programmers using the DOE
ACTS
toolkit in the analysis of and interaction with their software.
Our work is specifically targeted to four areas:
- performance analysis for scalable parallel and distributed systems
- program code analysis for multiple languages
- analysis tool integration and interoperation
- runtime tool interaction
The first target area has as its goal the implementation of performance
instrumentation, measurement, and analysis capabilities that can be applied
throughout the ACTS programming layers.
These capabilities include profiling and tracing support, instrumentation
extensions for high-level libraries, and tools to analyze and visualize
measured performance data at different levels of abstraction.
The goal for the second target area is to build a robust code analysis
system that supports source-based operations including instrumentation,
browsing, and building of program interfaces.
Multiple languages, including C, C++, and Fortran 90, will be targeted by
the system.
Building frameworks and mechanisms for analysis tools to work together
provide opportunities to create more sophisticated analysis environments.
The third target area is concerned with these issues.
Lastly, it is becoming increasingly important that program analysis tools
operate during program execution.
Tool interaction forms the basis for more sophisticated runtime application
functionality (e.g., computational steering or model coupling) as well as
for more dynamic program monitoring.
This work builds upon all the other areas.
Technical Approach
Our approach to building a robust program analysis capability for ACTS
is guided by a strong need for portability between languages and across
machines.
The ACTS software includes components written in C, C++, and Fortran,
and is intended to run on many different parallel and distributed platforms.
To address these concerns, we are presently working on three projects to:
- design a robust performance measurement and analysis system
for a general parallel program execution model based on HPC++;
- build a sophisticated program analysis system based on
state-of-the-art parsers for C, C++, and Fortran; and
- create a monitoring framework for runtime performance analysis.
These projects are described below.
TAU - Tuning and Analysis Utilities
TAU
is a program and performance analysis tool framework for
high-performance parallel and distributed computing applications
written in C, C++, Fortran 77/90, HPF, and Java languages.
TAU offers a performance profiling, tracing and monitoring facility.
The goal of the TAU project is to develop program and performance analysis
technology that meets both the challenges of evolving scalable parallel
computing systems and the needs of programming methodologies used for
next-generation scientific applications.
The technology should be able to target the diversity of computing
paradigms and machines while offering a framework of portable and
reconfigurable measurement and analysis components that can be optimized
and extended.
While the tools and techniques implemented may address specific needs of a
language or execution environment, they should be coherent, based on a
unified analysis model and able to interoperate with other framework
components.
The TAU system is shown in the figure below.
TAU is separated into three parts: instrumentation, measurement, and
performance analysis.
TAU supports several modes of instrumentation: source code, library,
statically and dynamically linked, and runtime.
The measurement library supports both profiling and tracing, and a
configurable set of runtime data capture and analysis modules.
Generated performance data can then be analyzed and visualized by a variety
of tools.
PDT - Program Database Toolkit
PDT is a tool
framework for static and dynamic analysis of object-oriented software.
The toolkit consists of the IL (Intermediate Language) Analyzer, and DUCTAPE
(C++ program Database Utilities and Conversion Tools APplication
Environment).
The Edison Design Group (EDG) C++ Front End first parses a source file, and
produces an intermediate language file.
The IL Analyzer processes this IL file, and creates a "program database"
(PDB) file consisting of the high-level interface of the original source.
Use of the DUCTAPE library then makes the contents of the PDB file
accessible to applications.
The PDT system is shown in the figure below.
TAU Monitoring Framework
To extend the usability of TAU performance analysis to runtime, we have
implemented the TAU monitoring framework to support access to distributed
TAU performance data during execution.
Our framework model regards each application context as a performance data
server.
An additional server (monitor) thread is created within each context to
enable any number of clients to attach and to respond to their requests for
performance data.
Careful attention must be paid to data access, as the server thread must
synchronize with other context threads to guarantee data consistency.
The client runs in a separate context from the application and can interact
with multiple context monitors.
An architectural diagram of the TAU monitoring framework is shown in the
figure below.
Accomplishments
There were three primary accomplishments
for the second year's work on the project:
-
Enhanced the robustness and flexibility of the TAU instrumentation
and measurement framework, define the TAU mapping API, and extend
TAU's analysis and visualization capabilities
with commercially available tools.
-
Improve the PDT system by increasing the information in the program
database, adding a Fortran 90 parser, and creating new tools for
code wrapping and scripting that use the DUCTAPE interface.
-
Develop a distributed monitoring system that can access TAU
performance data at runtime during execution.
Our accomplishments are discussed below.
In conjunction with these three development goals, we have also worked to
expand the use of the project's results.
TAU
- TAU Mapping API
TAU applies the concept of mapping at levels within a programming
hierarchy to build analysis abstractions that capture the important
behavioral and semantic characteristics of the software.
The mapping concept extends to languages where compile-time code
manipulation can take place.
TAU's support for analysis mapping is found in careful implementation of
techniques consistent with the software level where they are applied.
Example: Profiling Asynchronous execution in POOMA-2 and SMARTS.
The TAU mapping API was integrated with the
POOMA-2
application framework.
POOMA is a C++ framework including data-parallel array
and particle classes.
The original POOMA implemented parallelism in a
lock-step fashion using message passing.
POOMA-2 includes thread-based evaluation and the ability to use the
Scalable Multithreaded Asynchronous RunTime System
(SMARTS).
POOMA-2 and SMARTS present several problems to a performance analysis
system.
First, being a class library with data-parallel semantics, POOMA-level
expressions will be mapped to parallel computations, either an SPMD
code with message passing or a multithreaded asynchrounous code.
The performance system has to be able to track this mapping and associated
performance data with the framework-level abstraction.
TAU does this through its mapping API and its support for tracking
asynchronous execution.
TAU is able to produce performance profiles of applications objects, such
as expressions, instead of only routine profiles of object methods.
- Integration with DyninstAPI
An important feature of the TAU system is its ability to interface with
software at compile time and at runtime.
In particular, TAU support different modes of instrumentation: source code,
library, statically and dynamically linked, and runtime.
It can use
DyninstAPI
for runtime code generation.
This allows TAU to change the application while it is executing, without
recompiling it to insert the TAU instrumentation.
DyninstAPI provides a machine independent interface to TAU to permit
insertion of snippets of instrumentation code.
- Integration with MPI Profiling Interface
TAU provides an instrumented wrapper library for
MPI
(Message Passing Interface).
It uses the MPI Profiling Inter face, which provides a general mechanism
for intercepting calls to MPI routines independent of the vendor MPI
implementation.
This profiling wrapper library allows the users to instrument their MPI
applications by relinking with this wrapper and does not require changes to
their application or MPI library source code.
- Integration with Java Virtual Machine Profiling Interface (JVMPI)
TAU can instrument Java applications without requiring any modifications to
the Java application source code, the bytecode or the virtual machine.
It uses profiling hooks in the Java virtual machine
(JVMPI)
by loading a TAU dynamic shared object in the virtual machine at runtime.
It can then track events such as dynamic loading of classes, thread
creation and destruction, method entry and exit and interface with the TAU
API for performance measurement of Java applications.
- Integration with Performance Counter Library (PCL)
TAU interfaces with
PCL
to acces s hardware performance counters that are available on most modern
CPUs for performance measurement.
PCL is a library that provides a uniform interface to a ccess hardware
performance counters with low overhead.
This allows the users to assess the performance of routines, basic blocks
and statements in terms of cache misses, instructions issued, floating
point operations and other counters.
It currently supports access to hardware performance counters on Compaq
Alpha 21164/21264 under True 64 and Cray Unicos, SGI MIPS R10000/R12000
under IRIX, Sun UltraSPARC I/II under Solaris, IBM PowerPC 604e under AIX
and Pentium MMX/II/III under Linux operating systems.
- Support for new languages and platforms
TAU supports an integrated, extensible analysis framework through
modular component design, published data formats, standardized
interfaces, and programs to interface to third-party tools.
This has made it possible for TAU to be retargeted to new language,
runtime, and system contexts and extended with new analysis functionality.
The TAU profiling and tracing envirnonment is highly robust and works in
the following cases:
-
platforms: SGI Power Challenge and Origin 2000+, IBM SP, Intel Teraflop,
Cray T3E, HP 9000, Sun, Windows 95/98/NT, Compaq Alpha Linux cluster, Intel
Linux cluster
-
languages: C, C++, Fortran 77/90, HPF, HPC++, Java
-
thread packages: pthreads, Tulip threads, SMARTS threads, Java threads,
Windows threads
-
communications libraries: MPI, Nexus, Tulip, ACLMPL
-
compilers: KAI, PGI, GNU, Fujitsu, Sun, Microsoft, SGI and Cray
PDT
- Released Version 1.1
Version 1.1 of the Program Database Toolkit for C++ has been released.
The distribution includes the C++ IL Analyzer, the DUCTAPE library,
and the EDG C++ Front End.
Various PDT processing tools (pdbmerge, pdbconv, pdbtree, and pdbhtml) are
also available for use with PDT 1.1.
This release of PDT was upgraded to version 2.41.2 of the
Edison Design Group (EDG)
C++ Front End.
- New features
PDT 1.1 provides new and enhanced features.
Some of these features include: position information for routines, classes,
templates, namespaces; calls for constructors/destructors/new/delete;
routine default arguments; optional template text strings; and optional
reporting of unneeded entities.
Implementation of some of these features proved to be challenging, since
the EDG Front End was developed for code generation of compiler back ends,
not static analysis by the IL Analyzer.
Shell scripts for user configuration and execution of PDT were also
improved.
- Robust Header Files
The inclusion of standard C++ system header files from
Kuck and Associates, Inc.,
KCC 3.4c compiler has significantly enhanced PDT's robustness of parsing
and analysis in this release, while simplifying configuration and
increasing the scope of supported platforms.
Porting PDT to a number of new platforms enabled TAU's automatic
instrumentation to be available on those platforms as well.
- Fortran 90 IL Analyzer
Implementation of the Fortran 90 IL Analyzer, based on the Fortran 90
Front End developed by Mutek is
progressing well. Mapping Fortran 90 language features to analogous
C++ constructs was required first. The global structure of the
Fortran 90 IL Analyzer is in place, and details for specific language
constructs are being worked out. Mutek's elimination of the memory
management scheme for their Front End that is based on the EDG Fortran
77 Front End necessitated changes in the handling of routine calls,
for example. Appropriate modifications to the structure of the
program database, and therefore DUCTAPE, are necessary to accomodate
Fortran 90's modules, interfaces, derived types, array features, etc.
- PDT Applications
In addition to the tools released with PDT 1.1, three applications
have been developed that utilize the Program Database Toolkit.
For very large and complex libraries, such as POOMA, source instrumentation
for profiling and tracing can be time consuming if done manually.
TAU uses PDT to access information needed for automatic instrumentation.
This information includes function and method signatures and parameter type
information.
Similarly, we have implemented a coverage analysis tool that automatically
instruments the program using information from PDT to determine possible
and impossible calling paths and reachable routines.
Perhaps the most extensive and sophisticated PDT application is for
SILOON
(Scripting Interface Languages for Object-Oriented Numerics).
PDT enables SILOON to generate glue and skeletion code needed in
provided scripting language access to scientific libraries.
In using PDT, source code is first parsed by an EDG-based compiler front
end.
The appropriate IL Analyzer then walks the intermediate language tree,
extracting the high-level interface and outputting item
descriptions to a program database.
These descriptions characterize the program's functions and classes, its
types, source files, namespaces, templates and their instantiations, and
macros.
The figure below shows PDT's use with SILOON.
In all the PDT applications, the DUCTAPE library provides the applications
access to the program database.
TAU Monitoring Framework
During the last year, we have implemented a TAU runtime monitor based on
the monitoring framework described above.
In fact, we have implemented two versions of the TAU monitoring framework:
one with HPC++ and one with Java.
Our intention in the HPC++ implementation was to leverage the HPC++ library
to build the middleware support required by the monitor and to create a
server interface using HPC++ distribute object semantics.
The Java implementation grew out of our interest in providing a more
portable and robust software development environment for clients,
particularly for the creation of graphical dislays, and in building a more
flexible and programmable server interface and monitor middleware system.
As shown in the figure below, the Java-based TAU monitor server utilizes a
Java Virtual Machine spawned from the profiled application.
Communication and data transfer with the client are implemented with Java
RMI.
Cluster Performance Monitoring Tools
An effort is underway to build tools for monitoring general system-level
performance metrics on Linux based clusters.
The TAU Monitoring Framework discussed above is a specific type of tool of
this sort, monitoring user level profiling data.
Another example is the Supermon project at Los Alamos National
Laboratory with which we collaborated on the performance client.
To build these specific monitors in addition to a general monitor of
arbitrary performance metrics, we are building two important components: a
middleware toolkit for accessing and transporting data, and a set of
clients for presenting and manipulating arbitrary metrics.
Two middleware toolkits are under consideration for the data access and
transport level of the monitor: a Java RMI based framework (discussed
above) and the recently released open-source
SGI Performance Co-Pilot.
Our investigation is focusing on the performance and side-effects
(undesirable perturbations) of each, in addition to the ease of adding
features and clients in the future.
Additional work has been done examining
High Performance C++ (HPC++)
as a middleware solution.
Future Plans - FY 2000
During the third year of the TAU project, we will focus on
four main development activities:
- Create a suite of static and dynamic program and performance
display tools that interface with information produced by TAU and
PDT. These tools will include browsers of program information,
such as function, class, method, and template browsers.
- Improve the PDT system by completing the Fortran 90 parser,
building new static program analyses, and developing new program
code interface wrappers, specifically for aiding in data
marshalling and translation.
- Build robust versions of the TAU monitoring framework and cluster
performance monitor. Use component technology offered by Java and
its various application libraries (e.g., JavaBeans, Java 2D/3D,
Swing) to develop custom performance monitoring clients for both
data retrieval, analysis, and presentation.
- Investigate performance measurement in the Linux kernel.
- Conduct targeted performance studies with ACTS developers and ASCI
applications.
Tool Availability
The latest TAU profiling and tracing toolkit (version 2.7)
and Program Database Toolkit (version 1.1)
are available as part of the LANL ACL Fall 1999 CD-ROM distributed at SC'99.
This edition of the CD-ROM can be downloaded by users from:
http://www.acl.lanl.gov/software
TAU can be independently downloaded from its homepage at:
http://www.acl.lanl.gov/tau
PDT can be independently downloaded from its homepage at:
http://www.acl.lanl.gov/pdtoolkit/
Both of these URL's will continue to be updated with future versions of the
software.
References
TAU
- Advanced Computing Laboratory, Los Alamos National Laboratory: TAU Portable Profiling. URL:http://www.acl.lanl.gov/tau
- S. Shende, A. D. Malony, J. Cuny, K. Lindlan,
P. Beckman, S. Karmesin: Portable Profiling and Tracing for Parallel,
Scientific Applications using C++, Proc. 2nd SIGMETRICS Symposium on
Parallel and Distributed Tools, pp. 134-145, August 1998.
(paper)
- NERSC: ACTS Toolkit: TAU Information Page, URL:http://acts.nersc.gov/tau/.1998.
- Advanced Computing Laboratory, Los Alamos National Laboratory: TAU: Tuning and Analysis Utilities, Supercomputing '99 flyer, Los Alamos National Laboratory Publication LALP-99-205, November 1999.
(paper)
- S. Vajracharya, S. Karmesin, P. Beckman, J. Crotinger, A. Malony,
S. Shende, R. Oldehoeft, S. Smith: SMARTS: Exploiting Temporal Locality and
Parallelism through Vertical Execution, International Conference on
Supercomputing (ICS'99), pp. 302-310, June 1999.
(paper)
- S. Shende:
Profiling and Tracing in Linux,
Extreme Linux Workshop, USENIX, June 1999.
(paper)
- S. Shende, A. D. Malony, J. Cuny, K. Lindlan, Tuning and Analysis Utilities, SC'99 presentation, November 1999.
- University of Oregon: TAU: Tuning and Analysis Utilities URL:
http://www.cs.uoregon.edu/research/paracomp/tau/.
PDT
- Advanced Computing Laboratory, Los Alamos National Laboratory:
Program Database Toolkit. URL:http://www.acl.lanl.gov/pdtoolkit.
- K. Lindlan, J. Cuny, A. D. Malony, S. Shende,
P. Beckman: An IL Converter and Program Database for Analysis Tools.
Proc. 2nd SIGMETRICS Symposium on Parallel and Distributed Tools,
p.153, August 1998.
(paper)
- Advanced Computing Laboratory, Los Alamos National Laboratory: PDT: Program Database Toolkit, Supercomputing '99 flyer, Los Alamos National Laboratory Publication LALP-99-204, November 1999.
(paper)
- B. Mohr: DUCTAPE. Poster, International Symposium on Computing in
Object-Oriented Parallel Environments (ISCOPE'98), December 1998.
- K. Lindlan, J. Cuny, A. Malony, S. Shende, and B. Mohr:
A Tool Framework for Static and Dynamic Analysis of
Object-Oriented Software. Submitted to International Conference on
Supercomputing (ICS'00).
- Advanced Computing Laboratory, Los Alamos National Laboratory:
Taming Complexity in High-Performance Computing, White Paper. November 1999.(paper)
- K. Lindlan, A. Malony, J. Cuny, S. Shende, and B. Mohr.
Program Database Toolkit,
SC'99 poster,
November 1999.
- K. Lindlan, A. Malony, J. Cuny, S. Shende, and B. Mohr. Program
Database Toolkit,
SC'99 presentation,
November 1999.
- Advanced Computing Laboratory, Los Alamos National Laboratory:
SILOON (Scripting Interface Languages for Object-Oriented Numerics),
URL:http://www.acl.lanl.gov/siloon.
TAU Monitoring Framework
- T. Sheehan, A. Malony, S. Shende:
A Runtime Monitoring Framework for the TAU Profiling System,
International Symposium on Computing in
Object-Oriented Parallel Environments (ISCOPE'99), pp. 170-181,
December 1999. (paper)
- University of Oregon: TAU Monitoring Framework URL:http://www.cs.uoregon.edu/research/paracomp/tau/monitor/.