Publications: |
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0.
Large-Scale Programming Tools and Environments, special issue of Scientific Programming. (to appear, email for copies)
Abstract:
The integration of scalable performance analysis in parallel development tools
is difficult. The potential size of data sets and the need to compare results
from multiple experiments presents a challenge to manage and process the
information. Simply to characterize the performance of parallel applications
running on potentially hundreds of thousands of processor cores requires new
scalable analysis techniques. Furthermore, many exploratory analysis processes
are repeatable and could be automated, but are now implemented as manual
procedures. In this paper, we will discuss the current version of
PerfExplorer, a performance analysis framework which provides dimension
reduction, clustering and correlation analysis of individual trails of large
dimensions, and can perform relative performance analysis between multiple
application executions. PerfExplorer analysis processes can be captured in the
form of Python scripts, automating what would otherwise be time-consuming
tasks. We will give examples of large-scale analysis results, and discuss the
future development of the framework, including the encoding and processing of
expert performance rules, and the increasing use of performance metadata.
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
Scalable, Automated Performance Analysis with TAU and PerfExplorer.
Proceedings of Parallel Computing 2007, Aachen, Germany, 2007.
Abstract:
Scalable performance analysis is a challenge for parallel development tools.
The potential size of data sets and the need to compare results from multiple
experiments presents a challenge to manage and process the information, and to
characterize the performance of parallel applications running on potentially
hundreds of thousands of processor cores. In addition, many exploratory
analysis processes represent potentially repeatable processes which can and
should be automated.
In this paper, we will discuss the current version of PerfExplorer, a
performance analysis framework which provides dimension reduction, clustering
and correlation analysis of individual trails of large dimensions, and can
perform relative performance analysis between multiple application executions.
PerfExplorer analysis processes can be captured in the form of Python scripts,
automating what would otherwise be time-consuming tasks. We will give examples
of large-scale analysis results, and discuss the future development of the
framework, including the encoding and processing of expert performance rules,
and the increasing use of performance metadata.
-
D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
Performance database technology for SciDAC applications.
Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
Abstract:
As part of the Performance Engineering Research Institute (PERI) effort, the
Performance Database Working Group, which involves PERI researchers as well as
outside researchers at the University of Oregon, Portland State University, and Texas
A&M University, has developed technology for storing performance data collected by a
number of performance measurement and analysis tools, including TAU, PerfTrack,
Prophesy, and SvPablo. In addition to the performance data, metadata capturing the
experimental setup and conditions (e.g., source code version; input data; platform,
compiler, library, and operating system versions and configurations; runtime
environment) are exported to a common metadata schema, along with some basic
performance information. The exported information can be viewed from a common web
interface, and a link or contact information is provided for accessing the original
performance data in its home database. Analysis tools provided by the individual
databases support tasks such as parallel profile browsing and analysis, cross-experiment
analysis, and scalability studies. Performance data are currently being collected and
analyzed for the GTC and MILC SciDAC applications. The tools are being installed on
machines used by SciDAC researchers so that they can easily collect data and upload it to
an associated performance database. Work on a deeper level of interoperability that will
allow exchange of actual performance data between databases is underway.
-
Y. Zhang, R. Fowler, K. Huck, A. Malony, A. Porterfield, D. Reed, S. Shende, V. Taylor, and X. Wu..
US QCD Computational Performance Studies with PERI.
Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
Abstract:
We report on some of the interactions between two SciDAC projects: The National Computational Infrastructure for Lattice Gauge Theory (USQCD), and the Performance Envineering Research Institute (PERI). Many modern scientific programs consistently report the need for faster computational resources to maintain global competitiveness. However, as the size and complexity of emerging high end computing (HEC) systems continue to rise, achieving good performance on such systems is becoming ever more challenging. In order to take full advantage of the resources, it is crucial to understand the characteristics of relevant scientific applications and the systems these applications are running on. Using tools developed under PERI and by other performance measurement researchers,, we studied the performance of two applications, MILC and Chroma, on several high performance computing systems at DOE laboratories. In the case of Chroma, we discuss how the use of C++ and modern software engineering and programming methods are driving the evolution of performance tools.
-
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan
Morris.
TAUg: Runtime Global Performance Data Access Using MPI.
EuroPVM/MPI, pp. 313-321, Bonn, Germany, 2006.
Abstract:
To enable a scalable parallel application to view its global performance state,
we designed and developed TAUg, a portable runtime framework layered on the TAU
parallel performance system. TAUg leverages the MPI library to communicate
between application processes, creating an abstraction of a global performance
space from which profile views can be retrieved. We describe the TAUg design
and implementation and show its use on two test benchmarks up to 512
processors. Overhead evaluation for the use of TAUg is included in our
analysis. Future directions for improvement are discussed.
-
Li Li, Allen D. Malony and Kevin Huck.
Model-Based Relative Performance Diagnosis of Wavefront Parallel
Computations.
Euro-Par 2006 Parallel Processing Conference September 2006 (LNCS 4128). Pages 35-46.
Abstract:
Parallel performance diagnosis can be improved with the use of performance
knowledge about parallel computation models. The Hercule diagnosis system
applies model-based methods to automate performance diagnosis processes and
explain performance problems from high-level computation semantics. However,
Hercule is limited by a single experiment view. Here we introduce the concept
of relative performance diagnosis and show how it can be integrated in a
model-based diagnosis framework. The paper demonstrates the effectiveness of
Hercule's approach to relative diagnosis of the well-known Sweep3D application
based on a Wavefront model. Relative diagnoses of Sweep3D performance anomalies
in strong and weak scaling cases are given.
-
Kevin Huck and Allen D. Malony.
PerfExplorer:
A Performance Data Mining
Framework For Large-Scale Parallel Computing.
SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
Abstract:
Parallel applications running on high-end computer systems manifest a
complexity of performance phenomena. Tools to observe parallel performance
attempt to capture these phenomena in measurement datasets rich with
information relating multiple performance metrics to execution dynamics and
parameters specific to the application-system experiment. However, the
potential size of datasets and the need to assimilate results from multiple
experiments makes it a daunting challenge to not only process the information,
but discover and understand performance insights. In this paper, we present
PerfExplorer, a framework for parallel performance data mining and knowledge
discovery. The framework architecture enables the development and integration
of data mining operations that will be applied to large-scale parallel
performance profiles. PerfExplorer operates as a client-server system and is
built on a robust parallel performance database (PerfDMF) to access the
parallel profiles and save its analysis results. Examples are given
demonstrating these techniques for performance analysis of ASCI applications.
-
Karen L. Karavanic, John May, Kathryn Mohror, Brian Miller, Kevin Huck, Rashawn
Knapp, Brian Pugh.
Integrating Database Technology with Comparison-based Parallel Performance
Diagnosis: The PerfTrack Performance Experiment Management Tool.
SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
Abstract:
PerfTrack is a data store and interface for managing performance data from
large-scale parallel applications. Data collected in different locations and
formats can be compared and viewed in a single performance analysis session.
The underlying data store used in PerfTrack is implemented with a database
management system (DBMS). PerfTrack includes interfaces to the data store and
scripts for automatically collecting data describing each experiment, such as
build and platform details. We have implemented a prototype of PerfTrack that
can use Oracle or PostgreSQL for the data store. We demonstrate the prototype's
functionality with three case studies: one is a comparative study of an ASC
purple benchmark on high-end Linux and AIX platforms; the second is a parameter
study conducted at Lawrence Livermore National Laboratory (LLNL) on two high
end platforms, a 128 node cluster of IBM Power 4 processors and BlueGene/L; the
third demonstrates incorporating performance data from the Paradyn Parallel
Performance Tool into an existing PerfTrack data store.
-
P Worley, J Candy, L Carrington, K Huck, T Kaiser, G Mahinthakumar, A Malony, S
Moore, D Reed, P Roth, H Shan, S Shende, A Snavely, S Sreepathi, F Wolf, Y
Zhang
Performance
Analysis of GYRO: a tool evaluation.
Journal of Physics: Conference Series, vol. 16, pp. 551-555, 2005.
Abstract:
The performance of the Eulerian gyrokinetic-Maxwell solver code GYRO is
analyzed on five high performance computing systems. First, a manual approach
is taken, using custom scripts to analyze the output of embedded wallclock
timers, floating point operation counts collected using hardware performance
counters, and traces of user and communication events collected using the
profiling interface to Message Passing Interface (MPI) libraries. Parts of the
analysis are then repeated or extended using a number of sophisticated
performance analysis tools: IPM, KOJAK, SvPablo, TAU, and the PMaC modeling
tool suite. The paper briefly discusses what has been discovered via this
manual analysis process, what performance analyses are inconvenient or
infeasible to attempt manually, and to what extent the tools show promise in
accelerating or significantly extending the manual performance analyses.
-
Kevin Huck, Allen D. Malony, Robert Bell and Alan Morris.
Design and
Implementation of a Parallel Performance Data Management Framework.
(Winner: The Chuan-lin Wu Best Paper Award),
Proceedings of the 2005 International Conference on Parallel Processing.
June 14-17, 2005. Oslo, Norway.
Abstract:
Empirical performance evaluation of parallel systems and applications can
generate significant amounts of performance data and analysis results from
multiple experiments as performance is investigated and problems diagnosed.
Hence, the management of performance information is a core component of
performance analysis tools. To better support tool integration, portability,
and reuse, there is a strong motivation to develop performance data management
technology that can provide a common foundation for performance data storage,
access, merging, and analysis. This paper presents the design and
implementation of the Performance Data Management Framework (PerfDMF). PerfDMF
addresses objectives of performance tool integration, interoperation, and reuse
by providing common data storage, access, and analysis infrastructure for
parallel performance profiles. PerfDMF includes an extensible parallel profile
data schema and relational database schema, a profile query and analysis
programming interface, and an extendible toolkit for profile import/export and
standard analysis. We describe the PerfDMF objectives and architecture, give
detailed explanation of the major components, and show examples of PerfDMF
application.
|
Posters: |
-
D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
Performance Database Technology for SciDAC Applications.
Poster, SciDAC. June, 2007.
Abstract:
As part of the Performance Engineering Research Institute (PERI) effort, the Performance Database Working Group, which involves PERI researchers as well as outside researchers at the University of Oregon, Portland State University, and Texas A&M University, has developed technology for storing performance data collected by a number of performance measurement and analysis tools, including TAU, PerfTrack, Prophesy, and SvPablo. In addition to the performance data, metadata capturing the experimental setup and conditions (e.g., source code version; input data; platform, compiler, library, and operating system versions and configurations; runtime environment) are exported to a common metadata schema, along with some basic performance information. The exported information can be viewed from a common web interface, and a link or contact information is provided for accessing the original performance data in its home database. Analysis tools provided by the individual databases support tasks such as parallel profile browsing and analysis, cross-experiment analysis, and scalability studies. Performance data are currently being collected and analyzed for the GTC and MILC SciDAC applications. The tools are being installed on machines used by SciDAC researchers so that they can easily collect data and upload it to an associated performance database.
-
R. Fowler, Y. Zhang, A. Porterfield, D. Reed, J. Mellor-Crummey, N. Tallent, K. Huck, A. Malony, S. Shende, V. Taylor, and X. Wu.
PERI and USQCD Computational Performance Studies.
Poster, SciDAC. June, 2007.
Abstract:
USQCD encompasses a SciDAC collaboration of US scientists developing and using large-scale computers for calculations in lattice quantum chromodynamics. Software Emphasis: improved scientific productivity through modular, reusable, cross-platform, high-performance libraries. PERI is a SciDAC Institute focused on delivering petascale performance to complex scientific applications running on Leadership Class computing systems. Emphasis: improved productivity through automation of measurement, analysis, and tuning of HPC applications.
-
Kevin Huck, Kathryn Mohror, John May, Brian Miller, Karen Karavanic.
PerfTrack:
Performance Database & Analysis Tool.
Poster, Lawrence Livermore National Laboratory, UCRL-POST-205871. September, 2004.
Introduction:
Our goal is to create a tool which will help scientific programmers answer
difficult questions about application performance as the source code, build
parameters, runtime environment and hardware vary over time. We are developing
PerfTrack to explore technologies in parallel performance measurement,
modeling, analysis and prediction. We are storing performance data and the
associated environment data in a relational database. This database provides a
foundation to build analysis tools, scalable to large numbers of threads (over
1024) and capable of comparing multiple executions. The tools we develop will
be automated to gather, store and analyze data, in order to encourage their use
in the software development cycle.
|