Kevin A. Huck - Publications


Publications:

Posters:

Presentations:


Publications:
  1. Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
    Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0.
    Large-Scale Programming Tools and Environments, special issue of Scientific Programming. (to appear, email for copies)
    Abstract:
    The integration of scalable performance analysis in parallel development tools is difficult. The potential size of data sets and the need to compare results from multiple experiments presents a challenge to manage and process the information. Simply to characterize the performance of parallel applications running on potentially hundreds of thousands of processor cores requires new scalable analysis techniques. Furthermore, many exploratory analysis processes are repeatable and could be automated, but are now implemented as manual procedures. In this paper, we will discuss the current version of PerfExplorer, a performance analysis framework which provides dimension reduction, clustering and correlation analysis of individual trails of large dimensions, and can perform relative performance analysis between multiple application executions. PerfExplorer analysis processes can be captured in the form of Python scripts, automating what would otherwise be time-consuming tasks. We will give examples of large-scale analysis results, and discuss the future development of the framework, including the encoding and processing of expert performance rules, and the increasing use of performance metadata.
  2. Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris
    Scalable, Automated Performance Analysis with TAU and PerfExplorer.
    Proceedings of Parallel Computing 2007, Aachen, Germany, 2007.
    Abstract:
    Scalable performance analysis is a challenge for parallel development tools. The potential size of data sets and the need to compare results from multiple experiments presents a challenge to manage and process the information, and to characterize the performance of parallel applications running on potentially hundreds of thousands of processor cores. In addition, many exploratory analysis processes represent potentially repeatable processes which can and should be automated. In this paper, we will discuss the current version of PerfExplorer, a performance analysis framework which provides dimension reduction, clustering and correlation analysis of individual trails of large dimensions, and can perform relative performance analysis between multiple application executions. PerfExplorer analysis processes can be captured in the form of Python scripts, automating what would otherwise be time-consuming tasks. We will give examples of large-scale analysis results, and discuss the future development of the framework, including the encoding and processing of expert performance rules, and the increasing use of performance metadata.
  3. D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
    Performance database technology for SciDAC applications.
    Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
    Abstract:
    As part of the Performance Engineering Research Institute (PERI) effort, the Performance Database Working Group, which involves PERI researchers as well as outside researchers at the University of Oregon, Portland State University, and Texas A&M University, has developed technology for storing performance data collected by a number of performance measurement and analysis tools, including TAU, PerfTrack, Prophesy, and SvPablo. In addition to the performance data, metadata capturing the experimental setup and conditions (e.g., source code version; input data; platform, compiler, library, and operating system versions and configurations; runtime environment) are exported to a common metadata schema, along with some basic performance information. The exported information can be viewed from a common web interface, and a link or contact information is provided for accessing the original performance data in its home database. Analysis tools provided by the individual databases support tasks such as parallel profile browsing and analysis, cross-experiment analysis, and scalability studies. Performance data are currently being collected and analyzed for the GTC and MILC SciDAC applications. The tools are being installed on machines used by SciDAC researchers so that they can easily collect data and upload it to an associated performance database. Work on a deeper level of interoperability that will allow exchange of actual performance data between databases is underway.
  4. Y. Zhang, R. Fowler, K. Huck, A. Malony, A. Porterfield, D. Reed, S. Shende, V. Taylor, and X. Wu..
    US QCD Computational Performance Studies with PERI.
    Journal of Physics: Conference Series, Vol. 78, 24--28 June 2007, Boston Massachusetts, USA.
    Abstract:
    We report on some of the interactions between two SciDAC projects: The National Computational Infrastructure for Lattice Gauge Theory (USQCD), and the Performance Envineering Research Institute (PERI). Many modern scientific programs consistently report the need for faster computational resources to maintain global competitiveness. However, as the size and complexity of emerging high end computing (HEC) systems continue to rise, achieving good performance on such systems is becoming ever more challenging. In order to take full advantage of the resources, it is crucial to understand the characteristics of relevant scientific applications and the systems these applications are running on. Using tools developed under PERI and by other performance measurement researchers,, we studied the performance of two applications, MILC and Chroma, on several high performance computing systems at DOE laboratories. In the case of Chroma, we discuss how the use of C++ and modern software engineering and programming methods are driving the evolution of performance tools.
  5. Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris.
    TAUg: Runtime Global Performance Data Access Using MPI.
    EuroPVM/MPI, pp. 313-321, Bonn, Germany, 2006.
    Abstract:
    To enable a scalable parallel application to view its global performance state, we designed and developed TAUg, a portable runtime framework layered on the TAU parallel performance system. TAUg leverages the MPI library to communicate between application processes, creating an abstraction of a global performance space from which profile views can be retrieved. We describe the TAUg design and implementation and show its use on two test benchmarks up to 512 processors. Overhead evaluation for the use of TAUg is included in our analysis. Future directions for improvement are discussed.
  6. Li Li, Allen D. Malony and Kevin Huck.
    Model-Based Relative Performance Diagnosis of Wavefront Parallel Computations.
    Euro-Par 2006 Parallel Processing Conference September 2006 (LNCS 4128). Pages 35-46.
    Abstract:
    Parallel performance diagnosis can be improved with the use of performance knowledge about parallel computation models. The Hercule diagnosis system applies model-based methods to automate performance diagnosis processes and explain performance problems from high-level computation semantics. However, Hercule is limited by a single experiment view. Here we introduce the concept of relative performance diagnosis and show how it can be integrated in a model-based diagnosis framework. The paper demonstrates the effectiveness of Hercule's approach to relative diagnosis of the well-known Sweep3D application based on a Wavefront model. Relative diagnoses of Sweep3D performance anomalies in strong and weak scaling cases are given.
  7. Kevin Huck and Allen D. Malony.
    PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing.
    SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
    Abstract:
    Parallel applications running on high-end computer systems manifest a complexity of performance phenomena. Tools to observe parallel performance attempt to capture these phenomena in measurement datasets rich with information relating multiple performance metrics to execution dynamics and parameters specific to the application-system experiment. However, the potential size of datasets and the need to assimilate results from multiple experiments makes it a daunting challenge to not only process the information, but discover and understand performance insights. In this paper, we present PerfExplorer, a framework for parallel performance data mining and knowledge discovery. The framework architecture enables the development and integration of data mining operations that will be applied to large-scale parallel performance profiles. PerfExplorer operates as a client-server system and is built on a robust parallel performance database (PerfDMF) to access the parallel profiles and save its analysis results. Examples are given demonstrating these techniques for performance analysis of ASCI applications.
  8. Karen L. Karavanic, John May, Kathryn Mohror, Brian Miller, Kevin Huck, Rashawn Knapp, Brian Pugh.
    Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool.
    SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Seattle, Washington, USA.
    Abstract:
    PerfTrack is a data store and interface for managing performance data from large-scale parallel applications. Data collected in different locations and formats can be compared and viewed in a single performance analysis session. The underlying data store used in PerfTrack is implemented with a database management system (DBMS). PerfTrack includes interfaces to the data store and scripts for automatically collecting data describing each experiment, such as build and platform details. We have implemented a prototype of PerfTrack that can use Oracle or PostgreSQL for the data store. We demonstrate the prototype's functionality with three case studies: one is a comparative study of an ASC purple benchmark on high-end Linux and AIX platforms; the second is a parameter study conducted at Lawrence Livermore National Laboratory (LLNL) on two high end platforms, a 128 node cluster of IBM Power 4 processors and BlueGene/L; the third demonstrates incorporating performance data from the Paradyn Parallel Performance Tool into an existing PerfTrack data store.
  9. P Worley, J Candy, L Carrington, K Huck, T Kaiser, G Mahinthakumar, A Malony, S Moore, D Reed, P Roth, H Shan, S Shende, A Snavely, S Sreepathi, F Wolf, Y Zhang
    Performance Analysis of GYRO: a tool evaluation.
    Journal of Physics: Conference Series, vol. 16, pp. 551-555, 2005.
    Abstract:
    The performance of the Eulerian gyrokinetic-Maxwell solver code GYRO is analyzed on five high performance computing systems. First, a manual approach is taken, using custom scripts to analyze the output of embedded wallclock timers, floating point operation counts collected using hardware performance counters, and traces of user and communication events collected using the profiling interface to Message Passing Interface (MPI) libraries. Parts of the analysis are then repeated or extended using a number of sophisticated performance analysis tools: IPM, KOJAK, SvPablo, TAU, and the PMaC modeling tool suite. The paper briefly discusses what has been discovered via this manual analysis process, what performance analyses are inconvenient or infeasible to attempt manually, and to what extent the tools show promise in accelerating or significantly extending the manual performance analyses.
  10. Kevin Huck, Allen D. Malony, Robert Bell and Alan Morris.
    Design and Implementation of a Parallel Performance Data Management Framework.
    (Winner: The Chuan-lin Wu Best Paper Award), Proceedings of the 2005 International Conference on Parallel Processing. June 14-17, 2005. Oslo, Norway.
    Abstract:
    Empirical performance evaluation of parallel systems and applications can generate significant amounts of performance data and analysis results from multiple experiments as performance is investigated and problems diagnosed. Hence, the management of performance information is a core component of performance analysis tools. To better support tool integration, portability, and reuse, there is a strong motivation to develop performance data management technology that can provide a common foundation for performance data storage, access, merging, and analysis. This paper presents the design and implementation of the Performance Data Management Framework (PerfDMF). PerfDMF addresses objectives of performance tool integration, interoperation, and reuse by providing common data storage, access, and analysis infrastructure for parallel performance profiles. PerfDMF includes an extensible parallel profile data schema and relational database schema, a profile query and analysis programming interface, and an extendible toolkit for profile import/export and standard analysis. We describe the PerfDMF objectives and architecture, give detailed explanation of the major components, and show examples of PerfDMF application.

Posters:
  1. D. Gunter, K. Huck, K. Karavanic, J. May, A. Malony, K. Mohror, S. Moore, A. Morris, S. Shende, V. Taylor, X. Wu, and Y. Zhang.
    Performance Database Technology for SciDAC Applications.
    Poster, SciDAC. June, 2007.
    Abstract:
    As part of the Performance Engineering Research Institute (PERI) effort, the Performance Database Working Group, which involves PERI researchers as well as outside researchers at the University of Oregon, Portland State University, and Texas A&M University, has developed technology for storing performance data collected by a number of performance measurement and analysis tools, including TAU, PerfTrack, Prophesy, and SvPablo. In addition to the performance data, metadata capturing the experimental setup and conditions (e.g., source code version; input data; platform, compiler, library, and operating system versions and configurations; runtime environment) are exported to a common metadata schema, along with some basic performance information. The exported information can be viewed from a common web interface, and a link or contact information is provided for accessing the original performance data in its home database. Analysis tools provided by the individual databases support tasks such as parallel profile browsing and analysis, cross-experiment analysis, and scalability studies. Performance data are currently being collected and analyzed for the GTC and MILC SciDAC applications. The tools are being installed on machines used by SciDAC researchers so that they can easily collect data and upload it to an associated performance database.
  2. R. Fowler, Y. Zhang, A. Porterfield, D. Reed, J. Mellor-Crummey, N. Tallent, K. Huck, A. Malony, S. Shende, V. Taylor, and X. Wu.
    PERI and USQCD Computational Performance Studies.
    Poster, SciDAC. June, 2007.
    Abstract:
    USQCD encompasses a SciDAC collaboration of US scientists developing and using large-scale computers for calculations in lattice quantum chromodynamics. Software Emphasis: improved scientific productivity through modular, reusable, cross-platform, high-performance libraries. PERI is a SciDAC Institute focused on delivering petascale performance to complex scientific applications running on Leadership Class computing systems. Emphasis: improved productivity through automation of measurement, analysis, and tuning of HPC applications.
  3. Kevin Huck, Kathryn Mohror, John May, Brian Miller, Karen Karavanic.
    PerfTrack: Performance Database & Analysis Tool.
    Poster, Lawrence Livermore National Laboratory, UCRL-POST-205871. September, 2004.
    Introduction:
    Our goal is to create a tool which will help scientific programmers answer difficult questions about application performance as the source code, build parameters, runtime environment and hardware vary over time. We are developing PerfTrack to explore technologies in parallel performance measurement, modeling, analysis and prediction. We are storing performance data and the associated environment data in a relational database. This database provides a foundation to build analysis tools, scalable to large numbers of threads (over 1024) and capable of comparing multiple executions. The tools we develop will be automated to gather, store and analyze data, in order to encourage their use in the software development cycle.

Presentations:
  1. Using the TAU Performance Analysis System on the Blue Gene/P
    ALCF INCITE Workshop. May. 7-8, 2008. Argonne National Laboratory, Argonne, IL. PDF.
  2. PERI Database Working Group: Status Report
    PERI Semi-annual meeting. Feb. 25-26, 2008, UCSD, San Diego CA. .
  3. Scalable, Automated Performance Analysis with TAU and PerfExplorer
    Parallel Computing 2007. Sept. 4-7, 2007. Aachen and Jülich, Germany. PDF.
  4. Scalable Performance Analysis with TAU, PerfDMF and PerfExplorer
    Forschungszentrum Jülich, 2007. Aug. 28, 2007. Jülich, Germany. PDF.
  5. Knowledge Support for Parallel Performance Data Mining
    Code Instrumentation and Modeling for Parallel Performance Analysis, Dagstuhl Seminar, 2007. Aug. 19-24, 2007. Dagstuhl, Germany. PDF.
  6. PerfDMF: Performance Data Management Framework
    Open Source Performance Analysis Tools (OSPAT) BOF Session, Conference on High Performance Networking and Computing (SC|05). November 18, 2005. Seattle, Washington. PDF.
  7. PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing.
    Conference on High Performance Networking and Computing (SC|05). November 18, 2005. Seattle, Washington. PDF.
  8. Design and Implementation of a Parallel Performance Data Management Framework.
    2005 International Conference on Parallel Processing. June 17, 2005. Oslo, Norway. PDF.
  9. Scalable Parallel Performance Analysis.
    IBM Petascale Tools Strategy Workshop, May 4, 2005. PDF.
  10. PerfExplorer: Parallel Performance Analysis using Data Mining Techniques.
    Directed Research Project, University of Oregon. December 15, 2004. PDF.

Home

Valid XHTML 1.0 Strict