Knowledge Support for Parallel Performance Data Mining

Kevin A. Huck

Parallel applications running on high-end computer systems manifest a complex combination of performance phenomena, such as communication patterns, work distributions, and computational inefficiencies. Current performance tools compute results that help to describe performance behavior, as well as to understand performance problems and how they came about. Unfortunately, parallel performance tool research has been limited in its contributions to large-scale performance data management and analysis, automated performance investigation, and knowledge-based performance problem reasoning.

This dissertation discusses the design of a performance analysis methodology and framework which integrates scalable data management, dimension reduction, clustering, classification and correlation analysis of individual trials of large dimensions, and comparative analysis between multiple application executions.

Analysis process workflows can be captured, automating what would otherwise be time-consuming and possibly error prone tasks. More importantly, process automation provides an extensible interface to the analysis process. The methods also integrate context metadata and a rule-based system in order to capture expert performance analysis knowledge about known anomalous behavior patterns. Applying this knowledge to performance analysis results and associated metadata provides a mechanism for diagnosing the causes of performance problems, rather than just summarizing results. Our prototype implementation of our data mining framework, PerfExplorer, and our data management framework, PerfDMF, are applied in large-scale performance studies to demonstrate each thesis contribution. The dissertation concludes with a discussion of future research directions.