Model-Based Automatic Performance Diagnosis of Parallel Computations
Li Li
Committee: Allen Malony (chair), Stephen Fickas, Virginia Lo, Daniel Steck, Xian-He Sun
Dissertation Defense(Feb 2007)
Keywords:

Scientific parallel programs often undergo significant performance tuning before meeting their performance expectation. Performance tuning naturally involves a diagnosis process -locating performance bugs that make a program inefficient and explaining them in terms of high-level program design. Important performance measurement and analysis tools have been developed to support the performance analysis with the facilities of running experiments on parallel computers and generating measurement data to evaluate performance. However, current performance analysis technology does not yet allow for associating found performance problems with causes at a high-level program abstraction. Nor does it support the performance diagnosis process in a well automated manner.

We present a systematic method to guide the performance diagnosis process and support the process with minimum user intervention. The motivating observation is that performance diagnosis can be greatly improved with the use of performance knowledge about parallel computation models. We therefore propose an approach to generating performance knowledge for automatically diagnosing parallel programs Our approach exploits program execution abstraction and parallelism found in computational models to search and explain performance bugs. We identify categories of knowledge required for performance diagnosis and describe how to derive the knowledge from computational models. We represent the extracted knowledge in a manner such that performance inferencing can be carried out in an automatic manner.

We have developed the Hercule automatic performance diagnosis system that implements the model-based diagnosis strategy. In this dissertation, we present how Hercule integrates the performance knowledge into a performance analysis tool and demonstrate the effectiveness of our performance knowledge engineering approach through Hercule experiments on a vanety of parallel computational models. We also investigate compositional programs that combine two or more models. We extend performance knowledge engineering to capture the interplay of multiple models in an integrated stale, and improve Hercule capabilities to support the compositional performance diagnosis. We have applied Hercule to two representative scientific applications, both of which are implemented with compositional models. The experiment results show that, requiring minimum user intervention, model-based performance analysis is vital and effective in discovering and interpreting performance bugs at a high level of program abstraction.