Empirical Performance Analysis of HPC Applications with Portable Hardware Counter Metrics

Brian J Gravelle

In this dissertation, we demonstrate that it is possible to develop methods of empirical hardware-counter-based performance analysis for scientific applications running on diverse types of CPUs. Although hardware counters have been used in performance analysis for at least 30 years, the methods used are still limited to particular CPU vendors or even particular generations of CPUs from the same vendor. Our motivating hypothesis is that hardware counter-based measurements could be developed to provide consistent performance information on diverse CPU types. This dissertation proves the hypothesis was correct by demonstrating one such set of metrics.

We begin with an introduction motivating empirical performance analysis on CPUs, followed by a background on empirical performance analysis. This background includes the Roofline Performance Model which is widely used to visualize the performance of scientific applications relative to the potential performance of the system in use. The Roofline Model uses metrics that are easily portable to different CPU architectures, so it is a useful starting point for our efforts to develop portable hardware counter metrics. We contribute to existing roofline literature by presenting a method using hardware counters to measure the required application data on two different CPUs and by presenting two benchmarks to produce the Roofline Model of the CPU. These contributions are complementary since the benchmarks can also be used to validate the hardware counters used to measure the application data.

Building on this work, we present a set of additional performance metrics derived from Hardware Performance Monitors that we have been able to replicate on CPUs from two separate vendors. We developed these metrics to focus on information that can inform developers about the performance of the algorithms and data structures in their applications. This method contrasts with other hardware counter methods which are aimed at particular microarchitectural features. These metrics allow the users to understand the performance of the application from the same perspective on multiple CPUs.

We use a series of case studies to explore the usefulness of our new metrics and to validate that the measured values provide the expected information about the application on both of our test systems. The first set of case studies examines a series of benchmarks and mini-applications. These computational kernels have a variety of performance features which we explore using the new hardware counter metrics. Finally, we study the performance of several versions of a scientific application using a combination of the Roofline model and the new metrics. These case studies show that our performance metrics are able to provide useful performance information on two different CPU types, proving our hypothesis by example.