Profiling and tracing are the two main approaches for empirical parallel performance analysis. Parallel tracing is most often implemented as a measurement-based technique. Here, the application code is instrumented to observe events which are recorded in a trace buffer on each occurrence during execution [5,28]. In contrast, performance profiling  can be implemented in one of two ways: 1) in vivo, with measurement code inserted in the program (e.g., see [6,9,21,23,24]), or 2) ex vivo, by periodically interrupting the program to assign performance metrics to code regions identified by the halted program counter (e.g., see [10,13,14,22,26]. The first technique is commonly referred to as measurement-based profiling (or simply measured profiling) and is an active technique. The second technique is called sample-based profiling (also known as statistical profiling) and is a passive technique since it requires little or no modification to program.
There are significant differences of opinion among performance tool researchers with regards to the merits of measured versus statistical profiling. The issues debated include instrumentation, robustness, portability, compiler optimizations, and intrusion. Ultimately, the profiling methods must be accurate, else the differences of opinion really do not matter. Herein lies an interesting performance analysis conundrum. How do we evaluate analysis accuracy when the ``true'' performance is unknown? Some technique must be used to observe performance, and profiling tools will always have limitations on what performance phenomena can and cannot be observed . Is a tool inaccurate if it does not provide information about particular performance behavior at a sufficient level of detail? Furthermore, no tool is entirely passive, and any degree of execution intrusion can result in performance perturbation . Should not all profiling tools be considered inaccurate in this case? Parallel performance analysis is no different from experimental methods in other sciences. ``Truth'' always lies just beyond the reach of observation. As long as this is the case, accuracy will be a relative assessment.
Until there is a systematic basis for judging the accuracy of profiling tools, it is more productive to focus on those challenges that a profiling method faces to improve its accuracy. Our work advocates measured profiling as a method of choice for performance analysis . Unfortunately, measured profiling suffers from direct intrusion on program execution. This intrusion is often reported as a percentage slowdown of total execution time, but the intrusion effects will be distributed throughout the profile results. The question we pose in this paper is whether it is possible to compensate for these effects by quantifying and removing the overhead from profile measurements.
Section §2 describes the problem of profiling intrusion and outlines our compensation objectives. The algorithms for overhead removal are described in Section §3. We tested these on a series of case studies using the NAS parallel benchmarks . In Section §4, we report our findings with an emphasis on evaluating relative accuracy. Our approach can improve intrusion errors, but it does not fully solve the overhead compensation problem. Section §5 discusses the thorny issues that remain. Conclusions and future work are given in Section §6.