To observe the performance of a parallel program, the general technique of performance profiling is often used. Performance profiling (9) can be implemented by direct (in vivo instrumentation of the program with measurement code (e.g., see (19,20,21,5,23)), or by sampling (ex vivo) the program periodically (via interrupts or passive monitoring) to assign performance metrics to code regions identified by the sampled program counter (e.g., see (7,8,11,6,18). The first technique is commonly referred to as measurement-based profiling (or simply measured profiling) and is an active technique. The program is instrumented to observe specific events of interest, such as the entry and exit of a routine. The second technique is called sample-based profiling (also known as statistical profiling) and is a passive technique since it requires little or no modification to the program. The ``events'' observed in this case are considered to be either those events inferred from the program ``state'' at the time of sampling (e.g., the current routine on the call stack) or those events that triggered the sampling action (e.g., overflow of a hardware counter) or a combination of both.
In both statistical and measured profiling, performance measurements are made during program execution. There is an overhead associated with performance measurement since extra code is being executed and hardware resources (processor, memory, network) consumed. When performance overhead affects the program execution, we speak of performance (measurement) intrusion. Performance intrusion, no matter how small, can result in performance perturbation (13) where the program's measured performance behavior is ``different'' from its unmeasured performance. Whereas performance perturbation is difficult to assess, performance intrusion can be quantified by different metrics, the most important of which is dilation in program execution time. This type of intrusion is often reported as a percentage slowdown of total execution time, but the intrusion effects themselves will be distributed throughout the profile results.
Advocates of statistical profiling argue against measured profiling in regards to time overhead and its impact on intrusion. The insertion of measurement code into the program introduces direct overhead for every event occurrence at the point in the program's execution when and where the event occurs. In contrast, statistical profiling measures the the program's performance only at the time sampling, whenever and wherever that occurs. Hence, it is argued, statistical profiling can result in less overhead and, consequently, less intrusion than measured profiling. Measured profiling advocates retort that, while this may be true in some cases, there are certain classes of common events and execution actions that cannot be observed by a statistical approach alone or at all. Even Unix gprof-style profiling (6), typically implemented via program counter sampling, requires direct code measurement to keep track of the number of times an event (i.e., routine entry) occurs during execution.
Any performance profiling technique, short of completely passive monitoring, will encounter measurement overhead and will also have limitations on what performance phenomena can and cannot be observed (13). Until there is a systematic basis for judging the validity and verifiability of differing profiling techniques, it is more productive to focus on those challenges that a profiling method faces to improve the accuracy of its measurement. In this regard, we pose the question whether it is possible to compensate for measurement overhead in performance profiling. What we mean by this is to quantify measurement overhead and remove the overhead from profile calculations. (It is important to note we are not suggesting that by doing so we are ``correcting'' necessarily the effects of overhead on intrusion and perturbation.) Because performance overhead occurs in both measured and statistical profiling, overhead compensation is an important topic of study.
In (16), we presented overhead compensation techniques that were implemented in the TAU performance system (23) and demonstrated with the NAS parallel benchmarks (1) for both flat and callpath profile analysis. While our results showed improvement in NAS profiling accuracy, as measured by the error in total execution time compared to a non-instrumented run, the compensation models were deficient for parallel execution due to their inability to account for interprocess interactions and dependencies. The contribution of this paper is the modeling of performance overhead compensation in parallel profiling and the design of on-the-fly algorithms that would be implemented in practical profiling tools, such as in the TAU system.
Section §2 briefly describes the basic models from (16) and show how they fail on a simple master-worker example. We discuss the difficult issues that arise with overhead interdependency in parallel execution. In Section §3 we follow a strategy to model parallel overhead compensation for message-based parallel programs based on the notion of rational reconstruction of compensation solutions for specific parallel case studies. From the rationally reconstructed models we generate, we derive an on-the-fly algorithm for overhead analysis and compensation and argue for its general application. The simple master-worker example is then reconsidered in Section §4 in respect to the application of the algorithm. Conclusions and future work are given in Section §5.