Profiling associates performance metrics with aspects of a program's execution. Normally, it is the program's components being profiled, such as its routines and code blocks, and profile results show how performance data are distributed across these program parts. Execution time is the most common metric, but any source of performance data is valid (e.g., hardware counters [4,6,23,30]). Flat profiles show performance data distributed onto the static program structure, while path profiles [11] map performance data to dynamic program behavior, most often represented as program execution paths (e.g., routine calling paths [7,11]). Profiling both measures performance data and calculates performance statistics at runtime.
Measured profiling requires direct instrumentation of a program with code to obtain requisite performance data and compute performance statistics. Generally, we are interested in observing events that have entry / exit semantics. Instrumentation is done both at the event ``entry'' point (e.g., routine begin) and the event ``exit'' point (e.g., routine return). Profile statistics report performance metrics observed between event entry and exit. We typically speak of inclusive performance, which includes the performance of other descendant events that occur between the entry and exit of a particular event, as well as of exclusive performance, which does not include descendant performance.
Unfortunately, the measurement code alters the program's execution, primarily in execution time dilation, but it also affects hardware operation. Intrusion will be reflected in the profile results unless techniques can compensate for it. Other research work has sought to characterize measurement overhead as a way to bound intrusion effects or to control the degree of intrusion during execution. For instance, the work by Kranzlmüller [15] quantifies the overhead of MPI monitors using the benchmarking suite SKaMPI, and Fagot's work [8] assesses systematically the overhead of parallel program tracing. The work by Hollingsworth and Miller [12] demonstrates the use of measurement cost models, both predicted and observed, to control intrusion at runtime via measurement throttling and instrumentation disabling.
Our interest is to compensate for measurement intrusion. Thus, we
need to understand the intrusion effects are manifested in performance
profiles. Let us first consider inclusive execution time profiles,
using routines as our generic events. Two timing measurements are
necessary to determine inclusive time for a routine every time it is
called: one to get the current clock value at time of entry and one to
get the clock value at exit. The difference between the clock samples
is the inclusive time spent in this call. Let represent
the overhead to measure the inclusive time. If the routine
is
executed
times, its inclusive time is increased by
measurement overhead.
However, 's inclusive time is also increased by the overhead
incurred in the inclusive time measurement of descendant routines
invoked after
is called and before it exits. If
routines are
called while
is active (
covers all calls to
), the
inclusive time is increased additionally by
because each
descendant routine profiled will incur inclusive overhead.
To calculate 's exclusive time, the inclusive time spent in each
direct descendant is subtracted from
's inclusive time. Since this
occurs at
's exit, the inclusive times for all direct descendant
calls must be summed. If
is the measurement overhead to do
the summation upon each direct descendant exit, the total overhead
that gets reflected in
's exclusive time is
for
direct descendant calls. (The subtraction from
's inclusive time is
small and will be ignored for sake of discussion.) It is important to
observe that all overhead accumulated in the inclusive times of
's
direct descendants gets cancelled in the exclusive time computation.
Unfortunately, if we calculate the inclusive and exclusive times
simultaneously, as is normally the case, the
overhead
must be added to
's inclusive time.
To summarize, the total performance profile of routine will show an
increase in inclusive time due to measurement overhead
of
,
where
is the total number of times
is called,
is the number of times descendant routines of
are called, and
is the number of times
's direct descendant routines are called. The
performance profile of routine
will also show an increase in exclusive
time of
due to overhead.
Standard profiling practice does not report the measurement overheads included in per event inclusive and exclusive profiles. Typically, overhead is reported as a percetage slowdown in total execution time, with the implicit assumption that measurement overhead is equally distributed across all measured events. Clearly, this is not the case. Events with a large number of descendant event occurrences will assume a greater proportion of measurement overhead. If we cannot compensate for this overhead, the performance profile will be distorted as a result. Furthermore, if processes of a parallel application exhibit different execution behavior, the parallel profiles will show skewed performance results.