We validate our parallel performance intrusion compensation model using a
prototype implemented within the TAU performance system. To illustrate the
problem, we examine a parallel MPI application that computes the value of
using the Monte-Carlo integration algorithm. The program calculates
the area under the
function curve (
) from
to
. The program comprises of a master (or server) task that generates
work packets with a set of random numbers. The master task waits for a
request from any worker and sends the chunk of randomly generated numbers
to it. For each pair of numbers that is given to a particular worker, it
finds out if the pair of cartesian co-ordinates represented by the numbers
is below or above the
function curve. Then, collectively, the workers
estimate the value of
iteratively until it is within a given error
range. This simple example highlights how instrumentation overheads
accumulated at the worker tasks are communicated to the master task. We
execute the application in four modes: when there is no TAU
instrumentation, with instrumentation without any compensation, with local
perturbation compensation, and finally, with parallel perturbation
compensation. As shown in table 1, these experiments are shown as
distinct columns and we show the time spent in the worker and master
tasks. We show the minimum times spent in the respective tasks. The timer
overhead associated with a TAU timer was 480 nanoseconds on an
Intel®Itanium2 Linux machine running at 1.5 GHz. The
accuracy of compensation improves when we use high resolution timers, such
as those provided by PAPI(4).
The results in Figure 13 and Table 1 show that local compensation schemes do manage to reduce the overhead in the worker tasks, but they fail in the master. The parallel compensation scheme reduces the overhead properly in both master and worker tasks.