The experimental methodology we use to evaluate overhead compensation characterizes the profiling measurement for an application with respect to levels of instrumentation and sequential versus parallel execution. For the experiments we report here, we used three levels of instrumentation. The main only (MO) instrumentation is used to determine the total execution time for the ``main'' routine. This will serve as our standard estimate for overall performance using as little instrumentation as possible. The profile all (PA) instrumentation generates profile measurements for every source-level routine of the program. The callpath all (CA) instrumentation uses TAU's callpath profiling capabilities to generate profile measurements for routine callpaths of the program. Obviously, this CA instrumentation is significantly greater than PA and will further stress overhead compensation.
Five experiments are run for an application using the three levels of instrumentation. The MO experiment gives us a measure of total execution time. For parallel SPMD applications, we profile the ``main'' routine of the individual processes, using the maximum as the program's total execution time. The per process times can also be used for evaluation under the assumption the program's behavior is well-behaved. The PA experiment returns profiling measurements without compensation. We let PA-comp represent a PA-instrumented run with compensation enabled. Similarly, a CA experiment returns callpath profiling measurements without compensation and a CA-comp experiment returns callpath profile results after overhead compensation.
We can compare the ``main'' profile values from PA, PA-comp, CA, and CA-comp runs to the MO run to evaluate the benefit of overhead compensation. However, we can also look at other indirect evidence of compensation effectiveness. Assuming the PA-comp run delivers accurate profile results, we can compare the associated statistics from the CA-comp profile to see how closely they matched. This can also be done for the PA and PA-comp runs with different levels of instrumentation. Per process values can be used in all parallel cases for comparison under SPMD assumptions.
Ten trials are executed for each experiment. We have a choice of using profile results with the minimum ``main'' values or the average ``main'' values in the evaluation. Our preference is to use the profiles reporting minimums. The reason is that these runs are likely to have less artifacts in the execution (i.e., anomalies not directly attributed to the program) and, thus, represent ``best case'' performance. On the other hand, an argument can be made to take the average profile values, since artifacts may be related to the instrumentation. We report both values in our results below. However, it is important to note that calculating average profiles may not be reliable for programs that do not behave in a deterministic manner.
Following the experimental methodology above, we tested overhead compensation on all NAS parallel benchmark applications . As the application codes vary in their structure and number of events, we expected differences in the effectiveness of compensation. We ran the ten experiments for each application sequentially and on 16 processors. Problems sizes were chosen mainly to achieve runtimes of reasonable durations. The parallel system used in our study was a Dell Linux cluster.In the following sections, we report on six of the NAS benchmarks: SP, BT, LU, CG, IS, and FT.