Table 1 shows the total sequential execution time of ``main'' in microseconds from the different profiles for the different applications. The minimum and mean values are reported. We also calculate the percentage error (using minimum and mean values) in approximating the MO time for ``main.'' The dataset size (A or W) used in the experiments is indicated.
An important observation is that the TAU measurement overhead per event is already very small, on the order of 500 nanoseconds for flat profiling on a 2.8 GHz Pentium Xeon processor. This can be easily seen in the TAU profile results (not shown) where the overhead estimation is given as an event in the profile. Of course, the slowdown seen in the PA and CA runs depends on the benchmark and the number of events instrumented and generated during execution. Because more events are created for callpath profiling, we expect to see more slowdown for the CA runs.
The results show that overhead compensation is better at approximating the total execution time, both for flat profiles and for callpath profiles. This is generally true for all of the NAS benchmarks we tested. In the case of IS-A, the flat profile compensation (PA-comp) shows remarkable improvement, from a 193% error in the PA measurement to within 2.1% of the ``main'' execution time. The improvements in compensated callpath profiles for SP-W to less than 1% error are also impressive.
To be clear, we are instrumenting every routine in the program as well as every depth of callpath. If, as a result, we instrument a small routine that gets called many times, overheads can accumulate significantly. For callpath profiling with instrumentation including a small event, overheads will be effectively multiplied by the number of callpaths containing the small routine. This is what is happening in IS-A. Flat profile compensation can deal with the error, but callpath compensation cannot. It is interesting that the reason can be attributed to the small differences in overhead unit estimation, ranging in this case from 957 nanoseconds (minimum) to 1045 (maximum). This seemingly minor 90 nanoseconds difference is enough in IS-A callpath profiling to cause major compensation errors. Certainly, the proper course of action is to remove the small routine from instrumentation.