next up previous
Next: Parallel Experiments Up: Experiments with Compensation Analysis Previous: Experimental Methodology

Sequential Experiments

Table 1 shows the total sequential execution time of ``main'' in microseconds from the different profiles for the different applications. The minimum and mean values are reported. We also calculate the percentage error (using minimum and mean values) in approximating the MO time for ``main.'' The dataset size (A or W) used in the experiments is indicated.


Table 1: Overhead Compensation Results for NAS Benchmarks on Linux Cluster - Sequential
Experiment MO PA PA-comp CA CA-comp
  $\mu secs$ $\mu secs$ $\mu secs$ $\mu secs$ $\mu secs$
SP-A min 387588657 397602281 392833924 405226516 399405895
mean 388540699 398360423 394245841 407233889 401650317
%error (min:mean)   2.5 : 2.5 1.3 : 1.4 4.5 : 4.8 3.0 : 3.3
SP-W min 65427051 67942093 66404006 71812623 65517453
mean 66178471 69254426 67104562 73659688 66687843
%error (min:mean)   3.8 : 4.6 1.4 : 1.3 9.7 : 11.3 0.1 : 0.7
BT-A min 522765488 549063282 542479898 553178345 532736660
mean 524248915 552617635 545409236 555959945 536680190
%error (min:mean)   4.6 : 5.2 3.4 : 3.8 5.8 : 6.0 1.9 : 2.3
LU-W min 297366632 300993317 302786082 306287598 303405699
mean 299395075 302941264 305796049 307849925 306172285
%error (min:mean)   1.4 : 3.3 0.0 : -0.6 10.2 : 8.9 3.4 : 2.6
CG-A min 5368659 5733951 5740469 6824800 6536302
mean 5560969 5758157 5764569 6916842 6628535
%error (min:mean)   6.8 : 3.5 6.9 : 3.6 27.1 : 24.3 21.7 : 19.1
IS-A min 5967910 17540614 6094620 35457776 2632054
mean 5987002 17667114 6215288 36008102 4441510
%error (min:mean)   193.9 : 195.0 2.1 : 3.8 494.1 : 501.4 -55.8 : -25.8
FT-A min 24593893 25418103 25296244 29104159 28754736
mean 25215853 25549141 25557557 29470907 28918045
%error (min:mean)   3.3 : 1.3 2.8 : 1.3 18.3 : 16.9 16.9 : 14.6


An important observation is that the TAU measurement overhead per event is already very small, on the order of 500 nanoseconds for flat profiling on a 2.8 GHz Pentium Xeon processor. This can be easily seen in the TAU profile results (not shown) where the overhead estimation is given as an event in the profile. Of course, the slowdown seen in the PA and CA runs depends on the benchmark and the number of events instrumented and generated during execution. Because more events are created for callpath profiling, we expect to see more slowdown for the CA runs.

The results show that overhead compensation is better at approximating the total execution time, both for flat profiles and for callpath profiles. This is generally true for all of the NAS benchmarks we tested. In the case of IS-A, the flat profile compensation (PA-comp) shows remarkable improvement, from a 193% error in the PA measurement to within 2.1% of the ``main'' execution time. The improvements in compensated callpath profiles for SP-W to less than 1% error are also impressive.

To be clear, we are instrumenting every routine in the program as well as every depth of callpath. If, as a result, we instrument a small routine that gets called many times, overheads can accumulate significantly. For callpath profiling with instrumentation including a small event, overheads will be effectively multiplied by the number of callpaths containing the small routine. This is what is happening in IS-A. Flat profile compensation can deal with the error, but callpath compensation cannot. It is interesting that the reason can be attributed to the small differences in overhead unit estimation, ranging in this case from 957 nanoseconds (minimum) to 1045 (maximum). This seemingly minor 90 nanoseconds difference is enough in IS-A callpath profiling to cause major compensation errors. Certainly, the proper course of action is to remove the small routine from instrumentation.


next up previous
Next: Parallel Experiments Up: Experiments with Compensation Analysis Previous: Experimental Methodology
Sameer Shende 2004-06-08