next up previous
Next: TAU Portable Profiling Up: Dynamic Performance Callstack Sampling: Previous: Introduction

Performance Callstacks

A callstack at a point in execution shows the current execution location(s) and the sequence of procedure calls that led to it [4]. A performance callstack is a view on a program's performance execution profile at runtime. The execution profile shows where time is being spent in a program's code (mainly with respect to routines) for each thread of execution. A performance callstack profile at a point in time is defined as the profile of the application with respect to the active profiled blocks on the callstack if the application had terminated at that time.

Selective profiling allows a user to profile a group of profiled blocks. A profiled block is said to be active if its instrumentation is enabled at runtime. If a profiled block is not active, it does not appear on the callstack and no statistics are maintained for it. A profiled block can be a routine (function) or a user-defined statement-level timer that has both inclusive and exclusive profiled quantities associated with it. A profiled quantity could be time or the value of a hardware performance counter such as the number of secondary data cache misses.

The performance callstack view is defined only for those routines in the calling stack of a thread at the point where the callstack is sampled. Looking at a single sample, the performance callstack shows, for each profiled block in the calling stack, its current execution profile statistics. These statistics include the number of invocations or calls, the number of profiled subroutines called by it, the aggregate exclusive and inclusive time spent in the routine, and the instance exclusive and inclusive time spent in the routine since the start of each instance of the profiled block on the callstack.

To illustrate how these are calculated, we look at an example shown below. In this pseudocode, routine entry and exit times (with respect to the wallclock time) are marked by open and close braces, respectively. A slice of the program's execution profile is taken when the code indicated by the TAU_MONITOR() routine is encountered. At this instance, the performance callstack for that thread of execution is calculated and the performance metrics calculated represent the execution profile, had the application terminated at that instant. Table 1 shows how the performance callstack statistics described in this section are calculated.

Example program showing routine extry and exits w.r.t time

ROUTINE                          TIME (usec)
main() {                             0
       foo() {                       5
             }                      10

       bar() {                      15
             foo() {                20
                 TAU_MONITOR()      25

  


Table 1: Callstack statistics for the example
Routine on Callstack Calls Subrs Excl (usec) Incl (usec) Instance Excl (usec) Instance Incl (usec)
main() 1 2 5 25 5 25
bar() 1 1 5 10 5 10
foo() 2 0 10 10 5 5

In this way, callstack profiling helps the user understand the performance profile metrics at a single point in time for only those routines that are active and on the callstack. Multiple samples taken over time can be displayed in the form of a callstack trace history showing how performance behavior changes. Depending on where the callstack is sampled, performance views for different subsets of a program's routines can result.



next up previous
Next: TAU Portable Profiling Up: Dynamic Performance Callstack Sampling: Previous: Introduction



Sameer Suresh Shende
Thu Aug 6 13:08:10 PDT 1998