TAU integrated with PETSc



TAU uses PDT for source-to-source translation based instrumentation of PETSc source code. TAU's MPI wrapper library level instrumentation is used to gather information pertaining to the MPI library. A performance case study (for ex19) is available here.


To use TAU with PETSc v 2.1.3, replace the bmake/common/rules file with rules file and modify the location of the TAU stub makefile. Configure TAU with -pdt=< dir > -mpiinc=< dir > -mpilib=< dir > options. Execute the application to generate profile and/or trace data. NOTE: You need TAU v 2.11.16 or better for PETSc.

Profiling PETSc applications with TAU

The above display (jracy) shows the profile of ex2 example (using make BOPT=g) from the petsc/src/sles/examples/tutorials directory (-m 200 -n 200). It was executed on four processors (node 0 through 3 in the main display). n,c,t 1,0,0 stands for node 1, context 0, thread 0. Color coded routines are seen in the Ledger window below.

By clicking the middle mouse button over "mean" in the main racy window, we see the average text profile over all nodes. The routines can be sorted in different ways. Here, the routines are sorted according to the exclusive time. VecMAXPY_Seq routine takes the most exclusive time (followed by VecMDot_Seq).

Clicking on the routine names or colors shows the breakdown of the routine over all nodes. The inclusive or exclusive time spent in the routine can be displayed in microseconds (msecs or secs) or as a percentage of the total time on that node.

Clicking the first mouse button on a particular node (mean in this case) in the main racy window shows the breakdown of time spent in routines in that node. The Options menu shows various sorting and display options.

TAU's older profile browser "racy" also shows user defined events. In this case, we see the message communication information. The sizes of messages sent from each node are shown in the window above.

By clicking on a particular node (node 0 in the figure above), we see the message sizes of messages sent to all nodes and to specific nodes. TAU's user defined events can be used for tracking any application-level entity.


When TAU is configured with the -TRACE option, event traces are generated. After merging (tau_merge *.trc app.trc) traces from all nodes, these can be converted (tau_convert -pv app.trc tau.edf app.pv) to the Vampir trace file format. Tracing shows the temporal variation of performance.

In the above figure, we see a timeline display. Inter process communication is also seen as line segments. Vampir allows the user to zoom into a segment of the trace to examine the events on each node.

The above figure shows the communication matrix in Vampir. The extent of inter-process message communication is seen here.

Tracing also preserves the dynamic calltree of a process. The calltree view on node 0 shows the time spent in VecMAXPY_Seq (7.3 secs), the number of times the routine was called (393) and the calling order.


For any assistance with PETSc and TAU, please contact <tau-team@cs.uoregon.edu>