next up previous
Next: Triggers for Trace Dumping Up: Enabling Online Trace Analysis Previous: Enabling Online Trace Analysis


Overview

Figure 2 depicts the overall architecture of our online trace analysis framework. On the far left we see a running application instrumented by TAU. The inserted probes call the TAU measurement library which is responsible for the trace event generation. Periodically, depending on external or internal conditions (see section 4.2 for details), each application context will dump its event trace data. In our present implementation, this is done to disk via NFS on a dedicated, high-speed network separate from the MPI message passing infrastructure. An independent trace writing network is a nice feature of our cluster that reduces tracing intrusion on the application.

Concurrently, an independent TAU process runs on a dedicated node and merges the parallel traces streams. Its responsibility is to produce a single globally-consistent trace with synchronized timestamps. This stream is then passed to the VNG analysis server which is intended to run on a small subset of dedicated cluster nodes. From there, pre-calculated performance profiles and event timelines are sent to the visualization client running on the user's local computer. This multi-layer approach allows to event data to be processed online and in parallel, close to its point of origin on the cluster. Furthermore, the interactive access to the data, coupled with runtime instrumentation and measurement control, allows the detailed analysis of long production runs.


next up previous
Next: Triggers for Trace Dumping Up: Enabling Online Trace Analysis Previous: Enabling Online Trace Analysis
Sameer Suresh Shende 2003-09-12