Next: Triggers for Trace Dumping
Up: Enabling Online Trace Analysis
Previous: Enabling Online Trace Analysis
Overview
Figure 2 depicts the overall architecture
of our online trace analysis framework. On the far left we see a running
application instrumented by TAU. The inserted probes call the TAU measurement
library which is responsible for the trace event generation. Periodically,
depending on external or internal conditions (see section
4.2 for details), each application context will
dump its event trace data. In our present implementation, this is done to
disk via NFS on a dedicated, high-speed network separate from the MPI message
passing infrastructure. An independent trace writing network is a nice
feature of our cluster that reduces tracing intrusion on the application.
Concurrently, an independent TAU process runs on a dedicated node and merges
the parallel traces streams. Its responsibility is to produce a single
globally-consistent trace with synchronized timestamps. This stream is then
passed to the VNG analysis server which is intended to run on a small subset
of dedicated cluster nodes. From there, pre-calculated performance profiles
and event timelines are sent to the visualization client running on the user's
local computer. This multi-layer approach allows to event data to be
processed online and in parallel, close to its point of origin on the cluster.
Furthermore, the interactive access to the data, coupled with runtime
instrumentation and measurement control, allows the detailed analysis of long
production runs.
Next: Triggers for Trace Dumping
Up: Enabling Online Trace Analysis
Previous: Enabling Online Trace Analysis
Sameer Suresh Shende
2003-09-12