System Partitioning

Next: Conclusion Up: Enabling Online Trace Analysis Previous: Remote Steering and Visualization

System Partitioning

Figure 3 depicts how our cluster system might be partitioned for online trace analysis.

**Figure 3:** System Partitioning
$\includegraphics[bb=66 110 715 472,width=\columnwidth,clip=]{fig_online-analysis_partitioning}$

Together with the master node, we dedicated four additional processors (29-32) for the purpose of trace data processing. Parallel applications are run on the rest of the machine (processors 01 to 28). The cluster nodes are interconnected by a redundant gigabit Ethernet network each with its own network switch. The two networks allow us to keep message passing communication independent from NFS traffic, although our approach does not require this explicitly.

During an application run, every worker process is equipped with an instance of the TAU runtime library. This library is responsible for collecting trace data and writing it to a temporary file, one per worker process. The TAU merger process on processor 32 constantly collects and merges the independent streams from the temporary files to a unified representation of the trace data. At this point, the analysis process of VNG takes over. Whenever a remote user client connects to the server on the master node, the VNG worker processes come into action on the processors 29 to 31 in order to update their data or perform new analysis requests. During the entire process, trace data stays put in the cluster file system. This has tremendous benefits in GUI responsiveness at the client. Furthermore, the VNG analyzer communicates with the remote visualization client in a highly optimized fashion, guaranteeing fast response time even when run over a long distance connection with low bandwidth and high latencies (see Results).

Next: Conclusion Up: Enabling Online Trace Analysis Previous: Remote Steering and Visualization

Sameer Suresh Shende 2003-09-12