There is growing interest in monitoring parallel applications and interacting with the running program as the computation proceeds. Some efforts are focussed on computational steering and support the creation of sensors to observe execution dynamics and actuators to change program behavior. Consistent with these directions, our attention is towards online performance evaluation using tracing as a measurement approach. The purpose is to offer the user the same rich functionality as off-line trace analysis, but without the penalties of large trace data management.
However, the development of a general-purpose online trace analysis system is difficult, especially if it is to be portable and scalable. The work presented here is a first step toward this goal. Combining the strengths of the TAU and VNG tools, we demonstrated a full-path, working system that allows interactive trace generation, merging, analysis, and visualization. In its present form, the work is quite portable across parallel platforms, as it is based on already portable existing tools and the file system and inter-process communication interfaces used are standard.
Our next step in this research is to conduct scalability performance tests. We expect the file system-based trace merging approach will suffer at some point. To address this problem at higher levels of parallelism, we are considering the use of the recent work on MRNet, a multi-cast reduction network infrastructure for tools, to implement a tree-based parallel version of the TAU merger.