next up previous
Next: Peformance Database Framework Up: paper-final Previous: Performance Mapping and Dynamic

Large-Scale Performance Monitoring and Steering

Parallel performance tools offer the program developer insights into the execution behavior of an application. However, most tools do not work well with large-scale parallel applications where the performance data generated comes from thousands of processes. Not only can the data be difficult to manage and the analysis complex, but existing performance display tools are mostly restricted to two dimensions and lack the customization and interaction to support full data investigation. In addition, it is increasingly important that performance tools be able to function online, making it possible to control and adapt long-running applications based on performance feedback. Again, large-scale parallelism complicates the online access and management of performance data. It may be desirable to use existing computational steering systems for this purpose, but this will require performance analysis and visualization to be integrated with these tools. As a result of our work with the University of Utah [16], we found ourselves in a position to design and prototype a system architecture for coupling advanced three-dimensional visualization with online performance data access, analysis, and visualization in a large-scale parallel environment. The architecture, shown in Figure 3, consists of four components. The ``performance data integrator'' component is responsible for interfacing with a performance monitoring system to merge parallel performance samples into a synchronous data stream for analysis. The ``performance data reader'' component reads the external performance data into internal data structures of the analysis and visualization system. The ``performance analyzer'' component provides the analysis developer a programmable framework for constructing analysis modules that can be linked together for different functionality. The ``performance visualizer'' component can also be programmed to create different displays modules.

Figure 3: Online performance analysis and visualization architecture.

Our prototype is based on the TAU performance system, the Uintah computational framework [15], and the SCIRun [13] computational steering and visualization system. Parallel profile data from a Uintah simulation are sampled and written to profile files during execution. The performance data integrator reads the performance profile files, generated for each profile sample for each thread, and merges the files into a single, synchronized profile sample dataset. Each profile sample file is assigned a sequence number and the whole dataset is sequenced and timestamped. A socket- based protocol is maintained with the performance data reader to inform it of the availability of new profile samples and to coordinate dataset transfer. The performance profile reader, implemented as a SCIRun module, inputs the merged profile sample dataset sent by the data integrator and stores the dataset in an internal C++ object structure. A profile sample dataset is organized in a tree-like manner according to TAU profile hierarchy:
node $\rightarrow$ context $\rightarrow$ thread $\rightarrow$ profile data
Each object in the profile tree has a set of attribute access methods and a set of offspring access methods. Using the access methods on the profile tree object, all performance profile data, including cross-sample data, is available for analysis. Utah's SCIRun [13] provides a programmable system for building and linking the analysis and visualization components. A library of performance analysis modules can be developed, some simple and others more sophisticated. We have implemented two generic profile analysis modules: Gen2DField and Gen3DField. The modules provide user control that allows them to be customized with respect to events, data values, number of samples, and filter options. Ultimately, the output of the analysis modules must be in a form that can be visualized. The Gen2DField and Gen3DField modules are so named because they produce 2D and 3D Field data, respectively. SCIRun has different geometric meshes available for Fields. We use an ImageMesh for 2D fields and a PointCloudMesh for 3D fields. The role of the performance visualizer component is to read the Field objects generated from performance analysis and show graphical representations of performance results. We have built three visualization modules to demonstrate the display of 2D and 3D data fields. The Terrain visualizer shows ImageMesh data as a surface. The user can select the resolution of the X and Y dimensions in the Terrain control panel. A TerrainDenotator module was developed to mark interesting points in the visualization. A different display of 2D field data is produced by the KiviatTube visualizer. Here a ``tube'' surface is created where the distance of points from the tube center axis is determined by metric values and the tube length correlates with the sample. The visualization of PointCloudMesh data is accomplished by the PointCloud visualizer module. The SCIRun program graph in Figure 4 shows how the data reader, analyzer, and visualizer modules are connected to process parallel profile samples from a Uintah application. The visualization is for a 500 processor run and shows the entire parallel profile measurement. The performance events are along the left-right axis, the processors along the in-out axis, and the performance metric (in this case, the exclusive execution time) along the up-down axis. Denotators are used to identify the performance events in the legend with the largest metric values. This full performance view enables the user to quickly identify major performance contributors.


Figure 4: Performance profile visualization of 500 Uintah processes

Although this work is in the early stages, it demonstrates the significant tool advances possible through technology integration. As the Utah C-SAFE ASCI project moves towards Uintah computations with adaptive-mesh refinement capabilities, we expect the relevance of online performance analysis to increase in importance. We are developing new performance visualization modules and extending the performance profile data to accommodate hardware counter statistics. Since SCIRun is being positioned as a computational steering system for Uintah, the implementation of the online performance tool in SCIRun well positions it for use as a customizable performance steering tool.
next up previous
Next: Peformance Database Framework Up: paper-final Previous: Performance Mapping and Dynamic
Sameer Suresh Shende 2003-02-21