Hackstadt's Masters Thesis

CHAPTER III
RELATED WORK

Many research results and discussions of visualization tools have appeared in the literature in recent years. This chapter summarizes many of the key results that have contributed to this research effort.

A recent summary of current visualization tools for parallel systems can be found in [21] where Kraemer and Stasko summarize the current state-of-the-art in visualization development for parallel systems in general (i.e., parallel debugging, program visualization, and performance evaluation).

Tools such as ParaGraph [9,10], Pablo [33], and Seeplex [4] have become popular because of their portability, scalable displays, and complete performance visualization environments. The displays offered by these products can be very effective for general parallel performance evaluation. Typically, however, the creation of application-specific displays either requires considerable programming (e.g., X Window programming) or is simply unavailable. With regard to ParaGraph, Heath and Etheridge [10] admit:

Unfortunately, writing the necessary routines to support an application-specific display is a decidedly nontrivial task that requires a general knowledge of X Windows programming. (p. 38)

Stasko [46] explains the dire need for application-specific displays in the context of parallel program debugging, which he differentiates from performance evaluation by claiming that performance visualizations do not focus on the semantics of a particular program. As explained in the introduction, however, performance evaluation can be enhanced by creating visualizations which are linked to the semantics of an application. In this way, the visualization concerns of parallel program debugging and performance evaluation do intersect, and Stasko's reasons for application-specific displays become relevant to this work as well.

Many of the philosophies underlying this research are echoed in the work by Sarukkai and Gannon. In [42], they contend:

The lack of a generalized approach for the treatment of the performance data has lead to the use of ad-hoc means of developing performance visualization systems.
A truly programmable system should provide a means of easily obtaining the desired visualization and still not be tied to specific architectures or programs. To achieve this, the visualization mechanism should not be tied with the semantics of any event in the trace file. Instead it should provide a means of mapping subsets of all events in a trace file and different fields in these events to different axes of a figure and to different graphical objects such as circles, points, lines or 3-D objects.
Finally, a powerful visualization tool should provide some sophisticated graphical editing capabilities such as zooming into specific locations of windows, multiple color maps, overlaying of figures, etc. (pp. 158-159)

Data visualization software, coupled with the design process proposed in Chapter V, provides some of the capabilities identified by Sarukkai and Gannon. In general, the separation of data transformation and graphics makes visualizations independent of the trace data's semantics. Flexible data models offered by most scientific visualization packages simplify mappings between data and graphical characteristics and rendering techniques, and sophisticated display interaction techniques are also supported.

Sarukkai and Gannon also make a case for the importance of application-specific displays and rapid prototyping to enable more effective evaluation:

While it is convenient to have predefined visualizations of programs, the problem with such tools is that it is not easy to rapidly test new visualizations.... (p. 158)

The use of prototyping tools has been established by systems such as Pablo [33] and Polka [46]. Pablo promotes itself as a performance tool prototyping environment that allows and supports end-user applications. That is, the prototyping environment is the same as the one used by the end-user, but Pablo provides little support for new visualization prototyping. Polka can be used more effectively for the rapid development of algorithmic animations but is primarily suited for sequential programs.

An essential feature of next-generation visualizations is customizability of the displays. Pancake covers this topic as it pertains to parallel debugging in [31]. As with Stasko's work, many of the concepts discussed are also relevant to parallel performance visualization. Pancake's point is that visualizations based on the user's conceptual model can be more meaningful than those which are not. Therefore, giving the user the ability to customize and control visualizations should result in more meaningful and useful displays. Clearly, this notion is applicable to both debugging and performance visualizations. In [37], Roschelle argues that meaningful visualizations are not necessarily those that are consistent with an expert's mental model. Rather, users should be able to experiment with visualizations and develop their own understanding of the data. Thus, both researchers support the importance of customizable displays.

In [38], Rover proposes a paradigm that treats performance data similar to any distributed data (i.e., program and system data) in the context of the data parallel programming model. Rover states:

Visualization displays this performance data for perusal, employing the same presentation techniques in place for data visualization, such as animation, image transformations, color manipulations, statistical analyses. (p. 149)

In her conclusion, Rover states that existing scientific data visualization resources can be effectively applied to performance visualization. In a similar manner, this work treats performance data like scientific data and develops a methodology for applying scientific visualization tools which contain the presentation techniques identified by Rover to the problem of parallel performance visualization.

The literature shows at least one documented use of existing visualization software products for performance evaluation. In [39], Rover utilizes AVS and Matlab to generate performance displays. Her approach is similar to the design process proposed in this thesis in that performance data is collected, transformed into a format readable by the software, and then displayed using that software. The tools were used to generate simple, two-dimensional displays. This research both formalizes the approach and extends the use of such products into the development of new, more complex visualizations.

Reiss and Sarkar [34] present a sequential program analysis environment in which visualizations are defined as abstractions using queries over an object-oriented database of information about the program. Roman and Cox [36] define a program visualization as a mapping from programs to graphical representations. The methodology described in this paper combines these ideas and applies them to the context of parallel programs and performance by mapping abstract performance data to visual and graphical characteristics.

Last modified: Wed Jan 20 15:13:56 PST 1999

Steven Hackstadt / hacks@cs.uoregon.edu

CHAPTER III RELATED WORK

CHAPTER III
RELATED WORK