Hackstadt's Masters Thesis

CHAPTER I
INTRODUCTION

Even though we navigate daily through a perceptual world of three spatial dimensions and reason occasionally about higher dimensional arenas with mathematical ease, the world portrayed on our information displays is caught up in the two-dimensionality of the endless flatlands of paper and video screen. All communication between the readers of an image and the makers of an image must now take place on a two-dimensional surface. Escaping this flatland is the essential task of envisioning information - for all the interesting worlds (physical, biological, imaginary, human) that we seek to understand are inevitably and happily multivariate in nature. Not flatlands.
- Edward Tufte, Envisioning Information (p. 12).

The dramatic increases in the complexity and sophistication of parallel computing systems seen in recent years has outpaced the corresponding growth in the analysis environments that accompany these systems. Consequently, users of parallel computers are left with ineffective resources for analyzing and evaluating the performance of their applications. The many advances that have been made in the areas of performance monitoring and tracing now enable the collection of vast amounts of detailed performance data about applications executing on parallel computers. For the user, however, such massive amounts of data are rarely useful. The analysis problem subsequently becomes extracting the key performance characteristics that will help the user to understand or improve their program.

Visualization has long been recognized as an effective means of portraying large quantities of complex data by replacing a cognitive task (i.e., analyzing a large table of numbers) with a perceptual task (i.e., noticing relationships among graphical objects and characteristics) [48]. While the traditional sciences have benefitted by many advances in visualization techniques and the development of sophisticated data visualization tools and environments, visualizations targeting parallel computer systems have remained ad hoc and have failed to take advantage of more powerful visualization techniques.

Performance visualization is the use of graphical display techniques for the visual analysis of performance data to improve the understanding of complex computer performance phenomena. From the opening chapter of his book Envisioning Information [48], Edward Tufte foretells the future of performance visualizations. Tufte recognizes and demonstrates the necessity and effectiveness of multi-dimensional information displays. While the graphics used in current performance visualization tools are predominantly confined to the two-dimensional flatland described by Tufte, the work reported here has developed new methods for rapidly prototyping a new generation of advanced, multi-dimensional performance visualizations.

As parallel programming environments continue to evolve, the need for more sophisticated, more flexible visualizations will increase. To harness the potential of more sophisticated visualization techniques requires a visualization methodology that is more robust than the techniques currently used in many performance visualization tools. Performance visualization has the distinction of lacking a physical model on which displays can be based. Rather, performance data represents abstract relationships between virtual objects (e.g., data elements, processors, or different components of a computing system). This distinction from traditional scientific visualization both hinders and benefits visualization in the context of parallel computing. It is a hindrance in that what a display is attempting to portray may not be immediately obvious to an unfamiliar user. However, the absence of a concrete physical model allows for more flexibility in the design of visualizations. For this reason, performance visualizations are more a product of how performance parameters have been mapped into graphical representations than they are a product of the physical "reality" from which the parameters have been collected.

The tendency for performance visualizations to be more abstract complicates the issues of usability and evaluation. This is best illustrated with a comparison to traditional scientific visualization. Consider, for example, modeling the interactions between thousands of molecules in a three-dimensional cube. Probably the most "useful" (and intuitive) display, in the majority of applications, would be a direct representation of the three-dimensional space and the molecules within it. The molecules might be represented as spheres that "float" around the cube, bouncing off the walls and colliding with one another. Such a visualization is a direct manifestation of the physical system (i.e., the physical system defines the visualization), and few scientists would find some other representation more generally useful than that suggested by the physical system.

Now, suppose someone wished to model the interactions between thousands of processors in a massively parallel computer. Should they use the interconnection topology of the machine (e.g., mesh, hypercube) as the basis for the visualization? Or perhaps they should consider the logical model imposed by a data-parallel programming language (e.g., vector, 2-D or 3-D array)? Or maybe it's more appropriate to base the visualization on how the compiler decides to distribute the data among processors? Each of these represents a possible foundation (i.e., context) upon which visualizations for a parallel system could be constructed, but no one of these is ubiquitous enough to capture all - or even many - of the possible performance visualization scenarios for the seemingly simple task of modeling the processors of a parallel computer. Thus, it is very difficult to determine which display might be the most useful. In fact, the concept of "usefulness" becomes extremely user- and application-dependent in the context of parallel performance visualizations. Furthermore, because performance visualizations can be more abstract, user preference plays a critical role in evaluating the effectiveness of a particular display.

Nonetheless, researchers have identified many important principles relating to the design and use of visualization in the context of parallel program evaluation [4,10,27,29,31,40]. Some of these principles include the use of multiple views, semantic context, and user interaction. Multiple views facilitate understanding of a parallel program's operation by capturing and analyzing data from the execution across different levels of observation and from different perspectives. In addition, it is generally believed that visualization interpretation is improved if the user is provided with a semantic context for understanding the parallel program data represented. Semantic context can play a role similar to that of the physical models that support traditional scientific visualizations. Finally, user interaction allows the viewer to select alternative views, change the level of detail or the type of data, and control display parameters in order to find the best possible visualization scenario for data interpretation.

Although these principles provide constructive guidelines for visualization design, it is still a challenging undertaking to develop generally useful parallel program visualization tools. Several projects have tried to deliver general visualization solutions, leading to widely debated concerns over usability. The focus in the parallel programming tool community on the quality of visualizations offered by end-user tools has been, in part, at the expense of research into developing improved visualization techniques that better target specific end-user requirements. With the importance of semantic context in enhancing visualization interpretation, it is good practice not to restrict visualization design creativity by requiring visualizations to have broad user appeal, particularly since a meaningful visualization for one user may not be especially meaningful for another. Rather, efforts are better directed at developing visualization techniques that can be applied in building visualization tools to address specific problems that users encounter in parallel program evaluation, while following general principles and guidelines for good visualization design [16,29,48].

To summarize, the preceding arguments lead to several characteristics that should be supported by a visualization environment for parallel systems. First, the absence of a physical model on which to base visualizations suggests the need for a more formal visualization methodology in which displays result from mapping performance data to graphical characteristics. Second, because usefulness is so dependent on the user and their application, the ability to modify and interact with a visualization is critical. Finally, a visualization environment should be adaptable to user needs and specific application contexts by focusing more on robust visualization techniques and specification rather than specific visualizations.

This thesis presents a methodology and implementation for performance visualization development that addresses these issues while greatly reducing programming overhead, facilitating rapid prototyping of displays, and allowing for effective iterative design and evaluation. By applying the tools of scientific visualization to performance visualization, this work demonstrates that next-generation displays for performance visualization can be prototyped, if not implemented, with existing data visualization software products using sophisticated graphical techniques that physical scientists have used for several years now.

The remaining chapters of this thesis fall into two main parts. Chapters I-VI provide a detailed description of the visualization process, while Chapters VII-IX each contain case studies of the use of this visualization process in specific parallel program and performance visualization contexts. Thus, this research is further motivated in Chapter II. Related research work is summarized in Chapter III, and a brief description of the IBM Data Explorer visualization package, the primary visualization development platform, is given in Chapter IV. This is followed by a detailed examination of the methodology in Chapter V, and a description of the visualization development process that was developed by applying the methodology to a particular implementation environment is found in Chapter VI. A case study focusing on the development of new visualization concepts is explored in Chapter VII. The second study, in Chapter VIII, focuses on the use of visualization for the evaluation of parallel data distributions. Chapter IX documents the development of scalable visualizations for data-parallel programs. An informal evaluation of this visualization development technique in Chapter X is followed by a summary of the work and some possible directions for future research in Chapter XI.

Last modified: Wed Jan 20 15:13:47 PST 1999

Steven Hackstadt / hacks@cs.uoregon.edu

CHAPTER I INTRODUCTION

CHAPTER I
INTRODUCTION