Hackstadt's Masters Thesis

CHAPTER VI
IMPLEMENTATION

It was mentioned previously that the methodology described in Chapter V is open to a variety of implementations. The methodology offers a framework for creating interfaces to existing graphical resources, an efficient means of simultaneously gaining access to a wide variety of graphical capabilities and minimizing graphical programming. In this way, an instantiation of Chapter V's methodology yields an environment in which visualizations are actually developed. This chapter discusses one such implementation that uses the Data Explorer software.

The visualization development process that has evolved from working with Data Explorer is illustrated in Figure 2. Performance visualization starts with raw trace data or statistics. The fundamental steps of this process transform the trace data into a data object file and a visual program. While trace transformations manifest themselves physically (i.e., as some real program or function operating upon the trace data), graphical transformations in this implementation environment are a mental process by which analysts merge the capabilities of the visualization environment, their knowledge of the performance data, and a visualization concept - or, more formally, a view abstraction - to construct a visual program in DX that drives the creation of the desired display. It is important to note that the specification of a visualization in Data Explorer (by a visual program) does not fulfill the role of the view abstraction specification of Figure 1; that aspect of the methodology is not explicitly defined in this environment, but is, instead, embodied in the mental transformation process. In an alternative implementation environment, or as an extension to this work, abstraction specification could play an important role in developing the mapping from performance data to graphical representation.

Figure 2. A development process based on the abstract performance visualization methodology can be realized using existing data visualization software.

The trace transformation may perform several operations and reductions on the data, but it ultimately creates a data object file that can be interpreted by the data visualization software. The content of these DX data object files maps nicely to the concept of performance objects. In fact, that is exactly what DX data files contain - data objects that have been constructed by the transformation of performance data. From this specially formatted data file, a visualization prototype is created by executing the visual program. The resulting display can be manipulated in many ways, including rotating, zooming, and travelling through or around the objects in the image by capitalizing on the capabilities available in the software.

Figure 2 also shows how the performance visualization prototyping environment can be integrated into an iterative design and evaluation process. Though not explicitly shown in the figure, the redesign of a visualization is accomplished by modifying the trace transformation, the graphical transformation, or both. Eventually, the visualization will be ready for production. If the software package is an insufficient environment for the final implementation (e.g., because of cost or performance), then a graphics-level implementation of the visualization is appropriate at that point. Note that this occurs after the iterative design and evaluation process is complete. Ideally, though, the prototyping environment and implementation environment are the same, in which case the visualization prototype would be in its production version immediately after evaluation of the prototype is complete.

Figure 2 indicates the interdependence between the visual program and the data object file by a dashed arrow between them. In an ideal development environment, performance objects and view objects are implemented independently of one another. However, dependencies between data objects and visual programs exist in some cases. For example, much of the trace processing required to create a visualization like Figure 12 (p. 42) is in representing the polygonal facets of the cylinder. Essentially, the data object file is a precise specification for the graphical structure. Ideally, though, the developer should be able to rely on the visualization environment to provide that transformation. In other words, this sort of processing should really be part of the graphical transformation (i.e., the creation of the visual program in DX). In an end-user visualization tool, such processing would be inherent in the visualization development environment, not programmed externally by the user. However, in a prototyping environment where new visualizations are created for evaluation and then modified, this sort of processing is more easily managed as part of the external trace processing. The switch from prototyping to production sees this additional trace processing ported into the implementation environment.

In fact, DX can accommodate the creation of user-defined modules that accept more general, abstracted trace data and perform the necessary structural transformations. A developer could write a DX module that performs a specific transformation on trace data and easily integrate it into the visualization environment. This transition would make the implementation environment more consistent with the methodology from which it came, but since prototyping visualizations enables development and evaluation, this aspect of the implementation environment has not yet been explored.

In this process, graphics programming is avoided prior to production, the developer is able to focus on the visualization rather than the code that generates it, and visualizations are quickly created for evaluation, modification, and redesign. This is possible because changes to visualizations are made more easily and have fewer implications in the prototyping environment than in current performance visualization tools. Compared to existing performance visualization techniques, this method is different in that it separates the development phase from the production phase. As the area of visualization evaluation advances, a decoupled development process will be important so that modifications may be made quickly and easily. The application of this methodology creates a process that is a step toward that goal. The following sections will explore trace and graphical transformations more deeply.

Trace Transformation: Creating the Data Object File

Raw performance data is most effectively processed by Data Explorer when it is transformed into the DX data model. This section discusses some of the issues involved in creating such trace transformations.

Trace Analysis

By using an existing software product, part of the visualization problem becomes one of trace transformation. To take advantage of Data Explorer's rich data model the trace files or performance data must be transformed into a format that Data Explorer can process. A single trace file may take on several different representations within Data Explorer depending on the visualization desired. On one hand, a transformation may simply augment the existing trace data with the appropriate keywords and structure. Alternatively, it may perform extensive computations and/or reductions on the data set before it generates the Data Explorer file. It is important to note that the problem of trace analysis still exists. That is, in most cases the trace transformation is responsible for any analysis or reduction of the trace data. However, one of the advantages of the method being proposed here is that the transformation of trace data is done independently of any graphical representation of that data, a concept promoted in [33] and [42], and a key characteristic of the high-level methodology. Thus, transformations are easily modified and can be used with several different graphical techniques. In an environment of creation, testing, and evaluation, the ability to make changes with minimal intrusion on the rest of the system allows for more rapid prototyping of the displays.

Trace Transformation Programming

Trace transformations can be easily implemented in traditional programming languages, and do not require any special programming skills. It is even possible that existing performance tools can be used for more involved trace analysis problems, and then to generate data files suitable for input to Data Explorer. The underlying requirement, though, is an understanding of the data model for the product being used. In the case of DX, experience suggests that it should take no more than a couple weeks of studying examples and experimenting for an individual to become very comfortable with DX's data model. Once an understanding of the data model is achieved, trace transformation programming follows quickly and easily. Data Explorer's data model is rich enough for general scientific visualization and as a result, offers many alternatives for performance visualization.

Self-Describing Data

Data Explorer's data model offers significant advantages over traditional data representation schemes. Because Data Explorer data is "self-describing" and modules are designed with this in mind, a single DX visual program can generate many different displays depending on the data it processes. (The notion of self-describing data can also be found in Pablo's Self-Describing Data Format (SDDF) [33] and NCSA's Hierarchical Data Format (HDF) [30].) The data files imported by DX contain structural object information which determines a set of possible visualizations. It is up to the visual program to extract and process the desired information from the data file. Many visual programs are reusable, requiring only minimal changes, if any at all.

Graphical Transformation: Creating the Visual Program

In this section, several features of Data Explorer (and many other scientific visualization packages) pertaining to graphical transformations will be discussed in the context of parallel performance visualization.

Visual Programs

Once a trace has been transformed into the DX data model, it can be imported using either DX's visual programming language or its scripting counterpart. The visual programs translate directly into the more general scripting language, both of which are best described as functional and data-driven. Since the DX programs created for this work were usually quite simple, the visual programming environment was adequate. A switch between the two programming styles could have been made at any time if it had become necessary, however. Programs generally consist of three phases: importing and selecting, processing, and then rendering the data. One of the main advantages to using a product such as Data Explorer is that all of the programming that is necessary to implement these three phases is already done for the developer.

Figure 3 contains a visual program created in Data Explorer. The graphical representation of a DX function is a module with sets of "tabs" on its top and bottom, corresponding to inputs and outputs, respectively. By connecting one module's output tab to another module's input tab, the user assembles a network of modules - a visual program - that specifies and controls the visualization. Connections between modules indicate the flow of data through the network.

Figure 3. A simple visual program in Data Explorer is capable of creating many different types of visualizations, as seen in Figure 8.

Customization

Forming the basis of the customization opportunities within DX, modules of a visual program usually have default values that do the "right thing" to the data. However, in certain cases, it is desirable (or required) to set certain parameters. For example, to import data, the user selects the Import module from the DX menus and places it on the programming canvas. (The visual program in Figure 3, like most DX programs, uses the Import module.) The Import module requires the user to tell it which data object file is to be read. To set the parameters of a module, the user begins by double-clicking on the module, which opens into a window like the one in Figure 4. Next, the user types the appropriate information into the desired fields (e.g., the "name" field in the Import module).

Figure 4. The Import module reads a data file into the visualization environment.

Data Explorer offers other techniques for controlling module parameters which allow the user to more easily interact with and "tweak" the characteristics of a display. By connecting objects called "interactors" to input tabs, the user can create a "control panel" that allows for easy modification of any number of different parameters. An interactor appears in the visual program as a simple module (no inputs, one or more outputs) and in a control panel as a selector, a dial or slider, a text field, or some other interaction object. (Note that the visual program in Figure 3 does not contain any interactors.) Interactors are highly configurable yet easy to use, adding significant flexibility to the visualization development process. An example control panel appears in Figure 5. As can be seen, the control panel allows the user to select import data files, alter the graphical characteristics of the display, and even change the quantities being visualized. Such a flexible environment fosters the development of customizable displays and a high-degree of user-interaction, important properties for next-generation parallel program and performance displays.

Figure 5. Control panels are used to create simple interfaces that can manipulate many characteristics of a visualization.

Display Interaction

Next-generation performance visualizations are bound to take advantage of three-dimensional displays. The additional information made available by this technique will be invaluable to programmers using next-generation parallel languages. However, moving to three-dimensional visualizations necessitates additional functionality with regard to visualization control and interaction.

Adding a third dimension to a visualization increases the representation potential for the data associated with a given display. Three-dimensional rendering techniques allow the viewer to see more of that data, and display interactions increase the access to and control of visual details and display attributes. However, extending existing performance visualization tools to three-dimensional displays requires more than just adding a projection routine. Because three-dimensional displays are so dependent on the angle from which they are viewed and the rendering techniques being used, tools offering three-dimensional displays need ways for the user to interact with the objects in the display. At a minimum, this would seem to require the ability to zoom in/out on any part of an object, to rotate the object arbitrarily, and to control graphical attributes such as color and transparency. More advanced tools include control over lighting models and the surface properties of display objects (e.g., specularity and reflectance).

As an example of the additional features provided by scientific visualization tools, Data Explorer's color map editor, shown in Figure 6, allows the visualization developer to customize a visualization's color map. In fact, multiple color maps can be used for different objects in the display. The importance of flexible color mapping has been documented [3,35]. Visualizations can be given completely new meaning simply by changing the color map(s) associated with the display. Such features are not trivially incorporated into existing performance visualization tools but are standard in many visualization packages.

Figure 6. A colormap editor can create arbitrary colormaps for a visualization, enabling the analyst to explore and highlight different features of the represented data.

Animation

Another feature that is extremely important to performance visualization is animation, again a capability offered by most data visualization packages. Data Explorer's animation technique is similar to that of a cartoon. The DX data file often contains time series data. When visualizations for each member of the time series are displayed rapidly in succession, an animation results. This is not as flexible as the animation techniques found in some performance visualization tools (e.g., ParaGraph [10] and Polka [46]) where, for example, a connection between two nodes may appear and disappear without ever having to change the surrounding image. However, once displays become three-dimensional, this type of animation becomes difficult since connections may pass through, in front of, or behind other objects in the scene, requiring complex hidden line and surface analysis algorithms for a correct visualization.

For instance, Figure 7(a) shows a display from the popular ParaGraph visualization tool, while Figure 7(b) is a prototype of the ParaGraph display generated by Data Explorer. Both displays show communication between processors. ParaGraph's display is two-dimensional while Data Explorer's is (inherently) three-dimensional, though not yet to any representational advantage. In rendering the two-dimensional display, it is known that a connection between two nodes will not interfere with any other objects in the scene (except other connections which are safely ignored since they are just pixel-wide lines) since the nodes are arranged in a circle. However, in the three-dimensional visualization, the visibility of a given connection is dependent on the orientation of the structure as well as the components making it up, which includes the other connections. The situation is worse when a truly three-dimensional display is generated. The benefit of using a tool like Data Explorer is that this display can be extended beyond the "flat" prototype into a real three-dimensional display, resulting in a much richer, information-dense display. But this added graphical complexity has its consequences as it necessitates a stricter method for animation in the generalized data visualization software products. An example of extending an existing, two-dimensional visualization is presented in Chapter VII.

In general, visualizing in three dimensions overcomes certain limitations inherent in two dimensions. For example, in 3-D there is more flexibility in layout than in 2-D, making the creation of scalable displays potentially less intractable (see the case study in Chapter IX). Advanced graphics rendering also offers more options for combining global and detailed performance visualization in a single display (as in Chapter VII). While the display in Figure 7(b) may be perfectly acceptable, three-dimensional representations offer greater possibilities to the developer and must play an integral part in the next generation of parallel performance visualization tools.

Figure 7. Displays from (a) existing performance visualization tools can be prototyped, and subsequently extended, using (b) three-dimensional data visualization packages.

Performance Visualization Examples

In this section, examples of how the development techniques described thus far can be applied to parallel performance visualization are presented. These examples should pave the way for the specific case studies in the following chapters.

The DX data model centers around sets of positions and connections. A simple DX program might create a visualization that annotates positions with spheres and connections between positions with cylinders. Additional coloring might take place depending on the data being processed. Figure 3 contains a visual program that accomplishes these tasks for appropriately structured input data.

Performance visualizations that intend to illustrate interprocessor communication often manifest themselves in a visualization fitting the description given above. That is, processors can be represented by a set of spheres in space, while the communication between processors can be realized by links between the spheres (e.g., Figure 7(b)). Such a display can be extended in many ways and is certainly not limited to interprocessor communication.

To emphasize the use of reusable visual programs, the images in Figure 8 were all generated by the visual program in Figure 3. The only program parameter that was changed was the name of the data file in the Import module. All the other modules have default parameters that do the right thing for the data being visualized. The Data Explorer modules are able to figure out what to do with the data without the user explicitly describing it; the trace transformation process is responsible for augmenting the trace data with enough structural information so that Data Explorer modules can construct the visualization from the data. Thus, the structure and content of the data file - which is the result of a trace transformation - plays a key role in determining a visualization's appearance. In this way, a single visual program enables a set of displays to be generated. Practically, this is convenient, but theoretically, it violates the desired independence between performance and view objects, as described by the high-level methodology. Again, this is a result of the specific implementation environment and represents a design decision made so that prototyping could be better supported.

Each of the displays in Figure 8 could be used in a parallel performance setting. For instance, Figure 8(a) could represent interprocessor communication in a ring topology. Similarly, Figure 8(b) could be applied to a mesh architecture where glyph size represents communication overhead, link color represents the communication load on a particular interconnection between processors, and the mesh background shows a continuous interpolation of the discrete node data, potentially useful to observe scalability characteristics. Figure 8(c) extends the previous example into three dimensions. The interior of the solid is volume-rendered to create a "cloud" of colors which can offer insight into the possible results of a scaled-up version of the application. Finally, Figure 8(d) offers a novel visualization where processors exist in a two-dimensional grid with "sails" emanating from the glyphs. For each processor, the orientation of the sail and the height of the sail's two upper points could be controlled by a three-dimensional metric (e.g., busy, idle, and overhead percentages). While these displays are significantly different graphically, they were all created by a single, simple visual program processing different data files. In terms of the methodology, the same view objects are being combined with different performance objects.

Figure 8. The visual program in Figure 3 can create a wide range of performance visualizations depending on the structure of the underlying data.

Alternatively, the same dataset can be processed by different visual programs to generate different displays, an approach common to scientific visualization. For example, a developer can apply different realization techniques to the same data by using different DX programs. One visual program may volume-render a three-dimensional structure while another creates contour surfaces within the volume. The data is the same, but the different visual programs enable different types of displays to be created.

Thus, in this methodology the developer is presented with two levels at which visualization development and modification can take place: the data object file (performance objects) and the visual program (view objects). Both can be used to control certain aspects of the visualization process, but typically one may be more appropriate than the other depending on the user's goals. If the goal is to investigate performance characteristics within a single set of performance data, then fixing the data set and changing the visual program tends to work best. On the other hand, if the goal is to compare and contrast several sets of performance data, then using a single visual program and changing the structure of the imported data can be effective. Of course, in many situations, changing both the data and the visual program generates the best results.

The strength and flexibility of a product like Data Explorer comes from both the programming behind the modules and the powerful data model it uses. The result, in terms of prototyping performance visualizations, is that displays can be created very easily and quickly. For the visualizations in this thesis, less than 100 lines of standard C code was necessary to implement the trace transformations. Also, the Data Explorer visual programs required to import and process the data files vary minimally across a wide range of visualizations - a testimony to the self-describing capabilities of the data model and the high degree of software reusability supported by DX. In all, a new visualization - trace transformation, graphical transformation, debugging, experimentation, etc. - can be developed in less than a day. Modifications to existing displays require a few minutes or less. Given that a single DX visual program may serve to create many visualizations, and of course, a single DX data file can be used in many different visual programs, the overall result is a very versatile environment for creating and redesigning performance visualizations.

To illustrate how this may benefit performance visualization developers, consider the following scenario. Suppose a certain performance visualization tool was limited to two-dimensional displays in the spirit of Figure 7(a). (Note that this supposition applies to almost all of the existing performance visualization tools mentioned throughout this thesis.) To extend such displays into three dimensions would require substantial work on the part of the tool designers and implementers. New graphic routines would have to be written, additional methods of interacting with the display would probably be necessary, and perhaps the data model would need to be extended. Using a tool such as Data Explorer, which supports three-dimensional visualizations by default, the jump from 2-D to 3-D is simply a matter of changing the data!

Last modified: Wed Jan 20 15:14:19 PST 1999

Steven Hackstadt / hacks@cs.uoregon.edu

CHAPTER VI IMPLEMENTATION