$next$ $up$ $previous$
Next: Our Approach: The Up: VIS'92 Conference Previous: Introduction

Our Approach: An Overview in the Context of Parallel Program Animation

Massively parallel computer systems --- in which hundreds or even thousands of interacting processes execute concurrently --- are enormously complex. In order to understand their behavior, programmers rely on informal modeling techniques: information traced during execution is filtered and abstracted to develop a model of the system's actual behavior that is compared with the programmer's conceptual model of the system's intended behavior. Visualization tools aid in this process by providing comprehensible views of program behavior.

Figure: Checkered pattern of striped and solid processes for the SOR program.

Consider, for example, a program iteratively approximating the solution to a PDE using a Successive Overrelaxation (SOR) method [6]. Processes, arranged in a square mesh repeatedly update their values as a function of the values of their neighbors. To speed convergence, their execution alternates in a checkered pattern as in Figure 1: first striped processes execute and then solid processes.

Figure 2 shows snapshots from a straightforward animation of an SOR program but it does not show the expected behavior. What has gone wrong?

Figure: Snapshots from the original animation of the SOR program.

The most obvious problem is that the programmer did not think of the individual behavior of processes but instead thought of a global pattern of activity in which all processes were ``doing the same thing at the same time.'' Here we assume that the visualization tool supports some type of abstraction, allowing the user to group program actions into abstract events. In this example, we group read and update actions into striped and solid events which we animate with two-sor-icons.ps Abstraction alone, however, is not enough.

The abstract striped and solid events are not atomic but occur over a period of time (starting with the first read and ending with the last update). Because of asynchrony, the time periods for successive events can overlap; thus, the logically sequential striped and solid events may appear as concurrent. Worse, since they are executed on the same set of processes, their animations may be superimposed making them incomprehensible. (This phenomena occurs in the Dictionary Search example below; it is seen in Figure 4b.)

Our visualization techniques produce comprehensible pictures by ordering events in logical rather than physical time. Events are ordered in logical time by

: process dependencies: processes are sequential and their actions are totally ordered by local timestamps, and
: interprocess dependencies: messages must be sent before they can be received. $gif$

For the SOR program, striped and solid steps are ordered by process dependencies and thus we animate them in successive frames as shown in Figure 3. In the figure, the abstract events of interest have been temporally separated, providing visual discrimination and allowing the user to understand his program's behavior in terms of his/her own conceptualizations.

Figure: Intended, logical sequence of striped and solid steps in the SOR computation.

The behavior in this new animation is logically equivalent to the behavior in the original picture; that is, all processors execute the same sequences of operations with the same interprocess dependencies. For many parallel computations, however, such simple orderings are not possible because of intertwined dependencies.

Consider, for example, a program that implements a dictionary search in which queries are pipelined from the host to a database of key-ordered records stored in a hypercube [14]. Queries are routed within the cube to the proper node using a binary search. In the animation of primitive events shown in Figure 4a, multiple queries are active simultaneously and it is difficult to understand whether or not each query is proceeding as intended.

To view this behavior in a more understandable format, we define an abstract event to be all of the communications generated on behalf of a single query. These new query events cannot be separated on the basis of logical time because they follow data-dependent paths through the cube, arriving in different orders at different processes. There is no single, logically consistent ordering that can be imposed by the animator.

For such cases, we introduced perspective views which enable the user to selectively ignore logical dependencies in establishing partially consistent event orderings. In this case, we can choose to ignore all dependencies other than those caused by send events on the host process. This creates the perspective views shown in Figure 4c-d in which queries are shown in the order that they were issued from the host and each query completes before the next begins.

This animation is clear and understandable. In spite of the fact that it is not logically equivalent to the actual execution trace (queries appear to arrive at some processes out of order), it enables the programmer to easily comprehend relevant aspects of program behavior. In this case, it enabled us to discover a bug that was not apparent in the original animation: in Figure 4d, a query crosses a dimension of the cube twice indicating an error in the routing of messages. Thus, ignoring some temporal orderings in the original execution sequence made it possible for the programmer to understand aspects of the program's behavior relevant to its correctness.

Figure: Snapshots of an animation of the Dictionary Search. Low-level communication events ; abstract events with several concurrent search requests ; a perspective view of abstract events showing the path taken by an individual request ; erroneous communication between p3 and p7 .

$next$ $up$ $previous$
Next: Our Approach: The Up: VIS'92 Conference Previous: Introduction

Joydip Kundu kundu@neweng.enet.dec.com