next up previous
Next: Conclusion Up: PDWorks'93 Conference Previous: Debugging Session

The Limitations of Textual Feedback

In this example, we consider a program that implements a dictionary search in which queries are pipelined from a host to a database of key-ordered records stored in a hypercube. Queries are routed within the cube to the proper node using binary search. More than one query is active at a time. The program as written contained a routing error.

We consider an 8 processor cube with processes having PIDs 0 through 7 and a host process with PID 8. We model the behavior of the program as a series of queries, each query starting at the host, traversing the cube and eventually returning to the host. The chain query uses two features we have not encountered thus far: the definition of a set of processes ( Cube) and the limitation of a communication event to a set of processes (denoted by # followed by a process set).
? Cube = {0..7}
? QueryChain = W#{8} @ R#Cube ( W#Cube @ R#Cube ) * W @ R#{8}
? Query = QueryChain ON {8}
? match Query *
Match Succeeded.
Found 2 p-chains on {8}; using 8 processes.
The match succeeds but it does not give us any information about the error. Further investigations using Ariadne did not help. We had better success in debugging this program with our previous animating debugger, Belvedere [14].

In using Belvedere, we also defined an abstract event that matched the entire set of messages associated with a query; the query itself was much more complex (Belvedere uses the EDL modeling language [3]). Initially, the animation was incomprehensible, as shown in Figure 1a because the Query events overlapped in logical time: each query follows data-dependent paths through the cube, arriving in different orders at different processes. To separate the events, we created a perspective on the animation that included only dependencies caused by Write events on the host process (this is the same functionality provided by Ariadne's WRT clauses). Two snapshots from these perspective views are shown in Figure 1b-c. They portray the same execution trace that we used above with Ariadne. Now, however, the erroneous behavior is easy to spot: in Figure 1c, a query crosses a dimension of the cube twice.

As the programmers of this code, we knew that message transmissions should follow the path of a binary search. Once half of the remaining cube is eliminated by a comparison, the search should never go back to that subtree by crossing the same dimension of the cube again. Investigations of this behavior, led us to discover a routing error in the initial calculations of the return path for a query.

  
Figure: Snapshots from the an animation of the Dictionary Search. Concurrent abstract events ; a perspective view of an abstract events showing the path taken by an individual request ; and a perspective view of a second query showing an extra communication from the front to the back plane of the cube .

The routing error was immediately apparent from the animation but we could not find it with Ariadne. It is not possible to concisely describe a query that finds this anomaly; worse, it is unlikely that the programmer would even think to ask such a query. The anomaly was detected as a deviation in a visual pattern. This example serves as an indicator that we will not be able to completely avoid graphical output. In an independent effort, we are developing scalable graphical representations of massively parallel computations and eventually, we expect to combine the two efforts.



next up previous
Next: Conclusion Up: PDWorks'93 Conference Previous: Debugging Session



Joydip Kundu kundu@neweng.enet.dec.com