Event-Based Behavioral Abstraction for pC++

pC++ is an object-parallel programming language intended for both shared and distributed memory machines. It extends the notion of data-parallel execution to aggregates; that is, structured sets of objects distributed over the processors of a parallel system. Application-specific aggregates are described in pC++ as collection class types. Parallelism is implicit and results from the concurrent application of a function to the elements of a collection. pC++ provides a shared name space; references to shared data compile to GET_ELEMENT routines that generate (at runtime) explicit sends and receives for nonlocal data.

Event traces from a pC++ program include these generated events, each tagged with the type of operation it implements (for example, W_ASKFORDATA, R_STOREBLOCKMSG, and W_ACKREPLYMSG). Thus the trace exposes the workings of the runtime system to the programmer, which is sometimes useful in debugging. For the most part, however, it is not. The majority of pC++ programmers understand their code only at the higher level of data-parallel operations and those operations must be accessible during debugging.

Ariadne's predefined event set can be extended dynamically, without requiring a recompilation of the debugger. This enables us to add higher-level pC++ events as predefined user events, leaving the debugger itself independent of the source language. For pC++, the predefined events include

#TABULAR56#

We also allow ordinary user events which can be inserted into pC++ source code in standard functions, in methods of elements (invoked by each element of a collection), or in member functions (invoked on a per process basis). This gives us three levels of abstraction in traced events: runtime communication events, pC++ predefined events, and user events. Ariadne filters unwanted events from a match, making it possible to use events from different levels separately or in combination.

Example: Binary Image Compression. To illustrate the use of Ariadne and the extended modeling language, we debug a parallel version of an algorithm to compress black and white images stored as bintrees [22]. Working up from the leaves, the algorithm compresses homogeneously colored siblings into single nodes represented at their parent as shown in Figure 1. The parallel version of the algorithm distributes entire subtrees of the initial bintree to processes. We ran our code on images with 256 nodes. It ran to completion but frequently produced compressions that were too small.

To begin debugging this program with Ariadne, we added user-defined events USERMERGE and USERNOMERGE to trace the decisions made about each pair of siblings. The behavior of a single node during a step in which it is active was then described by one of the two chains

CH MERGE = USERMERGE WRT [ ];
CH NOMERGE = USERNOMERGE WRT [ ];
where CH is a keyword that begins the definition of a chain and the WRT clause indicates that event types not explicitly

mentioned in the pattern are to be ignored.

pC++ distributes the elements of a collection across the processors and uses an ``owner computes'' rule. In this case, elements of the collection correspond to the active nodes of the bintree; initially the leaves are active. Processors iterate through the elements assigned to them. To model the combined behavior of the nodes across a level of the tree, we first define the successful completion of a barrier synchronization as

CH BARRIER = BEGINBARRIER ENDBARRIER WRT [ ];
and then model the complete behavior at a single level as

PCH COMPRESSLEVEL =
BARRIER* ( (+ MERGE NOMERGE +) ) * BARRIER
ONSOME PROCSET;
where PCH is a keyword that begins the definition of a p-chain, (+ A B +) denotes the regular expression A + B , * denotes the Kleene star, and ONSOME indicates that the pattern occurs on some, but not necessarily all, of the specified processes. Since we ran our code on an SGI PowerChallenge with 8 processors, we used the set of process ids PROCSET = {0..7} for this example. It should be noted that ONSOME was quite useful here because at the later stages of the algorithm, subsequent steps involve fewer processes and thus it allows us to specify a single model for all steps.

The behavior of the algorithm as a whole is modeled as
MATCH BIC = COMPRESSLEVEL* ;
where MATCH is a keyword indicating that the specified pattern (0 or more instances of COMPRESSLEVEL) is to be matched in the execution history graph and the resulting subgraph is to be assigned to the specified variable, BIC. Note that the model is scalable: it is completely insensitive to the input size and only the definition of PROCSET depends on the machine size. Figure 2 shows a match of this model to an execution.

As mentioned above, matching imposes a tree structure on the matched subgraph that corresponds to the model's structure. It is this match tree that is displayed by AVE in the figure. For scalability, it is automatically compressed both horizontally and vertically: horizontally by eliding sibling nodes and vertically by pruning levels. The horizontal compression is straightforward: omitted siblings are represented by a single slider. In our example, the match tree included 8 instances of the COMPRESSLEVEL p-chain but only the first and last are explicitly shown in the initial picture. Vertical compression is achieved by pruning subtrees below each p-chain. The pruned subtrees are partitioned into equivalence classes and represented by an icon labeled with a pair of integers that summarizes the partitioning: the first element is the total number of chain subtrees represented and the second is the number of equivalence classes of those trees. In the example, the leftmost box is labeled with 8,1 which indicates that 8 processes initiated a COMPRESSLEVEL event and all of them matched the same p-chain pattern. The pop-up window below (obtained by clicking on the icon) gives a further description of this subtree: a pattern matching 5 barrier events, followed by 16 compression events (either MERGE or NOMERGE), followed by another barrier was matched on processes 0 through 7. This is as expected. In the first step of the compression, each process handles 32 leaf nodes of the image, performing 16 compression operations. In the last step, only one process is active as evidenced by the 1,1 label on its node.

Intermediate nodes of the tree can be viewed as in Figure 3. It shows the third node also expanded. The third node again has all eight processes active because they are still handling local subtrees. Eventually, the local subtrees are compressed and at the later stages of the program, processes become inactive; if we looked at the events of the seventh and eighth stages, for example, we would have seen that only two processes were active in the seventh and only one in the eighth.

So far, our debugging output is as expected, the model matched the traced behavior. Ariadne's models, however, are quite coarse and it is often useful to investigate a match further using its query facilities. In this case, because we knew that the resulting compressed image was too small, we were suspicious that erroneous, extra MERGE events were occurring. To check this, we wanted to determine the number of elements executing MERGE events in each COMPRESSLEVEL. First, we defined an additional attribute, MERGELTS, for MERGE nodes using

WITH BIC FOREACH MERGE
COMPUTE MERGELTS = ELEMID(USERMERGE);
which computes the element ID attribute (ELEMID) from the corresponding matched trace events. Our query was then

WITH BIC FOREACH COMPRESSLEVEL COMPUTE
.5IN NUMMERGES = SIZEOF(MERGELTS(MERGE));
WITH BIC SHOW NUMMERGES (COMPRESSLEVEL)
.5IN WRT COMPRESSLEVEL;

which first counts the number of elements performing MERGE events in each COMPRESSLEVEL (that is, computes the size of the set of all elements IDs) and then displays the output as in Figures 4(a) and 4(b). These plots were not as expected. At each level in the bintree, only those nodes that were successfully merged at the previous level are available for merger; thus, the number of successful merges at any level should be no more than half the number from the previous step. Figure 4(a) shows roughly the decreasing values that we would expect but looking at the actual values in Figure 4(b), we see a problem. 32 events were compressed at the third level but this is more than half of the 63 that had been compressed at the second level. There were too many MERGEs on the third level.

Perhaps the program was incorrectly merging siblings with different color values. To check out this possibility, we replayed the execution (as discussed in the next section) annotating the traced MERGE events with the color values of the subtrees being merged. We then rematched the model against the trace and asked if MERGE events only occurred at like-colored siblings. The answer was ``yes.'' We next investigated the possibility that some of the mergers were invalid because one of the siblings represented a subtree where compression had already failed at a lower level. This would result in a NOMERGE event being followed by a subsequent MERGE event on the same element. To check this, we first computed

WITH BIC FOREACH NOMERGE COMPUTE
NOMERGELTS = ELEMID(USERNOMERGE);
as above and then computed the set of elements having each outcome as
WITH BIC FOREACH COMPRESSLEVEL COMPUTE
MERGESET = MERGELTS(MERGE);
WITH BIC FOREACH COMPRESSLEVEL COMPUTE
NOMERGESET = NOMERGELTS(NOMERGE);
None of the elements with MERGE events on a level should have participated in the NOMERGE events on the previous level. In terms of our sets, this means that MERGESETs should be disjoint from the NOMERGESETs on the previous level. We checked this with the query
WITH BIC FOREACH COMPRESSLEVEL COMPUTE
.4IN MERGEFLAG = DISJOINT(MERGESET(COMPRESSLEVEL),
NOMERGESET(LEFT(COMPRESSLEVEL)));
WITH BIC SHOW MERGEFLAG(COMPRESSLEVEL)
WRT BIC;
where LEFT here refers to the immediately preceding COMPRESSLEVEL event (that is, its left sibling in the match tree). This query produced the output in Figure 5 showing that the sets on the third level were not disjoint as they should have been. (We knew this because the output value for the third level was ``0'' indicating that the DISJOINT predicate was false.) Thus some element on level three did a compression on a subtree that had not been compressed at level two.

Further investigations with Ariadne were not productive. This, however, is typical: event-based debugging is useful in narrowing the focus of attention during the early stages of debugging but it is not as useful in relating the detected anomalies back to source code errors in the later stages of debugging. As the result of our event-based analysis, we know that the code merged nodes representing trees when it should have only merged nodes representing single, fully compressed leaves. We also know that this was first detectable on level three. We don't, however, know why it occurred. For this, we switched to a state-based debugger.

Next: Setting Consistent Breakpoints Up: Event and State-Based Debugging Previous: The Ariadne Debugger:

Sameer Suresh Shende
Fri Apr 5 10:36:18 PST 1996