next up previous
Next: Delaunay Triangulation Up: Gaussian Elimination Previous: Summary of Debugging

Debugging Session.

We defined a broadcast chain as
? BroadcastChain = senders: M @ receivers: R
In this case, because the send operation was a multicast rather than a single write, the read in the pattern matches all of the Reads associated with that M, enabling us to uniformly handle single writes, multicasts, and broadcasts.

The p-chain specification created a separate broadcast event for each process. Using a for-loop, we defined an array of Broadcast p-chains:
? for (i=0; i <= P-1; i++) do Broadcast[i] = BroadcastChain ONALL {i} od
where P is a defined constant giving the number of processes in the system. Each of these p-chains will match a process performing a single write that is read by all other processes.

To ascertain that the correct number of broadcasts were performed, we attempted to match a set of P broadcasts with
? BCseries = match (Broadcast[])P
The missing index in Broadcast[] indicates that any element can be used, making it a shorthand for

(Broadcast[0] + Broadcast[1] + ... + Broadcast[255])
where + means alternation in our expressions. The match was successful, resulting in
Match succeeded.
Found 256 p-chains on {0..255}; using 256 processes.
Since BCseries matches a p-chain, it could potentially match all P occurrences on the same process, so we have to look at the output carefully. In this case, the Broadcast events occurred on 256 processes and thus, that each process must have initiated one. We then used
? owners (BCseries WRT {receivers})
256 owners: on processes {0..255}.
owners is again a function that determines the characteristics of a match. In this case, its argument is an abstract event that is modified by a WRT clause acting as a filter. WRT {receivers} makes only events of type receivers (as defined in the chain pattern) visible to the search; thus this query determines the set of processes executing receives in BCseries. It told us that every process executed a receive. We determined the total number of receives with
? count (BCseries WRT {receivers})
65,280 occurrences: on processes {0..255}.
which told us that every process read every broadcast (). We now knew that the correct number of events occurred and so we attempted to verify that the correct logical relation -- precedes -- had held between them.

The Broadcast events have already been removed from the trace because they were successfully matched above. We could check for the precedes relation in two ways. We could restore the Broadcasts to the trace and rematch them with
? restore (BCseries)
? OrderedBCseries = match < Broadcast[] * >
where the angular brackets indicate that the precedes must hold between matched events. Alternatively, we could use a predicate over abstract eventsgif
? eprecedes (BCseries)
In either case the relation precedes is checked in the matched abstract event using its search order; the result is
Precedes failed.
Broadcast[38] overlaps Broadcast[243].
It tells us that the first 135 broadcasts occurred correctly but the 135th overlapped with the 136th meaning that precedes did not hold between them. This must mean that some process -- either 38 or 243 -- read the broadcast values out of order. This observation led us to the bug: a missing synchronization between broadcasts.



next up previous
Next: Delaunay Triangulation Up: Gaussian Elimination Previous: Summary of Debugging



Joydip Kundu kundu@neweng.enet.dec.com