Chains are described by a slight extension to regular expressions.
Our expressions, over primitive event types, use operations of
concatenation (denoted by adjacency), alternation ( +), shuffle
( |
), and Kleene Star ( *).
For this example, we modeled
the activity of a process with the pattern
W R
which describes a Write followed by a Read on a single
process. For future reference, we also named the components of the
pattern and the pattern itself. The complete chain definition was thus
? PermutationChain = send: W receive: R
where send and receive name the event matched by W and
R respectively. The Ariadne prompt -- ? -- is used in this paper to distinguish between
lines of user input and debugger output. p-Chains are described by
binding a chain to a set of processes. In this case, PermutationChain
was bound by
? Permutation = PermutationChain ONALL PROCS
to the set of all processes (denoted PROCS). The keyword
ONALL indicates that a copy of the chain must originate on each
member of that set.
pt-Chains are used to describe the temporal composition of
p-chains. In this example, we are looking for a single instance of
Permutation and thus the specification is trivial. We ask
for a match with
? PermuteEvent = match Permutation
If the match had succeeded, PermuteEvent would have been set to
the resulting abstract event. In this case, however, it failed;
PermuteEvent was set to and the following feedback
was given
Match failed: Search failure.
Looking for Permutation.
Found 256 chains on {0..255}; using 256 processes.
This indicates that the search failed while looking for the
p-chain Permutation and that during the search, 256 complete
chains were recognized, one initiating on each of the processes
numbered 0 through 255. The last part of the message gives the number
of processes that had primitive events matched by the completed
chains.
This information told us that no process above 255 did both a
Write and a Read. To investigate further, we asked for
additional information about the behavior of a specific process using
the matchchain command. Matchchain attempts to recognize
a single chain on a single process and provides feedback on the reason
for a failure. In this case, it reported
? matchchain PermutationChain ON 256
Match failed: Chain failure.
Expecting an R but encountered the Right Cursor.
The Right Cursor marks the right end of the portion of the
execution history graph that we are examining. Thus this response indicated that process 256 wrote a value but did not
read one. Since this did not seem to help in locating the error, we
tried another tactic.
We broke the pattern into two simple pieces, first matching all of
the Writes and then matching all of the Reads.
The dialogue went as follows:
? PWrite = W ONALL PROCS
? WriteEvents = match PWrite
Match succeeded.
Found 512 p-chains on {0..511}; using 512 processes.
? PRead = R ONALL PROCS
? ReadEvents = match PRead
Match failed: Search failure.
Looking for PRead.
Found 256 chains on {0..255}; using 256 processes.
This is more helpful. It indicates that all Writes
occurred as expected
but no process above 255 received a value. Where
did the missing values go? We found out by asking
? destinations (WriteEvents)
This is our first example of a function returning the
characteristics of a match. Each such function takes an abstract
event as its argument and recursively searches the structure of that
event. In the case of the
destinations function, the structure is searched for WriteEvents
and the set of destinations for those events is returned.
The feedback was
Values written to 256 processes: {0..255}.
indicating that all of the values written by the 512 processes
were directed at the lower 256 processes. Clearly there was some
problem in the address calculation code. In fact, the variable
holding the destination PID was mistakenly allocated to be an 8-bit
quantity; larger values were truncated. Thus, only processes whose
identifier was less than 256 could complete correctly.
If the same address truncation had occurred in a program for
compression rather than permutation, it would have been harder to
detect. A compression is very much like a permutation except that not
all of the processes write or read values. During compression, the
nonzero elements of an array (stored one value per process) are moved
to the beginning of that array. Only processes with nonzero data
values Write and only processes at the beginning of the array
that are to receive a value Read. Thus, the chain definition
used above, W R, would not work because it is a
process-centered chain that follows the activity of a single process.
Instead, we must use a data-centered pattern that follows the
path of a communication. This is described by the pattern
? CompressionCh = send: W @
receive: R
The @
moves the context of the match -- that is it changes
the process and the location for matching -- to the receiving process.
Thus this pattern matches a Write on the initiating process and
then follows that communication edge to continue matching on the
process that is its destination.
Not all of the processes execute the CompressionCh pattern (only
those initially holding nonzero
values), so we can not determine a priori the set of
processes for binding. Instead we use
? Compression = CompressionCh ONSOME PROCS
where the keyword ONSOME indicates that a successful match
need only occur on a nonempty subset of the given process set.
Now, however, when we do the match, it succeeds despite the presence
of the truncation error:
? CompressionEvent = match Compression
Match succeeded.
Found 256 p-chains on
{1,12,24..33,37,45,47,56,58..65,78,89..112,129..137,
139,141,143..149,156,158,160..189,196,197..234,241,245..258,
267,269,276,
280..298,301,314,324,356,358,367..391,413,415,433..456,470..494
};
using 362 processes.
Note that the process set is not consecutive because chains are only
found on those processes that initially have nonzero values. This
result looks correct; there is no indication that some values were
written but never read. We could detect this sort of error only by
asking if anything remained in the trace after the match. (Remember
that a match removes the matching events from the execution history
graph.) This can
be done with
? Left = match REMAINDER
where REMAINDER is a predefined pattern that matches anything.
In our example, where the specific trace we were using had 290
processes with nonzero PIDs, the result was
Match succeeded.
Found 34 p-chains on
{15..18,79,115..126,190..194,335,495..498,501..507};
using 34 processes.
meaning that 34 ``extra'' events were found that had not been matched
by the Compression chains. What were these events? We can find
out with
? eventtypes (Left)
34 Writes
0 Multicasts
0 Reads
0 Phase Markers
where eventtypes is again a function that returns a match
characteristic. In this case, it counts the number of primitive
events of each type. The answer indicated that all of the unmatched events were
Writes. We now are in the same position that we reached in the
permutation example: we know that only processes below 256 completed
and that some of the values that were written were never read. The
destinations function would lead us to the error in the same
manner as above.