The State-Based Component



next up previous
Next: Tracking Elusive States Up: Using Ave/Ariadne to Previous: Using Ave/Ariadne to

The State-Based Component

ipd [23] provides comprehensive services for state examination and modification similar to those available on standard, sequential debuggers [43][2]. We could, for example, examine the call stack at the breakpoint we've established as with the query shown in Figure 17.

 

For this particular example, having raised the question of whether or not all even processes are on the same iteration, we could look at the value of the variable cnt in the function main(). The display command to ipd achieves our purpose:

indicating that the odd processes are stopped in their first iteration, while the even processes are stopped in their second iteration.

As we have seen in the message window output, process 2 is waiting to receive a message from process 0. So it must have already sent a message to process 0. What happened to that message? We looked at the message queue of process 0 to find that there is an unprocessed message from process 2. So our replay-based technique has uncovered an artifact of a race. In the original execution, message from process 4 reached process 0 before the message from process 2. Our replay-based technique has preserved the manifestation of the race condition, although the setting of the breakpoint has made sure that the race is absent during re-execution. Had we been setting breakpoints without the benefit of replay, setting of breakpoints after each exchange would have produced the correct behavior! The breakpoint would have acted as a synchronization agent, eliminating the race condition that existed, and processors would have seen the messages in the correct order.

The user had used a library call, that accepts any message that arrives in the message queue. An explicit message receipt that specifies the identity of the sender process would avoid the race condition. Messages would still arrive out of order, but they will be received in the correct order.

Without the event-based modeling the user would have very little idea about where to start the investigation. Sure, in retrospect, the user can set the same breakpoint by specifying four different local predicates on four different sets of processes. But how is he/she going to guess that a breakpoint needs to be set just after the second iteration, and not the first? More importantly how do you identify the second iteration? Note the odd numbered processes exit the while loop after the first iteration, and even numbered processes execute different branches of the if statement. There is no way for the user to stop the computation at a global state that we can identify with the help of abstract events. The user can set three different breakpoints using conditional state expressions, and stop the odd numbered processes once they are outside the while loop, and the even processes at separate branches of the if statement. It is clearly not as simple as our scheme, and as mentioned earlier without the benefit of replay, such scheme will be useless against race conditions.

Example: Binary Image Compression (Revisited) Even with the synchronization bug removed, the program still produced wrong results. The parallel version of the program at times produced compressed images that are smaller in size than those produced by the sequential version. In the parallel version of the algorithm, the processors receiving messages tagged as succ drops out of the computation, and their local images are merged with their sibling's image. Thus the size of the compressed image is inversely proportional to the number of processors receiving succ messages.

Ariadne allows the users to log message contents during program tracing, and add new attributes dynamically gif. In the reinstrumentation phase, we logged the contents of the query messages. Each query message consists of an integer, that indicates whether the sender has a homogeneous or a non-homogeneous image. The content is stored in the primitive events as an attribute named value.

The user used the same model as shown in Figure 6, which matched the behavior. We argued earlier that a comprehensive debugging environment needs to support the alternation between state- and event-based debugging, and this is a perfect example of that methodology. No processor that sends a query message with value = 2 should receive a fail message, since its local image has more than one pixel and cannot be merged with any other portion of the image.

with compression foreach compr_succ compute value = value(W_query);
with compression show value(compr_succ) wrt compr;
We expect two equivalence classes, and hence the presence of two colors in the Scatter Plot. However, in Figure 18 Ave displayed a Scatter Plot with three distinct colors. Some processors which have non-homogeneous portion of the image are indeed receiving a succ messages from their partners. The plot identifies the fourth iteration to contain one such processor.

 

The user can now set a breakpoint after the processors receive the query messages in iteration 4, and examine the states of the ones that received a query message with value = 2. This can be done by viewing the match tree through the default partition as shown in Figure gif where the 4-th compr event has been expanded through the default partition. Since there is only one event in the pattern = compr_succ partition, the user can simply select it and set a breakpoint before the event. The reason for providing structural transformation through a variety of partition is now clear: it provides flexibility in setting breakpoints. Note the user actually wants to set the breakpoint such that the processor that processes the message which contains a 2 is all set to process it. This can be done by selecting the partition, and setting a breakpoint after the R_query event as
with select break after R_query;
We can also see the identifier of the process that we need to investigate by requesting for partition information, by selecting the collapsed node. This identifies process 24 to be our target process. A state examination of the process revealed that the function process_query() that decides what reply to send, did not check for the case when the message content is 2.



next up previous
Next: Tracking Elusive States Up: Using Ave/Ariadne to Previous: Using Ave/Ariadne to



Joydip Kundu