next up previous
Next: Sample Example from Up: Sample Debugging Sessions Previous: FFT on a

Gaussian Elimination

This is a shared memory version of the Gaussian elimination program featuring static data partitioning. The program consists of two phases: an elimination phase followed by a back substitution phase. The parent process spawns a number of child processes (as requested by the user). The elimination phase consists of n iterations for a nx n system of equations. At each iteration, the parent process does pivoting if necessary, which is followed by all the processes taking part in the elimination phase.

The code segment is produced below (with the location of the bug pointed out in comments).

 
Figure: Code segment showing the elimination phase of Gaussian elimination.

The program produced correct results most of the time. With a certain 10x10 matrix, it produced incorrect results only when the number of processes exceeded 8.

The back substitution phase was traced, and checked, and was found to be correct. We then proceeded to test the row elimination phase by modeling the behavior with Ariadne. Next we explain the instrumentation procedure for this shared memory program.

Data dependency in a shared memory program manifests itself through the access of the shared variables. As a result we need to trace the shared variable accesses in order to capture the causal data dependency prevalent in the program execution. In the gaussian elimination program, each process accesses a row as an aggregate (the whole row is read or written), thus we need not distinguish between accesses to different elements of the matrix as long as they belong to the same row. We instrumented our program in a way, that attaches a logical clock to each rowgif. The effect of accessing a row thus manifests itself by producing events which are timestamped by the logical clock maintained for that particular row. Since a row access in our example should be atomic, we need to produce events so as to demarcate the interval over which the access takes place: violation of atomicity can then be found by searching for overlapping access intervals. The time interval can be captured by producing an event at the beginning of the access, and another at the end of the access. So every time a process performs a shared access, it produces in its trace an event timestamped by the logical clock maintained for that row (this produces the correct timestamps since reads are blocking); the event also contains the identifier of the row, so that accesses to same shared location can be compared for checking atomicity.

The accesses that are part of the pivoting phase are tagged as ``pivot'', and the accesses that are part of the elimination phase are tagged as ``elim''.

The get_pivot function is called only by the parent process. It requires swapping the row with a zero pivot element with one that has a non-zero element in the pivot position. So this requires read accesses to two rows, followed by write accesses to the same two rows. Thus a pivoting phase can be described simply as:
R_pivot R_pivot W_pivot W_pivot

Similarly each process in the elimination of a row coefficient reads two rows and writes to a single row. This can be described in the chain level as
R_elim R_elim W_elim ;

The complete behavioral description for a 9 process system can then be described as:
? set proc = { 0..8 };
? pch pivot = R_pivot R_pivot W_pivot W_pivot onall {0};
? pch elim = (R_elim R_elim W_elim)* onall proc ;
? match Gauss = < (pivot* elim)* >;

The angular brackets in the match statement specifies that precedes should hold between pivoting and elimination of a single iteration, and also between two successive iterations themselves. The Kleene star in pivot* is to model the fact, that pivoting may or may not occur in an iteration. The Ave feedback is shown below.

 
Figure: Ave feedback showing precedes failure in Gaussian Elimination.

Note that in the first iteration, there was no pivoting. The elimination phase of iteration 1 did not precede the pivoting phase of iteration 2. There was a race between process 0 (the parent) doing pivoting on row 2 (where it swaps rows 2 and 3) on iteration 2 with process 1 still eliminating coefficient of row 3 in iteration 1. The programmer needed to insert a synchronization call at the end of each iteration (shown in the code segment with in comments at the end of the while loop).

The program worked fine when number of processes were less than 9, because in this case with the given input the parent process had to take care of more than rows in the first iteration. This gave process 1 enough time to finish working on row 3.



next up previous
Next: Sample Example from Up: Sample Debugging Sessions Previous: FFT on a



Joydip Kundu kundu@neweng.enet.dec.com