This example comes from the
Sequent's Guide to Parallel Programming (page 5-25).
The sample (supposedly ``correct'') shared memory program
uses dynamic job allocation and given a list of cities with cartesian coordinates
finds the city closest to the origin. Each process calls m_next a system
call in Sequent, which increments a global counter atomically, and returns
a queue location. The process computes the queue locations that it
should examine according to the counter value, and the value of a
variable called BITE that determines the chunk of the queue
a process should have.
It then reads the data, computes the distance between the cities
and the origin, and if the computed
distance is less than the shared variable closest, overwrites closest.
Access to the shared variable is protected with a semaphore.
We were trying out our prototype on a shared memory example, and we wrote
the following process-centric model when the system was run on a 4 process
system with 10 cities in the list. BITE was set to 1, forcing every process
to work on a single queue location at a given time.
We instrumented the program such that every time a process
reads an element from the list, an R_list event is generated and an
R_var (W_var)
event is generated everytime a process reads(writes) the variable
closest.
? set proc = {0..3};
? ch axs = R_list R_var W_var*;
? pch comp = axs* onsome proc;
? match find = comp;
proc is a set containing the identifiers of each process. At each iteration, a process reads the list of cities, computes the distance, reads the shared variable closest, and if the distance is less than closest, updates the variable, which is described by the chain axs. The total behavior of the system is zero or more axs chains matching on each process. The Ave feedback with the user asking for uncollapsing of the process set is shown below.
Figure: Ave feedback for the example program from Sequent manual.
Since we had only 10 cities in our list, the queue should not be accessed more than 10 times. However a quick look at the feedback shows that process 0 alone had accessed the queue 13 times.
Each process accesses the queue within a for loop, where the index
element of the loop, gives the location within the queue to be examined.
The lower limit of the loop corresponds to the first element ot be accessed
in the queue in any given iteration, and was set to
base = (global counter value - 1)*BITE. The lower limit is ofcourse
unchanged in the loop. We instrumented the program in such a way, that
every event also has an attribute that contains the current value of the local
variable base. Since BITE
is set to 1, the for loop should be executed only once for a given value of
base.
The new trace was then processed using the same
behavioral description. We then uncollapsed the behavior of process 1,
and found that both the iterations of axs^
2 was done with the
same value of base. Exploring the other processes revealed the same
pattern with each process accessing
two consecutive queue locations with the
same value of base.
So each process was looking at two queue locations in
each iteration instead of one. A look at the setting of upper limit revealed
that the upper limit was incorrectly set to upper = base+BITE instead
upper = base + BITE -1. This resulted in each queue element except 9 to be
examined twice, when BITE = 1.