In our experiments, we need to run the pC++ program in a number of
different configurations on different machines.
These include several different levels of instrumentation (none, complete,
and selective).
In addition, we need to run the program with different numbers of
processors on different machines.
Keeping track of all the required compile and runtime flags necessary
to accomplish our goal is a non-trivial task.
The cosy tool provides a very
simple set of menus for configuring and running a pC++ program.
In our case, we restricted profiling to the quick-sort
routine Qsort
, the main bitonic sort sort
, the bitonic merge
merge
, the routine that contains all the communications grabFrom
,
and the barrier synchronization code pcxx_Barrier
.
While there are many more small functions that are executed, we found when
we included them all, the size of the event trace file was too large and
the execution time were severely distorted.
However, when we used this restricted set of functions, the trace files
were very compact and the impact of instrumentation on total execution time
was less than 5 percent.