Next: Distributed Memory Systems
Up: Implementing a Parallel
Previous: Communication Between Elements
Get_Element and Get_ElementPart are the only communication
primitives in pC++ that allow processor threads to access the shared
name space and remote elements. Notice that this differs significantly
from the ``message passing'' style that is common in SPMD programming.
In the latter model, all synchronization is accomplished by matching
a send operation with a corresponding receive. In pC++, any processor
object thread may read any element of any collection, but only the owner
object thread may modify the element; this is equivalent to the ``owner
computes'' semantics found in HPF. All synchronization is in terms
of barrier operations that terminate each collection operation.
The runtime system for pC++ must manage three distinct tasks.
- 1.
-
The allocation of collection classes.
This involves the interpretation of the alignment and distribution
directives to build the local collection for each processor object.
More specifically, each processor object must have a mechanism whereby
any element of the collection can be identified.
In a shared address space environment, this may be a table of pointers
or a function that computes the address of an element.
In the non-shared address space model,
this may be the name of a processor that either has the object or knows
where to find it.
Depending upon the execution model of the target, this task may also involve
the initialization of threads associated with each processor object.
- 2.
-
The management of element accesses.
In particular, access management requires an effective,
efficient implementation of Get_Element and
Get_ElementPart functions.
This activity can be seen as a compiler-assisted ``local caching'' of
a remote element to the local processor thread's address space.
In a shared address space environment, alternative memory management
schemes may be used to improve memory locality.
If there is no shared address space, these functions require a ``one-sided''
communication protocol - if processor X needs an element from
processor Y,
processor X must wake up an agent that has access to
the address space of Y which can fetch the data and return
it to X.
- 3.
-
Termination of parallel collection operations.
All parallel collection operations are barrier synchronized
before returning to the main thread.
Note that only the processor objects involved in the computation of
the collection operation must be synchronized and not every processor
in the system need be involved.
However, as we shall see, this may be required for some implementations.
Some restrictions are imposed by the current pC++ compiler that are
important to note for the runtime system implementation.
The current version of the pC++ compiler generates SPMD code in
which the set of processor objects for each collection is the same as the
set of processors in the user's execution partition.
There is one execution thread per processor and, on one processor, all
local collection objects share this thread.
In true SPMD style, the main thread is duplicated and is run with the
single local thread on each processor.
This model of execution is consistent with all current HPF implementations,
but imposes some limitations on the programmer.
For example, it is not possible to have two different collection
operations going on in different subsets of processors concurrently.
Furthermore, this limits the system from having nested concurrency by
building collections of collections.
Even with these limitations this is still a powerful programming model.
These limitations will be removed in future versions of the compiler.
In the paragraphs that follow, we will look at the shared address
and distributed address space implementations of the current pC++
execution model.
Next: Distributed Memory Systems
Up: Implementing a Parallel
Previous: Communication Between Elements