Next: Distributed Memory Systems Up: Implementing a Parallel Previous: Communication Between Elements

The pC++ Runtime System

Get_Element and Get_ElementPart are the only communication primitives in pC++ that allow processor threads to access the shared name space and remote elements. Notice that this differs significantly from the ``message passing'' style that is common in SPMD programming. In the latter model, all synchronization is accomplished by matching a send operation with a corresponding receive. In pC++, any processor object thread may read any element of any collection, but only the owner object thread may modify the element; this is equivalent to the ``owner computes'' semantics found in HPF. All synchronization is in terms of barrier operations that terminate each collection operation.

The runtime system for pC++ must manage three distinct tasks.

1.: The allocation of collection classes. This involves the interpretation of the alignment and distribution directives to build the local collection for each processor object.

More specifically, each processor object must have a mechanism whereby any element of the collection can be identified. In a shared address space environment, this may be a table of pointers or a function that computes the address of an element. In the non-shared address space model, this may be the name of a processor that either has the object or knows where to find it. Depending upon the execution model of the target, this task may also involve the initialization of threads associated with each processor object.

2.: The management of element accesses. In particular, access management requires an effective, efficient implementation of Get_Element and Get_ElementPart functions.

This activity can be seen as a compiler-assisted ``local caching'' of a remote element to the local processor thread's address space. In a shared address space environment, alternative memory management schemes may be used to improve memory locality. If there is no shared address space, these functions require a ``one-sided'' communication protocol - if processor X needs an element from processor Y, processor X must wake up an agent that has access to the address space of Y which can fetch the data and return it to X.

3.: Termination of parallel collection operations. All parallel collection operations are barrier synchronized before returning to the main thread.

Note that only the processor objects involved in the computation of the collection operation must be synchronized and not every processor in the system need be involved. However, as we shall see, this may be required for some implementations.

Some restrictions are imposed by the current pC++ compiler that are important to note for the runtime system implementation. The current version of the pC++ compiler generates SPMD code in which the set of processor objects for each collection is the same as the set of processors in the user's execution partition. There is one execution thread per processor and, on one processor, all local collection objects share this thread. In true SPMD style, the main thread is duplicated and is run with the single local thread on each processor.

This model of execution is consistent with all current HPF implementations, but imposes some limitations on the programmer. For example, it is not possible to have two different collection operations going on in different subsets of processors concurrently. Furthermore, this limits the system from having nested concurrency by building collections of collections. Even with these limitations this is still a powerful programming model. These limitations will be removed in future versions of the compiler.

In the paragraphs that follow, we will look at the shared address and distributed address space implementations of the current pC++ execution model.

Next: Distributed Memory Systems Up: Implementing a Parallel Previous: Communication Between Elements

mohr@cs.uoregon.edu
Thu Feb 24 15:47:41 PST 1994