Distributed memory parallel systems consist of a set of processing nodes interconnected by a high-speed network. Each node consists of a processor and local memory. In the case of a non-shared, distributed memory system, each processor only has access to its local memory and a message system is used to move data across the network between processors.
One common approach to building a shared memory system on top of a non-shared, distributed memory computer is called shared virtual memory (SVM). In SVM, the message passing system is used to move pages from one processor to another in the same way a standard VM system moves pages from memory to disk. Though these systems are still experimental , they show great promise for providing a support environment for shared memory parallel programming models. Because pC++ is based on a shared name space of collection elements, we use SVM techniques to build the runtime system for the Intel Paragon and the Thinking Machines CM-5.
More specifically, our model is based on ideas from Koan. The basic idea is that each collection element has a manager and a owner. The owner of the element is the processor object that contains the element in its local collection. As with pages in an SVM, we assume that an element may be moved from one local collection to another at run time, for load balancing reasons, or, in the case of dynamic collections, may be created and destroyed at run time. In other words, we assume that the distribution map may be dynamic. Although this dynamic feature of the system is not being used by the current compiler, it is a design requirement for the runtime system implementation. The purpose of the element manager is to keep track of which processor object owns the element. The manager is static and every processor thread knows how to find the manager. Elements are assigned managers by a simple cyclic distribution. The algorithm for implementing the function Get_Element(i) is given as follows:
Hence, the primary implementation issues for a given machine reduce to:
The current pC++ compiler assumes no mechanism exists for interrupting a processor thread. Instead, the compiler generates calls to a function called Poll() in the element and collection methods. By calling Poll(), a thread can check a queue of incoming messages and reply to requests in a reasonably timely manner. Unfortunately, calling Poll() periodically is not sufficient to prevent starvation. If no interruption mechanism exists on the target, it is necessary to make sure that the Barrier() function also calls Poll() while waiting for the barrier to complete.
The final issue to consider in the runtime environment is that of allocating the collection elements. In the current version a large table is created in each processor object that stores pointers to the local collection. A second table in each processor object stores the index of the owner of each element that is managed by that processor. Because all of the elements and both tables can be allocated and created based on the distribution and alignment data, it is straightforward to parallelize this task in the distributed memory environment. (This is in contrast to the shared memory situation where synchronization is necessary to ensure that each processor has access to pointers to all the elements.)