Processors, Threads, and Parallelism

Next: Collections and Templates Up: A Brief Introduction Previous: A Brief Introduction

Processors, Threads, and Parallelism

The processor objects used to build distributions for collections represent a set of threads. Given the declaration


Processors P;

one thread of execution is created on each processor of the system that the user controls. These new processor object (PO) threads exist independent of the main program control thread. (In the future, pC++ will allow processor sets of different sizes and dimensions.) Each new PO thread may read but not modify the ``global'' variables; i.e., program static data or data allocated on the heap by the main control thread. Each PO thread has a private heap and stack.

Collections are built on top of a more primitive extension of C++ called a Thread Environment Class, or TEClass, which is the mechanism used by pC++ to ask the processor object threads to do something in parallel. A TEClass is declared the same as any other class with the following exceptions:

There must be a special constructor with a Processors object argument. Upon invocation of this constructor, one copy of the member field object is allocated to each PO thread described by the argument. The lifetime of these objects is determined by their lifetime in the control thread.
A TEClass object may not be allocated by a PO thread.
The () operator is used to refer to a single thread environment object by the control thread.
A call to a TEClass member function by the main program control thread represents a transfer of control to a parallel action on each of the threads associated with the object. (Consequently, member functions of the TEClass can read but cannot modify global variables.) The main control thread is suspended until all the processor threads complete the execution of the function. If the member function returns a value to the main control thread, it must return the same value from each PO thread or the result is undefined.
If a TEClass member function is invoked by one of the processor object threads, it is a sequential action by that thread. (Hence, there is no way to generate nested parallelism with this mechanism.)

These issues are best illustrated by an example.



int x;         // c++ global
float y[1000]; // c++ global
                            
TEClass MyThreads{          
   int id;          // private thread data
 public:
   float d[200];    // public thread data     
   void f(){id++;}  // parallel functions
   int getX(int j){return x;}
};                                                       
                                                         
main() {
 Processors P;   // the set of processors
 MyThreads T(P); // implicit constructor
                 // one thread object/proc.
 // a serial loop
 for(int i=0; i<P.numProcs(); i++)
   T(i).id=i; // main control thread can
              // modify i-th thread env.
 T.f(); // parallel execution on each thread
 // an implicit barrier after parallel call
}

In this example, the processor set P is used as the parameter to the thread environment constructor. One copy of the object with member field id is allocated to each PO thread defined by P. The lifetime of T is defined by the main control thread in which it was created. (However, in the current implementation the storage is not automatically reclaimed.) Figure 1 illustrates the thread and memory model that the language provides.

The main control thread can access and modify the public member fields of the TEClass object. To accomplish this, one uses the () operator, which is implicitly overloaded. The reference T(i).id refers to the id field in the TEClass object. Note that the value of the expression T.id within the main control thread may not be well defined because each thread may have a different value for id. However the assignment T.id = 1 is valid and denotes an update to all members named id.

An individual PO thread cannot modify the local fields of another PO thread, but it can access them by means of the () operator. The only other way for PO threads to communicate is by means of native system message passing, but this is not encouraged until a C++ binding for the standard message passing interface is defined.

The call T.f() indicates a branch to a parallel operation on each PO thread. After the parallel execution of the method, there is a barrier synchronization before returning to the main control thread. In the case of invoking an object such as T.getX(), which has a non-void return value, it is essential that the function returns an identical value for each main PO thread for a given main thread invocation.

Note that the TEClass mechanisms provide a simple and direct way to ``wrap'' message passing SPMD style C++ routines inside a standard sequential main program. In this way, many of the special libraries of parallel C++ code already designed can be easily integrated into this model [17][16].

Next: Collections and Templates Up: A Brief Introduction Previous: A Brief Introduction

mohr@cs.uoregon.edu
Thu Feb 24 15:47:41 PST 1994