This research is supported by DARPA under Rome Labs contract AF 30602-92-C-0135.

A companion paper, ``Implementing a Parallel C++ Runtime System for Scalable Parallel Systems'', discusses issues of pC++ runtime system design and appeared in the Proceedings of the Supercomputing '93 conference [17].

Note, the difference between the shared- and distributed-memory implementations is only in the low-level trace data collection library and timestamp generation; all trace instrumentation is the same.

The ports to the IBM SP-1 and workstation clusters using PVM were done recently. Performance results for this ports were not yet available for inclusion in this article.

Thu Feb 24 13:42:43 PST 1994