Next: A Very Brief Up: Performance Analysis of Previous: Performance Analysis of

Introduction

The introduction of a new parallel programming system should include, in addition to a description of the language principles and operational paradigm, an evaluation of the performance one would expect using the system, as well as a detailed accounting of the performance issues that have evolved from the system's design and implementation. However, as is often the case, the important concerns of portability, usability, and, recently, scalability of a parallel programming system tend to outweigh the equally important performance concerns when the system is released, leaving the mysteries of performance evaluation for users to discover. Certainly, the reasons for this situation are not hard to understand. The challenges of designing a language that supports a powerful parallel programming abstraction, developing a runtime system platform that is truly portable across diverse target hardware and software architectures, and implementing non-trivial applications with the system creates a large and complex software environment. Although the performance of the language, runtime system, and target system implementations are, clearly, always of concern during design and development, the time and effort needed to explore the performance ramifications of the initial versions of a parallel programming system may be difficult to justify if it delays system introduction.

However, the performance evaluation of a parallel programming system can be facilitated by integrating performance analysis support early in the system's design and development. This might occur in several ways, including:

The notion of designing for performance analysis is well-founded [23][22], but until now has been rarely applied in the parallel language system domain.

The performance evaluation issues associated with the pC++ system are interesting because they address several performance levels (language, runtime system, target architecture) and require a system-integrated performance toolset to fully investigate. Hence, in concert with the pC++ system development, a performance analysis strategy has been formulated and is being implemented. As a result, the first version of the compiler - a preprocessor which generates Single Program Multiple Data (SPMD) C++ code that runs on the Thinking Machines CM-5, the Intel Paragon, the IBM SP-1, the BBN TC2000, KSR KSR-1, the Sequent Symmetry, and on a homogeneous cluster of UNIX workstations running PVM - is being introduced with integrated performance analysis capabilities and an extensive set of performance measurements already completed. These results are presented here.

The pC++ language and runtime system are very briefly described in §2 . The performance measurement environment that is integrated in the pC++ system is described in §3. This environment is being used to perform a more detailed analysis of performance factors at the language, runtime system, and application levels. In §4, we describe four benchmark programs that we use to illustrate the performance issues associated with the pC++ language and runtime system implementation. Total execution time and speedup results are presented in §5. In §6, we present some of the detailed performance analysis results we have generated.



Next: A Very Brief Up: Performance Analysis of Previous: Performance Analysis of


mohr@cs.uoregon.edu
Thu Feb 24 13:42:43 PST 1994