Supercomputer programming environments have fallen far behind desktop and workstation systems in terms of the sophistication of the programming tools that are available to build new applications. Part of the reason for this is economic: the desktop market place is several orders of magnitude bigger than that of the supercomputing world. Another explanation is that the traditional supercomputer users are not accustomed to having good programming tools provided with high performance hardware. This is especially true of users of massively parallel processing (MPP) systems. Indeed, most users are delighted if there is a compiler that generates good code for a single processor.
However, the market for MPP supercomputer systems is undergoing a radical transformation from that of a captive of the international defense industry to a vital and energetic component of high-end computing in various commercial sectors. New users are demanding portable programming tools because they see their investment in software being far more important than the money they spend on hardware. Furthermore, because of their roots in desktop technology, they are demanding more than simple compiler environments and message passing libraries.
Unfortunately, there are major problems that must be overcome if we are to raise the level of parallel program software design to that of desktop and workstation tools. Specifically, we must learn more about the integration of programming environment technology and parallel algorithm design. There are several important issues here, including
In this paper we examine and evaluate an experimental programming environment designed at the University of Oregon that address some of these issues. This system, called (for Tuning and Analysis Utilities), is part of the pC++ programming system being distributed by a consortium consisting of Indiana University, University of Oregon and the University of Colorado. pC++ is a programming language based upon a concurrent aggregate extension to C++. The pC++ programming system has been ported to most of the major MPP platforms. It consists of a set of language preprocessors, runtime libraries, and tools for program execution, analysis and tuning. As an integrated program and performance analysis environment, consists of six special graphical tools. The five below are discussed in this paper.
We illustrate the use of these tools from the perspective of the design and evaluation of a single application in pC++: a bitonic sort module that is used as part of a large N-Body simulation of cosmological evolution. In the sections that follow we will demonstrate how the tools were used to analyze this module. Section 3 gives a brief description of the algorithm. Sections 4 and 5 show how the static analysis tools illustrate the structure of the program. In Section 6 we will describe how the dynamic analysis tools expose a potential performance bug.