Domain-Specific Metacomputing for Computational Science:|
Achieving Specificity Through Abstraction
By Steven Hackstadt
The area of domain-specific metacomputing for computational science crosscuts computational science, parallel and distributed computing, and software engineering. Part of the goal of this paper is to show how select technologies from each of these areas can be used to address certain requirements of this new area. We have already informally introduced and motivated those requirements and briefly mentioned the technologies. The goals of this chapter are to describe more explicitly what those requirements are and the relationship they have to the technologies. This chapter provides the context within which later ones will describe more formally how the area of domain-specific metacomputing for computational science is grounded in three core areas of computer science. In short, we seek to characterize the area of domain-specific metacomputing for computational science.
It should be noted that to improve readability, we occasionally shorten the verbose title "domain-specific metacomputing for computational science" to just "domain-specific metacomputing," "metacomputing for computational science," or just "computational science metacomputing." A shortened title is used only where it sufficiently suggests the aspects of the area most relevant to the discussion at that point.
The three requirements to be presented are high performance heterogeneous computing, software design and development, and domain-specificity.
High Performance Heterogeneous Computing
The first requirement of metacomputing for computational science, high performance heterogeneous computing, arises from three different perspectives: limited financial resources to address increasing performance requirements, a desire to increase the utilization of existing resources, and the need to support multiple types of parallelism within a single application.
A large number of computational science problems take the form of simulations, now a respected companion to scientific theory and experimentation [NCO96]. In general, simulations have four dimensions by which they can be characterized: size, resolution, complexity, and length. For example, a simulation of ocean currents might consider a cubic kilometer (size) of ocean at a time, where one of these cube is decomposed into a large collection of small, 1-meter cubes (resolution). The simulation might consider gravitational effects and water temperature, but neglect surface temperature and windspeed (complexity). Finally, when the simulation is finished, it might result in a portrayal of ocean currents over a period of one hour (length).
Each of these parameters has a direct effect on the performance of the simulation as a whole. Increasing size, resolution, complexity, or length automatically increases the amount of work that needs to be done. As described in Chapter 1, computational requirements for these problems are, in fact, rapidly increasing. Faced with this situation, three courses of action come to mind.
The first approach is simply to make scientists wait longer and longer for the results of their simulations by not making any improvements to the computational environment they use; the second course of action is to buy new, more powerful supercomputer systems and to port the applications or simulations to them; and the third solution is to better utilize as many of the computational resources available as possible. Of course, a fourth course of action is to buy new machines in addition to harnessing unused, existing computational power. The ASCI program [DOE96] is doing this in a staggered manner, first investing in new machines, and eventually requiring their use in a collective manner.
It is safe to assume that for a large number of problems, the first option is not viable. Furthermore, repeatedly resorting to the second option results in tremendous expense and a collection of different (i.e., heterogeneous) machines, while simultaneously failing to improve the versatility of the system as a whole [KPSW93]. However, as Notkin et al. [NHSS87] point out, "heterogeneity is often unavoidable... as evolving needs and resources lead to the acquisition... of diverse hardware...." Eventually, the hardware investment and the cumulative performance potential become large enough that a more general solution is necessary, a solution that makes better and more flexible use of the available resources.
Consider, for example, that many problems exhibit more than one type of parallelism. The image understanding Grand Challenge problem described in [KPSW93] consists of coarse-grained, MIMD-like tasks at a high level with fine-grained, SIMD-like operations applied at a low level. Alternately, different subtasks of a problem may be better suited to different machine architectures [SDA96]. Consider an application, as depicted in Figure 2, that has four distinct phases, each exhibiting a certain type of parallelism. Let us assume the entire execution time on a serial computer to be 100 units. When the code is executed on a (homogeneous) vector processor, the time required for the vector phase is, of course, drastically reduced, but the times for the other phases see only moderate improvements. If a heterogeneous system consisting of a vector machine, a MIMD system, a SIMD computer, and a special-purpose parallel machine is available, we could potentially optimize the total execution time of the code by executing each phase on the most appropriate machine. The trade-off, as noted in the figure, is that we incur a certain amount of communication overhead as we move data between the independent systems. The amount of that communication, of course, can vary widely. The important point is that a given homogeneous system (i.e., one or more machines of the same type), by definition, can not necessarily address the diverse requirements of complex computational science problems.
FIGURE 2. The hypothetical execution on various systems of a code with multiple types of embedded parallelism. (Figure adapted from Khokhar et al. [KPSW93].)
Hence, heterogeneous computing is desirable from financial, utilitarian, and technical perspectives. A depiction of a possible heterogeneous computing environment is shown in Figure 3, and Khokhar et al. [KPSW93] offer the following definition.
FIGURE 3. A heterogeneous computing environment consisting of vector and parallel processors, servers, and workstation clusters. The components are connected by a high-speed networking infrastructure.
Heterogeneous computing is the well-orchestrated and coordinated effective use of a suite of diverse high-performance machines (including parallel machines) to provide superspeed processing for computationally demanding tasks with diverse computing needs.
The solutions to supporting heterogeneous computing reside in both hardware and software. In terms of hardware, a collection of different computing systems is obviously required. In addition, most researchers agree that a high-speed network connecting the machines is also fundamental [KPSW93, SC92, BM95]. According to Khokhar et al. [KPSW93], such a network must have bandwidth capabilities on the order of 1 gigabit/sec to better match computation and communication speeds.
With respect to software, the needs are more diverse. Siegel, Dietz, and Antonio [SDA96] survey the software support needed for heterogeneous computing. They identify several areas requiring software solutions, including code development and generation, debugging and performance tools, operating system support for task manipulation, intermachine data transport, performance monitoring, and administration. In addition, a recent workshop [BM95] concluded that defining a "performance-oriented interface layer between application programs and target systems" could significantly advance the area. Such a layer would enable a similarly broad range of services, including dynamic scheduling, application queries about system state, prediction and measurement, and monitoring and checkpointing. Many of these topics will be explored in more detail in the next chapter. The next section, however, contains a higher-level view of the software development requirements for domain-specific metacomputing.
Software Design and Development
Page through the software and tool environments sections of some recent parallel and distributed computing conference proceedings, and many of the papers have a couple features in common. Almost inevitably, the papers have some kind of diagram of the architecture of the proposed tool or environment. Such diagrams typically consist of labelled boxes connected by lines or arrows; occasionally some text is included to explain what the symbols mean. The other common feature of these papers is a prose description of the structure or architecture of the system. Examples of such diagrams and descriptions are recreated in Figure 4.
FIGURE 4. The "architectures" of tools and environments are often depicted as box-and-line diagrams which are occasionally accompanied by some descriptive text. These examples each describe visualization tools for parallel computing environments.
Indeed, diagrams and descriptions like those in Figure 4 are common in many areas of computer science. And despite being almost completely informal, they can be surprisingly effective. But the labels assigned to the features of a diagram are typically unique to the particular system being described. Furthermore, the terms for general architectural patterns are casually defined (if at all) and are used differently by different authors. Shaw and Garlan [SG96] characterize the situation as follows:
Unfortunately, diagrams and descriptions are highly ambiguous. At best they rely on common intuitions and past experience to have any meaning at all.
As we will soon see, the implications of this statement have a profound effect on providing metacomputing support for computational science.
Because the performance demands of computational science problems are so great, collaborations between scientists and expert parallel programmers are essential. As mentioned earlier, a main goal of computational science is to train "hybrid" scientists with expert knowledge in both areas. But, until computational scientists are more numerous, personal collaborations between these previously disjoint communities will be required to get the best application performance. When faced with building entire metacomputing environments for computational science, the success of these collaborations is even more crucial.
In light of Shaw and Garlan's claim that the diagrams and descriptions so commonly used to communicate about software rely on "common intuitions and past experience to have any meaning at all," we must be concerned with the ability of such collaborations to produce robust and useful software. To what extent do physical scientists and computer scientists have the required common intuitions and past experiences to understand each other with respect to this goal?
This is not to say that these two communities should have any common intuitions and past experiences. But it does strongly suggest that a common "domain of discourse" [ASWB95] is needed if research collaborations to support metacomputing for computational science are to be successful. The software engineering community has so far relied on folklore, informal models, and unproven theories about software architectures [SG96]. However, even among that community there is a perceived need for more formal definitions and representations of software architecture. Compared to the relatively new community of scientists and computer scientists faced with building domain-specific metacomputing environments together, software engineers are a tightly-knit group. Accordingly, the benefits of improved software design and development techniques to these software projects seem both more imperative and more promising.
Improved software design and development support must include several features. For example, it has long been desirable among software engineers to build systems from so-called "reusable parts," but doing so has proved very difficult [AB96, GAO95, NM95]. Garlan, Allen, and Ockerbloom [GAO95] contend this is because of "mismatches" in the "assumptions a reusable part makes about the structure of the application in which [it] is to appear." Nonetheless, using reusable parts to instantiate different domain-specific metacomputing environments built from a common collection of basic parts is a desirable goal. In addition, building systems from "frameworks" is complementary to using reusable parts. A framework essentially defines a class of applications (or environments) sharing a common architecture and built from a set of generic components [NM95]. Still another goal is to facilitate a high degree of interoperability between the components of a metacomputing environment. Heiler's work [Heil95] seeks to ensure that "a common understanding of the meaning of the requested services and data" exists between the requester and provider. Manola [Mano95] essentially advocates that the use of domain-specific knowledge is central to achieving interoperability. Finally, Purtilo, Snodgrass, and Wolf [PSW91] describe a "software bus," an abstract software layer through which heterogeneous application (components) exchange data and control.
One of the most primitive manifestations of many of these features is the software library. But, as Rice [Rice96] explains, the software library "still requires a level of expertise beyond the background and skills of the average scientist and engineer...."-a case of lacking common intuitions and past experiences. Furthermore, a software library is subject to the "mismatches" described by Garlan et al. [GAO95]. A higher-level, more comprehensive methodology that simultaneously facilitates the collaboration between scientists and computer scientists is required.
The last general requirement of domain-specific metacomputing for computational science is domain-specificity itself. Some view a domain-specific approach to software as a "compromise" position [TTC95]. That is, while not purporting to be a fully general approach, it does avoid the poor software reuse that results from "point solutions" [FGNS96, TTC95] (a particularly prevalent problem in the parallel and distributed computing community). Thus, a domain-specific approach seeks to build families of related systems.
Domain-specificity is characterized by a deep consideration in the design and development of a software system for the context and type of problems to be addressed. While domain-specificity is a somewhat more qualitative requirement than the others mentioned so far, the reasons for it are no less apparent. For example, Springmeyer, Blattner, and Max [SBM92] express its importance with respect to software functionality:
Domain specific knowledge plays an important role in improving the functionality of software. A designer can apply knowledge of how domain activities are actually practiced to improve the effectiveness and usability of software tools....
Domain-specificity also offers a way to improve the collaboration between scientists and computer scientists, as expressed by Taylor, Tracz, and Coglianese [TTC95]:
The domain-specific approach affords greater opportunity for user involvement, analyst/developer cooperation, and concern for manageability, productivity, and cost-effectiveness.
Central to the requirement of a domain-specific approach is the development of a "domain model," the goal of which is to standardize the terminology of the problem domain and the descriptions of specific problems to be solved in the domain. Taylor et al. [TTC95] claim that domain models "enable effective communication between the developers of a system and those procuring it." In other words, a domain model begins to build up the common intuitions required for effective collaboration between scientists and computer scientists.
Orienting software development around particular domains has the desirable effect of limiting the scope of the software so that a general solution is neither required or attempted. Simultaneously, it increases software reuse, improves software functionality, and facilitates the collaboration between scientists and computer scientists.
In summary, domain-specific metacomputing for computational science has three main requirements: high performance heterogeneous computing, a software design and development methodology, and means for achieving domain-specificity. These requirements and the preceding discussion begin to characterize more formally the area of domain-specific metacomputing for computational science. But requirements are made explicit so that they may be addressed. The next section continues the characterization of the area by describing three core technologies and how they address the requirements just presented.
The purpose of this section is to identify the enabling technologies for domain-specific metacomputing for computational science and to explain how they address the requirements set forth in the previous section. Each technology primarily addresses one requirement and has a secondary correspondence to a another requirement. Figure 5 depicts these relationships and will be explained in the following sections. Each technology will be described in extensive detail in the next chapters; this chapter serves only to outline the technologies and their relationships to the area requirements.
FIGURE 5. The requirements and technologies for domain-specific metacomputing for computational science.
The first core technology that addresses metacomputing for computational science is metacomputing itself. The literature proposes a variety of definitions for metacomputing. The presence of multiple definitions is indicative of the technology's relative youthfulness. Consequently, the definition proposed here is a combination of those found in the literature [KPSW93, SC92, FK96]:
A metacomputing environment is a program execution environment which, primarily through software, supports and simplifies the coordinated use of heterogeneous computing systems, high-speed networks, and other computational resources, possibly located at different geographic sites and in different administrative domains.
Khokhar et al. [KPSW93] restrict metacomputing to "computations exhibiting coarse-grained heterogeneity in terms of embedded parallelism." At the other extreme, Smarr and Catlett [SC92] define it very generally as "a network of heterogeneous, computational resources linked by software in such a way that they can be used as easily as a personal computer." Somewhere between these is the definition by Foster and Kesselman [FK96], which defines metacomputers as "execution environments in which high-speed networks are used to connect supercomputers, databases, scientific instruments, and advanced display devices, perhaps located at geographically distributed sites." Certain characteristics of each of these definitions are reflected in the definition we propose, the key aspects of which are that (1) metacomputing is largely a software solution, (2) diverse, distributed computational resources are supported, and (3) the environment is well-integrated and simplifies access to the available resources.
In addition to the obvious goal of improved performance, metacomputing seeks to increase accessibility to supercomputing capabilities and to enable unique computing capabilities not otherwise possible [FK96]. These two goals roughly correspond to the requirement of more cost-effective and flexible heterogeneous computing. To that end, metacomputing primarily addresses the requirement of high performance heterogeneous computing. Metacomputing and heterogeneous computing share an emphasis on utilizing heterogeneous hardware resources such as workstations and parallel/vector supercomputing systems, but metacomputing broadens this somewhat by including other resources like database servers, scientific instruments, graphical displays. In this regard, metacomputing addresses the hardware requirements associated with heterogeneous computing.
But metacomputing is concerned with more than just hardware. In fact, metacomputing emphasizes software as a key, if not the central, part of the solution by focusing on the construction of robust, comprehensive, higher-level environments that support computation and tool construction. For example, Foster and Kesselman [FK96] employ a "toolkit" approach which attempts to integrate a collection of modules, each of which supports higher-level services through well-defined interfaces. Grimshaw and Wulf [GW96], on the other hand, are building an object-oriented class hierarchy and runtime environment to better facilitate customization, extension, and replacement of system functionality by users. The POOMA framework [ABCH95] uses an object-oriented approach to address portability and retargetability requirements. Finally, both POOMA and POET [AM94] utilize frameworks and an object-orientation to facilitate the mapping between a given physical phenomenon and the algorithms and data structures used to model it. While the best approach may not be immediately evident, such efforts are (or soon will be) addressing the software requirements for heterogeneous computing. More generally, these efforts are trying to improve the software design and development process for metacomputing capabilities.
Thus, metacomputing technology primarily applies to the requirement of high performance heterogeneous computing, but also shares a secondary focus on software design and development needs. This relationship is illustrated in Figure 5 in which the area of metacomputing is overlapped mostly by high performance heterogeneous computing and to a lesser extent by software design and development.
The area of software architecture attempts to formalize the description of the structure and topology of a software system, and to "clarify structural and semantic differences among components and interactions" [SG96]. According to Garlan and Shaw [SG96], software architecture "found its roots in diagrams and informal prose," like those discussed previously and appearing in Figure 4. They arrive at this claim after identifying three design levels for software: architecture, code, and executable. They argue that the code and execution levels are now well-understood as evidenced by the evolution of higher-level languages from machine language, symbolic assemblers, and macro processors. They contend that the architecture level, however, is currently understood mostly in an intuitive manner and lacks uniform syntax and semantics to reason about the diagrams and accompanying prose. Software architecture, as an area, seeks to improve both the understanding and the precision of the architecture level of software design.
A good starting place for software architecture is to consider the variety of organizational styles that software exhibits, such as client-server, pipe-filter, object-oriented, or dataflow. Garlan and Shaw [SG96] make the following observation:
Systems often exhibit an overall style that follows a well-recognized, though informal, idiomatic pattern.... These styles differ both in the kinds of components they use and in the way those components interact with each other.
They define a common framework within which they can compare the different styles. The framework consists of components (e.g., clients, servers, filters, databases), connectors (e.g., procedure calls, events, protocols, pipes), and a set of constraints that determine how they can be combined. The unique aspect of this model is that it treats the interactions between components (i.e., the connectors) at the same level as the components themselves. At least part of the goal of this technique is to avoid the "mismatches" between components of software [GAO95]. If one abstractly views a metacomputing environments as a collection of interoperating entities (which may be objects, modules, tools, machines, or applications depending on the design and implementation of the particular system), the potential benefits of a design methodology that models both the components of the system as well as the interactions between them become apparent.
The technology of software architecture has been most successfully applied in building domain-specific software architectures (DSSAs) [HPLM95, TTC95]. A DSSA provides an organizational structure for a family of applications (e.g., avionics, mobile robotics, or user interfaces) such that new applications or products can often be created very easily or even automatically [SG96]. DSSAs provide a software engineering methodology that is tailored to a particular domain. In this way, it partially addresses how domain-specificity can be manifested within a software system. With respect to what areas may be amenable to domain-specific architectures, Taylor et al. [TTC95] claim
[The] existence of a large amount of code that is typically successfully scavenged is an indicator that the domain may be mature enough for attempts to regularize it and apply the DSSA approach. Similarly, well-developed and well-used libraries are indications of mature domains.
Clearly, many computational science domains (e.g., computational fluid dynamics, mechanics and structural analysis, etc.) fit these requirements, as evidenced by the large number of scientific software libraries available. Thus, in theory, DSSAs could be created for these areas. Taylor et al. [TTC95] outline a five-step process for engineering a DSSA which will be discussed in more detail later. Suffice it to say here that the process includes a high degree of interaction between "domain experts, systems analysts, software engineers, and potential customers" [TTC95]. That interaction is largely facilitated by agreeing upon the terminology, concepts, and requirements of the domain. In other words, domain-specific architectures can aid in addressing the requirement of domain-specificity.
Surprisingly, these and related ideas have not gone completely unexplored in the parallel and distributed computing community. For example, Cuny et al. [CDHH96] propose the idea of "domain-specific environments" based on the contention that "solving a particular computational science problem... involves a combination of several technologies with a domain-specific purpose." Their approach emphasizes the collaboration between application scientists and computer scientists. With respect to technology, they seek solutions exhibiting programmability, extensibility, and interoperability to better facilitate the experimental nature of leading-edge scientific applications. In other work, Armstrong and Macfarlane's POET system [AM94] "captures the basic communication paths and the structural aspects of the computational algorithm that are necessary to implement a parallel version of particular scientific problem classes." Concern for application design and a variation of domain-specificity is evident in their approach. In addition, Anglano, Schopf, Wolski, and Berman [ASWB95] propose an abstract, graphical notation for describing heterogeneous applications. Their notation and accompanying annotations are well-defined and precise. But more importantly, this notation acts as a "domain of discourse" between scientists and computer scientists. In essence, they provide a primitive domain-specific software architecture for heterogeneous computing applications.
In summary, the technology of software architecture provides ideas and concepts that can improve the overall software design and development process. In addition, the sub-area of domain-specific software architectures offers methods that could be useful for achieving domain-specificity within a metacomputing environment for computational science applications. The few manifestations of these concepts within the parallel and distributed computing community are of particular interest and will be discussed in more detail later.
The last technology central to domain-specific metacomputing for computational science is domain-specific environments. As described earlier, domain-specific environments (DSEs) and problem-solving environments (PSEs) have a unique relationship. Whereas a PSE provides a specific computational capability (e.g., solving partial differential equations) that has potential application in many domains (e.g., structural mechanics, chemical engineering), a DSE seeks to address all of the computational requirements within a single domain. Their relationship is an orthogonal one, as illustrated in Figure 6.
FIGURE 6. Problem-solving environments provide comprehensive support for a single computational service that is applicable across many domains. Domain-specific environments pick a single domain and may provide a collection of diverse computational support.
The main problem in applying PSEs, as originally defined by Gallopoulos et al. [GHR94], to large computational science problems, particularly those designated as Grand Challenges, is that the methods and solution techniques are not necessarily well-understood and standardized yet. Similarly, experimental applications typically consist of several component phases that may not be well-integrated. For example, an environment to support seismic tomography [CDHH96] consists of seven distinct steps, each with its own domain-specific requirements. Recently, however, the term "problem-solving environment" has been applied more generally to indicate a well-integrated, comprehensive computational environment. In particular, the ASCI program calls for the development of problem-solving environments that support code development, application execution, and results analysis in a "unified computing and information environment" [DOE96]. This vision of PSEs transcends the original, and bears a greater resemblance to domain-specific environments than to traditional problem-solving environments.
Terminology aside, we consider the creation of domain-specific environments, as characterized below, to be the most relevant example of domain-specific high performance computing support. The GEMS system [BRRM95] seeks broad, comprehensive high performance computing support for the Grand Challenge area of environmental modeling. GEMS is clearly not a problem-solving environment in the original sense of the term; it does not provide a specific computational capability applicable to many domains. Rather, it provides a comprehensive set of capabilities targeted to the domain of environmental modeling, including data management, analysis, visualization, and performance monitoring. The GEMS system was developed through an intense collaboration between software developers and domain experts. The development team concluded that "to produce high-quality software the organizational approach must encourage coordination between the developers [computer scientists] and clients [application scientists] throughout the entire process" [BRRM95]. The work by Cuny et al. [CDHH96] has created a similar environment for seismic tomography through a similar collaborative process. Their work focuses on addressing the evolving requirements of experimental scientists as well as building systems in the face of ever-changing computational technology. They conclude that to build DSEs, researchers needs to "develop methods, tools, and infrastructure that utilize capabilities of programmability, extensibility, and interoperability" [CDHH96]. These characteristics are central to the unique role that domain-specific environments play in supporting experimental science. In particular, as scientists develop new ideas about how to solve their problems, new requirements for the DSE arise. Later, we will show in more detail how programmability, extensibility, and interoperability are central to addressing this need.
Domain-specific environments provide the best technology for achieving domain-specificity in the functionality of domain-specific metacomputing environments for computational science. DSEs make two major contributions in this regard. First, they advocate an intense collaboration between computer and application scientists so that the resulting software systems not only address the scientists' needs but do so in a meaningful and useful way. Second, DSEs reveal promising ideas and methods for integrating high performance heterogeneous computing into the experimental process. While this and other sections focus primarily on how problem-solving environments relate to domain-specific environments, we in no way intend to neglect the independent successes of problem-solving environments. Their comprehensive environments, transparent access to high performance computing, and attention to good software engineering practices address aspects of each of the requirements previously identified. The topic of PSEs is raised in the remaining chapters where relevant
In conclusion, when the three technologies of metacomputing, software architecture, and domain-specific environments are examined collectively, they fully address the requirements of high performance heterogeneous computing, software design and development, and domain-specificity. Furthermore, the unique, overlapping manner in which each pair of technologies fully addresses a particular requirement suggests the completeness of these requirements and technologies for domain-specific metacomputing for computational science and positions the area as one with many research challenges.
The focus of this chapter has been on establishing a relationship between the requirements and technologies for domain-specific metacomputing for computational science. While the focus of this paper is primarily on the motivation, characterization, and foundations of this area, it also provides a "case study," if you will, of the more general problem in science-and society-of how technology is understood, applied, integrated, and used.
As we have seen in this chapter, it is often the case that a given technology does not fully address a single requirement. Furthermore, a single technology may have varying degrees of relevance to multiple requirements. Finally, it may take the combination of multiple technologies to address a single requirement. In the case of computational science metacomputing, the collection of technologies and requirements form an interlocking cycle. In general, one can not expect such a nice pattern to arise. And the identification of such patterns, as in this case, may take considerable thought and organization as well as a deep understanding of the requirements and technologies involved.
The next three chapters describe the technologies of metacomputing, software architecture, and domain-specific environments in more detail. Chapter 3 explores the foundations of metacomputing in parallel and distributed computing. Then, supporting research in the areas of software engineering and computational science is described in Chapters 4 and 5, respectively. We conclude in Chapter by synthesizing domain-specific metacomputing for computational science as a research problem and speculating on how it might be pursued.
Domain-Specific Metacomputing for Computational Science:|
Achieving Specificity Through Abstraction
By Steven Hackstadt