Hackstadt's Oral Comprehensive Exam Position Paper: Chapter 4

Domain-Specific Metacomputing for Computational Science:
Achieving Specificity Through Abstraction
By Steven Hackstadt

CHAPTER 4
Supportive Research In Software Engineering

Certain research topics in software engineering have particular relevance to domain-specific metacomputing. The goal of this chapter is not a comprehensive review of software engineering-a major undertaking by itself-but rather, to identify topics of research that may contribute to the area of domain-specific metacomputing for computational science. Earlier chapters provided some motivations as to how and why software engineering may contribute; we begin by reviewing those. This is followed by a brief look at how software engineering is currently applied in the parallel and distributed computing community. The last two sections focus on the software design and development problem from two perspective: user and developer. For the user (i.e., the application scientist), we explore how object-orientation facilitates application development. For the developer (i.e., the builder of a metacomputing system), we discuss how software architectures-and domain-specific software architectures, in particular-provide guidance for constructing more useful systems for application scientists.

The Software Crisis in Parallel Computing

The "software crisis" of the 1960s led to the first use of the term software engineering. Since then, and despite numerous advances in approaches, tools, and education, that crisis is still with us today ([Somm89], p. 3). Today's software crisis challenges the software engineer to produce high quality software in a cost-effective manner. Sommerville proposes four criteria for high quality software ([Somm89], p. 4):

Maintainability: Software changes should not result in undue costs.
Reliability: Software that fails or produces faulty results is not useful.
Efficiency: Software should not waste system resources.
Usability: Software user interfaces should accommodate their intended users.

The extent to which each criterion is individually addressed has a direct impact on the cost of software. But these criteria are interrelated, which makes minimizing overall cost and maximizing individual criteria difficult. For example, incorporating a better user interface may reduce efficiency. Subsequently attempting to improve efficiency, though, may make the software more difficult to change, thus decreasing maintainability. In general, maintainability, reliability, efficiency, usability, and other similar nonfunctional software requirements represent trade-offs for software engineers. It is achieving a balance among such criteria that makes producing high-quality software so challenging.

Software engineering is composed of many subdisciplines. Ramamoorthy and Tsai [RT96] identify seven of them: development process, requirements engineering, design, programming languages, testing, reliability and safety, and maintenance. Of these subdisciplines, four are of particular relevance to this chapter:

Development Process. As applications grow larger and their domains become more complex, the software development process must keep pace. To this end, object-oriented techniques enable modularity, abstraction, reuse, and programming-in-the-large. Software development traditionally ignores the end users' views. More effective software systems combine a developer's technical knowledge with end users' domain knowledge.
Requirements Engineering. End user needs and knowledge are recorded so they may be used during system development. Software prototypes and simulations help users and developers to understand development problems and possible solutions.
Design. Object orientation plays a key role in the area of design. Object-oriented systems are loosely coupled but highly cohesive, with design activities carried out at two levels: the high level focuses on the system architecture and decomposition, while the low level focuses on algorithm, data, and code design.
Programming Languages. The evolution from low-level machine code and assembly languages to high-level languages like Fortran, C, and object-oriented languages is characterized by increases in level of abstraction and modularity. These characteristics facilitate development, improve maintainability, and reduce costs.

Chapter 2 suggests several ways in which software engineering could help realize the goals of domain-specific metacomputing. In building "big" metacomputing software environments, like those described in Chapter 3, it is often desirable to reuse existing software (e.g., communication packages, numerical libraries) and simultaneously support a high degree of flexibility and extensibility so that unpredicted applications of metacomputing technology can be supported in the future. Conversely, the potentially diverse resources and capabilities of metacomputing environments may greatly complicate the creation of the applications which ultimately use them. Thus, from the area of software engineering, domain-specific metacomputing may potentially benefit from research in software design and development, software reuse, and interoperability.

However, in the parallel and distributed computing community, techniques from software engineering have not been widely embraced. Application and tool development has been carried out in a largely ad hoc manner, resulting in what some consider to be yet another crisis [Panc91, Panc94]. Chandy identifies four unique aspects of parallel software that have contributed to this state of affairs [Chan94].

Rate of Change. The relatively rapid emergence of parallel computing demands that a large number of parallel applications be developed in a relatively short amount of time. (Software costs)
Correctness. Nondeterminacy and multiple threads of control make formal reasoning and debugging difficult. (Reliability)
Performance. Architectural diversity and the range of factors that can affect performance (e.g., communication latency, data distribution) complicates optimization and efficiency. (Efficiency)
Absence of Standards. Inconsistent and inadequate tools hinder productivity, and the lack of hardware standards and only a few software standards limits the development of CASE tools. (Maintainability)

Parallel computing's rapid rate of change results in high software costs and makes software development in this area a risky endeavor. Chandy's other observations are largely consistent with Sommerville's criteria for well-engineered software ([Somm89], p. 4). In particular, correctness has direct bearing on the reliability of software; performance and efficiency are inextricably linked; and an absence of standards certainly complicates maintainability. Usability, the fourth of Sommerville's criteria, is, unfortunately, often an afterthought in light of the other daunting challenges faced by parallel software developers [Panc91]. Each of these challenges ultimately affects the productivity of the parallel tool or application developer. Yet, to date, just two relatively simple forms of "software engineering" have been applied.

Primitive Forms of Software Engineering

The most common forms of "software engineering" employed within the parallel and distributed computing community are code scavenging and software libraries. Both of these techniques facilitate the reuse of programs and code, though at a relatively low level [TTC95].

Code scavenging occurs when pieces of source code are copied from one application for use in another. While common, code scavenging is less a software engineering "technique" and more a natural by-product of informal code development. A software library, however, is a compiled collection of specific procedures and data structures that supports some particular functionality and is accessible through a procedural interface.

Software libraries can support a wide range of functionality. One of the most prevalent types, mathematical software libraries, have been around since the 1960s [Rice96]. Examples include LINPACK, LAPACK and ScaLAPACK [DW95] for linear algebra; ODEPACK [Hind83ode] for differential equations; and FFTPACK [Swar82] for performing fast Fourier transforms. Software libraries for specific scientific areas also exist. For example, CFDLIB solves a wide range of computational fluid dynamics problems [John96]. These types of libraries most commonly support the application developer by providing pre-packaged, generic, and widely applicable functionality. However, the tool developer also benefits from libraries that support tasks like constructing and managing user interfaces (e.g., XForms [ZO97]), sharing and communicating data with other processes (e.g., Nexus [FKT96]), and interacting with databases (e.g., POSTGRES [YC95]).

But, as Chandy [Chan94] describes, software libraries have several limitations with respect to parallel computing. First, libraries are language- and architecture-specific. It is not possible to develop libraries for every combination of parallel programming language and parallel architecture. Second, software libraries for parallel computing carry a risk of commitment because so few standards exist. The chance that a given parallel language and architecture may not be supported in the future is much greater than in sequential computing. Finally, composition and data mapping complicate the creation of parallel software libraries. In sequential programming, a single data space and a single type of functional composition simplifies the issues of composition and data mapping. But in parallel programming, library procedures may have certain expectations for data layout (e.g., with respect to the distribution among processors) but also be expected to cooperate in parallel execution.

In addition, other researchers [TBSS93] expose an inherent limitation on the scalability of software libraries, pointing to a phenomenon called "feature combinatorics" as the primary cause. As software libraries achieve broader use, a broader range of features must be supported. "The implementor of the component library must laboriously enumerate every permutation of feature selections" or possibly fail to meet the needs of a particular application [TBSS93].

Finally, Rice [Rice96] comments on the generally usability of software libraries:

Although the software library provides some form of abstraction and a facility for reusing software parts, it still requires a level of expertise beyond the background and skills of the average scientist and engineer....

Thus, for parallel and distributed computing (and ultimately, metacomputing) software libraries pose difficulties for both the developers and potential users. This suggests that software libraries, as a mechanism for software reuse, may not be adequate. In the remainder of this chapter, additional software engineering techniques that may benefit software design and development of, and within, metacomputing environments are presented.

We do not intend to single out a "best" solution, but rather we seek to identify the potential advantages and disadvantages of each approach, recognizing that for a given situation, the best approach depends upon a number of criteria, including the application, the programmer, project goals, and functional and nonfunctional requirements.

Object Orientation

Proponents of an object-oriented approach to software development claim many advantages, including data abstraction, reuse, extensibility, and flexibility ([CABD94], p. 7). Data abstraction occurs since the implementation of an object is hidden behind the interface (methods) through which other objects access functionality. Reusability is facilitated because objects encapsulate both methods and data structures into a single entity that can more easily be applied in different contexts. Extensibility is supported through object inheritance and sub-classing.

While the potential benefits are numerous, object technology has not been as widely adopted as expected [Panc95]. Certainly, part of the reason for this has been the relatively steep learning curve incurred by programmers switching from other programming paradigms [Panc95, FT95]. Other factors include the requirements of new tools, languages, metrics, and software development processes [FT95]. Norton et al. [NSD95] summarize the apprehension among scientific programmers:

Although valuable progress continues, until these methods become commonplace, as demonstrated by supercomputer manufacturer support and standards committees, most developers may remain apprehensive about adopting new languages. Thus, the future of scientific programming will depend on establishing standards and recognizing educational trends in software design.

From this standpoint, we still consider object oriented computing a "technology" that may potentially benefit metacomputing for computational science. But, we must note that compared to the other technologies examined in this chapter, object-oriented computing is more advanced, more thoroughly researched, and more widely applied.

Attempts to apply object-oriented technology within the parallel and distributed computing communities have followed three primary techniques. The most basic approach is to use an object-oriented language (typically C++) combined with a standard message-passing library (such as MPI). In this case, the application developer must manage parallelism explicitly. Among those attempts to support more general, portable, and automatic object-oriented parallel computing (i.e., object-oriented parallel languages), a common theme has emerged: parallelism is supported through specialized class hierarchies and/or language extensions that interface with complex runtime systems supporting task and/or data parallelism. Norton et al. [NSD95] identify several examples: ACT++, C**, Charm++, Compositional C++, Concert, Concurrent Aggregates, Concurrent C++, COOL, DC++, DCE++, HPC++, Mentat, Parallel C++, pC++, POOL-T, and POOMA. In this case, the application developer adopts one of these language systems (and the abstractions it supports) to implement their application. Finally, higher-level programming and problem-solving environments built on an object-oriented foundation support a range of features to assist scientists and/or application developers. The following sections briefly describe examples of each approach to integrating object-oriented technology with parallel computing.

Object-Oriented Languages And Message Passing Libraries

The most rudimentary means of parallel object-oriented computing is to use an object-oriented language such as C++ and a message passing library. An example of this technique is the work by Norton et al. [NSD95]. They describe their experience porting a Fortran 77 plasma particle in cell (PIC) simulation code to C++ and Fortran 90 (which also supports some object-oriented concepts). They seek a high-performance, cross-platform solution capable of executing on Intel Paragon, IBM SP1/SP2, and Cray T3D distributed memory parallel computers, while simultaneously taking advantage of the improved design, development, and maintenance characteristics of the object-oriented paradigm.

Two requirements are immediately evident. First, the target parallel architecture must have a C++ compiler available. Second, a message passing library such as PVM or MPI that is compatible with the compiler must also exist for the target architecture. Once these basic requirements have been met, it is up to the programmer to carry out the design and implementation of the application. This includes designing and implementing the required class hierarchy (usually from scratch), managing decomposition and concurrency, and dynamically balancing processor loads at runtime (if necessary). Of course, with respect to metacomputing, all of the shortcomings of basic message-passing libraries (e.g., PVM) as discussed in Chapter 3 also apply. In addition, though, this low-level approach to parallel object-oriented computing has other limitations.

For example, a main reason for adopting the object-oriented paradigm is reuse. But as Pancake [Panc95] notes, it is rarely known what functionality may actually be amenable to reuse:

Typically, it is only after an [object-oriented] application is complete that the developers understand which objects might have broader use. Those objects then must be restructured through a process known as generalization, that in turn may require revisiting several earlier stages of object design....

Norton et al. [NSD95] confirm this through the refinement of their particle simulation class hierarchy:

Unfortunately, hierarchies are nearly impossible to design correctly on the first attempt. Moreover, when the design is poorly organized, it is difficult to modify it without triggering something close to a complete redesign.

While they claim to have reused much of their code during the refinement process, they recognize that "if the new class hierarchy cannot be defined with clean interfaces, the best approach is to redesign it from the beginning" [NSD95].

In addition to limiting reuse, a low-level parallel object-oriented approach also limits code portability, especially where machine-specific message passing libraries and compilers are involved. Norton et al. [NSD95] use different communication libraries and compilers for each architecture.¹ This resulted in numerous problems, including five months of lost development time from compiler inconsistencies alone.

But perhaps the largest barrier preventing wide-spread adoption of C++ (and other object-oriented languages) in the parallel computing community is performance. Norton et al. [NSD95] report C++ execution times that are 53-110% slower than same-sized problems implemented in Fortran 77, depending on the architecture and message-passing library used. Even for their sequential computations, C++ performs about twice as slow as Fortran 77. This loss of performance is partly due to memory overhead and data access costs. Also, whereas Fortran allows arrays to be passed directly as message-passing parameters, C++ requires the use of intermediate buffers [NSD95]. In general, object-oriented computing has overheads associated with it that decrease performance, or at least make optimization more difficult.

Similarly, efforts to develop efficient computational kernels and components for one compiler/library/machine combination do not necessarily translate into efficient execution on other platforms. For example, Norton et al. [NSD95] observe that "designing efficient and portable C++ code is difficult due to differences in compiler implementations." In addition, they designed and built a separate class that supported a virtual parallel machine. This class encapsulated all the machine- and library-specific details needed for parallel execution. While clearly a good example of how the object-oriented paradigm can be used to enhance portability, the reusability and generality of that class is subject to the problems previously described. Such a class could be applied to a broad range of other applications, but to do so might require significant modification to the original implementation.

Norton et al. also identify the major design improvements over Fortran 77 (and similar languages). An object-oriented approach allows program classes to correspond directly to the "physical and computational constructs" of the simulation [NSD95]. They also note that in comparison to Fortran 77, object-orientation "provides a programming perspective that reflects the problem domain" [NSD95]. Thus, object-orientation holds particular promise as a means of facilitating domain-specificity. We explore this topic in more detail later.

Object-oriented languages provide a good example of the trade-offs associated with nonfunctional requirements. In terms of Sommerville's criteria for well-engineered software ([Somm89], p. 4), object-oriented languages facilitate maintainability as well as some degree of reusability, but efficiency (or more specifically, performance) suffers. In the context of parallel scientific computing, reliability and performance are typically the most critical nonfunctional requirements. Does this mean object-oriented languages have no place in computational science? Certainly not. What is needed are more compelling reasons for using object-oriented languages. That is, the nonfunctional benefits of object-oriented languages must be brought to bear on the open challenges in domain-specific metacomputing for computational science.

Parallel Object-Oriented Class Hierarchies

An eventual outgrowth of the approach taken by Norton et al. [NSD95] is a general framework within which different types of parallel particle simulation codes may be deployed. Indeed, the development of parallel object-oriented class hierarchies can result in improved reuse. Furthermore, hierarchies focused on a particular domain may provide an effective means for achieving domain-specificity. As we have shown, however, the creation of such a framework requires careful consideration in the design and implementation of the constituent classes. Thus, for the developer, all of the challenges of the lower-level approach remain since essentially the same task is being carried out. But the application developer who can adopt such a framework gets all the advantages of the end product without incurring the cost of developing it. The disadvantages to the end-user are (1) a possible lack of familiarity with the hierarchy's implementation which could hinder major changes, and (2) encountering the situation where the framework provides most of the desired functionality, but is missing one or more application-critical features or components, thus preventing adoption of the framework as a whole. A framework designed in conjunction with application scientists could avoid these problems. Essentially, this approach decouples the design and development of a class hierarchy from its actual use.

Parallel Object-Oriented Methods and Applications. The Parallel Object-Oriented Methods and Applications (POOMA) Framework attempts to support a flexible environment for scientific programs that can exploit data-parallel semantics [RHCA97]. In fact, POOMA was originally conceived to support the same domain (i.e., particle in cell simulations) as the work by Norton et al. [NSD95]. Not surprisingly, there are major similarities in the object-oriented abstractions (appropriate to PIC simulations) each system supports. But whereas Norton et al. construct a class hierarchy as a side-effect of porting an application from Fortran 77, POOMA seeks first to construct a comprehensive class hierarchy that can subsequently be used by a range of applications in the given domain.

POOMA is a comprehensive C++ class hierarchy for the data-parallel development of classes of scientific applications. POOMA actually targets several related domains, including plasma physics, molecular dynamics, computational fluid dynamics, rheological flow, vortex simulations, porous media, medical imaging, and material science [ABCH95, RHCA97]. This is accomplished through a five-layer class hierarchy and a variety of data types appropriate to the application areas. In addition, POOMA provides a portable, high-performance, serial/parallel programming model.

The POOMA system provides an example of a framework, which is discussed in more detail in Chapter . Reynders et al. [RHCA97] explain the positive implications of this approach:

Computer scientists and algorithm specialists can focus on the lower realms of the FrameWork, optimizing computational kernels and message-passing techniques without having to consider the application being constructed. Meanwhile, application scientists can construct numerical models with objects in the upper leaves of the FrameWork, without knowing their implementation details.

This separation is possible because of POOMA's object-orientation. Parallelism and application science are encapsulated within different objects, helping to prevent the interlacing of message-passing commands and computational algorithms within the application code [RHCA97]. Not only does this improve the understandability of application code, it keeps orthogonal aspects of the hierarchy code separate from one another. This, in turn, improves the generality and extensibility of the hierarchy.

This separation manifests itself in the five layers of the class hierarchy. At the highest level, the Application Layer represents "abstractions directly relevant to application domains" [RHCA97]. The Components Layer contains the building blocks from which the Application Layer is constructed, including solvers, FFTs, and particle operations. This layer represents reusable components that can be used to compose applications [ABCH95]. Next, the Global Layer defines the abstract data types (fields, particles, matrices, etc.) that are used by the Component and Application Layers to create domain-specific structures. The Parallel Abstract Layer implements the key abstractions of parallel simulation and programming, such as data layout, communication, and load balancing. Finally, the Local Layer contains node-local instances of Global Layer data structures.

POOMA is a language target for the Accelerated Strategic Computing Initiative (ASCI) described earlier [DOE96]. Thus, its operation in larger-scale, heterogeneous environments is of concern. Reynders et al. [RHCA97] indicate that they are investigating how POOMA might support more coarse-grained, task-parallelism. POOMA already provides abstract representations for key metacomputing requirements like load balancing, data distribution, and communication. It remains to be seen whether the POOMA researchers will expand these capabilities to provide a comprehensive metacomputing programming environment, or whether they will make attempts to interface with external objects, tools, and resource managers like Globus and Legion and have POOMA operate as a component within a larger environment.

In summary, object-oriented class hierarchies still require substantial programming efforts to create real applications. The reusable abstractions are available, but the science (in the form of algorithms) still has to be expressed. On the other hand, perhaps this approach strikes an effective balance between allowing a high-degree of software reuse while still retaining full programmability of the application.

Object-Oriented Systems For Parallel Computation

We describe an object-oriented system for parallel computation as a software system built with object-oriented technology and which supports or facilitates higher-level access to parallel computing capabilities. The object-oriented nature of the system's implementation may be present in varying degrees in the user's experience with the system, and the level of access to parallel computing capabilities is assumed to be something higher than that required when using a predefined class hierarchy. We describe only one example in this section, but other systems also fit this description. For example, the problem-solving environment PDELab [WHRC94] (Chapter 5) and the program archetypes concept [Chan94] are other examples of object-oriented systems for parallel computation.

Parallel Object-Oriented Environment and Toolkit. Armstrong and Macfarlane [AM94] describe the Parallel Object-Oriented Environment and Toolkit (POET). At first glance, POET has several striking similarities to POOMA [ABCH95, RHCA97]: the researchers describe it as a framework; it is implemented in C++; it provides domain-specific abstractions useful for a class of problems; and parallelism and data movement is encapsulated within objects. Indeed, at first glance, POET appears to be yet another parallel object-oriented class hierarchy.

But Armstrong and Macfarlane are careful to distinguish their work in this regard. For example, unlike most who use the term, they define their use of the term framework: "an object-oriented style of programming where a pre-existing environment provides a top-level object or objects within which all others are nested" [AM94]. While we regard the term much more generally (e.g., we do not limit it to an object-oriented style of programming), this definition notably excludes C++ (and hence, POOMA) as a framework because "the user must provide a main program that is itself not an object" [AM94]. Even though it is written in C++, the POET toolkit provides a top-level object from which the user "will instantiate, modify, and `frame' together objects provided by the toolkit to create a numerical application" [AM94].

In addition, whereas class hierarchies act as components within a larger calculation or simulation, POET provides a top level (application structure) and a bottom level (parallelism, data movement, and communication) between which the user code is inserted. "Put succinctly: a class library is designed to be driven by user code while a frame-based system [like POET] is designed to drive the user code" [AM94]. Thus, similar to what might be done when using an user interface construction kit, the programmer/scientist fills in "stubs" in template objects [AM94]. POET provides the main control structure (like the user interface event loop that handles keyboard and mouse events) and also manages low-level details such as communication and load-balancing.

POET "frames" are essentially domain-specific abstractions that capture "the basic communication linkages that are necessary to implement a parallel version of particular scientific problem classes..." [AM94].

A specific framed object represents a complete algorithm. The stubs within an object can be replaced by user-defined, science-specific objects or methods. (In addition to C++, POET supports callbacks to C or Fortran routines.) Then, POET uses such objects as it solves the problem. Parallelism and other artifacts of the computational environment are encapsulated within the lower levels of the framework.

POET is fundamentally a template-based approach, except that it supports multiple languages [AM94]. Its object-oriented basis provides the encapsulation of high-level program structure and low-level implementation details, and also allows user code to be inserted in between. Creation of domain-specific POET frames is currently done on an individual basis; the environment does not support this process. (Work in the area of domain-specific software architectures addresses this issue and is discussed later in this chapter.)

What initially appears to be a subtle change in the perspective of control (i.e., who provides the application control structure) is actually a significant paradigm shift that represents support for parallel computation at a higher-level than class hierarchies. In this way, POET is similar to the HeNCE/PVM environment (discussed in Chapter 3), in which scientists expressed algorithms using a graph-based paradigm and only filled in code for an application's core procedures and functions. Creation of the program control structure (i.e., the top-level) and the underlying calls to PVM (i.e., the low-level) were generated automatically. The main problem with HeNCE was the potentially limited set of applications it could generate. With respect to providing multiple frames, POET is an improvement. Within a particular frame, however, POET is actually more restrictive than HeNCE. But that is the very point: by limiting the programmer/scientist to providing only the algorithmic support unique to the science they are pursuing, they can-for a class of problems-be more productive. As Armstrong and Macfarlane state, "the frame-based approach is useful because it is restrictive" [AM94]. In other words, POET achieves of a high degree of (domain-)specificity through (frame-based) abstraction.

Carried to an extreme, this approach may ultimately result in something very similar to a problem-solving environment (PSE), such as PDELab [WHRC94]. A problem-solving environment similarly provides the top-level structure and low-level implementation details for solving the problem, and may allow user-defined functionality to be inserted in between. In a more limited case, the user might only be allowed to specify values for parameterized objects or algorithms. PSEs go beyond a system like POET, though, in that they typically try to emulate the entire problem-solving process (e.g., from initial brainstorming to interpretation of results) [WHRC94]. As we have mentioned previously, though, PSEs transcend domain-specificity by focusing exclusively on a single problem. Nonetheless, the goal is still to facilitate high-level access to high performance computing for solving specific types of problems. As in the case of PDELab, these systems are often built on an object-oriented foundation which extends, to varying degrees, into the user's interaction with the system.

In summary, object-orientation is a major software engineering technology that can assist in supporting access to parallel and distributed computing capabilities. Furthermore, parallel object-oriented computing can take on a range of forms, from an object-oriented language combined with a message passing library to parallel class hierarchies to object-oriented systems that facilitate high-level access to parallel computing capabilities. Indeed, the Legion metacomputing project [GW96] recognizes the potential benefits of this approach as its researchers are constructing a completely object-oriented metacomputing environment. Object-orientation provides some fundamental mechanisms for achieving software reuse which, in turn, can result in increased productivity (by creating less work for the application scientist [MN96]). Furthermore, properties of the object-oriented paradigm facilitate the creation of domain-specific abstractions such as application data structures with clear relationships to the physical phenomena being modeled.

Software Architecture

In this section, we shift our attention from the application scientist's view of software development to the challenges faced by the developers of the systems used by application scientists. The application of object-orientation technology to parallel computing reveals an important theme in using high performance computing to solve computational science problems:

To increase scientists' productivity when trying to use high performance computing, they must be able to interact with the computing environment in a way, and at a level, that is meaningful to them.

For parallel class hierarchies and object-oriented systems for parallel computing, developers make efforts to tailor the systems to specific scientific or computational domains. We can observe the evolution of this trend in the systems previously described.

As we move from parallel class hierarchies (e.g., POOMA [RHCA97]), to programmatic object-oriented systems for parallel computing (e.g., POET [AM94]), to object-oriented problem-solving environments (e.g., PDELab [WHRC94]), increasing efforts to specialize and refine the end user's view of the system are made. In POOMA, for example, the focus is on creating meaningful objects and methods that the application scientist can use in developing code. POET extends this by providing computational templates which begin to limit the amount of code required from the user. Finally, PSEs like PDELab need virtually no code from the user and present an encompassing environment exclusively built to solve a single type of problem.

The construction of these systems demands an increasing amount of effort to determine, design, and implement the domain-specific concepts to be supported. Recent work in software engineering attempts to address this problem. The area of software architecture seeks to represent and reason about the structure and topology of software systems. Its motivations and goals were covered in Chapter 2. Our goal in this section is to explore how the area may apply to parallel and distributed computing as well as domain-specific metacomputing.

Software architecture is concerned with the variety of organizational styles for constructing software that have emerged over time. Systems like POOMA [ABCH95, RHCA97] and POET [AM94] clearly exhibit the style of an object-oriented system. We naturally use this and similar terms throughout our discussion of object-orientation. Indeed, as we argued earlier, software architecture has evolved from intuition, diagrams, and informal prose-a sort of folklore, if you will, that has been built up over several years. A similar term, layered system, is also prevalent in parallel and distributed systems, particularly in the context of networked communications ([SG96], p. 25). A layered system is organized in a hierarchical manner such that each layer provides services to the one above. Despite it being built with object-oriented methods, the PDELab problem-solving environment is primarily built as a "layered architecture" [WHRC94]. Hence, programming style does not necessarily imply architectural style. That POOMA and POET exhibit an object-oriented style results from the manner in which the systems are constructed, not because of the languages in which they are implemented. (Though, in this case, an object-oriented language greatly facilitates the use of an object-oriented architectural style.)

The use of architectural styles is one technique for software reuse. If common software architectures are identified and formalized, then perhaps reusable, modular, style-specific components can be developed. But software engineering has not yet advanced to this point, and the large-scale reuse of generic software is an elusive goal [Chan94, GAO95, HPLM95, SG96, TTC95]. Garlan et al. [GAO95] suggest many possible reasons for this phenomenon. Part of the problem is clearly that software designed with reuse in mind is hard to find or simply does not exist. Even among so-called reusable software, low-level interoperability issues such as programming languages, operating systems, and machine platforms often prevent pieces from fitting together. But Garlan et al. [GAO95] point to a higher-level, more pervasive problem. Components often make assumptions about the structure of the applications in which they are to appear. These assumptions lead to architectural mismatches between components and the applications trying to use them.

The assumptions concern a variety of software construction issues [GAO95]. For example, components often make assumptions about what part of the software holds the main thread of control. If two components both demand it, an architectural mismatch occurs. Similarly, components make assumptions about the format of the data on which they operate. Other assumptions involve application protocols. For instance, if one component uses an event-based model for sharing messages and another one uses procedure calls, the application attempting to integrate these is forced to resolve the difference. The consequences of these assumptions include excessive code size, poor performance, the need to modify external packages, the need to reinvent existing functionality, and an error-prone construction process [GAO95].

Domain-Specific Software Architectures

Indeed, despite promising research [AB96, GAO95, NM95, SDKR95], the general problem of software reuse may not be solved for some time. Furthermore, the use of software architectures based on generic software styles only addresses one aspect of the problem revealed by the application of object-oriented technology to parallel computing. That is, software styles do very little in the way of capturing domain-specific representations and integrating them into a software system.

To this end, researchers are exploring an alternative approach: domain-specific software architectures (DSSAs). The ultimate goal of DSSAs is the same as the other technologies discussed in this section: reusability. But DSSAs do not attempt to provide general software reuse. Rather, the goal of DSSAs is to facilitate software reuse within classes of applications. By agreeing on certain domain-specific software characteristics, the conflicting assumptions of software components can be eliminated, thus avoiding the architectural mismatches that inhibit reuse. Thus, focusing on particular domains has the effect of reducing the software development problem space, allowing effective software solutions to be found more easily.

However, despite promising DSSA efforts, so far computational science has not been a primary target. Current DSSA domains include avionics, command and control, and vehicle-management ([SG96], p. 32).² For example, Hayes-Roth et al. [HPLM95] describe a DSSA for adaptive intelligent systems (e.g., robotics, monitoring systems). They identify the three components of a DSSA: a reference architecture that describes the common framework of computation for the domain of interest; a component library of reusable "chunks of domain expertise," and an application configuration method for choosing and assembling the components required to build a specific application. A full analysis of their system is beyond the scope of this paper. Rather, we focus on a distinguishing feature of DSSAs that is also pertinent to the needs described above: domain modeling.

Domain Modeling

With respect to parallel computing and, more generally, metacomputing, we contend that the only means of improving access to these technologies is to deliver them in domain-specific concepts. We assume that application scientists are scientists first and programmers second. Thus, allowing them to program in familiar terms improves their ability to do science and reduces the struggle with high performance computer programming concepts.

To this end, research on the development of domain models offers methodologies for collecting, organizing, and applying domain-specific concepts and knowledge during the software construction process. A domain model serves several purposes. For example, domain models provide a standard terminology for describing problems within the domain, and they reveal useful abstractions and patterns of composition. Domain models also establish high-level constraints that software must satisfy [Bato94]. According to Might [Migh95], a domain model also indicates what "functions are being performed and what data, information, and entities are flowing among those functions." The concept of a domain is relative. Domains can vary in breadth, depth, level of abstraction, form, and representation. However, the single, common characteristic among all domain models is that they "represent the functions and flows (or behavior) in the domain of interest" [Migh95].

Taylor et al. [TTC95] propose five phases of domain engineering- that is, the creation (or evolution) of domain models and their associated software architectures. In the first phase, interviews with users (or domain experts) reveal the contents of the domain and the needs of users in that domain. The second phase attempts to produce a dictionary and thesaurus of domain-specific terminology and to distinguish between essential and optional features. Design and implementation constraints are developed in the third stage. Whereas the first two stages primarily answered questions about what the domain is, the third stages begins to explore how domain knowledge will manifest itself. The fourth phase addresses the design and analysis of the domain-specific software architecture through an iterative process that is applied at successively lower levels of abstractions until the desired detail is achieved. During the final step, the DSSA is populated with reusable software components. From the preceding description, it is clear that developing a domain model and software architecture is a complex and time consuming process. The DSSA for adaptive intelligent systems developed by Hayes-Roth et al. [HPLM95] reportedly took several person-years to design, implement, test, debug, and document. But Hayes-Roth et al. also note that there is only marginal additional costs for developing software that, in addition to meeting its functional objectives, is reusable. Furthermore, the additional cost must be weighed against the cost-savings of subsequent reuse.

Domain-Specific Software Architectures for Computational Science Problems

As stated earlier, DSSA technology has rarely been applied to computational science problems. One of the distinguishing characteristics of these problems is their experimental nature. In many cases, this nature makes it impossible to form complete specifications of desired functionality, requirements, constraints, or problem description. In fact, this proves to be one of the limiting factors in successfully applying problem-solving environments to computational science problems. What are the prospects, then, of applying DSSA technology to these same problems? This section briefly speculates on the answer to that question.

First, domain-specific software architectures and problem-solving environments are fundamentally different. DSSAs primarily address software development concerns whereas PSEs address issues of software use and usability. The connection between the two is that by applying DSSAs to computational science, someday it may be possible to build software that can address software use and usability issues for this type of problem.

Second, by definition, DSSAs (unlike PSEs) do not result in "point solutions" [TTC95]. In fact, Might [Migh95] claims that a key function of domain models is "to systematically study the impact of alternative strategies and policies." Much of the experimentation in computational science involves just that-alternative strategies and policies for simulating, parameterizing, and analyzing computational models of physical phenomena. Even though Might targets software development supporting business-oriented processes (in the healthcare industry, in particular), his comments apply equally well to computational scientists in search of a scientific result, but lacking an obvious solution for attaining it.

Third, as we describe above, domains are relative. In particular, domains vary in depth and in level of detail and abstraction. A reasonable assumption is that the quality of a domain has direct bearing on the quality of resultant domain-specific software architectures, not to mention the quality of applications derived from that architecture. In many cases, though, while the science being conducted is experimental, the computational problems are not. The computational problems may change and evolve over time, but they do so within a common set of terminology, concepts, assumptions, and goals. They occur within a common domain.

Thus, computational science problems appear amenable to domain-specific software architectures. Anglano, Schopf, Wolski, and Berman [ASWB95] describe a system for hierarchically describing and characterizing heterogeneous, computational science applications that execute on high performance systems. Their system, Zoom, is not a domain-specific software architecture, but it does have several similarities to domain modeling and software architecture. We briefly explore the differences and similarities below.

There are three major properties that distinguish Zoom from a DSSA for computational science metacomputing. The first major difference is that Zoom does not explicitly target computational science domains. Instead, Zoom addresses the somewhat broader and differently focused "domain" of heterogeneous applications. In this way, Zoom does not directly facilitate the integration of domain knowledge into software and tool systems. Second, given its domain, Zoom is naturally concerned primarily with performance. As a result, Zoom characterizes applications at a lower level than what might be envisioned for domain-specific metacomputing. The reason for this is that the applications targeted by Zoom do not assume the existence of a computational infrastructure that handles requirements like resource allocation, scheduling, and monitoring. Finally, Zoom is primarily a mechanism for representing and reasoning about applications. Thus, unlike a software architecture, it does not support the subsequent instantiation of those applications.

Despite these differences, Zoom has much in common with the DSSA process. At its highest level, Zoom serves as a "domain of discourse" between computer scientists and computational scientists [ASWB95], making it consistent with the first two stages of the DSSA process, as described by Taylor et al. [TTC95]. Next, Zoom identifies computational requirements, constraints, and options with respect to implementations, data format and structure conversions, and performance trade-offs. This corresponds to the third phase of the DSSA process. It also echoes Might's [Migh95] claim that domain models should support consideration of different implementation options. Just as Taylor et al. [TTC95] describe the fourth phase as an iterative process for defining a domain architecture at successively lower levels of abstraction, Zoom supports a hierarchical representation that reveals increasing amounts of application detail at each of three levels. Zoom also tries to address one of the notable shortcomings of the object-oriented systems described earlier: tool support. At its deepest level, Zoom provides information necessary for program development tools and the "accurate cost models required for optimization and scheduling" [ASWB95]. Anglano et al. hope to use Zoom as an interface to a suite of tools for developing and managing heterogeneous computing applications. DSSAs also strive to facilitate the development of tools in addition to applications [HPLM95]. Finally, Anglano et al. propose a well-defined, (mostly) graphical notation for describing applications that is similar to the block diagrams used during the DSSA process [TTC95] and in software architecture in general ([SG96], pp. 160-163).

Zoom provides a compelling glimpse into the application of DSSA technology to high performance computing and computational science. But what are the possible advantages to using a DSSA approach over object-oriented class hierarchies and object-oriented systems for parallel computation?

Conclusion

The most obvious benefit of DSSAs is that they define methodologies for integrating domain knowledge into software systems. Object-oriented systems like Norton's plasma simulation [NSD95], POOMA [ABCH95, RHCA97], and POET [AM94] use ad hoc, informal methods for incorporating domain-specific concepts into user applications or their own software. Furthermore, these systems do not address the more challenging problem of supporting computation within a metacomputing environment.

Systems like Legion [GW96] and Globus [FK96] strive to support generic metacomputing capabilities. In addition, interoperability of components within these systems is not reinforced by the careful modeling and analysis encouraged by software architecture. Together, these traits make generic metacomputing systems susceptible to the same poor reusability exhibited by other general software. With respect to domain knowledge, some of the object-oriented systems described earlier allow it to be reflected in application programming languages. However, reflecting that same knowledge in associated tools (e.g., debuggers, performance analyzers, visualizers) is difficult and not always possible using ad hoc methods. Not surprisingly, domain-specific tools are rare. A lack of incorporated domain knowledge in both languages and tools allows the relatively poor usability of parallel and distributed systems to persist.

In summary, domain-specific software architectures address both problems of usability and reusability. Domain modeling offers a well-defined process for collecting, organizing, and applying domain knowledge to the software construction process. And software architecture encourages careful consideration of how software is structured as components and connectors, supporting several nonfunctional software requirements like reusability, interoperability, and extensibility. Ultimately, applying domain-specific software architecture technology to metacomputing for computational science results in applications and tools that appear cognizant of a scientist's domain. Such systems may be similar to domain-specific environments, a topic of the next chapter.

Domain-Specific Metacomputing for Computational Science:
Achieving Specificity Through Abstraction
By Steven Hackstadt

Last modified: Wed Nov 5 08:15:11 1997

Steven Hackstadt / hacks@cs.uoregon.edu
http://www.cs.uoregon.edu/~hacks/