Domain-Specific Metacomputing for Computational Science:|
Achieving Specificity Through Abstraction
By Steven Hackstadt
The most incomprehensible thing about the world is that it is comprehensible.
- Albert Einstein (1879-1955)
Science cares little about philosophical conundrums-even if proposed by one of its greatest contributors. Incomprehensible or not, science is in a relentless pursuit of comprehensive knowledge, constantly striving to characterize the unknown, understand the mysterious, and explain the obvious. Mankind's scientific knowledge, while relentlessly pursued, is far from comprehensive even though in almost every discipline, scientists have at their disposal a vast array of technology to assist them. But science and technology, while often equated, have a complex relationship, for it is often the case that science results in technology that is of essential use to other science. And while technology certainly has broad implications and applications outside of science, the self-perpetuating relationship between the two is central to mankind's relentless pursuit of the incomprehensible.
Computational Science Research: An Evolution of Computer Science
Without question, one of the greatest technological achievements of this century has been the computer. With simulation now an adjunct to experimentation and theory as paradigms of science [NCO96], scientists increasingly need to use the computational power of high performance computers to work on very large problems. To the scientist, though, computers are only as good as the problems they can solve or address. The area of computational science is primarily concerned with facilitating scientists' ability to solve or simulate large scientific problems with computers.
Recent advances in computing techniques and technology, coupled with new models for scientific phenomena, are fueling a revolution in the way science and engineering are performed. The impact of this revolution is still in its infancy, though, in part because access to high performance computers is relatively scarce and because few people (among scientists, in particular) possess the skills to use such machines [NSF93]. Thus, computational science may be described as a combination of computer science and traditional physical sciences, with the expected side-effect that scientists become more computer-savvy and computer scientists become more science-savvy. In the computer science community, two trends have been instrumental in motivating the focus on computational science research.
First, during the past century, society has benefited so significantly from scientific achievements that the nature of science is, as Rice declares, shifting from "science for science's sake, to science for society's sake" [Rice95]. That is, government is increasingly expected to ensure that the research it funds with taxpayers' dollars will generate useful results. In this atmosphere, social and government concerns for health, education, security, environment, economics, etc. are becoming the motivating factors in funding decisions.
Second, computer science is a relatively new "science." As such, it has been building its own foundations. Rice argues that those foundations are now laid, and that computer science must now become more "outward looking" and challenge itself with solving real problems [Rice95].
These trends are having a profound effect on the nature of computer science. The new focus on computational science will contribute greatly to the benefit that computer science will have on society in the next century.
Solving "Big" Problems
Computational science spans a broad range of scientific areas, including computational fluid dynamics, geology, material science, mathematics, mechanics and structural analysis, molecular modeling and quantum chemistry, and physics. Problems represented by these areas are "big," as characterized across several dimensions. Problem size, simulation complexity, and required execution time each impact the computational requirements of a given application. The following application descriptions exemplify these requirements.
Mechanics and Structural Analysis. One application of supercomputers in this area is to analyze automobile crashworthiness. Models consisting of 100,000 to 250,000 unknowns that require 20 hours of Cray C90 computer time to solve often prove too coarse to generate accurate predictions, requiring verification by actual vehicle crash tests [NSF93]. Indeed, computational requirements are often so large that resolution and consideration of important physical phenomena are omitted in exchange for faster results.Many problems in these areas and others have been designated Grand Challenges, which are computation-intensive, "fundamental problems in science and engineering with broad economic and scientific impact whose solution can be advanced by applying [high performance computing and communications] technologies" [NCO96].
Molecular Modeling and Quantum Chemistry. Computer models are used to understand the effects of carcinogens on the structure of DNA. In one simulation, it can take a Cray Y-MP nearly six days to model the interactions of 3,500 water molecules and 16 sodium ions for just 200 trillionths of a second [NSF93]. Hence, problems are often so complex that even the briefest simulation can take an inordinate amount of time to complete.
Computational Fluid Dynamics. Numerical equations of mass, momentum, and energy simulate the evolution of fluids. Such a simulation would typically take place in three dimensions, each divided into, say, 1,000 cells. If each cell requires 100 floating point operations per time step and the simulation were to last 25,000 time steps, the entire simulation would require 2.5 quadrillion (2.5 x 1015) floating point operations. And if a scientist requires the results in two hours, the computing system must sustain 300 billion (3 x 1011) floating point operations per second (commonly expressed as 300 gigaflops) [NSF93]. Thus, applications with strict time requirements can rapidly drive up the performance requirements of the computing hardware.
A major question facing application and computer scientists is how best to apply such technologies to these problems. The computer science community has long envisioned problem-solving environments (PSEs). Gallopoulos, Houstis, and Rice [GHR94] describe a PSE as "a computer system that provides all the computational facilities necessary to solve a target class of problems... [including] advanced solution methods, automatic or semiautomatic selection of solution methods, and ways to easily incorporate novel solution methods...." While the technology of problem-solving environments holds promise for well-defined, well-understood problems with standardized solutions, it does not adequately address the leading-edge, experimental nature of most Grand Challenge problems [GHR94]. Even so, the general concept of a comprehensive computational environment is appealing as a means of improving scientists' access to high performance computing systems.
More recent work [CDHH96, BRRM95] has taken a different approach. Rather than provide a specific computational capability that potentially spans many computational domains (e.g., solving partial differential equations), domain-specific environments (DSEs) seek to provide comprehensive computational abilities within a single domain. That is, a DSE seeks to address "requirements that are unique to [a particular] application domain" through a collaboration between application and computer scientists [CDHH96]. While their approaches are orthogonal to each other, DSEs and PSEs share many of the same motivations and characteristics. Central to both technologies is achieving high performance through the use of parallel computing systems.
Pancake points out that while parallel computing has been around in various forms since the mid-1970s and current systems are "undeniably powerful," the large-scale transition to parallel computing has been slow because of a lack of software support [Panc91]. The relatively recent appearance of PSE and DSE software that supports parallel computing capabilities with the scientist in mind at first appears to corroborate this claim. But it is slightly more complicated than that because today's proponents of PSEs claim that while attempts to build PSEs in the 1960s and 1970s were thwarted because "technology could not yet support PSEs in computational science," today's "high-performance computers combined with better understanding of computing and computational science have put PSEs well within our reach" [GHR94]. Essentially, each blames the other. That is, parallel systems were not adopted because adequate software support was not available, but those software environments could not be developed, in part, because machines did not offer enough performance. We are simply considering different sides of the same coin, implying that achieving good application performance requires both robust software and high performance machines. Pancake explains the nature of that coin further [Panc91]:
The audience for high-performance computing is not the computer science community, but scientists, engineers, and other technical programmers whose computational requirements exceed the capacities of even our fastest sequential machines. They have turned to parallel processing because their problems are too big, or their time constraints too pressing, for conventional architectures. As Ken Neves of Boeing Computer Services puts it: "Nobody wants parallelism. What we want is performance."Delivering "Big" Performance
As computer systems offer more performance potential, they become more appealing to scientists with "big" problems to solve. According to the United States Department of Energy, if current trends in high performance computing (HPC) continue, sustainable teraflop performance (that is, a trillion floating point operations per second) will be achieved by the year 2002, but machines capable of sustaining hundreds of teraflops will not appear until roughly 2025 [DOE96]. With computational science and Grand Challenge problems already rapidly approaching teraflop-level performance, the U.S. Government has launched the Accelerated Strategic Computing Initiative (ASCI) to address the "full system, full physics" problems surrounding virtual testing and prototyping capabilities for nuclear stockpile stewardship [DOE96]. This program has stimulated U.S. high performance computer manufacturers to create more powerful machines. A teraflop system is already in place at Sandia National Laboratory, and systems capable of more than three teraflops each will be sited at Lawrence Livermore and Los Alamos National Laboratories within the next two years. Sustainable performance of hundreds of teraflops is expected by 2003, bucking current trends by more than twenty years.
For many computational science problems, a machine is rarely too powerful. Being able to perform longer simulations on larger problem sizes at higher resolutions is certainly not considered a bad thing among scientists. The primary issue, though, is cost-effectiveness. Together, the first three ASCI systems will cost a total of about $250 million1. Obviously, cost-effectiveness is not the primary goal of the ASCI program. But, cost-effective performance is now an important goal for the business-oriented computing market. And cost-effectiveness is a motivating factor behind other trends in high performance computing.
In particular, researchers feel there are additional ways of significantly improving application performance. For example, by using existing resources more efficiently and increasing access to spare computational cycles, researchers claim that tremendous amounts of computing power can be harnessed [BP94, GNW95, SKS92, ZWZD92]. Headed in this direction is the area of heterogeneous network computing, which has as its goal the use of a collection of autonomous computers to solve one or more computational tasks concurrently ([Esha96], p. 4). Work in metacomputing extends this notion in both scale and overall system cohesiveness. These areas are discussed in considerable detail later. The most successful approach will undoubtedly be the one that can offer cost-effective increases in performance and simultaneously take the greatest advantage of existing, stand-alone, high performance systems. In fact, the ASCI program recognizes this and is ultimately looking to university research to develop the long-term solutions needed to bridge the gap between the individual teraflop systems being built today and the hundred-teraflop system expected by 2003.
Building "Big" Software
Unfortunately, raw machine performance figures are, for all intents and purposes, unattainable by real applications2. The input/output, synchronization, and communication costs of parallel and distributed programs can degrade real application performance significantly. Consequently, achieving "big" performance requires more than just bigger, faster computers. Concerted efforts at improving the performance of application codes, algorithmic kernels, and system software are also required. The ASCI project, for example, expects a tenfold performance increase from improvements in system software alone. Similar improvements in the application codes themselves are also expected [DOE96].
Constructing the software systems for the ASCI platforms is challenging enough; tuning and optimizing them for performance is even harder; but uniting the three teraflop systems (and others) into a cohesive metacomputing environment eventually capable of sustaining hundreds of teraflops is no less than a monumental undertaking. Such a system must support application development, debugging, performance evaluation and tuning, archival data access, and a host of other application- and programmer-oriented services-all with performance being paramount [SDA96]. This is "big" software. And while ASCI may be at one extreme of the high performance computing spectrum, there are numerous other research labs, universities, and companies seeking more moderate increases in performance simply by improving the utilization of their existing computational resources in a similar manner.
Big software must be designed and built in a methodical, organized manner. However, among the parallel and distributed computing research community, software has largely been built on a very ad hoc basis [Panc91, Panc94]. Furthermore, the tools and environments that have been produced, with a few notable exceptions, have gone largely unused by programmers and scientists. Finally, history has shown that industry simply can not be relied upon to produce high-quality, useful tools and environments for parallel and distributed systems [Panc94], in part because the market is just too small compared to the massive personal computer software market. Pancake claims that just one in fifty research tools and one in twenty commercial tools can be deemed "successful."3
What, then, is going to happen as researchers begin extending the computational software environment to include, for instance, metacomputing? Fortunately, the groups carrying out the preliminary efforts in this area appear to have at least some understanding of the more general software design and development issues posed by these "bigger" systems [FK96, GW96]. Grimshaw recognized this point in the early days of his work in this area [GNW95]:
The issue is not whether metasystems will be developed; clearly they will. Rather, the question is whether they will come about by design and in a coherent, seamless system-or painfully and in an ad hoc manner by patching together congeries of independently developed systems, each with different objectives, design philosophies, and computation models.True to those words, current implementations appear to be emphasizing, to varying degrees, the use of "frameworks" [AM94, DGMS93, Sund96] and "toolkits" [FK96], "object-oriented" design [GW96, ABCH95], and building software from "components" [BWFS96].
Interestingly, these terms and concepts are more often seen in the field of software engineering than high performance computing. While the techniques of software engineering are often embraced by business and industry, other areas of computer science have seemed less inclined to adopt them. Perhaps this is changing as the HPC community is increasingly faced with building "big" software like that required for metacomputing environments. Indeed, this represents an interesting research challenge.
Software engineering has a fair body of work that could benefit the HPC community in this regard. For example, software composition studies how software can be built as a collection of individual components [AB96, GAO95, NM95]. Other work explores issues of interoperability among independent pieces of software [Heil95, Mano95, PSW91, Purt94]. Object-oriented approaches have been highly touted as improving productivity [MN96]. Many of these topics are encompassed in the emerging area of software architecture [SDKR95, SG96].
According to Shaw and Garlan [SG96], software architecture "involves the description of elements from which systems are built, interactions among those elements, patterns that guide their composition, and constraints on these patterns." One area in particular, domain-specific software architectures (DSSAs), may hold particular promise in constructing metacomputing support for computational science problems [HPLM95, TTC95]. The core idea of this area is to tailor the organizational structure of the software to a particular family (or "domain") of applications. While the topic of software architecture is resumed later, it is important to note at this point that a fundamental requirement for a domain-specific software architecture is, of course, a domain. That is, DSSAs apply to classes of applications that share concepts, terminology, data types, computational structures, etc. It is not surprising, then, that DSSAs potentially have much in common with domain-specific environments. Their commonality lies in a pervasive consideration of the nature of the problems to be solved by the software environment. While DSSAs focus on an appropriate design and development methodology, DSEs tend to focus on meeting functional, nonfunctional, and implementational requirements. As we discuss later, a combined approach holds particular promise.
The "Big" Picture: A Research Challenge
The previous sections have sketched a unifying path through the areas of computational science, parallel and distributed computing, and software engineering. Along this path, a variety of "technologies" that hold promise in addressing the requirements of "big" computational science problems have been mentioned. The nature of this path and the dependencies among the technologies encountered along it are central to the focus of this paper. Figure 1 depicts these relationships.
FIGURE 1. Domain-specific metacomputing for computational science is defined as the confluence of three major areas of computer science.
Starting with the area of computational science, we discover domains like computational fluid dynamics and material science with large problems that require increased performance to be solved or simulated. In search of performance, we look to the area of parallel and distributed computing. Evolving technologies like scalable clustered computing, heterogeneous network computing, and metacomputing can potentially offer substantial, cost-effective increases in application performance. Building software for these domains and computing systems requires improvements over the existing, ad hoc methods of software design and construction used in this community. Looking to software engineering, we find emerging technologies in the area of software architecture that might aid in this endeavor. By defining the software in terms of components and interactions, we are better able to consider its overall structure and topology. But defining a completely generic model might constrain our ability to meet other nonfunctional requirements such as performance or resource utilization. As a result, restricting software development to specific domains is often advantageous, which leads us back to computational science, an area replete with problem domains in search of software solutions.
Collectively, the technologies from each of these areas constitute a new, crosscutting area which we term domain-specific metacomputing for computational science. Each technology seems to contribute to the common goal of high performance computing for computational science, though no one technology fully addresses the scope of the problem. Thus, these three foundational areas-computational science, parallel and distributed computing, and software engineering-yield a number of technologies that appear promising to the area of domain-specific metacomputing for computational science. To better understand this new area, we must identify the area's basic requirements, determine how the emerging technologies may best address those requirements, and explain how those technologies have evolved and converged to form the area of domain-specific metacomputing for computational science.
Domain-Specific Metacomputing for Computational Science:|
Achieving Specificity Through Abstraction
By Steven Hackstadt