Hackstadt's Oral Comprehensive Exam Position Paper: Chapter 3

Domain-Specific Metacomputing for Computational Science:
Achieving Specificity Through Abstraction
By Steven Hackstadt

CHAPTER 3
Foundations in Parallel and Distributed Computing

The foundations of domain-specific metacomputing for computational science are in parallel and distributed computing. This chapter seeks to motivate the development of metacomputing technology, to identify its key challenges, and to evaluate the major contributions in this area. Our comments primarily focus on metacomputing from the systems and performance perspectives, though we also discuss software organization and implementation methodologies where applicable.

The Convergence of Parallel and Distributed Computing

Parallel computing and distributed computing have, in the past, been mostly distinct areas of research and development. Parallel computing, characterized by the development of closed hardware systems containing multiple processing units, the software and tools for using such systems, and efficient parallel algorithms, has had the exclusive goal of performance. Distributed computing, on the other hand, has been primarily concerned with the sharing of resources among a set of interconnected computers for purposes of performance, reliability, scalability, and security ([CDK94], p. 30). Whereas parallel computing has targeted computationally-intense, numeric applications, distributed computing has focused more on interactive, multi-user computing environments that support a broad range of computing activities.

The advent of high-speed networks and more ubiquitous parallel computing, however, is driving a convergence of parallel and distributed computing, making it increasingly difficult to distinguish between the two. For example, as network speeds increase, clusters of single-processor workstations and personal computers have become a viable platform for parallel computing. Conversely, multiprocessing is evolving from an esoteric technology for achieving high performance on expensive, high-end machines to a more wide-spread, mainstream technology for achieving better performance in distributed computing environments. An excellent example of this convergence is the rising popularity of shared-memory multiprocessors (SMPs) for general purpose computing. While they are clearly parallel computers, SMPs are commonly used as server machines in distributed systems ([CDK94], p. 45-46). The modest parallelism of most SMPs allows these servers to respond simultaneously to multiple service requests, reducing the likelihood that the service becomes a bottleneck. Furthermore, SMP's support of shared-memory allows the efficient implementation of interprocess communication and processor allocation on such machines. While advantageous for distributed systems, shared-memory multiprocessors also have tremendous performance potential. In fact, the 3-teraflop system solicited by the Accelerated Strategic Computing Initiative (ASCI) program will be a cluster of SMP machines [Wood96, LLNL97].

While parallel and distributed computing each maintain their own, unique research directions, their convergence gives rise to many new research challenges. One of these challenges, metacomputing, is the focus of this chapter. Foster and Kesselman [FK96] effectively describe some of the relationships between metacomputing and parallel and distributed computing:

Metacomputers have much in common with both distributed and parallel systems, yet also differ from these two architectures in important ways. Like a distributed system, a networked supercomputer must integrate resources of widely varying capabilities, connected by potentially unreliable networks and often located in different administrative domains. However, the need for high performance can require programming models and interfaces radically different from those used in distributed systems. As in parallel computing, metacomputing applications often need to schedule communications carefully to meet performance requirements. However, the heterogeneous and dynamic nature of metacomputing systems limits the applicability of parallel computing tools and techniques.

Thus, while metacomputing can certainly build on the foundations of parallel and distributed computing, significant advances in infrastructure, methods, and tools are still required. Because our agenda is somewhat broad, it is beyond the scope of this work to explore all of the research challenges posed by metacomputing. We seek to reveal and describe the important role that metacomputing can play in computational science. To this end, we briefly provide some foundations for the area and then survey representative examples of work most relevant to domain-specific metacomputing for computational science. This is followed by discussions of some specific parallel and distributed computing issues.

Heterogeneous Computing

Heterogeneous computing (HC) exists at the convergence of parallel and distributed computing and provides many of the foundations of metacomputing. As defined previously, heterogeneous computing seeks to coordinate the execution of a collection of diverse computers to execute computationally demanding tasks. HC inherits parallel computing's concern for performance, but it must also exhibit a strong consideration for certain distributed systems issues. For example, if the proper task allocation for the program in Figure 2 in Chapter 2 is not made, little or no performance gain is realized.

To better distinguish between the different types of HC, various characterizations and taxonomies have been proposed. Heterogeneity can manifest itself in many ways. Temporal heterogeneity occurs when a system can execute in different execution modes (e.g., SIMD, MIMD) at different times, while spatial heterogeneity occurs when different machines in a system execute in different modes at the same time [ETM95]. In an attempt to capture the distinction between these two types of heterogeneity, Eshaghian ([Esha96], p. 2-4) proposes the taxonomy in the upper part of Figure 7. In system heterogeneous computing (SHC), a single multiprocessor computer executes one or more tasks in both SIMD and MIMD modes. In multimode SHC, SIMD and MIMD execution can take place simultaneously, but in mixed-mode SHC, the execution mode of the whole machine is switched between SIMD and MIMD. SHC machines tend to be specialized systems (e.g., the Image Understanding Architecture ([Esha96], p. 67-97)) and are not intended for general purpose numerical computation. On the other hand, network heterogeneous computing (NHC) consists of a collection of interconnected machines that can execute one more tasks concurrently. When all the machines in the system are identical, it is called multimachine NHC. When at least one of the machines is different, mixed-machine NHC occurs.

FIGURE 7. The relationship between Eshaghian's taxonomy and the EM3 taxonomy of heterogeneous computing.

Unfortunately, Eshaghian's classification is not complete. For example, how does one classify a heterogeneous system that consists of several identical workstations and a single multimode computer that supports simultaneous SIMD and MIMD computations? The problem is that the classifications within SHC and NHC consider different features. In particular, NHC considers machine type, but neglects the number of execution modes supported. SHC does the exact opposite.

The EM3 taxonomy [ETM95] offers an improvement over Eshaghian's approach by independently considering two dimensions of a heterogeneous computing system: execution modes and machine models. Execution mode represents the type of parallelism supported by a machine (e.g., vector, SIMD, MIMD), while machine model considers the architecture and performance of the machines in the HC system. In the spirit of Flynn's Taxonomy ([HP90], p. 572), each dimension is classified as either single or multiple, yielding the categories shown in the lower part of Figure 7. To improve the comparison with Eshaghian's taxonomy, the EM3 taxonomy has been arranged as an inverted tree (instead of a simple 2x2 grid).

In the EM3 taxonomy, SESM systems consist of a single machine architecture which support a single execution mode. SESM systems include uniprocessors as well as parallel or distributed systems that support a single execution mode and are composed of identical processors or machines. SEMM-class systems include, for example, networks of different workstations that each support the same execution mode. Finally, MESM and MEMM systems consist of machines that individually can support multiple modes of parallelism. MESM-class systems consist of a single machine type, while MEMM-class systems are composed of different types of computers.

The relationships between the Eshaghian and EM3 taxonomies is shown in the middle of Figure 7. While Eshaghian's taxonomy does identify how multiple execution modes are manifested within a single machine (i.e., multimode or mixed-mode), the resulting classes of machines do not fully address the range of possibilities in a heterogeneous system. The Eshaghian taxonomy makes two (implicit) assumptions that limit its usefulness. First, it assumes that if a HC system is composed of a single machine, that machine must exhibit multiple execution modes. Second, it assumes that if multiple machines are used in a HC system, none of the individual machines are capable of multiple execution modes. The first assumption is at least partially valid in that without it, non-heterogeneous systems are possible (e.g., a single workstation or a shared-memory multiprocessor). However, the second assumption is clearly limiting in that it certainly omits valid heterogeneous systems. In particular, MEMM-class machines do not have an obvious classification in Eshaghian's taxonomy.

Yet, MEMM-class systems (i.e., heterogeneous systems that support multiple execution modes on multiple machine types) represent the most flexible metacomputing environment for computational science. Metacomputing is less concerned with specialized architectures capable of supporting multiple modes of parallelism. Such machines are usually research prototypes ([Esha96], p. 34-43) and require non-standard, specialized programming languages ([Esha96], p. 43-49). These characteristics are not consistent with the goals of metacomputing. Rather, metacomputing seeks to support multiple forms of embedded parallelism by allocating tasks to an appropriate machine if available. Furthermore, support of a unified programming model is simplified when the machines that make up the heterogeneous system support standard programming languages, system libraries, and tools.

Metacomputing Challenges

Earlier, metacomputing was distinguished from heterogeneous computing as providing larger-scale and more cohesive computational support. Thus, metacomputing faces all of the same challenges as heterogeneous computing. While general consideration of heterogeneity dates back to 1987 [NHSS87] and specific methods (e.g., remote procedure call) for accommodating heterogeneity date back much further, it was not until the early 1990s that the potential advantages of heterogeneous computing came to light. Many of those advantages have already been discussed in earlier chapters.

But along with those advantages come several challenges. Khokhar et al. [KPSW93] identify several that are part of the process of heterogeneous computing. First is algorithm design. Heterogeneous computing opens up unique opportunities for algorithms. Designers must consider the types of machines available in a given heterogeneous system, alternate solution methods to application subproblems, and the cost of network communication. Next, code-type profiling attempts to identify the types of parallelism (e.g., vectorizable, SIMD/MIMD parallel, scalar, etc.) exhibited by the phases of an application and to estimate the execution time of each. Once profiled, analytical benchmarking indicates which machines are most appropriate to each phase of the code. Following this, the code must be partitioned and mapped onto the heterogeneous system. Matching code types to machines and dealing with code and data conversions among different machines complicate these tasks. An appropriate selection of machines must then be made. (A given application doesn't necessarily use all the available machines.) Scheduling of the application modules takes place at several levels (e.g., whole system, jobs, active sets, and processes). During execution, issues of synchronization must be addressed, including coordination between the senders and receivers of messages and controlling access to shared objects. Attention must be given to interconnection requirements so that computation and communication speeds are matched. Finally, programming environments must support portable and parallel languages, cross-parallel compilers, parallel debuggers, configuration management, and performance evaluation.

Siegel et al. [SDA96] identify a similar three-stage process for building applications for a heterogeneous system. The process begins by comparing the computational requirements of the application with the machines available in the HC system. Next, task profiling and analytical benchmarking break the application into homogeneous subtasks (with respect to their computational requirements) and quantifies how each of the machines may perform on the tasks. In the third stage, execution times for each task are derived, including consideration of potential communication costs, system load, and network traffic. The tasks are then assigned to machines according to an execution schedule. In the last stage, the application is executed. As the system is monitored, rescheduling or relocating certain tasks may be necessary.

These characterizations of heterogeneous computing are intended to act as a backdrop for the remainder of this chapter as the major contributions to metacomputing are surveyed. In addition, we revisit some of these topics when we discuss particular issues and challenges for metacomputing later in this chapter.

Metacomputing Research

The First Metacomputer: The NCSA Metacomputer

The notion of metacomputing was first popularized by Smarr and Catlett in a 1992 article [SC92]. Oddly enough, their article appeared among a collection of work appearing at the ACM SIGGRAPH '92 Annual International Conference on Computer Graphics and Interactive Techniques. What, one may ask, does metacomputing have to do with computer graphics? Smarr and Catlett were primarily concerned with the use of scientific visualization in networked environments for purposes of computational steering and remote collaboration. The conference showcased a collection of projects representing the "state of the art in networked visual computational science" [Hart92]. Among those systems was the NCSA metacomputer.

System Description. Smarr and Catlett envisioned a metacomputer that could be used as easily as a personal computer (PC) and supported this vision with a simple analogy. One can think of a PC as a "minimetacomputer" consisting of a general purpose processor, a floating-point processor, an I/O "computer," audio and graphics chips, etc. "Like the metacomputer, the minimetacomputer is a heterogeneous environment of computing engines connected by communication links" [SC92]. Khokhar et al. [KPSW93] take a similar view of heterogeneous computing in general, also noting the development and subsequent widespread use of specialized processors for I/O and floating-point processing. Woodward [Wood96], too, suggests a macro view of clustered SMPs where a single SMP and its shared memory play the roles of a CPU and a CPU cache, respectively. The replication of such views is certainly suggestive of the evolution of a new level in the computing hierarchy.

Suppose that for a certain application, a user requires a desktop workstation, a remote supercomputer, a mainframe supporting mass storage, and a specialized graphics terminal. Smarr and Catlett [SC92] explain the relevance of metacomputing to such an environment:

Some users have worked in this environment for the past decade, using adhoc, custom solutions, providing specific capabilities at best, in most cases moving data and porting applications by hand from machine to machine. The goal of building a metacomputer is elimination of the drudgery involved in carrying out a project on such a diverse collection of computer systems.

Smarr and Catlett identify three stages in building metacomputer systems. The first stage involves the interconnection of hardware resources by high-speed networks and the software integration required to make the environment seamless (e.g., distributed file systems, user access to remote resources). The next stage focuses on splitting up application codes so they can run on heterogeneous systems-again, primarily a software solution. Finally, Smarr and Catlett call for access to a "transparent national network" that addresses issues of administration, file systems, security, and accounting [SC92].

Smarr and Catlett also identify three general capabilities supported by metacomputing. As mentioned earlier, theoretical simulation allows scientists to perform numerical experiments in the artificial world of a high-performance computer system unconstrained by time and space. Metacomputing can also support access to and control of scientific instruments and sensors, enabling near real-time observation and numerical analysis. Finally, data navigation in a metacomputing environment supports remote access to large data stores (created in part by the previous two capabilities) and the computational power necessary to present and explore them interactively. Similar application areas are the targets of other metacomputing work [FK96, FT96, GW96] and distinguish metacomputing from heterogeneous computing.

Evaluation. Smarr and Catlett can be largely credited with first promoting the idea of metacomputing. They describe one of the first real metacomputing environments, which was built at the National Center for Supercomputing Applications (NCSA) [SC92]. The NCSA metacomputer was truly diverse, integrating massively parallel Thinking Machine CM-2 and CM-5 machines, vector processors like Cray-2, Cray Y-MP, and Convex systems, and superscalar IBM RS/6000 and Silicon Graphics VGX multiprocessors. The systems were interconnected by a gigabit local area network (LAN) with access to file servers, tape robots, and RAID systems. In addition, high-performance visualization and other multimedia capabilities were supported. In fact, this reveals one of metacomputing's distinguishing factors. Whereas heterogeneous computing integrates a collection of diverse computers, metacomputing environments seek more comprehensive computational support by providing access to file systems, mass storage, visualization, and other I/O devices. The NCSA metacomputer was one of the first to exemplify this notion.

Furthermore, the NCSA metacomputer was instrumental in both identifying the importance of software support for metacomputing and providing an example of such support. But NCSA built their software environment primarily from existing software technology like Parallel Virtual Machine (PVM) [DGMS93, Sund96]. PVM is discussed in more detail in the next section, but as earlier chapters predicted, the use of existing software technology proved difficult. As a result, current metacomputing projects appear to be less ambitious in this regard [FGNS96, FK96]. Some projects originally had the intention of using existing technology [GNW95], but eventually abandoned that goal altogether [GW96]. Even the original developers of PVM have found it necessary to build higher-level software to better support appropriate metacomputing abstractions [DGMS93, BDGM96, Sund96]. We explore this issue in more detail as we survey each of the projects below.

Metacomputing From Existing Technology: PVM and HeNCE

Parallel Virtual Machine, better known simply as PVM, is an important exception to the trend of unadopted tool technology in the parallel and distributed computing community [Panc94]. PVM is largely responsible for bringing network-based parallel computing to the masses. That is, PVM was one of the first software systems that allowed a heterogeneous collection of machines, including a simple collection of workstations, to work together on a single computational task. Moreover, it is a relatively small system that is easily installed by the user [DGMS93]. Originally released in 1991, PVM has been consistently updated and improved.

System Description. PVM is essentially a message-passing system. An application consists of several tasks, each of which can exploit the machine architecture best suited to its solution. (An example of this approach appeared earlier in Figure 2.) Each user creates their own virtual machine from the available computational and network resources. Logically, this virtual machine is a single, large distributed memory computer, and the tasks running on the virtual machine must communicate with each other through messages with explicitly typed data. PVM handles all data format conversions that may be required between two machines that use different data representations. In this way, PVM supports heterogeneity at three different levels: application, machine, and network [DGMS93].

With respect to application-level heterogeneity, tasks can be assigned to computational resources in one of three ways. In the transparent mode, PVM attempts to locate tasks on the most appropriate machine. In architecture-dependent mode, the user assigns an architecture type to a task and relies on PVM to find an appropriate machine of that type to execute the task. Finally, in machine-specific mode the user actually assigns particular machines to each task. PVM routines also support process initiation and termination as well as various communication and synchronization primitives (e.g., broadcast, barrier, rendezvous).

Thus, PVM provides many of the low-level mechanisms necessary for heterogeneous computing, but it falls considerably short of supporting the vision of metacomputing for computational science. The developers, of course, recognize this. In an effort to "make network computing accessible to scientists and engineers," they are constructing the Heterogeneous Network Computing Environment (HeNCE) on top of PVM.

The goal of HeNCE is to simplify the tasks of "writing, compiling, running, debugging, and analyzing programs on a heterogeneous network" [BDGM96]. HeNCE consists of several tools: compose, configuration, build, execute, and trace. Each of these is briefly described here. Using the compose tool, the user specifies the parallelism in their application as a graph, where nodes represent subroutines and arcs represent control and data flow. The configuration tool is used to create the virtual parallel machine by indicating which hosts are to participate, with what priority, and for which tasks. The build tool generates the parallel program, compiles it for each node type in the virtual machine, and installs the executables on each machine. The execute tool starts up PVM and runs the program, allocating tasks per the information given to the configuration tool. Finally, the trace tool provides some rudimentary visualizations of network status and the graph representation of the parallel program.

Obviously, the environment has fairly limited capabilities in terms of the range of parallel programs it can create, and for the most part, the successful application of HeNCE to experimental computational science problems seems unlikely. Nonetheless, the developers have done something fairly unique by creating a higher-level abstraction to parallel computing in a heterogeneous environment. The programmer (scientist?) need only write the core subroutines for a computation or simulation. The rest of the application logic is specified through the graph abstraction, hiding the underlying PVM calls and much of the program's control structure implementation from the user.

Evaluation. As mentioned earlier, PVM was among the first software enabling any form of network computing. In part, it can be credited with establishing network-based computing as a valid alternative to dedicated supercomputers. It remains a practical, immediately available solution that targets small- to medium-scale heterogeneous systems typically consisting of less than one hundred hosts. In many ways, PVM has revealed the limitations that must be addressed by larger-scale systems (like those described below). For example, PVM requires the user to manage message-passing explicitly, to know what machines are available and what their characteristics are, and to have login privileges on each of them.

PVM places an emphasis on practicality and on providing useful technology now. That spirit is shared by HeNCE, which clearly can not solve all application development problems for PVM, though it may provide an incremental improvement for a certain class of programmers. To their credit, the developers of PVM and HeNCE do not claim to provide metacomputing capabilities. Including their work in this discussion is, perhaps, to overstate their goals. Nonetheless, PVM is representative of a software framework that is successfully following the evolution of high performance computing. More recent work is extending PVM to support threads, client-server computing, remote procedure call, agent-based computing, and web-based computing [Sund96]. In general, PVM's interface has become a de facto standard for message-passing and, ironically enough, will likely be supported by future metacomputing systems [GW96].

A Metacomputing Toolkit: The Globus Project

In the discussion so far, a number of implicit issues have been raised that are important to metacomputing. Among those are resource location, resource allocation, and authentication. Resource location is the determination of what computational and network resources are available in the heterogeneous computing environment. Resource allocation is the assignment of machines to tasks. And authentication is the verification of access privileges to a given machine.

In the NCSA metacomputer, all of these were addressed directly by the user. The resource set was fixed; resource allocation was done manually for each application; and access to each resource was assumed. PVM and HeNCE provide an incremental improvement in the area of resource allocation by supporting transparent and architecture-dependent task assignments, but resource location and authentication are still assumed. The limited functionality in these areas is, in some cases, acceptable for small-scale heterogeneous systems, but for systems with more than ten or twenty hosts, keeping track of what each host is named, which hosts are operable at any given time, and the login names and passwords for each host becomes burdensome. In general, we refer to these issues as part of the configuration problem.

The issues identified by Khokhar et al. [KPSW93] and Siegel et al. [SDA96] described earlier do not necessarily dictate how such requirements should be addressed. They simply claim that such issues must be addressed. The vision of supporting metacomputing environments with access to hundreds or even thousands of machines certainly implies that the user can not be relied upon to address issues of configuration. To provide such support is the responsibility of, indeed the very reason for, a full-scale metacomputing environment.

The Globus project is one of two national-scale metacomputing projects. The other project, Legion, is discussed later. Together, these two projects provide the best examples of metacomputing environments, though both are still under active development at the time of this writing. Even so, much of the design and some implementation has been described in the literature. We present an overview of these projects as well as comments about their strengths and weaknesses. Additional observations about these projects are made as we discuss specific metacomputing issues and challenges later.

System Description. The goal of Globus is to confront directly the problems of configuration and performance optimization by first creating a "metacomputing infrastructure toolkit" that provides a set of basic capabilities and interfaces in these areas [FK96]. These components define a "metacomputing abstract machine" upon which a range of higher level services, tools, and applications may be built. By definition, their approach results in a software layer best classified as middleware, leaving the creation of higher-level services to future work. Central to Globus is consideration of the potentially unreliable nature of metacomputing environments. To this end, they view applications that run in metacomputing environments as possessing the ability to configure themselves to a given execution environment and then to adapt to subsequent changes in that environment. But clearly, the functionality to accomplish this is not the application programmer's responsibility; it must originate from within the metacomputing environment itself. Later, we discuss one component system working toward this goal.

Foster and Kesselman [FK96] make five general observations about future metacomputing systems, including Globus. Even though most experiments have been on small-scale environments with the largest consisting of machines at 17 different sites [FGNS96], environments must support both scale and selection. When widely deployed, metacomputing technology will support access to hundreds or thousands of machines; users will want to select from those machines based on criteria such as connectivity, cost, security, and reliability. Metacomputers must handle heterogeneity at multiple levels, including physical devices, network characteristics, system software, and scheduling and usage policies. Whereas traditional applications could make assumptions about machine and network characteristics, metacomputer applications must be able to handle the unpredictable structure of diverse and/or dynamically constructed metacomputing environments. Similarly, dynamic and unpredictable behavior may be caused by sharing of metacomputing environments, making guaranteed quality of service and exclusive, predictable access to resources difficult. Finally, security and authentication are complicated by metacomputing resources residing in multiple administrative domains. Foster and Kesselman point out that all of these issues have a common requirement of real-time information about the system structure and state. A metacomputing system must use that information to make configuration decisions and be notified when the information changes. Meeting these real-time system information requirements is one of the challenges we address below.

The Globus toolkit is comprised of a set of modules. Each module defines an interface through which higher-level services gain access to the module's functionality. Currently, six modules have been conceived: resource location and allocation, communications, unified resource information service, authentication interface, process creation, and data access [FK96]. Additional details about some of these modules appear below.

Evaluation. It is difficult to evaluate a system that is still under active development. Consequently, our comments here relate more to the design and implementation approach being used by Globus. This is still constructive, though, since we have identified software design and development methodologies as a key contribution of metacomputing technology.

Compared to PVM, the Globus project is taking a very different approach to the metacomputing problem. PVM and HeNCE are concerned with extending an existing system originally intended for a somewhat different purpose (i.e., computing in small- to medium-scale heterogeneous environments). Globus, on the other hand, is building a "metacomputing toolkit" and addressing the metacomputing problem in a bottom-up manner by developing low-level mechanisms and support that can be used to implement higher-level services [FK96]. The Globus project uses existing technology where applicable, extends it when practical, and builds their own services when necessary. Most low-level services appear to require new implementations, though some mid- and higher-level services will likely be more compatible with existing systems.

A notable aspect of Globus that distinguishes it from other projects is the desire to let potential metacomputing applications drive the identification of, and solutions to, technical problems. Through the use of targeted testbeds (one of which is discussed later), the Globus project is hoping to improve their ability to predict the requirements of future metacomputing applications.

What remains to be seen is whether the low-level toolkit components currently being built will be able to address these yet unknown requirements. This reveals one of the potential problems with a bottom-up approach: the resulting system may not be sufficiently general. This is particularly important since Globus is trying to build just that-a general metacomputing environment. In the next section, we see how the Globus approach compares to the design of the Legion metacomputing system.

Distributed Object Metacomputing: The Legion Project

The object-oriented computing paradigm promotes such characteristics as modularity, abstraction, reuse, and programming-in-the-large [RT96]. In many cases, it can reduce the amount of work that a programmer has to do and result in increased productivity [MN96]. The object-oriented paradigm has more recently been extended to the realm of distributed applications in hopes of thwarting the so-called "software crisis" [Cox90].

The rapid increase in internetworked computers during the 1990s has resulted in challenging new requirements for information processing in the areas of interconnection and interoperability [NWM93]. In response to this challenge, distributed object systems such as CORBA [OMG95] and DCE [OG96] have emerged, and object-orientation has proven itself a desirable means of managing the complexity of large-scale distributed systems [NWM93]. However, most of these systems do not target high-performance computing and do not support parallel programming [GW96]. In an effort to bring the advantages of distributed objects to metacomputing, the Legion project is attempting to build an infrastructure capable of supporting high-performance access to millions of hosts based on a solid, object-oriented conceptual model [GW96].

System Description. The Legion vision is nothing less than grandiose, and it forces one to consider seriously how future metacomputing environments will actually manifest themselves and be used. Grimshaw and Wulf [GW96] describe Legion as follows:

Our vision of Legion is of a system consisting of millions of hosts and trillions of objects co-existing in a loose confederation tied together with high-speed links....
It is Legion's responsibility to support the abstraction presented to the user, to transparently schedule application components on processors, manage data migration, caching, transfer, and coercion, detect and manage faults, and ensure that the user's data and physical resources are adequately protected.

One may argue that the Legion vision of computing is actually a fairly simple conceptual extension to the current state of internetworking. With millions of computers connected to the global Internet, it is not a big conceptual leap to imagine those computers participating in a global pool of computational resources. But, as we have argued previously, this relatively small conceptual jump requires enormous advances in software technology. The Legion project recognizes these requirements perhaps more than any other metacomputing project, though this was not always the case. Legion started as an effort to extend the Mentat parallel programming language to a "campus-wide virtual computer" [GNW95]. Now, the researchers are designing "from the ground up so that the resulting system has a clean coherent architecture, rather than a patchwork of modifications based on a solution for a different problem" [GW96]. This realization does not bode well if the developers of PVM and HeNCE have any real intentions of supporting large-scale metacomputing.

In this spirit, the Legion object model is guided by the philosophy that the developers can not design a system that satisfies the needs of every user. To this end, the Legion team is not attempting to provide specific solutions to all of metacomputing's problems. Rather, "users should be able, whenever possible, to select both the kind and level of functionality, and make their own trade-offs between function and cost" [GW96]. The goal of Legion, then, is primarily to specify functionality, not implementation. As the developers state, "the [Legion] core will consist of extensible, replaceable components" [GW96]. Legion provides base implementations of core functionality, but users can replace them at will. Object-oriented classes and inheritance provide a natural means of supporting this goal [LG96].

As mentioned above, a key challenge for Legion is to deliver high performance. Performance has not been a primary objective for distributed object systems, and this is part of the reason that the Legion object system was (re)designed from scratch-so that performance considerations could influence the development of the system. The authors point to two methods for achieving high performance. First, load-distributing [SKS92] and load-sharing [ZWZD92] systems can be used to exploit resources throughout the metacomputing system. Second, Legion supports parallel execution. As a project emerging from the parallel processing community, parallelism is a major focus of the project. Four means of achieving parallelism are available: wrapping existing parallel codes in Legion object wrappers, supporting parallel method invocation and object management, exposing the Legion run-time interface to parallel language and toolkit builders, and by supporting popular message-passing APIs like PVM.

Evaluation. The goal of the Legion project is to provide a cohesive middleware system for metacomputing environments that supports a wide variety of tools, languages, and computation models, while simultaneously allowing diverse security, fault-tolerance, replication, resource management, and scheduling policies. The project is very large-scale in its goals and broad in its vision of future computing environments.

The primary strengths of the Legion system follow a common theme that spans from high-level design to low-level implementation. At the highest level, Legion's greatest strength is its philosophy toward addressing the metacomputing problem. At the heart of that philosophy is the tenet that they "cannot design a system that will satisfy every user's needs" [GW96]. Adherence to this principle will result in a system that can grow with the relatively young area of metacomputing. Along with this philosophy, the group has identified several constraints that restrict how the higher-level design can be mapped to lower-level implementations of the system. These include not replacing host operating systems, not legislating changes to interconnection networks, and not requiring Legion to run with special privileges. Like the Legion philosophy, these constraints go a long way toward creating a flexible and usable system. At the next level, the Legion group has adopted a framework approach to implementing their system. A framework specifies the interfaces to core components of the system and may provide default components that can be easily replaced. Finally, Legion takes an object-oriented approach to the actual implementation of the system. Object-orientation provides excellent mechanisms for abstracting machine-specific, operating system-specific, and network-specific characteristics of the computational resources within the metacomputing system. Simultaneously, it provides a robust model on which new components and services may be built. In summary, the main strength of Legion is a thorough appreciation of the need for a flexible and adaptive system. This appreciation is present from the high-level philosophy guiding the project all the way down to the low-level implementation of its objects.

Ironically, the primary weaknesses of Legion can be derived from the very properties that give it strengths. Consider the Legion philosophy that the project cannot possibly build a system to satisfy every user's needs. The Legion builders have actually stated that "the Legion project cannot... build all of `Legion'" [Grim96]. In some ways, this is a fundamental flaw-albeit an unavoidable one perhaps-in the Legion project. That is, as they seek to maximize the generality and flexibility of the system, they simultaneously minimize what they can actually implement and even specify in the form of interfaces, which in turn complicates the user's ability to customize the system to their own needs. Similarly, in the same way that they can not anticipate every user's needs, the constraints they have identified will likely prove insufficient as the field of metacomputing evolves. While object orientation has some clear advantages, it also has some potential disadvantages. With respect to usability, much of the system is left up to other developers and/or users to implement. Are users willing to write their own security modules? Their own scheduling objects? With respect to performance, it is unclear how much overhead object representations and interactions impose on Legion applications. Similarly, delivering a high-performance global name space is a very challenging problem. Finally, it remains to be seen whether metacomputing will thrive at the scale envisioned by the developers. Do we need support literally capable of managing millions of hosts at once? Or will metacomputing practically be applied in a more modest fashion? In summary, the weaknesses of the Legion system are derived from its extremely general approach and broad scale.

In comparing Globus [FK96] and Legion, two major differences are apparent. First, with respect to determining the requirements of metacomputing, the Globus project recognizes the importance of testbeds as a means of discovering and learning about metacomputing. Legion, on the other hand, is more concerned with building a totally flexible system that maximizes its ability to adapt to future requirements, whatever those may be. In design, Globus is taking a bottom-up approach and trying to use, or at least incorporate, existing technology when possible. Legion's object-oriented view of metacomputing is naturally top-down, and almost all functionality must be reimplemented for that environment.

Clearly, the decisions that the Legion team have made in the design and implementation of their system have trade-offs. This is not surprising. Legion is attempting to tackle a very large problem, and it has already made significant contributions to the consideration, design, and implementation of metacomputing software infrastructure. It has also spurred research into several more refined and focused areas like resource discovery and scheduling.

Issues and Challenges

From the comments above, it should be clear that metacomputing is a young area. The software that is currently available to support metacomputing-like functionality (e.g., PVM and HeNCE) falls short in many respects. At the same time, those projects that are attempting to address the metacomputing problem more completely (e.g., Globus and Legion) are still under active development. Nonetheless, in examining the current state of metacomputing, we have touched upon several more specific issues and challenges that metacomputing must address. Many of those issues are being addressed by other research groups and have close ties to more general research in parallel and distributed computing. This section discusses some of these efforts. We begin with a brief description of a metacomputing testbed. This is followed by sections discussing real-time system information, application scheduling, and global name space.

A Metacomputing Testbed

Both Legion and Globus have recognized that they cannot fully predict the potential applications and requirements of future metacomputing environments. Interestingly, each group has responded in a different way. Legion is starting from scratch with a completely object-oriented approach to facilitate replacing and/or extending default implementations with customized ones. In this way, unpredicted functionality can be more easily incorporated into the system.

The Globus team of researchers has taken a very different approach. To improve their ability to predict the future, they are creating a series of large-scale testbeds. From these, they hope to learn more about the applications, technical problems, and open issues that must be addressed so that the area can continue to evolve. The first of these testbeds, the I-WAY, was created in 1995 and focused on exploring the types of applications that might be deployed in a metacomputing environment.¹

The I-WAY. The Information-Wide Area Year, or I-WAY, was a metacomputing testbed "in which innovative high-performance and geographically distributed applications could be deployed" [FGNS96]. The organizers hoped that by focusing on applications, the I-WAY would reveal the "critical technical problems" that must be solved and allow researchers to "gain insights into the suitability of different candidate solutions" [FGNS96].

Essentially, the I-WAY was a huge experiment in metacomputing. The subjects of the experiment were the more-than-60 applications selected by competitive proposal. These applications fell into three categories: immersive virtual environments coupled with remote supercomputers, databases, or scientific instruments; multiple, geographically-distributed supercomputers coupled together; and virtual environments coupled with other virtual environments [FGNS96]. This experiment was conducted on a wide range of equipment, including high-end display and virtual reality devices, mass storage systems, specialized scientific instruments, and at least seven different types of supercomputers at 17 different sites in North America. The network connecting all of these components was also heterogeneous, using different switching and networking technologies [FGNS96].

A unique aspect of the I-WAY experiment is the use of I-WAY Point of Presence (I-POP) machines at each participating site. An I-POP machine is basically a workstation front-end to a computational resource. By using the same machine running the same operating system, I-POP machines provided a "uniform environment" for management and security of I-WAY resources [FGNS96]. Each I-POP machine also ran a suite of software tools called I-Soft that supported scheduling, security, parallel programming, and a distributed file system. It should be noted, however, that this support was at a relatively low level compared to that currently envisioned by the Globus project; in many cases, I-Soft was the first attempt at providing such functionality.

Discussion. The authors point to several successes of the I-WAY, including the novel idea of point-of-presence machines and preliminary steps toward an integrated software environment for metacomputing. However, the I-WAY also illustrated the acute nature of certain metacomputing problems. For example, the use of point-of-presence machines is clearly not a scalable solution. Perhaps for a small number of well-coordinated sites, this approach works. But on a larger scale, imposing specific standards on the machine and operating system required to interface with the metacomputing system is both unreasonable and difficult to enforce. We suggest that the need for I-POP machines will be obviated by advances in software technology for metacomputing.

The I-WAY environment was also constrained in many ways. For instance, primitive scheduling software limited applications to executing only on predefined, disjoint subsets of I-WAY computers. Similarly, with respect to security, each user had to have a separate account on each site to which access was required. As we have suggested, these limitations are largely due to the experimental nature of this environment. The authors explicitly recognize this and state that none of the user-oriented requirements were "fully addressed in the I-WAY software environment..." [FGNS96]. More importantly, they recognize the important role of software engineering, claiming that "system components that are typically developed in isolation must be more tightly integrated if performance, reliability, and usability goals are to be achieved" [FGNS96].

Real-Time System Information

As motivated by Foster and Kesselman [FK96], the effective use of a metacomputer is largely dependent on being able to know the state of the system resources. This information can include network activity, available network interfaces, processor characteristics, and authentication mechanisms. The information plays an important role in nearly every stage of metacomputing. For example, it indicates which machines are currently operable; it assists in determining which machines may be used to execute an application; it helps in creating a performance efficient schedule for executing the application; and it can be used to determine if rescheduling the application or relocating a task is necessary.

Providing current information only addresses part of the problem, though. Foster and Kesselman [FK96] note that high-performance applications have traditionally been developed for a specific type of system-or even a specific system-with well-known characteristics. Similarly, scheduling disciplines for distributed systems typically assume exclusive access to processors and networks [FK96]. The shared nature of metacomputing environments effectively invalidates both of these assumptions. It is not necessarily known which types of systems will be used to run a given application (or parts of an application) at a given time, and applications encounter contention for network and processor resources from other programs. Therefore, the load and availability of the resources-and hence, the performance that can be delivered to an application-varies over time [Wols96, WSP97]. In order to improve task assignment and scheduling decisions, a prediction of the future system state is also desirable.

Network Weather Service. Wolski [Wols96] describes a "distributed service that dynamically forecasts the performance various networked resources can deliver to an application." His system, the Network Weather Service (NWS), is being built to operate as a component within a metacomputing environment like Globus. NWS itself follows a modular design with the main components being the sensory, forecasting, and reporting subsystems. Together, these components must sense the performance of the metacomputing resources, forecast the future performance of each resource, and disseminate the forecast to higher-level services [Wols96, WSP97].

Network performance and CPU availability sensors exist as a server process that executes on each machine in the metacomputer. These servers periodically conduct communication experiments with one another to determine latency and throughput values. Similarly, each server regularly determines how much CPU time is available to applications. An internal database records these readings locally on each machine.

NWS supports a framework within which any number of forecasting methods can be used. Wolski describes eleven different predictive methods in his initial work, ranging from a simple running average to a more sophisticated autoregressive model [Wols96]. Each predictor uses a history of previously sampled data to forecast future values. However, experiments have shown that different predictors work better at different times [BWFS96]. To assist in choosing the best predictor, NWS records the error between the forecasts for each predictor and the sampled data. NWS actually tracks both the mean square prediction error and the mean percentage prediction error and for each, reports the predictor with the lowest error. Wolski notes that it is unclear which error indicator ultimately yields the best predictions and that in general, "the fitness of each forecasting technique may be application-specific" [Wols96]. Rather than dictating a single solution, NWS leaves it up to the higher-level service to decide which values to use.

Discussion. Wolski's Network Weather Service [Wols96] has a couple shortcomings. First, the right for a NWS server to conduct an experiment is controlled by a single token that is passed from server to server at a rate determined by an external administrative client. Controlling this rate controls the periodicity of communication measurements. While this may be advantageous as a means of minimizing the impact of NWS upon the computational environment, it does not scale to large-scale metacomputing environments. Second, an interesting question about NWS concerns the use of more computationally expensive prediction methods. One might expect a trade-off between the resource consumption of a predictor and the accuracy of the forecast. But there is no guarantee that more computation will actually yield a better prediction. NWS supports a means of dynamically choosing the best predictive method, but in order to do this, each method must be computed. So, one may pay the price of a more expensive method, but get no return on that investment. Practically, there is little NWS can do to solve this problem. At best, careful consideration of which predictors are actually computed is necessary.

In their overview of load distributing, Shivaratri, Krueger, and Singhal [SKS92] identify the main components of load-distributing algorithms. Among these is the information policy, which decides when, from where, and what state information is to be collected. While NWS is not a load-distributing system per se, it may ultimately be used as the information policy component of such a system in a metacomputing environment. Shivaratri et al. conclude that periodic policies, like NWS, often result in fruitless work when system load is high because the benefits from load distributing in general are minimal when all processors in the system are fully loaded [SKS92]. Consuming resources to collect information under these conditions can actually worsen the situation. Even though NWS is initially targeting the scheduling problem, a similar argument applies.

To counter such problems, the NetSolve system by Casanova and Dongarra [CD96] prefers to "take the risk of having a wrong estimate than to pay the cost for getting a constantly accurate one." This state-change-driven information policy [SKS92] only broadcasts information when it has "significantly changed" [CD96]. This approach avoids repeatedly reporting values that don't change, which in turn reduces the computation and communication load imposed by the monitoring system.

Clearly, both approaches have desirable and undesirable characteristics. The work by Zhou, Wang, Zheng, and Delisle [ZWZD92] seeks to leverage the strengths of both approaches while simultaneously avoiding their weaknesses. Their load sharing system, Utopia, targets large-scale distributed systems consisting of hundreds or thousands of hosts. Zhou et al. [ZWZD92] make several observations about large-scale distributed systems. First of all, they note that such systems are typically structured as clusters of machines, each used by a particular group of people. Furthermore, the machines within such clusters usually share resources extensively. Finally, they claim the following [ZWZD92]:

It is highly unlikely that all the hosts in the system are needed for receiving tasks from far away (in terms of network distance and delay). To achieve optimal performance on a host, typically only other hosts in its local cluster and a select number of remote, powerful hosts, which we call widely-sharable hosts (be they ones with fast CPU, large memory, high I/O bandwidth, or special hardware/software), are needed.

Thus, within a cluster, Utopia employs a centralized, periodic information policy where a server on a single resource acts as a master, receiving system information from all of the other servers. This information is then used to place tasks within that cluster. However, between clusters, a different information policy is used. In this case, system information collected within each "target cluster" (i.e., a cluster possibly receiving tasks from another cluster) is sent to the "source cluster" (i.e., the cluster looking to place a collection of tasks) so that a placement decision can be made. "Virtual clusters" of the "widely-sharable hosts" mentioned above "make it possible to share load on widely dispersed hosts in a large scale system, without `information pollution' and undue overhead" [ZWZD92]. This distributed, selective policy is what gives the system desirable scalability properties.

Providing comprehensive, reliable information within clusters and simultaneously avoiding flooding the network with mostly useless information appears to strike an appropriate balance for large-scale systems. It should be noted, however, that while Wolski's work could benefit in terms of scalability from these results, Utopia does not provide any predictive support like Wolski's system.

In general, these types of services for metacomputing environment illustrate the tension between providing specific, useful functionality and remaining sufficiently general and flexible enough to address the dynamic nature of the environment. Even though Wolski provides a general framework capable of supporting almost any forecasting method, he still makes certain assumptions about how the system operates that may limit the system's applicability and performance. Should the NWS information policy, for example, also be a framework that is controllable by a third party? Or is it sufficient just to improve the scalability of the information policy?

The former would certainly result in a more general tool, but would the responsibilities (i.e., the interfaces) for using the system subsequently become too complex? And while the latter appears to address the problem of scalability, might there be other scenarios that demand still a different information policy? There are no obvious answers to these questions.² While we generally advocate the use of frameworks to address many of the requirements of domain-specific metacomputing for computational science, the extent to which such techniques should be applied remains an open question. This relationship will be explored in more detail in Chapter .

Application Scheduling

It should be clear by now that an important client for metacomputing system state information is a scheduling tool. As Berman et al. [BWFS96] state, "despite the performance potential that distributed systems offer for resource-intensive parallel applications, actually achieving the user's performance goals can be difficult." Indeed, scheduling is central to achieving high application performance on a metacomputer.

However, performance is often user- and application-specific. Furthermore, a system's notion of performance is different than that of the application. For example, whereas a system scheduler might seek to maximize processor utilization or minimize load imbalance, an application-level scheduler might try to optimize some other measure of performance like execution time, speedup, or cost. As Berman and Wolski [BW96] note, "distinct users will attempt to optimize their usage of [the] same metacomputing resources for different performance criteria at the same time."

Application-Level Scheduling. Despite these issues, application programmers have been creating performance-efficient schedules for their heterogeneous parallel and distributed applications for many years. These efforts have largely been "individual and unrelated" despite commonalities in the techniques used [BW96]. To assist and enhance this process in a metacomputing environment, Berman et al. propose application-level schedulers (AppLeS) [BW96, BWFS96]. Application-level schedulers represent the "generalizable structure" of the otherwise intuitive and experiential practice of custom application scheduling on heterogeneous distributed systems [BW96]. In other words, Berman et al. seek to provide a framework within which metacomputing scheduling can be supported.

The primary difference between application-level schedulers and other scheduling environments is that "everything about the system is experienced from the point of view of the application" [BWFS96]. For example, given a fixed metacomputing system, if the candidate resources for an application are lightly loaded, then the system as a whole appears lightly loaded. This can result in two applications (with different resource requirements) making completely different judgments about the state of the system on which they are going to run. Such a situation does not occur in system-level schedulers.

An AppLeS scheduler consists of several components that correspond to the scheduling process. The Resource Selector considers different combinations of machines for executing the application; the Planner develops possible schedules for each combination of machines; the Performance Estimator evaluates the predicted performance of each schedule in terms of the user's performance metric; and the Actuator executes the best schedule using the target resource management system (e.g., Globus or Legion). All of these components are managed by the Coordinator and have access to an Information Pool that consists of performance models, user preferences and constraints, an application description, and the Network Weather Service (described earlier). In the spirit of Legion, AppLeS' components are meant to be replaceable. For example, custom performance models, planning algorithms, or resource selectors could be used in place of default implementations.

AppLeS essentially generates a "customized scheduler for each application" [BWFS96]. Moreover, since it is at the application-level, the scheduler is essentially an "integrated extension of the program being scheduled" [BWFS96]. That is, the customized scheduler and the application become part of the same execution instance. Berman et al. note that these techniques differ from much of the work described in the scheduling literature. In addition, the AppLeS system distinguishes between application-specific, system-specific, and dynamic information used during the scheduling process. It is in this way that Berman et al. have parameterized the general scheduling problem [BWFS96].

Discussion. Since AppLeS operates at the application level, it is able to consider and provide information more meaningful to the application programmer. In fact, Berman et al. have taken this to something of an extreme, requesting a wealth of information from the user. Some of this information, such as cost constraints and performance objectives, is critical to making appropriate scheduling decisions. But much of the information is excessively detailed, requiring the programmer to indicate, for example, how many megabytes of data is required at input and generated at output. The user must also indicate the number and size of data structures, amount of computation and communication per data structure, the communication patterns exhibited by the application, and information about data conversions [BWFS96]. If a user is willing to collect all of that information, it is arguable that with just a little more effort they could manually create a significantly better custom schedule. Mechanisms for determining such information automatically would be far more desirable.

Even though preliminary results are promising, to evaluate AppLeS based solely on the performance of the schedules produced by the prototype system is to miss the larger contribution the project is making to the area of metacomputing. The primary contribution that AppLeS makes is to provide a framework for scheduling on heterogeneous distributed systems. This framework will eventually support any number of scheduling algorithms, including ones provided by the user. It supports access to different types of information (e.g., application, system, and dynamic) relevant to the scheduling process. And it works with metacomputing resource managers like Globus [FK96] and Legion [GW96]. In the same way that a comparison between the Network Weather Service and a particular predictive method would be meaningless, a rigorous comparison between AppLeS and specific scheduling algorithms is also irrelevant.

But this is not to say that performance is irrelevant. What we are suggesting, though, is that the importance of performance is not necessarily absolute. In light of the increasing complexity of both applications and the systems that run them, achieving high performance is often at odds with other important criteria like portability, extensibility, ease of use, and generality.

Consider, for example, the NetSolve system by Casanova and Dongarra [CD96]. The system places great emphasis on ease of use and flexibility as it supports interfaces to programming languages like C and Fortran, analysis packages like MATLAB, interactive command shell, and the World Wide Web. The authors also claim that NetSolve can use any scientific linear algebra package available on the platforms on which it is installed. Of course, performance is also important, but even the authors admit to using a "simple theoretical model" that is "less accurate and always optimistic..." [CD96].

Contrast with this the attempts by Mechoso, Farrara, and Spahr [MFS94] to achieve superlinear speedup running a climate model in a heterogeneous, distributed computing environment. Performance is the exclusive goal in this work, and the authors go to great lengths to achieve it, carefully optimizing the task decomposition, overlapping computation and communication as much as possible, and manually creating custom schedules for their codes.

On the other hand, problem-specific environments are often capable of achieving high performance simultaneously with usability, flexibility, and other more subjective criteria. But, by definition, these environments lack the generality that an environment such as AppLeS attempts to support. For example, Hui et al. [HCYH94] provide a vertically integrated environment for scheduling solutions to partial differential equations on networks of workstations. Their environment is interactive, portable, and scalable, yet it also provides highly-tuned, novel methods for achieving maximal performance. They are able to achieve better performance, in part, by using knowledge about the domain. At the cost of being problem-specific (i.e., the environment can only be used to solve time-dependent PDEs [HCYH94]), they have been able to achieve a desirable combination of the other criteria.

This trade-off between performance and other nonfunctional requirements (e.g., extensibility, generality, portability) is a theme we will return to later. AppLeS is a good example of a system that tries to balance these varied concerns. It does so through a frameworks-based approach that is less concerned with providing specific functionality that meets certain requirements, and more concerned with providing an infrastructure within which a range of functionality can be instantiated.

Global Name Space

A metacomputing environment made up of hundreds or thousands of hosts requires an efficient, general, and usable means for addressing the hosts and other entities (e.g., distributed objects, remote processes, files, users, etc.) in the system. This capability, generally known as naming, is a fundamental requirement of distributed systems ([CDK94], p. 254). Coulouris et al. point out that since users, programmers, and system administrators all regularly refer to the components and services of a distributed system, names must be "readable by and meaningful to humans" ([CDK94], p. 255). For example, IP addresses (e.g., 128.223.8.39) provide a low-level name space for computer hosts (or, more precisely, for network interfaces), but are not general enough to handle all of the other objects mentioned above. Furthermore, IP addresses are awkward and difficult for most humans to remember. The use of logical IP names (e.g., foo.bar.edu) that are mapped into IP addresses is an improvement in this regard, and they also provide a degree of transparency. That is, a given IP name can, over time, refer to several different machines. But, name lookup operations can have a significant impact on system performance [CM89].

A system which translates a name into the attributes of a resource or object is called a name service ([CDK94], p. 253). One of the most widely used name services today is the Internet Domain Name System (DNS). DNS is a distributed service that responds primarily to queries for translating IP names into IP addresses. The DNS database is distributed across a collection of servers, and data is widely replicated and cached to address problems of scale ([CDK94], p. 271). DNS supports a name space that is tree-structured and partitioned both organizationally (e.g., .com, .edu, and .gov suffixes) and geographically (e.g., .us, .uk, and .fr suffixes).

While DNS provides a good example of a robust name service, it does not fully address the needs of a metacomputing environment. In particular, DNS only assists in locating physical hosts; metacomputing environments contain many other entities that require names. Thus, support of a global name space for metacomputing environments is an important consideration, yet surprisingly little effort has gone into this area. Many projects are adopting the now-common naming system used on the World Wide Web known as Uniform Resource Locators (URLs). URLs define a name space that is more generic than DNS (although, it uses DNS in many cases). This section proceeds by describing in more detail the name space defined by URLs. This is followed by a broader discussion of global name spaces and URL-based naming schemes as used in metacomputing projects.

Uniform Resource Locators (URLs). Since 1990, Uniform Resource Locators (URLs) have been the primary means of addressing-or, naming-information on the World Wide Web [BMM94]. A URL is a textual name composed of two main parts: the scheme and the scheme-specific part. These two parts are delineated by a colon. The scheme typically specifies a network protocol, the most common of which is Hypertext Transfer Protocol (HTTP). The scheme-specific part differs for each scheme. In the case of HTTP, it typically indicates the IP name of a web server and the directory path to a specific web document on that server. Many other schemes are supported. For example, the "file" scheme typically takes a directory path to a local file; the "mailto" scheme looks for an email address in the scheme-specific part; and the "news" scheme takes the name of an USENET newsgroup [BMM94]. New schemes are easily added to this system. For example, schemes for television ("tv") and telephone ("phone") are under consideration [Zigm96].

The URL system is layered upon other name services and network protocols. For many schemes (e.g., http, news, ftp, telnet), DNS must be consulted to translate an IP name into an address. Once the target host is known, the transaction is carried out according to the designated protocol. For instance, the URL first results in a DNS lookup of "foo.bar.edu." Once the network address is known, a connection to that machine using standard File Transfer Protocol (FTP) ensues. This is followed by a FTP request for the file named "readme.txt" located in directory "pub."

This functionality is largely consistent with the role of a name service. As Coulouris et al. ([CDK94], p. 256) point out, the ability to unify the resources of different servers and services under a single naming scheme is a major motivation for keeping the naming process separate from other services. URLs provide this capability and further introduce the notion of a "universal set of names" [Bern94]. Berners-Lee defines a member of this set as a Universal Resource Identifier (URI); a URL is then more specifically defined as a URI "which expresses an address which maps onto an access algorithm using network protocols" [Bern94]. More recently, attempts to define Uniform Resource Names (URNs) would result in a name space that is complementary to URLs. URNs would provide a name space (and resolution protocols) for persistent object names [SM94]. Whereas URLs point to specific documents on a specific web server, a URN would have to be "resolved" (much like an IP name) to determine where the desired document resides. That is, URNs would provide a layer of abstraction over relatively low-level URLs.

Discussion. Given the wide acceptance of URLs in the context of the World Wide Web and the emphasis that metacomputing places on internetworking, it is not surprising that several metacomputing projects are attempting to leverage the ubiquity of URLs. The focus of this discussion will be to compare URL-based naming schemes for metacomputing environments with other possibilities. In particular, URLs will be compared to the global name service proposed by Cheriton and Mann [CM89].

Notable among the metacomputing systems currently using URLs in some form are Globus [FK96] and Atlas [BBB96]. The Atlas metacomputing project [BBB96] uses URLs to support a global file system. Atlas integrates a runtime library for scheduling and communication with Java, a programming language that has extended the limited fetch-and-display capabilities of web browsers to include the ability to perform computations. While few details are available, Atlas supports a global file system by providing special versions of standard Unix file I/O routines like fopen() that accept URLs as file names. The URLs are then translated into either local or remote file system accesses. The preliminary work only supports a read-only file system, but a coherent read/write version is planned for the future [BBB96]. The use of Java and URLs suggests a possible future integration with a web-based environment.

As we have described earlier, the Globus project has identified several component modules. Currently, the communication module exists as a system called Nexus [FKT96, FGKT97]. Nexus is a multithreaded communications library that supports an abstract communication model based on nodes, contexts, threads, communication links, and remote service requests [FKT96]. Contexts and nodes roughly correspond to processes and processors, respectively, and a context consists of potentially many threads. In Nexus, communications (i.e., remote service requests) occur between contexts. Higher-level services can manipulate these abstractions through the Globus communications interface.

For two contexts (i.e., multithreaded processes) to communicate, one of them must "attach" to the other [FGKT97, FT96]. The name space of Nexus contexts is accessed through Uniform Resource Locators (URLs). Thus, the Nexus URL indicates that there is a Nexus server running on host "foo.bar.edu" and listening on port 1234. Nexus URLs define a global name space for Nexus contexts. While the translation of these names is performed by Nexus itself, by standardizing on URLs to define this name space, Nexus functionality can more easily be integrated into web-based environments. In fact, in an effort to support "ubiquitous supercomputing" similar to the "universal access" supported by the World Wide Web, Foster and Tuecke [FT96] are integrating Nexus with Java. The Atlas project mentioned earlier has a similar goal. Nexus' robust support for multiple communication methods greatly extends Java's limited capabilities in this area and creates the possibility of using Java for high-performance, heterogeneous computations [FT96].

In addition, Globus uses a separate URL scheme in its remote file and data access module [FK96]. URLs for remote file access are of the form where "x-rio" is the name of the scheme, "foo.bar.edu:5678" indicates the host and port number for a remote file server, and the rest of the URL is the directory path to a file (on the remote host). So, Globus uses two separate URL schemes to support one global name space for contexts and another global name space for files.

Thus, Nexus and Atlas demonstrate that a primary reason for using URLs is the prospect of increasing ubiquity by integrating metacomputing into the World Wide Web environment. But if any metacomputing project seeks ubiquity, it is the Legion project [GW96]. Yet, Legion is not embracing the URL mechanism to support its name space(s). As we have seen, Globus uses separate URL schemes for inter-context communication and for remote (parallel) file system access via the respective Globus modules; Atlas uses URLs to support a global file system and a completely separate (runtime) mechanism for interprocess communication. But Legion, with its pervasive object-orientation, is providing a significantly more unified approach where everything-processes and files alike-are objects. Grimshaw and Wulf [GW96] note that files are simply objects that reside on a disk. To this end, the Legion project is building its own global file system that will support a high-performance, object-based interface to files.

We stated previously that a key goal of naming is unification ([CM89], p. 256). Through its object-oriented approach, Legion essentially supports an even higher degree of unification than URLs can. That is, whereas Nexus URLs clearly distinguish the type of object being accessed (i.e., by the scheme and syntax of the URL), Legion names may not. One interacts with a Legion object in the same way (i.e., by invoking methods) whether it is bound to a file, a process, another user, or something else in the system. While the available methods may differ across object types, the syntax for naming them does not.

Legion's ability to provide a higher degree of name unification over URLs is a product of its object-orientation. Object-orientation facilitates the creation of a global name space through its properties of abstraction, inheritance, and encapsulation. URLs, as mentioned above, do not provide these capabilities, and efforts to provide more abstract name services (e.g., URNs) are underway [SM94]. The URL system has other shortcomings which can be revealed through a comparison with the naming system proposed by Cheriton and Mann [CM89].

Cheriton and Mann [CM89] propose a decentralized, global naming service that attempts to provide both performance and fault tolerance. The naming architecture consists of three levels: global, administrational, and managerial. As we will see, these levels have a desirable correspondence to the organization of metacomputing environments that span multiple administrative domains. The global level represents the organizations and groups of organizations that are covered by the naming service. Next, entries at the administrational level are owned and managed by a particular organization. Finally, managerial entries correspond to actual objects (e.g., directories, files, processes, people, etc.) and are called "object managers" [CM89]. Names in this system follow a simple, consistent syntax. Figure 8 contains an example of this syntax, labels the levels in the naming hierarchy, and shows the (approximate) corresponding URL.

FIGURE 8. A comparison between the name spaces supported by Cheriton and Mann's naming service and commonly used Uniform Resource Locators (URLs).

Immediately, an advantage of this scheme (and a shortcoming of URLs) is revealed. In particular, it provides a higher-level abstraction to naming. Whereas URLs contain the names of explicit host machines, this naming scheme abstracts away from particular machines and creates a logical, top-down approach to naming. A related advantage is that the naming syntax is consistent at all levels of the hierarchy. URLs (under the http and several other schemes) are designated in two different formats (i.e., the hostname element is represented in dot notation while directory paths are in path notation). Finally, URLs also include a separate designation for "protocol" (e.g., http, ftp, etc.) Because Cheriton and Mann's global naming system is object-oriented, protocol designation is not necessary (making it similar to Legion in that respect); object managers support a common, method-based interface for requesting the information they manage [CM89].³

In summary, the potentially dynamic nature of a large metacomputing system requires a consistent and robust global naming service. Comparing the URL system with Cheriton and Mann's system reveals some of the potential problems that metacomputing implementers may encounter. Efforts to augment the URL naming system with more persistent, abstract names (e.g., URNs) may someday provide metacomputing with a reasonable compromise between achieving ubiquity and the quality, transparency, and consistency of more robust naming systems.

Conclusion

The convergence of parallel and distributed computing has created many new, challenging research areas. Many of these challenges exist in the area of heterogeneous computing. To understand those challenges is to understand many of the challenges of metacomputing. Perhaps more revealing, though, is a survey of past and current efforts. The NCSA metacomputing effort was clearly instrumental in defining the vision of metacomputing and actually building an early system. That system, though, lacked the extensive software support that is now widely recognized as being required. Current efforts at creating that support are varied. While the PVM and HeNCE effort is building on proven, existing technology, the Legion project is creating a completely new system from the ground up. In between these two, Globus is using a modular approach that incorporates existing technology where applicable, and creates new functionality when necessary. Regardless of the methodology, a survey of these systems reveals a number of challenges, many of which are being addressed by the metacomputing community. Building metacomputing testbeds, collecting real-time system information, scheduling applications, and supporting a global name space are just a few of the challenges metacomputing faces. These specific challenges often have firm roots in parallel and distributed computing. By comparing the efforts of the metacomputing community with other projects and related research, we gain insight into the nature of metacomputing and the extent of the challenges it faces.

Domain-Specific Metacomputing for Computational Science:
Achieving Specificity Through Abstraction
By Steven Hackstadt

Last modified: Wed Nov 5 08:14:49 1997

Steven Hackstadt / hacks@cs.uoregon.edu
http://www.cs.uoregon.edu/~hacks/