Next: Performance Measurements in HPC Up: SAND2003-8631 Unlimited Release Printed Previous: Introduction Contents

Related Work

The three most widely-used component standards (CORBA [7], COM/DCOM [12], Java Beans [6]) are ill-suited to handle high performance scientific computing due to a lack of support for efficient parallel communication, insufficient scientific data abstractions (e.g., complex numbers), and/or limited language interoperability [9]. Thus, performance metrics developed for these environments are inadequate for HPC. In the serial environment to which these commercial component models are targeted, there is little reason for the design to account for details of hardware and memory hierarchy performance, yet this is a critical requirement in HPC. Often these distributed frameworks/component models (e.g. DCOM, CORBA CCM) use commodity networking to connect components together, entirely inadequate for HPC. In a distributed environment, metrics like round trip time and network latency are often considered useful, while quantities like bisection bandwidth, message passing latencies and synchronization cost, which form the basis of much of the research in HPC are left unaddressed. This primarily arises from the very different platforms that HPC and commercial component based applications run on - HPC is done almost exclusively on tightly-connected clusters of MPPs (massively parallel processors) or SMPs (Symmetric Multi-processors) while commercial codes often operate on LANs (Large Area Networks) or WANs (Wide Area Networks).

However, despite the different semantics, several research efforts in these standards offer viable strategies in measuring performance. A performance monitoring system for the Enterprise Java Beans standard is described in [13]. For each component to be monitored, a proxy is created using the same interface as the component. The proxy intercepts all method invocations and notifies a monitor component before forwarding the invocation to the component. The monitor handles the notifications and selects the data to present, either to a user or to another component (e.g., a visualizer component). The goal of this monitoring system is to identify hot spots or components that do not scale well.

The Wabash tool [14,15] is designed for pre-deployment testing and monitoring of distributed CORBA systems. Because of the distributed nature, Wabash groups components into regions based on the geographical location. An interceptor is created in the same address space of each server object (i.e., a component that provides services) and manages all incoming and outgoing requests to the server. A manager component is responsible for querying the interceptor for data retrieval and event management.

In the work done by the Parallel Software Group at the Imperial College of Science in London [16,17], the research is focused on grid-based component computing. However, the performance is also measured through the use of proxies. Their performance system is designed to automatically select the optimal implementation of the application based on performance models and available resources. With components, each having $C_{i}$ implementations, there is a total of $\Pi^{n}_{i=1} C_{i}$ implementations to choose from. The performance characteristics and a performance model for each component is constructed by the component developer and stored in the component repository. Their approach is to use the proxies to simulate an application in order to determine the call-path. This simulation skips the implementation of the components by using the proxies. Once the call-path is determined, a recursive composite performance model is created by examining the behavior of each method call in the call-path. In order to ensure that the composite model is implementation-independent, a variable is used in the model whenever there is a reference to an implementation. To evaluate the model, a specific implementation's performance model replaces the variables and the composite model returns an estimated execution time or estimated cost (based on some hardware resources model). The implementation with the lowest execution time or lowest cost is then selected and a execution plan is created for the application.

Next: Performance Measurements in HPC Up: SAND2003-8631 Unlimited Release Printed Previous: Introduction Contents

2003-11-05