Next: Conclusion Up: Computational Quality of Service Previous: CCA Overview

Computational Quality of Service

Quality of service is often associated with ways of implementing application priority or bandwidth reservation in networking. Here computational quality of service (CQoS) refers to the automatic selection and configuration of components to suit a particular computational purpose. While CBSE helps to partition complexity in parallel simulations, it also presents its own problems. For example, if data is distributed across all participating processors (Fig. 1), each component must deal with the distributed data as it is presented; it is almost never efficient to redecompose the problem optimally for each component. If the components are thorough black boxes, then there would be no mechanism to optimize this decomposition over all components interacting with it. However, if metadata is provided either as part of the static information associated with the component repository, or as dynamic information computed in real time, a ``resource-controller'' component could configure its peer components by taking the global situation into consideration (see Fig. 2). This special-purpose component interprets mechanistic, performance, or dependency metadata, provided by its peer components, to make an optimal solution within the context of an entire application or a local container component. For more information on CCA containers, see [10].

**Figure 2:** CQoS component organization.
$\includegraphics[width=2.5in]{resource-controller.eps}$

This approach not only solves CBSE problems but presents new opportunities, primarily that of being able to dynamically replace poorly performing components. Component concepts help to manage complexity by providing standard building blocks; these concepts also enable a degree of automation at a high level. Here we will describe how CBSE in scientific computing provides opportunities to automate scientific simulations for better performance and accuracy.

CQoS metadata may be used to compose or dynamically adapt an application. A detailed design of an infrastructure for managing CQoS-based component application execution was proposed in [11]. The CCA enables the key technology on which CQoS depends, including component behavior metadata and component proxies for performance modeling or dynamic substitution. By associating CQoS metadata with a component's uses and provides ports, one can effectively express that component's CQoS requirements and capabilities.

CQoS employs global information about a simulation's composition and its environment, so that sound choices for component implementations and parameters can be made. Building a comprehensive CQoS infrastructure, which spans the algorithms and parallel decomposed data common to scientific simulations, is an enormous task but, given the need to automate the cooperation of algorithmically disparate components, a necessary one. The research reported in the rest of this section is a first step toward this aim and thus first addresses problems that interest the scientific simulation community.

Performance Measurement and Monitoring. The factors that affect component performance are many and component dependent. To evaluate component CQoS, one must have a performance system capable of measuring and reporting metrics of interest. We have developed a performance monitoring capability for CCA that uses the TAU parallel performance system [12] to collect performance data for assessing performance metrics for a component, both to understand the performance space relative to the metrics and to observe the metrics during execution. After performance data have been accumulated, performance models for single components or entire applications can be constructed. An accurate performance model of the entire application can enable the automated optimization of the component assembly process.

Automated Application Assembly. CCA scientific simulation codes are assemblies of components created at runtime. If multiple implementations of a component exist (i.e., they can be transparently replaced by each other), it becomes possible to construct an ``optimal'' CCA code by choosing the ``best'' implementation of each component, with added consideration for the overhead of any potentially necessary data transformations. This construction requires the specification of quality attributes with which to discriminate among component implementations. In this discussion, we will focus on execution time as the discriminant.

Performance data can be measured and recorded transparently via the proxy-based system described in [13]. Component interface invocations are recorded, resulting in a call graph for the application. The net result of a fully instrumented run is the creation of data files containing performance parameters and execution times for every invocation of an instrumented component as well as a call graph with nodes representing components, weighted by the component's execution time.

Performance models are created through regression analysis of the data collected by this infrastructure. The call-graph is also processed to expose the cores, or components that are significant from the perspective of execution time. This processing is done by traversing the call tree and pruning branches whose execution time is an order of magnitude less than the inclusive time of the nodes where they are rooted. Since component performance models can be constructed from performance data collected from unrelated runs or from unit tests, the models consequently scale, at worst, as the total number of component implementations. The final composite model for a component assembly reduces to a summation over the performance models of each of the components in the cores. At any point before or during the simulation, the performance models of each of the component implementations are evaluated for the problem's size to obtain the execution times of any component assembly prior to choosing the optimal set. Once an optimal set of components have been identified, the performance modeling and optimization component, named Mastermind, modifies the existing component assembly through the BuilderService interface introduced in Section 2.

Adaptive Polyalgorithmic Solvers. While application assembly is typically done once before a scientific simulation starts, often the same set of component implementations does not satisfy CQoS requirements throughout the application's entire execution. Many fundamental problems in scientific computing tend to have several competing solution methods, which differ in quality attributes, such as computational cost, reliability, and stability. For example, the solution of large-scale, nonlinear PDE-based simulations often depends on the performance of sparse linear solvers. Many different methods and implementations exist, and it is possible to view each method as reflecting a certain tradeoff among several metrics of performance and reliability. Even with a limited set of metrics, it is often neither possible nor practical to predict what the ``best'' algorithm choice for a given choice may be. We are in the initial stages of investigating dynamic, CQoS-enhanced adaptive multimethod linear solvers, which are used in the context of solving a nonlinear PDE via a pseudo-transient Newton-Krylov method. Depending on the problem, the linear systems solved in the course of the nonlinear solution can have different numerical properties; thus, a single linear solution method may not be appropriate for the entire simulation. As explained in detail in [14], the adaptive scheme uses a different linear solver during each of the three phases of the pseudo-transient Newton-Krylov algorithm, leading to increased robustness and potentially better overall performance.

Next: Conclusion Up: Computational Quality of Service Previous: CCA Overview

Sameer Shende 2004-06-08