Performance Measurement and Modeling

Next: PMM Software Infrastructure Up: Performance Measurements in HPC Previous: The Common Component Architecture Contents

Performance Measurement and Modeling

A CCA application is composed of components and the composite performance of a component assembly is determined by the performance of the individual components as well as the efficiency of their interaction. Thus, the performance of a component has to be considered in a certain context consisting of the problem being solved (e.g., a component may have to do two functions, one which requires sequential access and the other strided access of an array), the parameters/arguments being passed to a method (e.g., length of an array) and the interaction between the caller and the callee (e.g., if a transformation of the data storage needs to be done). If multiple implementations of a component exist (i.e., implementations which provide the same functionality) then within a given context, there will be an optimal choice of implementation. This requires that performance models be available for all components and a means to generate a composite model exist.

Most scientific components intersperse compute intensive phases with message passing calls, which incur costs inversely proportional to the network speed. These calls sometimes involve global reductions and barriers, resulting in additional synchronization costs. For the purposes of this paper we will assume blocking communications where communications and computations are not overlapped. We will ignore disk I/O in this study. Thus, in order that a performance model for a component may be constructed, we require the following :

The total execution time spent in a method call. These methods are those in the ProvidesPorts of a component.
The total time spent in message passing calls, as determined by the total inclusive time spent in MPI during a method invocation.
The difference between the above is the time spent in computation, a quantity sensitive to the cache-hit rate. We will record this quantity for the period of the method call.
The input parameters that affect performance. These typically involve the size of the data being passed to the component and some measure of repetitive operations that might need to be done (e.g., the number of times a smoother may be applied in a multigrid solution).

The first three requirements are traditional and may be obtained from publicly available tools [18]. The fourth requires some knowledge of the algorithms being implemented, and is extracted by a proxy before the method invocation is forwarded to the component. We envisage that proxies will be simple and preferably, amenable to automatic generation.

Next: PMM Software Infrastructure Up: Performance Measurements in HPC Previous: The Common Component Architecture Contents

2003-11-05