A CCA application is composed of components and the composite performance of a component assembly is determined by the performance of the individual components as well as the efficiency of their interaction. Thus, the performance of a component has to be considered in a certain context consisting of the problem being solved (e.g., a component may have to do two functions, one which requires sequential access and the other strided access of an array), the parameters/arguments being passed to a method (e.g., length of an array) and the interaction between the caller and the callee (e.g., if a transformation of the data storage needs to be done). If multiple implementations of a component exist (i.e., implementations which provide the same functionality) then within a given context, there will be an optimal choice of implementation. This requires that performance models be available for all components and a means to generate a composite model exist.
Most scientific components intersperse compute intensive phases with message passing calls, which incur costs inversely proportional to the network speed. These calls sometimes involve global reductions and barriers, resulting in additional synchronization costs. For the purposes of this paper we will assume blocking communications where communications and computations are not overlapped. We will ignore disk I/O in this study. Thus, in order that a performance model for a component may be constructed, we require the following :
The first three requirements are traditional and may be obtained from publicly available tools [18]. The fourth requires some knowledge of the algorithms being implemented, and is extracted by a proxy before the method invocation is forwarded to the component. We envisage that proxies will be simple and preferably, amenable to automatic generation.