We have proposed a software infrastructure for performance measurement in HPC component environments. Our prototypical implementation was used to collect performance data for a scientific simulation and construct performance models. While the data collected is no different from what is required in traditional HPC, the measurement system must be compatible with component software development methods and new strategies, such as proxies, must be adapted from other component-based environments. Proxies can be automatically generated from a component's header if the sole purpose is to time the execution of a component. However, for performance modeling, one frequently needs to record certain inputs to the component. Proxies are the logical place to extract this information before forwarding the component invocation, but this requires that this information be identifiable during proxy creation. We are currently investigating simple mark-up approaches identifying arguments/parameters which affect performance and need to be extracted and recorded.
The problem of performance modeling is still unsolved. The models derived here are valid only on a similar cluster. Any significant change, such as halving of the cache size, will have a large effect on the coefficients in the models (though the functional form is expected to remain unchanged). Ideally, the coefficients should be parameterized by processor speed and a cache model. We will address this in future work, where the cache information collected during these tests will be employed.
The ultimate aim of performance modeling is to be able to compose a composite performance model and optimize a component assembly. Apart from performance models, this requires multiple implementations of a functionality (so that one may have alternates to choose from) and a call trace from which the inter-component interaction may be derived. The wiring diagram (available from the framework) along with the call trace (detected and recorded by the performance infrastructure) can be used by the Mastermind to create a composite performance model where the variables are the individual performance models of the components themselves. Figure 10 shows a schematic of how such a system may construct an abstract dual (represented as a directed graph) of the application. Edge weights signify the number of invocations and the vertices are weighted by the compute and communication times, as predicted by the performance models of the component implementations. The caller-callee relationship is preserved to identify subgraphs that are insignificant from the performance point of view. This facilitates dynamic performance optimization which uses online performance monitoring to determine when performance expectations are not being met and new model-guided decisions of component use need to take place. This is currently underway.