The scaling of parallel computer systems and applications presents new challenges to the techniques and tools for performance observation. We use the term performance observation to mean the methods to obtain and analyze performance information for purposes of better understanding performance effects and problems of parallel execution. With increasing scale, there is a concern that standard observation approaches for instrumentation, measurement, data analysis, and visualization will encounter design or implementation limits that reduce their effective use. What drives this concern, in part, is the problem of measurement intrusion, and the fact that simple application of current approaches may result in more perturbed performance data. It is also clear that scaling of standard methods will raise issues of performance data size, the amount of processing time required to analyze the data, and the usability of performance presentation techniques.
Concurrently, there is an interest in the online observation of parallel systems and applications for purposes of dynamic assessment and control. With respect to performance observation, we think of performance monitoring as constituting online measurement and performance data access, and performance interaction as additional infrastructure for affecting performance behavior externally. Certainly, there are several motivations for online performance observation, including the control of intrusion via dynamic instrumentation or dynamic measurement. However, the influence of scaling must again be considered when evaluating the benefits of alternative approaches.
In this paper, we consider these issues with respect to profiling and tracing methods for online performance observation. Our main interest in this work is to understand how best to scale a parallel performance measurement model and its implementation, and to extend its functionality to offer runtime control and interaction. We present results from the development of scalable online profiling in the TAU performance system. We also have develop online tracing capabilities in TAU, but this work is discussed elsewhere.