As the starting point for understanding the influences of scaling on performance observation, it is reasonable to consider the standard methods for performance measurement and analysis: profiling and tracing. Profiling makes measurements of significant events during program execution and calculates summary statistics for performance metrics of interest. These profile analysis operations occurs at runtime. In contrast, tracing captures information about the significant events and stores that information in a time-stamped trace buffer. The information can include performance data such as hardware counts, but analysis of the performance data does not occur until after the trace buffer is generated. For both profiling and tracing, it is usually the case that the performance measurements (profile or trace) are generated and kept at the level of application threads or processes.
What happens, then, as the application scales? We consider scaling mainly in terms of number of threads of execution. In general, one would expect that the greater the degree of parallelism, the more performance data overall that will be produced. This is because performance is typically observed relative to each specific thread of execution. Thus, in the case of profiling a new profile will be produced for each thread or process. Similarly, tracing will, in general, produce a separate event sequence (and trace buffer) for each thread or process. Certainly, these consequences of scaling have direct impact on the management of performance data (profile or trace data) during a large-scale parallel execution. Scaling, it is expected, will also cause changes in the number, the distribution, and perhaps the types of significant events that occur during a program's run, for instance, with respect to communication. Furthermore, larger amounts of performance data will result in greater analysis time and complexity, and more difficulty in presenting performance in meaningful displays.
However, the real practical question is whether our present performance observation methods and tools are capable of dealing with these issues of scale. Most importantly, this is a concern for measurement intrusion and performance perturbation. Any performance measurement intrudes on execution performance and, more seriously, can perturb ``actual'' performance behavior. While low intrusion is preferred, it is generally accepted that some intrusion is a consequence of standard performance observation practice. Unfortunately, perturbation problems can arise both with only minor intrusion and small degrees of parallelism.
How scaling affects intrusion and perturbation is an interesting question. Traditional measurement techniques tend to be localized. For instance, thread profiles are normally kept as part of the thread (process) state. This suggests that scaling would not compound globally what intrusion is occuring locally, even with larger numbers of threads (processes). On the other hand, it is reasonable to expect that the measurement of parallel interactions will be affected by intrusion, possibly resulting in a misrepresentation of performance due to performance perturbation. The bottom line is that performance measurement techniques must be used in an intelligent manner so that intrusion effects are controlled as best as possible. But this must involve a necessary and well-understood tradeoff of the need for performance data for solving performance problems against the ``cost'' (intrusion and possible perturbation) of obtaining that data.
Online support for performance observation adds interactivity to the performance analysis process. Several arguments justify the use of online methods. Post-mortem analysis may be ``too late,'' such as when the status of long running jobs needs to be determined to decide on early termination. There may also be opportunities for steering a computation to better results or better performance interactively by observing execution and performance behavior online. Some have motivated online methods as a way to implement dynamic performance observation where both instrumentation and measurement can be controlled at runtime. In this respect, online approaches may offer a means to better manage performance data volume and measurement intrusion. Most of the arguments above assume, of course, that the online support can be implemented efficiently and results in little intrusion or perturbation of the parallel computation. This is more difficult with online methods as they involve more directly coupled mechanisms for access and interaction. Again, one needs to understand the tradeoffs involved to make an intelligent choice of what online methods to use and how.