A. D. Malony, S. Shende, and R. Bell, "Online Performance Observation of Large-Scale Parallel Applications," Proc. Parco 2003 Symposium, in "Parallel Computing: Software Technology, Algorithms, Architectures and Applications," (Eds. G. R. Joubert, W. E. Nagel, F. J. Peters, and W. V. Walter), Advances in Parallel Computing, Vol. 13, Elsevier B.V., pp. 761 -768, 2004.

Keywords: Paraprof, Parvis, TAU, performance analysis, large-scale, parallel computing

Parallel performance tools offer insights into the execution behavior of an application and are a valuable component in the cycle of application development, deployment, and optimization. However, most tools do not work well with large-scale parallel applications where the performance data generated comes from upwards of thousands of processes. As parallel computer systems increase in size, the scaling of performance observation infrastructure becomes an important concern. In this paper, we discuss the problem of scaling and perfomance observation, and the ramifications of adding online support. A general online performance system architecture is presented. Recent work on the TAU performance system to enable large-scale performance observation and analysis is discussed. The paper concludes with plans for future work.


