$next$ $up$ $previous$
Next: TAU Performance System Up: paper-final Previous: paper-final

Introduction

There has long been a tension in the field of performance tools research between the need to invent new techniques to deal with performance complexity of next-generation parallel and distributive systems, and the need to develop tools that are robust, both in their function and in their scope of application. Perhaps this is just the nature of the field. Yet there are important issues concerning the advancement of performance tools ``research'' and the successful demonstration and use of performance ``technology.'' A cynical perspective might argue against trying something new without first getting the existing technology to just work, and work reliably in real applications and environments. The all too commonly heard mantras ``performance tools don't work'' and ``performance tools are too hard to use'' might lead one to believe in this perspective, but research history does not necessarily justify such a strong cynical stance. There has been significant innovation in performance observation and analysis techniques in the last twenty year to address the new performance challenges parallel computing environments present [10]. Current attention is certainly being paid to easing the burden of tool use through automated analysis [1]. There have also been important technology developments that add considerable value to the performance tool repertoire, such as APIs for dynamic instrumentation [4] and hardware performance counters [3]. Why, then, is there an apparent disconnect between research results and the ``reality'' of tool usage in parallel application environments? From our perspective as performance tool researchers, we take, perhaps, a controversial stance among our peers and argue that tool engineering is an important factor in this regard. The controversial part primarily concerns the notion of ``research'' and the rewards (or lack thereof) in a research career for tool development. Our counter position is that innovation in performance tools research is best advanced by ``standing on the shoulders'' of solid technology foundations. When that foundation does not exist, it must be developed. When a technology does exist, it should be integrated, if possible, and not reinvented. Indeed, many tools do not work reliably and, as a consequence, are hard to use. Many tools are not portable across parallel systems or reusable with different programming paradigms, and, as a consequence, have limited application. These results cannot be considered as positive results for the performance tool research community, that is, if reliability, portability, and robustness, in general, is considered worthy of research. We believe that they are, particularly in parallel computing. Furthermore, we contend that the future advances in performance tools research with the most direct potential effect in real application will be those that can best leverage and amplify existing robust performance technology. In this paper, we consider four research problems being investigated in the TAU parallel performance system [9,17] and describe the performance tools being developed to address them. These tools build on and leverage the capabilities in TAU (as well as the other technologies integrated in TAU) to provide robust, value-added solutions. While none of these solutions are necessarily ``new,'' in the sense of a new research finding, the technology being developed is novel and will directly provide new capabilities to TAU users. After a brief description of the TAU performance system, we look at the problem of instrumentation control to reduce measurement overhead. Our work here builds on TAU's rich instrumentation framework. The second problem of callpath profiling requires a solution that maps performance measurements to dynamically occurring callpaths. Here, TAU's performance mapping API is utilized. Providing online performance analysis and visualization for large-scale parallel applications is the third problem we consider. Finally, we describe our early work to develop a performance database framework that can support multi-experiment performance analysis.

$next$ $up$ $previous$
Next: TAU Performance System Up: paper-final Previous: paper-final

Sameer Suresh Shende 2003-02-21