Next: TAU Performance System
Up: paper-final
Previous: paper-final
There has long been a tension in the field of performance tools
research between the need to invent new techniques to deal with
performance complexity of next-generation parallel and distributive
systems, and the need to develop tools that are robust, both in their
function and in their scope of application. Perhaps this is just the
nature of the field. Yet there are important issues concerning the
advancement of performance tools ``research'' and the successful
demonstration and use of performance ``technology.'' A cynical
perspective might argue against trying something new without first
getting the existing technology to just work, and work reliably in
real applications and environments. The all too commonly heard
mantras ``performance tools don't work'' and ``performance tools are
too hard to use'' might lead one to believe in this perspective, but
research history does not necessarily justify such a strong cynical
stance. There has been significant innovation in performance
observation and analysis techniques in the last twenty year to address
the new performance challenges parallel computing environments present
[10]. Current attention is certainly being paid to easing
the burden of tool use through automated analysis [1]. There
have also been important technology developments that add considerable
value to the performance tool repertoire, such as APIs for dynamic
instrumentation [4] and hardware performance counters
[3]. Why, then, is there an apparent disconnect between
research results and the ``reality'' of tool usage in parallel
application environments?
From our perspective as performance tool researchers, we take,
perhaps, a controversial stance among our peers and argue that tool
engineering is an important factor in this regard. The controversial
part primarily concerns the notion of ``research'' and the rewards (or
lack thereof) in a research career for tool development. Our counter
position is that innovation in performance tools research is best
advanced by ``standing on the shoulders'' of solid technology
foundations. When that foundation does not exist, it must be
developed. When a technology does exist, it should be integrated, if
possible, and not reinvented. Indeed, many tools do not work reliably
and, as a consequence, are hard to use. Many tools are not portable
across parallel systems or reusable with different programming
paradigms, and, as a consequence, have limited application. These
results cannot be considered as positive results for the performance
tool research community, that is, if reliability, portability, and
robustness, in general, is considered worthy of research. We believe
that they are, particularly in parallel computing. Furthermore, we
contend that the future advances in performance tools research with
the most direct potential effect in real application will be those
that can best leverage and amplify existing robust performance
technology.
In this paper, we consider four research problems being investigated
in the TAU parallel performance system [9,17] and describe the
performance tools being developed to address them. These tools
build on and leverage the capabilities in TAU (as well as the other
technologies integrated in TAU) to provide robust, value-added
solutions. While none of these solutions are necessarily ``new,'' in
the sense of a new research finding, the technology being developed is
novel and will directly provide new capabilities to TAU users. After a brief
description of the TAU performance system, we look at the problem of
instrumentation control to reduce measurement overhead. Our work here
builds on TAU's rich instrumentation framework. The second problem of
callpath profiling requires a solution that maps performance
measurements to dynamically occurring callpaths. Here, TAU's
performance mapping API is utilized. Providing online performance
analysis and visualization for large-scale parallel applications is
the third problem we consider. Finally, we describe our early work to
develop a performance database framework that can support
multi-experiment performance analysis.
Next: TAU Performance System
Up: paper-final
Previous: paper-final
Sameer Suresh Shende
2003-02-21