References
Next: Overview of the
Up: No Title
Previous: Conclusion
References
- 1
-
M. Abrams, N. Doraswamy, and A. Mathur.
Visual analysis of parallel and distributed programs in time,
event and frequency domains.
IEEE Trans. on Parallel and Distributed Systems,
3(6):1992.
- 2
-
E. Adams, and S. S. Muchnick.
Dbxtool: A window based symbolic debugger for Sun workstations.
Software Practice and Experience, 16(7):653-669, July 1986.
- 3
-
G. Alverson, W. Griswold, D. Notkin and L. Snyder.
A flexible communication abstraction for nonshared memory parallel computing.
Proceedings of Supercomputing '90, 1990.
- 4
-
F. Baiardi, N. De Fransesco and G. Vaglini.
Development of a debugger for a concurrent language.
In IEEE Transactions on Software Engineering, SE-12(4):547-553,
Apr. 1986.
- 5
-
P. C. Bates.
Debugging Programs in a Distributed System Environment.
PhD thesis, University of Massachusetts, Amherst, MA 01003, 1986.
Also COINS Technical Report 86-05.
- 6
-
B. Bruegge and P. Hibbard.
Generalized path expressions: A high level debugging mechanism.
In Proceedings of the ACM SIGSOFT/SIGPLAN Software
Engineering Symposium in High-Level Debugging, pages 34-44, 1983.
- 7
-
R. Cooper and K. Marzullo.
Consistent detection of global predicates.
In Proceedings of the ACM/ONR Workshop on Parallel and
Distributed Debugging, pages 167-174, 1991.
- 8
-
A. A. Couch.
Categories and context in scalable execution visualization.
J. of Parallel and Distributed Computing, 18, 195-204, 1993.
- 9
-
J. E. Cuny, A. Hough, and J. Kundu.
Logical
time in visualizations produced by parallel programs.
Proceedings of Visualization '92, pages 186-193 (1992).
- 10
-
J. E. Cuny, G. Forman, A. Hough, J. Kundu, C. Lin, L. Snyder, and D. Stemple.
The Ariadne Debugger: Scalable Application of Event-Based Abstraction.
SIGPLAN Notices, 28(12), pages 85-95, 1994.
- 11
-
C. J. Fidge.
Partial orders for parallel debugging.
SIGPLAN Notices, 24(1), pages 183-194, 1989.
- 12
-
R. J. Fowler, T. J. Leblanc, and J. M. Mellor-Crummey.
An integrated approach to parallel program debugging and performance
analysis on large-scale multiprocessors.
SIGPLAN Notices, 24(1), pages 163-173, 1989.
- 13
-
Gait, J.
A debugger for concurrent programs.
Software Practice and Experience, Vol. 15, No. 6,
pages 539-554, 1985.
- 14
-
A. P. Goldberg, A. Gopal, A, Lowry, and R. Strom.
Restoring consistent global states of distributed computations.
In Proceedings of the ACM/ONR Workshop on Parallel and
Distributed Debugging, pages 144-154, 1991.
- 15
-
G. S. Goldszmidt, S. Katz, and S. Yemini.
High level language for debugging concurrent programs.
ACM Transactions on Computer Systems, 8(4), pages
311-336, Nov. 1990.
- 16
-
D. Haban, and W. Weigel.
Global events and global breakpoints in distributed systems.
21st Annual Hawaii International Conference on System Sciences,
pages 166-174, January 1988.
- 17
-
P. K. Harter, D. M. Heimbigner and R. King.
IDD: an interactive distributed debugger.
In Proceedings of the 5th International Conference
on Distributed Computing Systems, pages 498-506, 1985.
- 18
-
M. Heath and J. Etheridge.
Visualizing the performance of parallel programs.
IEEE Software, 8(5):29-39, 1991.
- 19
-
D. Hembold and D. Luckham.
Debugging Ada tasking programs.
IEEE Software, 2(2), pages 47-57, Mar. 1985.
- 20
-
A. A. Hough.
Debugging Parallel Programs Using Abstract Visualizations.
PhD thesis, University of Massachusetts, Amherst, MA 01003, 1991.
Also COINS Technical Report 91-53.
- 21
-
A. A. Hough and J. E. Cuny.
Perspective views: A technique for enchancing visualizations of
parallel programs.
In 1990 International Conference on Parallel Processing, pages
II 124-132, Aug. 1990.
- 22
-
W. Hseush and G. E. Kaiser.
Modeling concurrency in parallel debugging.
In Proceedings of the Second ACM SIGPLAN Symposium on
Principles and Practice of Parallel Programming, pages 11-20, March 1990.
- 23
-
Intel Supercomputer Systems Division.
Paragon Interactive Parallel Debugger Reference Manual.
Paragon Documentation, 1994.
- 24
-
Kendall Square Research.
UDB User's Guide.
KSR1 Documentation, Feb 1992.
- 25
-
J. Kundu and J. E. Cuny.
A scalable, visual interface for debugging with event-based
behavioral abstraction.
In Proceedings of New Frontiers on Massively Parallel
Processing, February, 1995, pages 472-479
- 26
-
L. Lamport.
Time, clocks, and the ordering of events in a distributed system.
Communications of the ACM, 21(7):558-565, 1978.
- 27
-
L. Lamport.
The mutual exclusion problem: Part I-A theory of
interprocess communication.
Journal of the Association for Computing Machinery, 33(2):313-326,
April 1986.
- 28
-
R. J. LeBlanc and A. D. Robbins.
Event-driven monitoring of distributed programs.
In Proceedings of the 5th International Conference on
Distributed Computing Systems, pages 515-522, 1985.
- 29
-
T. J. LeBlanc and J. M. Mellor-Crummey.
Debugging parallel programs with instant replay.
IEEE Transactions on Computers, C-36(4):471-482, Apr.
1987.
- 30
-
T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler.
Analyzing parallel program executions using multiple views.
Journal of Parallel and Distributed Computing, 9:203-217,
1990.
- 31
-
V. M. Lo, S. Rajopadhye, M. A. Mohamed, S. Gupta, B. Nitzberg, J. A. Telle, X. X. Zhong.
LaRCS: A language for describing parallel computations for the purpose of mapping.
Technical Report CIS-TR-90-16, University of Oregon Dept. of
Computer Science, 1990.
- 32
-
Y. Manabe, and M. Imase.
Global conditions in debugging distributed programs.
Journal of Parallel and Distributed Computing,
Vol. 15, pages 62-69, January 1992.
- 33
-
K. Mani Chandy, and L. Lamport.
Distributed snapshots: determining global states of distributed
systems.
ACM Transactions on Computer Systems, Vol 3, No. 1, pages 63-74,
January 1985.
- 34
-
F. Mattern.
Virtual time and global states of distributed systems.
M. Cosnard et al, eds., ``Parallel and Distributed Algorithms'',
Elsvier Science Publishers B. V. (North Holland), pages 215-226, 1989.
- 35
-
B. Mohr, D. Brown, A. Malony.
TAU: A Portable Parallel Program Analysis Environment for pC++,
Proc. of CONPAR 94 - VAPP VI, Linz, Austria, Springer Verlag, LNCS 854, pages 29-40, Sept. 1994.
- 36
-
B. Miller, and J. -D. Choi.
Breakpoints and halting in distributed programs.
Proceedings of 8th International Conference on Distributed
Computing Systems, pages 316-323, 1988.
- 37
-
B. Miller and J.-D. Choi.
A mechanism for efficient debugging of parallel programs.
SIGPLAN Notices, 24(1), pages 141-150, 1989.
- 38
-
Sarukkai, S., and Gannon, D.
SIEVE: A performance debugging environment for parallel programs.
J. of Parallel and Distributed Computing, Vol. 18,
No. 2, pages 147-168, June 1993.
- 39
-
Scheuermann, P., Yaagoub, A., and Aris-Ouksel, M.
Compression of binary images on a hypercube machine.
Journal of Parallel and Distributed Computing, Vol 23,
pages 49-59, 1994
- 40
-
Sequent Computer Systems, Inc.
DYNIX Pdbx Debugger User's Manual.
1986.
- 41
-
Snodgrass, R.
A relational approach to monitoring complex systems.
ACM Transactions on Computer Systems, Vol. 6, No. 2,
pages 157-196, May 1988.
- 42
-
L. Snyder.
The XYZ abstraction levels of Poker-like languages.
Languages and Compilers for Parallel Computing,
David Gelernter and Alexandru Nicolau and David Padua(eds.), MIT Press, pages 470-489, 1990.
- 43
-
R. M. Stallman.
GDB Manual (The GNU Source-Level Debugger).
Free Software Foundation.
- 44
-
Zave, P.
A distributed alternative to finite state machine specifications.
ACM Transactions on Programming Languages and
Systems, Vol. 7, No. 1, pages 10-36, Jan 1985.
reproducibility
Parallel programs are often non-reproducible; and even if reproducibility
can be guaranteed by using replay techniques the huge state space
parallel computations present overwhelms the users of standard
state-examination technique.
Parallel programs need to be debugged at two distinct levels -
at the level of the inter-process interactions, and at the level of
local interactions. Although local interactions within a process
are amenable to breakpoint-oriented controlled execution style
debugging, the inter-process interactions of the parallel programs
impedes universal use of the technique due to the following reasons.
- Non-determinism: Parallel programs can be non-deterministic
due to existence of race conditions. Race conditions arise in situations
where multiple threads access shared resource without synchronization.
Several executions of the same program can thus result in different
program behaviors. The absence of such uncertainty guarantees reproducibility
of sequential programs (except for reactive programs that use real-time
inputs), and makes them amenable to debugging with
the controlled execution scheme. Under this strategy, the
same program is reexecuted several times in an effort to examine
its earlier states.
- Non-transparency: Parallel program characteristics
may change drastically when executed under the control of a
debugger. This is an artifact of the non-deterministic nature
of parallel programs. Races may get resolved in a different order
when run under a debugger, and produce a different behavior. This
is also known as probe effect [13].
- Large State Space:
Parallel programs produce a huge state space, which
makes them almost impossible to handle using standard
breakpoint-oriented controlled execution-based debugging. Breakpoint-oriented
debugging provides no facility to focus attention to the offending
portion of the source code. The user has to rely on ``sorcery''
property to somehow know which states are potentially problematic, and
set breakpoints to examine those states. Note the problem with the
lack of filtering facility in breakpoint-oriented debugging can
be apparent in large sequential programs as well.
- State Consistency:
It is difficult to stop a parallel program in a meaningful
consistent [14][34] state. Although there are algorithms
available that ensures stopping the program in a consistent state
[32][14][16][34][36][33], they all rely on the
user possessing the sorcery property of knowing the internal
manifestation of the error in order to specify it in terms of
local or global predicates. These predicates are extremely hard to
identify independently, since the users never observe the errors in
terms of the predicates, but in terms of wrong outputs.
Replay-based techniques, where
the program is instrumented
to capture the order in which different processes access shared resources
in a log,
[29][28] can be used to
circumvent the first two problems.
The access log is consulted during each subsequent execution
of the program to ensure reproducibility. This technique can be used
to ensure transparency of the debugger as well .
Replay-based techniques, however, do little to avoid the last two
problems related to filtering and specification of consistent states
of a computation.
Event-based techniques,
in which user-specified models of intended program behavior are compared to
actual program behavior as captured in execution traces
[22][20][5][4], offers an elegant solution to the
problems of state consistency and the handling of a huge state space.
Replay mechanisms can be used
to avoid the problems of reproducibility and non-transparency.
More importantly the use of hierarchical abstraction facility
makes it possible to contend with large quantities of data; and the
use of logical time transformations filters out the perturbations
due to asynchrony and provides a pre-cursor to a facility that can
be used to specify consistent global states. These three properties
make event-based behavioral abstraction an attractive candidate for the
initial phases of debugging. Currently available event-based tools,
however, have their own limitations:
The problem of limited feedback vocabulary of event-based debugging
tools arises due to the inherent tension between the two basic
services that they need to provide - a modeling language
that allows the user to describe the program behavior
in sufficient detail, and a feedback mechanism that can inform the
user whether the actual behavior matched the intended model specified by
the user. The two services are inter-dependent
. Existing event-based debuggers have erred on the
side of the expressivity of the language, often limiting their feedback
vocabulary to a binary set of ``match/mismatch''. With precise models,
the user has high confidence about the correctness of the behavior if the
feedback received is ``match''. However when the observed behavior does
not match the model, a ``mismatch'' feedback hardly provides any hint
to the user as to what actually went wrong with the execution.
The output of existing event-based debuggers do not scale well for
large and complex programs. The problem is their insistence on precise
visualization that depicts individual process behavior in all its detail.
Most importantly, existing event-based debuggers do not provide
a facility to track an error manifested at the level of user-defined
abstract events down to the source level constructs. Tracking an error
back to the source line is inherently difficult, since abstraction by its
definition tries to capture the behavior of a program by grouping
source level constructs together in a single event. An abstract event
can thus span several source lines, even several different function
invocations. State-based techniques, on the other hand, allow the
user to directly examine an execution to an arbitrary level of detail and
often make it easier to relate errors to source code constructs.
No such integration has been contemplated before.
In this paper, we show a seamless integration between the event- and
state-based debugging strategy, which is based on the
the observation, that user-defined abstract events can be
used to set consistent global breakpoints.
This novel breakpointing scheme is more powerful than the standard
way of setting breakpoints through conditional state expressions,
and can be used as a vehicle of our proposed integration. In our
integrated approach users employ event-based
techniques [22][20][][5][4] at the highest level where gross
patterns of process interactions are investigated. As debugging proceeds and
the focus of attention narrows, the behavior of progressively smaller parts
of the program could be analyzed in progressively finer detail, using a
combination of event- and state-based techniques. Finally, at the lowest
level, when the error has been isolated to specific sections of sequential
code, traditional state-based techniques [40][24][23]
could be used, after a global breakpoint is set using our abstract event-oriented
breakpoint specification.
Next: Overview of the
Up: No Title
Previous: Conclusion
Joydip Kundu