References

Next: Overview of the Up: No Title Previous: Conclusion

References

1: M. Abrams, N. Doraswamy, and A. Mathur. Visual analysis of parallel and distributed programs in time, event and frequency domains. IEEE Trans. on Parallel and Distributed Systems, 3(6):1992.
2: E. Adams, and S. S. Muchnick. Dbxtool: A window based symbolic debugger for Sun workstations. Software Practice and Experience, 16(7):653-669, July 1986.
3: G. Alverson, W. Griswold, D. Notkin and L. Snyder. A flexible communication abstraction for nonshared memory parallel computing. Proceedings of Supercomputing '90, 1990.
4: F. Baiardi, N. De Fransesco and G. Vaglini. Development of a debugger for a concurrent language. In IEEE Transactions on Software Engineering, SE-12(4):547-553, Apr. 1986.
5: P. C. Bates. Debugging Programs in a Distributed System Environment. PhD thesis, University of Massachusetts, Amherst, MA 01003, 1986. Also COINS Technical Report 86-05.
6: B. Bruegge and P. Hibbard. Generalized path expressions: A high level debugging mechanism. In Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium in High-Level Debugging, pages 34-44, 1983.
7: R. Cooper and K. Marzullo. Consistent detection of global predicates. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, pages 167-174, 1991.
8: A. A. Couch. Categories and context in scalable execution visualization. J. of Parallel and Distributed Computing, 18, 195-204, 1993.
9: J. E. Cuny, A. Hough, and J. Kundu. Logical time in visualizations produced by parallel programs. Proceedings of Visualization '92, pages 186-193 (1992).
10: J. E. Cuny, G. Forman, A. Hough, J. Kundu, C. Lin, L. Snyder, and D. Stemple. The Ariadne Debugger: Scalable Application of Event-Based Abstraction. SIGPLAN Notices, 28(12), pages 85-95, 1994.
11: C. J. Fidge. Partial orders for parallel debugging. SIGPLAN Notices, 24(1), pages 183-194, 1989.
12: R. J. Fowler, T. J. Leblanc, and J. M. Mellor-Crummey. An integrated approach to parallel program debugging and performance analysis on large-scale multiprocessors. SIGPLAN Notices, 24(1), pages 163-173, 1989.
13: Gait, J. A debugger for concurrent programs. Software Practice and Experience, Vol. 15, No. 6, pages 539-554, 1985.
14: A. P. Goldberg, A. Gopal, A, Lowry, and R. Strom. Restoring consistent global states of distributed computations. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, pages 144-154, 1991.
15: G. S. Goldszmidt, S. Katz, and S. Yemini. High level language for debugging concurrent programs. ACM Transactions on Computer Systems, 8(4), pages 311-336, Nov. 1990.
16: D. Haban, and W. Weigel. Global events and global breakpoints in distributed systems. 21st Annual Hawaii International Conference on System Sciences, pages 166-174, January 1988.
17: P. K. Harter, D. M. Heimbigner and R. King. IDD: an interactive distributed debugger. In Proceedings of the 5th International Conference on Distributed Computing Systems, pages 498-506, 1985.
18: M. Heath and J. Etheridge. Visualizing the performance of parallel programs. IEEE Software, 8(5):29-39, 1991.
19: D. Hembold and D. Luckham. Debugging Ada tasking programs. IEEE Software, 2(2), pages 47-57, Mar. 1985.
20: A. A. Hough. Debugging Parallel Programs Using Abstract Visualizations. PhD thesis, University of Massachusetts, Amherst, MA 01003, 1991. Also COINS Technical Report 91-53.
21: A. A. Hough and J. E. Cuny. Perspective views: A technique for enchancing visualizations of parallel programs. In 1990 International Conference on Parallel Processing, pages II 124-132, Aug. 1990.
22: W. Hseush and G. E. Kaiser. Modeling concurrency in parallel debugging. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 11-20, March 1990.
23: Intel Supercomputer Systems Division. Paragon Interactive Parallel Debugger Reference Manual. Paragon Documentation, 1994.
24: Kendall Square Research. UDB User's Guide. KSR1 Documentation, Feb 1992.
25: J. Kundu and J. E. Cuny. A scalable, visual interface for debugging with event-based behavioral abstraction. In Proceedings of New Frontiers on Massively Parallel Processing, February, 1995, pages 472-479
26: L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, 1978.
27: L. Lamport. The mutual exclusion problem: Part I-A theory of interprocess communication. Journal of the Association for Computing Machinery, 33(2):313-326, April 1986.
28: R. J. LeBlanc and A. D. Robbins. Event-driven monitoring of distributed programs. In Proceedings of the 5th International Conference on Distributed Computing Systems, pages 515-522, 1985.
29: T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Transactions on Computers, C-36(4):471-482, Apr. 1987.
30: T. J. LeBlanc, J. M. Mellor-Crummey, and R. J. Fowler. Analyzing parallel program executions using multiple views. Journal of Parallel and Distributed Computing, 9:203-217, 1990.
31: V. M. Lo, S. Rajopadhye, M. A. Mohamed, S. Gupta, B. Nitzberg, J. A. Telle, X. X. Zhong. LaRCS: A language for describing parallel computations for the purpose of mapping. Technical Report CIS-TR-90-16, University of Oregon Dept. of Computer Science, 1990.
32: Y. Manabe, and M. Imase. Global conditions in debugging distributed programs. Journal of Parallel and Distributed Computing, Vol. 15, pages 62-69, January 1992.
33: K. Mani Chandy, and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems, Vol 3, No. 1, pages 63-74, January 1985.
34: F. Mattern. Virtual time and global states of distributed systems. M. Cosnard et al, eds., ``Parallel and Distributed Algorithms'', Elsvier Science Publishers B. V. (North Holland), pages 215-226, 1989.
35: B. Mohr, D. Brown, A. Malony. TAU: A Portable Parallel Program Analysis Environment for pC++, Proc. of CONPAR 94 - VAPP VI, Linz, Austria, Springer Verlag, LNCS 854, pages 29-40, Sept. 1994.
36: B. Miller, and J. -D. Choi. Breakpoints and halting in distributed programs. Proceedings of 8th International Conference on Distributed Computing Systems, pages 316-323, 1988.
37: B. Miller and J.-D. Choi. A mechanism for efficient debugging of parallel programs. SIGPLAN Notices, 24(1), pages 141-150, 1989.
38: Sarukkai, S., and Gannon, D. SIEVE: A performance debugging environment for parallel programs. J. of Parallel and Distributed Computing, Vol. 18, No. 2, pages 147-168, June 1993.
39: Scheuermann, P., Yaagoub, A., and Aris-Ouksel, M. Compression of binary images on a hypercube machine. Journal of Parallel and Distributed Computing, Vol 23, pages 49-59, 1994
40: Sequent Computer Systems, Inc. DYNIX Pdbx Debugger User's Manual. 1986.
41: Snodgrass, R. A relational approach to monitoring complex systems. ACM Transactions on Computer Systems, Vol. 6, No. 2, pages 157-196, May 1988.
42: L. Snyder. The XYZ abstraction levels of Poker-like languages. Languages and Compilers for Parallel Computing, David Gelernter and Alexandru Nicolau and David Padua(eds.), MIT Press, pages 470-489, 1990.
43: R. M. Stallman. GDB Manual (The GNU Source-Level Debugger). Free Software Foundation.
44: Zave, P. A distributed alternative to finite state machine specifications. ACM Transactions on Programming Languages and Systems, Vol. 7, No. 1, pages 10-36, Jan 1985.

reproducibility Parallel programs are often non-reproducible; and even if reproducibility can be guaranteed by using replay techniques the huge state space parallel computations present overwhelms the users of standard state-examination technique. Parallel programs need to be debugged at two distinct levels - at the level of the inter-process interactions, and at the level of local interactions. Although local interactions within a process are amenable to breakpoint-oriented controlled execution style debugging, the inter-process interactions of the parallel programs impedes universal use of the technique due to the following reasons.

Non-determinism: Parallel programs can be non-deterministic due to existence of race conditions. Race conditions arise in situations where multiple threads access shared resource without synchronization. Several executions of the same program can thus result in different program behaviors. The absence of such uncertainty guarantees reproducibility of sequential programs (except for reactive programs that use real-time inputs), and makes them amenable to debugging with the controlled execution scheme. Under this strategy, the same program is reexecuted several times in an effort to examine its earlier states.
Non-transparency: Parallel program characteristics may change drastically when executed under the control of a debugger. This is an artifact of the non-deterministic nature of parallel programs. Races may get resolved in a different order when run under a debugger, and produce a different behavior. This is also known as probe effect [13].
Large State Space: Parallel programs produce a huge state space, which makes them almost impossible to handle using standard breakpoint-oriented controlled execution-based debugging. Breakpoint-oriented debugging provides no facility to focus attention to the offending portion of the source code. The user has to rely on ``sorcery'' property to somehow know which states are potentially problematic, and set breakpoints to examine those states. Note the problem with the lack of filtering facility in breakpoint-oriented debugging can be apparent in large sequential programs as well.
State Consistency: It is difficult to stop a parallel program in a meaningful consistent [14][34] state. Although there are algorithms available that ensures stopping the program in a consistent state [32][14][16][34][36][33], they all rely on the user possessing the sorcery property of knowing the internal manifestation of the error in order to specify it in terms of local or global predicates. These predicates are extremely hard to identify independently, since the users never observe the errors in terms of the predicates, but in terms of wrong outputs.

Replay-based techniques, where the program is instrumented to capture the order in which different processes access shared resources in a log, [29][28] can be used to circumvent the first two problems. The access log is consulted during each subsequent execution of the program to ensure reproducibility. This technique can be used to ensure transparency of the debugger as well . Replay-based techniques, however, do little to avoid the last two problems related to filtering and specification of consistent states of a computation.

Event-based techniques, in which user-specified models of intended program behavior are compared to actual program behavior as captured in execution traces [22][20][5][4], offers an elegant solution to the problems of state consistency and the handling of a huge state space. Replay mechanisms can be used to avoid the problems of reproducibility and non-transparency. More importantly the use of hierarchical abstraction facility makes it possible to contend with large quantities of data; and the use of logical time transformations filters out the perturbations due to asynchrony and provides a pre-cursor to a facility that can be used to specify consistent global states. These three properties make event-based behavioral abstraction an attractive candidate for the initial phases of debugging. Currently available event-based tools, however, have their own limitations:

The problem of limited feedback vocabulary of event-based debugging tools arises due to the inherent tension between the two basic services that they need to provide - a modeling language that allows the user to describe the program behavior in sufficient detail, and a feedback mechanism that can inform the user whether the actual behavior matched the intended model specified by the user. The two services are inter-dependent . Existing event-based debuggers have erred on the side of the expressivity of the language, often limiting their feedback vocabulary to a binary set of ``match/mismatch''. With precise models, the user has high confidence about the correctness of the behavior if the feedback received is ``match''. However when the observed behavior does not match the model, a ``mismatch'' feedback hardly provides any hint to the user as to what actually went wrong with the execution.

The output of existing event-based debuggers do not scale well for large and complex programs. The problem is their insistence on precise visualization that depicts individual process behavior in all its detail.

Most importantly, existing event-based debuggers do not provide a facility to track an error manifested at the level of user-defined abstract events down to the source level constructs. Tracking an error back to the source line is inherently difficult, since abstraction by its definition tries to capture the behavior of a program by grouping source level constructs together in a single event. An abstract event can thus span several source lines, even several different function invocations. State-based techniques, on the other hand, allow the user to directly examine an execution to an arbitrary level of detail and often make it easier to relate errors to source code constructs.

No such integration has been contemplated before. In this paper, we show a seamless integration between the event- and state-based debugging strategy, which is based on the the observation, that user-defined abstract events can be used to set consistent global breakpoints. This novel breakpointing scheme is more powerful than the standard way of setting breakpoints through conditional state expressions, and can be used as a vehicle of our proposed integration. In our integrated approach users employ event-based techniques [22][20][][5][4] at the highest level where gross patterns of process interactions are investigated. As debugging proceeds and the focus of attention narrows, the behavior of progressively smaller parts of the program could be analyzed in progressively finer detail, using a combination of event- and state-based techniques. Finally, at the lowest level, when the error has been isolated to specific sections of sequential code, traditional state-based techniques [40][24][23] could be used, after a global breakpoint is set using our abstract event-oriented breakpoint specification.

Next: Overview of the Up: No Title Previous: Conclusion

Joydip Kundu