$next$ $up$ $previous$
Next: ZEE Up: Debugging Previous: Conditional Breakpoints

Optimized Code

Even in the case of sequential programs, debugging optimized code is very difficult. In unoptimized code, the association of source code location to executable code location is straightforward. For every line of source code, there is a corresponding block of executable code. $gif$ More importantly, for every source code statement S and corresponding executable block B, none of B is executed until every instruction in a block corresponding to any source statement previous to S has been executed. In optimized code, this simple association no longer exists. Blocks of executable code may be re-ordered, merged, or completely eliminated. If these blocks contain memory stores, the corresponding variables will have unexpected values at many points in the computation.

These characteristics of optimized code raise the two most basic, and most interesting, difficulties encountered when designing a debugger to interact with optimized code: association of code locations, and data reporting. Other problems are discussed in depth in [Cop93].

In the first published paper on the subject of debugging optimized code [Hen82], Hennessy presents an algorithm for detecting non-current variables (those which have values different from what would be expected upon inspection of the source code) in optimized code, and for calculating, where possible, the ``correct'' values for those variables.

Non-current variables are partitioned into two classes:

Roll-Back Variables (RBV): -- Those variables whose expected value has been overwritten.
Roll-Forward Variables (RFV): -- Those variables whose expected value has not been computed yet.

The code is represented by an augmented computation dag, which preserves the original ordering of the code in an abstract syntax tree, as well as identifying where optimizations have changed the values in the corresponding variables. At each node representing a store, the original post-order number u (the actual node pointer) is recorded. After optimization, the nodes are re-numbered. Call the new number o. Then RBV are those where o < l < u, while RFV are those where u < l < o.

Zellweger [Zel83] did a more extensive study of the subject, and established some useful definitions:

Expected Behavior: -- A debugger provides expected behavior if it can completely hide the effects of optimization from the user.
Truthful Behavior: -- Truthful behavior informs the user of the effects of optimization.
Semantic Breakpoint: -- The semantic breakpoint of a statement occurs at the point where some ``important'' section of the executable code block generated from that statement is executed. This may be the first instruction of the block, the store of a variable, or some other point, depending on the statement in question.

Zellweger's algorithms depend on the application of only a limited number of optimizations. Copperman presents more general definitions and algorithms to handle more complex optimizations in [Cop93]. To date, however, there are no complete solutions to either the code association (breakpoint location) problem or the data reporting problem.

Brooks, Hansen, and Simmons[Bro92] present the view that it is unnecessary, undesirable, and impossible to support transparent debugging of optimized code, and that a debugger should provide truthful behavior at all times. A compiler-debugger interface (CDI) is described which enables a debugger to provide information (via highlighting and animation) to the user about the optimizations that have been performed. In the case of ZPL, we believe this approach is not useful, because many of the optimizations performed cannot be expressed in terms of the source code.

$next$ $up$ $previous$
Next: ZEE Up: Debugging Previous: Conditional Breakpoints

Steve McLaughry
Fri May 30 15:48:07 PDT 1997