Extending Dynamic Invariant Detection with Explicit Abstraction
Daniel Brian Keith
Committee: Michal Young (chair), Zena Ariola, Christopher Wilson, Edward Vogel
Dissertation Defense(May 2024)
Keywords:

Dynamic invariant detection is a software analysis technique that uses traces of function entry and exit from executing programs and infers partial specifications that characterize the observed behavior. The specifications are reported as logical precondition and postcondition expressions (invariants) that relate arguments, instance variables, and results. Detectors typically generate large collections of invariants, among which most are true but few are interesting or useful. Refining this flood of invariants into a useful subset often requires manual tuning through configuration options and modification of the program under analysis.

Our research asks whether we can improve dynamic invariant detection by enabling explicit abstractions to be declared and applied to a program under analysis and whether this is practical; this dissertation shows that it is indeed practical and useful. Given a concrete program we can synthesize a model program composed of functions and modules that are abstractions of selected concrete modules. When we execute the model program in parallel with its underlying concrete program and apply dynamic invariant detection, we obtain abstracted invariants that can reveal the behavior of the underlying concrete program.

We developed the Alembic system to support and experiment with the above technique, enabling a practical method for steering the invariant detection process and shaping the analysis to produce more refined results than obtainable via traditional means. Alembic provides a simple language for defining abstractions and managing detection experiments; the system generates the necessary instrumentation, representation classes, and functions, freeing the analyst to focus on the expression of abstractions and detection experiments.

Alembic currently leverages the invariant detection capability of Daikon, a powerful first-generation detector, to analyze synthetic traces on abstractions. However, the principles we demonstrate apply to any detector and language that observes function entry and exit. We present some applications of this technique to example problems and then evaluate Alembic on production code such as the Guava class library. Our research suggests new uses for existing detectors and enables the design and evaluation of features to inform the next generation of dynamic invariant detection systems.

This dissertation includes previously unpublished co-authored material.