Validating Models and Assumptions

Quality assurance activities in the development environment, including systematic dynamic testing, always depend on models. Static analysis depends on the fidelity of models extracted for analysis, statistical testing for reliability estimation depends on models of program usage. Partition testing depends on the models used to divide program behaviors into classes that should be "covered." Discrepancies between these models and actual program behavior are valuable information, even when they don't result in observed program failures, because they can indicate how quality assurance activities in the development environment can be improved.

The Residue of Coverage Criteria

Since exhaustive testing is impossible, we must have some way of judging when "enough" testing has been done. Any such test adequacy criterion is a heuristic, but it can be valuable nevertheless in a negative sense; test adequacy criteria indicate, not when testing is definitely adequate, but when there is strong evidence that a set of tests is inadequate because some significant class of program behaviors has never been tested.

The family of structural coverage criteria (statement coverage, branch coverage, dataflow coverage, etc.) are based on syntactic models of program control and data flow. These syntactic models are conservative in the sense that they include not only all control and data flows that will occur in any execution, but also many infeasible paths that can never occur; it is (provably) impossible to determine exactly which paths are infeasible. Thus even exhaustive testing would fail to satisfy structural coverage criteria. When a software product is released without 100% coverage, testers are explicitly or implicitly assuming that the remaining test obligations - the residue - is either infeasible, or occurs in a vanishingly small set of possible executions.

In critical software systems such as avionics, these assumptions may be explicit. For example, developers or testers may be asked to explain, for each block of code which has not been executed under test, why such execution is infeasible (e.g., because it is a handler for an error that should never occur). In applications with less stringent reliability requirements, the assumptions may be implicit. For example, it is not uncommon to set a target of less than 100% coverage. A target of, for example, 90% satisfaction of a test coverage criterion is implicitly an assumption that the remaining 10% of test obligations are either infeasible or so rarely executed that they have negligible impact on quality.

We cannot completely avoid models and assumptions. What we can do is validate them. If we have implicitly or explicitly assumed that a particular path or region in code is never, or almost never executed, then knowing that an execution of that path or region has occured in the deployed user is valuable information, even if the software performed correctly in that case. This is what residual test coverage monitoring provides.


Contact: Michal Young