You are given a program. Will it crash? Is it subject to a spoofing, buffer overflow, or injection attack? Is this part of it dead code? Can I replace that code fragment with a more efficient one? Which exceptions can arise from calling this function in that context? Does this variable always contain the same value? All of these questions require program analysis: We want to know whether program code (the input to the analysis) has a specified property.
Program analysis is inherently difficult. Take the following ingredients:
The requirements look reasonable: Given an input program we want to check whether it has some desired "runtime" (behavioral) property. But they constitute an explosive mix: Any such problem is undecidableno program can solve it.1 Consequently, we need to give up at least one of these requirements to escape undecidability.
The following paper provides a powerful sound static analysis framework for C programs and nontrivial behavioral properties. The authors show that a precise compositional program analysis that is context-sensitive and path-sensitive is possible for huge code bases of central importance to our day-to-day computing infrastructure. In particular, they show that null derefence bugs, which require interprocedural data flow analysis, can be found without drowning them in false positives.
Their starting point is an abstract model of a program where infinite domains are finitely approximated and certain steps are replaced by nondeterministic steps resulting in unknown results; for example, the integers may be replaced by a single abstract value (representing all integers) and the result of an arithmetic comparison may be approximated by a nondeterministic choice of a Boolean value for the result. We can think of these steps as internal choices: the abstract model just performs the internal choices without external control. This abstraction is familiar from software model checking.
The authors contend it is important for practical precision to model the results of internal choices as unknowns: once a choice is performed the result is unknown, but every time we need the value it is the same. Their key insight is we are typically only interested in whether an abstracted function has a property for all possible sequences of internal choices in its body or for some particular sequence, or indeed both. This is captured by logical constraints for the input-output relation of a function where all the internal choices are universally, respectively existentially bound.
The paper outlines conceptually independent and generally applicable steps assembled into a solution pipeline: quantifier elimination, symbolic fixed-point computation to compute closed forms for recursive constraints, and finally constraint solving, bringing together and leveraging techniques ranging from the foundation of (Boolean) logic to state-of-the-art SAT-solving. Their execution requires care, as the authors explain aided by their selection of an illustrative fragment of C and a running example developed from source code to analysis result.
The challenge of bug hunting in large C code bases has seen the rise and application of techniques that are neither sound nor complete nor even stable: their result may depend on the machine load, other programs analyzed before or simultaneously, and other factors completely extrinsic to the program at hand. A bug is a bug, whichever way you catch it, the argument goes. This paper illustrates that the days of feasible, well-specified, sound, and sufficiently precise static program analysis for bug hunting even in huge code bases are not numbered.
1. Gordon Rice, H. Classes of recursively enumerable sets and their decision problems. Transaction of the American Mathematical Society 74 (1953), 358366.
a. A programming language is Turing-complete if it is possible to write an interpreter for Turing Machines in it.
DOI: http://doi.acm.org/10.1145/1787234.1787258
©2010 ACM 0001-0782/10/0800 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
No entries found