This article is part of a running series, which started here.
In the previous article, I invited you to run the verification on the AutoProof tutorial page dedicated to the example. AutoProof is an automated proof system for programs. This is just a matter of clicking "Verify", but more importantly, you should read the annotations added to the program text, particularly the loop invariant, which make the verification possible. (To avoid any confusion let me emphasize once more that clicking "Verify" does not run the program, and that no test cases are used; the effect is to run the verifier, which attempts to prove the correctness of the program by working solely on the program text.)
Here is the program text again, reverting for brevity to the shorter identifiers (the version on the AutoProof page has more expressive ones):
from
i := 1 ; j := n + 1
until i ≥ j or Result > 0 loop
m := i + (j - i) // 2
if t [m] < x then
i := m + 1
elseif t [m] > x then
j := m
else
Result := m
end
end
Let us now see what makes the proof possible. The key property is the loop invariant, which reads
A: 1 ≤ i ≤ j ≤ n + 1
B: 0 ≤ Result ≤ n
C: ∀ k: 1 .. i –1 | t [k] < x
D: ∀ k: j .. n | t [k] > x
E: (Result > 0) ⇒ (t [Result] = x)
The notation is slightly different on the Web page to adapt to the Eiffel language as it existed at the time it was produced; in today's Eiffel you can write the invariant almost as shown above. Long live Unicode, allowing us to use symbols such as ∀ (obtained not by typing them but by using smart completion, e.g. you start typing "forall" and you can select the ∀ symbol that pops up), ⇒ for "implies" and many others
Remember that the invariant has to be established by the loop's initialization and preserved by every iteration. The role of each of its clauses is as follows:
Why is this invariant useful? The answer is that on exit it gives us what we want from the algorithm. The exit condition, recalled above, is
i ≥ j or Result > 0
Combined with the invariant, it tells us that on exit one of the following will hold:
What AutoProof proves, mechanically, is that under the function's precondition (that the array is sorted):
Such a proof guarantees the correctness of the program if it terminates. We (and AutoProof) must prove separately that it does terminate. The technique is simple: find a "loop variant", an integer quantity v which remains non-negative throughout the loop (in other words, the loop invariant includes or implies v ≥ 0) and decreases on each iteration, so that the loop cannot continue executing forever. An obvious variant here is j - i + 1 (where the + 1 is needed because j - i may go down to -1 on the last iteration if x does not appear in the array). It reflects the informal idea of the algorithm: repeatedly decrease an interval i .. j - 1 (initially, 1 .. n) guaranteed to be such that x appears in t if and only if it appears at an index in that interval. At the end, either we already found x or the interval is empty, implying that x does not appear at all.
A great reference on variants and the techniques for proving program termination is a Communications of the ACM article of 2011: [3].
The variant gives an upper bound on the number of iterations that remain at any time. In sequential search, j - i + 1 would be our best bet; but for binary search it is easy to show that log2 (j - i + 1) is also a variant, extending the proof of correctness with a proof of performance (the key goal of binary search being to ensure a logarithmic rather than linear execution time).
This example is, I hope, enough to highlight the crucial role of loop invariants and loop variants in reasoning about loops. How did we get the invariant? It looks like I pulled it out of a hat. But in fact if we go the other way round (as advocated in classic books [1] [2]) and develop the invariant and the loop together the process unfolds itself naturally and there is nothing mysterious about the invariant.
Here I cannot resist quoting (thirty years on!) from my own book Introduction to the Theory of Programming Languages [4]. It has a chapter on axiomatic semantics (also known as Hoare logic, the basis for the ideas used in this discussion), which I just made available: see here [5]. Its exercise 9.12 is the starting point for this series of articles. Here is how the book explains how to design the program and the invariant [6]:
In the general case [of search, binary or not] we aim for a loop body of the form
m := ‘‘Some value in 1.. n such that i ≤ m < j’’;
if t [m] ≤ x then
i := m + 1
else
j := m
end
It is essential to get all the details right (and easy to get some wrong):
For binary search, m will be roughly equal to the average of i and j.
"Roughly" because we need an integer, hence the // (integer division).
In the last article of the series, I will reflect further on the lessons we can draw from this example, and the practical significance of the key concept of invariant.
Post-publication note: the announced "last article" appears here.
[1] E.W. Dijkstra: A Discipline of Programming, Prentice Hall, 1976.
[2] David Gries: The Science of Programming, Springer, 1989.
[3] Byron Cook, Andreas Podelski and Andrey Rybalchenko: Proving program termination, in Communications of the ACM, vol. 54, no. 11, May 2011, pages 88-98, available here.
[4] Bertrand Meyer, Introduction to the Theory of Programming Languages, Prentice Hall, 1990. The book is out of print but can be found used, e.g. on Amazon. See the next entry for an electronic version of two chapters.
[5] Bertrand Meyer Axiomatic semantics, chapter 9 from [3], available here. Note that the PDF was reconstructed from an old text-processing system (troff); the figures could not be recreated and are missing. (One of these days I might have the patience of scanning them from a book copy and adding them. Unless someone wants to help.) I also put online, with the same caveat, chapter 2 on notations and mathematical basis: see here.
[6] Page 383 of [4] and [5]. The text is verbatim except a slight adaptation of the programming notation and a replacement of the variables: i in the book corresponds to i - 1 here, and j to j - 1. As a matter of fact I prefer the original conventions from the book (purely as a matter of taste, since the two are rigorously equivalent), but I changed here to the conventions of the program as it appears in the AutoProof page, with the obvious advantage that you can verify it mechanically. The text extract is otherwise exactly as in the 1990 book.
Bertrand Meyer is chief technology officer of Eiffel Software (Goleta, CA), professor and provost at the Schaffhausen Institute of Technology (Switzerland), and head of the software engineering lab at Innopolis University (Russia).
No entries found