0% found this document useful (0 votes)
9 views55 pages

Mark 10

formal

Uploaded by

asma ghoul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views55 pages

Mark 10

formal

Uploaded by

asma ghoul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Formal verification

John Harrison (johnh@ichips.intel.com)


Intel Corporation RA2-451,
2501 NW 229th Avenue, Hillsboro, OR 97124, USA

Abstract. These lectures are intended to give a broad overview of the most im-
portant formal verification techniques. These methods are applicable to hardware,
software, protocols etc. and even to checking proofs in pure mathematics, though
I will emphasize the applications to verification. Since other lecturers will cover
some of the topics in more detail (in particular model checking), my lectures will
cover mainly the material on deductive theorem proving, which in any case is my
special area of interest. The arrangement of material in these notes is roughly in or-
der of logical complexity, starting with methods for propositional logic and leading
up to general theorem proving, then finishing with a description of my own theorem
proving program ‘HOL Light’ and an extended case study on the verification of a
floating-point square root algorithm used by Intel.

1. Propositional logic
2. Symbolic simulation
3. Model checking
4. Automated theorem proving
5. Arithmetical theories
6. Interactive theorem proving
7. HOL Light
8. Case study of floating-point verification
The treatment of the various topics is quite superficial, with the aim being overall
perspective rather than in-depth understanding. The last lecture is somewhat more
detailed and should give a good feel for the realities of formal verification in this
field.
My book on theorem proving (Harrison 2009) covers many of these topics
in more depth, together with simple-minded example code (written in Objective
Caml) implementing the various techniques.
http://www.cl.cam.ac.uk/˜jrh13/atp/index.html
Other useful references for the material here are (Bradley and Manna 2007; Clarke,
Grumberg, and Peled 1999; Kroening and Strichman 2008; Kropf 1999; MacKen-
zie 2001; Peled 2001).
0. Introduction

As most programmers know to their cost, writing programs that function correctly in all
circumstances — or even saying what that means — is difficult. Most large programs
contain ‘bugs’. In the past, hardware has been substantially simpler than software, but
this difference is eroding, and current leading-edge microprocessors are also extremely
complex and usually contain errors. It has often been noted that mere testing, even on
clever sets of test cases, is usually inadequate to guarantee correctness on all inputs, since
the number of possible inputs and internal states, while finite, is usually astronomically
large. For example (Dijkstra 1976):
As I have now said many times and written in many places: program testing can be
quite effective for showing the presence of bugs, but is hopelessly inadequate for
showing their absence.
The main alternative to testing is formal verification, where it is rigorously proved
that the system functions correctly on all possible inputs. This involves forming mathe-
matical models of the system and its intended behaviour and linking the two:

Actual requirements

6
Mathematical specification

Mathematical model
6
Actual system

Hardware versus software

The facts that (i) getting a mathematical proof right is also difficult (DeMillo, Lipton, and
Perlis 1979), and (ii) correctness of formal models does not necessarily imply correctness
of the actual system (Fetzer 1988) caused much pointless controversy in the 70s. It is
now widely accepted that formal verification, with the proof itself checked by machine,
gives a much greater degree of confidence than traditional techniques.
The main impediment to greater use of formal verification is not these generalist
philosophical objections, but just the fact that it’s rather difficult. Only in a few iso-
lated safety-critical niches of the software industry is any kind of formal verification
widespread, e.g. in avionics. But in the hardware industry, formal verification is widely
practised, and increasingly seen as necessary. We can identify at least three reasons:
• Hardware is designed in a more modular way than most software. Constraints of
interconnect layering and timing means that one cannot really design ‘spaghetti
hardware’.
• More proofs in the hardware domain can be largely automated, reducing the need
for intensive interaction by a human expert with the mechanical theorem-proving
system.
• The potential consequences of a hardware error are greater, since such errors often
cannot be patched or worked around, and may in extremis necessitate a hardware
replacement.
To emphasize the last point, an error in the FDIV (floating-point division) instruction
of some early Intel Intel Pentium processors resulted in 1994/5 in a charge to Intel of
approximately $500M. Given this salutary lesson, and the size and diversity of its market,
it’s therefore understandable that Intel is particularly interested in formal verification.

The spectrum of formal verification techniques

There are a number of different formal verification techniques used in industry. Roughly
speaking, these use different logics to achieve a particular balance between generality and
automation. At one end of the scale, propositional logic has limited expressiveness, but
(amazingly) efficient decision methods are available. At the other end, higher-order logic
is very expressive but no efficient decision methods are available (in fact, higher-order
validity is not even recursively enumerable) and therefore proofs must be constructed
interactively with extensive user guidance. The lectures that follow are organized on the
same basis, with propositional logic first, followed by symbolic simulation and temporal
logic model checking, and finally general theorem proving at the end.
The particular method that is most appropriate for a given task may depend on de-
tails of that application. Needless to say, the effective combination of all methods is at-
tractive, and there has been much interest in this idea (Seger and Joyce 1991; Joyce and
Seger 1993; Rajan, Shankar, and Srivas 1995). Another ‘hybrid’ application of formal
verification is to prove correctness of some of the CAD tools used to produce hardware
designs (Aagaard and Leeser 1994) or even of the abstraction and reduction algorithms
used to model-check large or infinite-state systems (Chou and Peled 1999).
There has been a recent trend has towards partial verification, not necessarily seek-
ing a complete proof of correctness, or even formulating what that means, but still prov-
ing important properties using techniques from formal verification. A common approach
is to replace the actual system with an abstraction that may lose many details while still
exposing some key aspect(s) that are then more tractable to reason about. For exam-
ple, Microsoft’s Static Driver Verifier forms an abstraction of code for device drivers
and checks compliance with various properties like the proper release of allocated re-
sources, using symbolic representations and arithmetical theorem proving. The Airbus
A380 flight control software has been subjected to abstract interpretation to prove rig-
orously that, subject to natural environmental constraints, no floating-point overflow ex-
ceptions can occur.
A widespread perception is that formal verification is an impractical pipedream with
no connection to traditional methods. I prefer to see complete formal verification as the
extreme point on a spectrum that takes in various intermediate possibilities. Some static
checking of programs is already standard, if only via typechecking of the programming
language or tools like lint and its derivatives. Examples like SDV and the A380 veri-
fication show that the intermediate space is quite rich in possibilities. Thus, even if one
neglects the numerous successes of complete functional verification, verification tech-
nology should be considered a natural component of programming practice.

1. Propositional logic

I assume that everyone knows what propositional logic is. The following is mainly in-
tended to establish notation. Propositional logic deals with basic assertions that may be
either true or false and various ways of combining them into composite propositions,
without analyzing their internal structure. It is very similar to Boole’s original algebra of
logic, and for this reason the field is sometimes called Boolean algebra instead. In fact,
the Boolean notation for propositional operators is still widely used by circuit designers.

English Standard Boolean Other


false ⊥ 0 F
true > 1 T
not p ¬p p −p, ∼ p
p and q p∧q pq p&q, p · q
p or q p∨q p+q p | q, p or q
p implies q p⇒q p6q p → q, p ⊃ q
p iff q p⇔q p=q p ≡ q, p ∼ q

For example, p ∧ q ⇒ p ∨ q means ‘if p and q are true, then p or q is true’. We


assume that ‘or’ is interpreted inclusively, i.e. p ∨ q means ‘p or q or both’. This formula
is therefore always true, or a tautology.

Logic and circuits

At a particular time-step, we can regard each internal or external wire in a (binary) digital
computer as having a Boolean value, ‘false’ for 0 and ‘true’ for 1, and think of each cir-
cuit element as a Boolean function, operating on the values on its input wire(s) to produce
a value at its output wire. The most basic building-blocks of computers used by digital
designers, so-called logic gates, correspond closely to the usual logical connectives. For
example an ‘AND gate’ is a circuit element with two inputs and one output whose output
wire will be high (true) precisely if both the input wires are high, and so it corresponds
exactly in behaviour to the ‘and’ (‘∧’) connective. Similarly a ‘NOT gate’ (or inverter)
has one input wire and one output wire, which is high when the input is low and low
when the input is high; hence it corresponds to the ‘not’ connective (‘¬’). Thus, there is
a close correspondence between digital circuits and formulas which can be sloganized as
follows:
Digital design Propositional Logic
circuit formula
logic gate propositional connective
input wire atom
internal wire subexpression
voltage level truth value

An important issue in circuit design is proving that two circuits have the same func-
tion, i.e. give identical results on all inputs. This arises, for instance, if a designer makes
some special optimizations to a circuit and wants to check that they are “safe”. Using
the above correspondences, we can translate such problems into checking that a number
of propositional formulas Pn ⇔ Pn0 are tautologies. Slightly more elaborate problems in
circuit design (e.g. ignoring certain ‘don’t care’ possibilities) can also be translated to
tautology-checking. Thus, efficient methods for tautology checking directly yield useful
tools for hardware verification.

Tautology checking

Tautology checking is apparently a difficult problem in general: it is co-NP complete,


the dual problem ‘SAT’ of propositional satisfiability being the original NP-complete
problem (Cook 1971). However, there are a variety of algorithms that often work well
on the kinds of problems arising in circuit design. And not just circuit design. The whole
point of NP completeness is that many other apparently difficult combinatorial problems
can be reduced to tautology/satisfiability checking. Recently it’s become increasingly
clear that this is useful not just as a theoretical reduction but as a practical approach.
Surprisingly, many combinatorial problems are solved better by translating to SAT than
by customized algorithms! This probably reflects the enormous engineering effort that
has gone into SAT solvers.
The simplest method of tautology checking is, given a formula with n primitive
propositional variables, to try all 2n possible combinations and see if the formula always
comes out true. Obviously, this may work for small n but hardly for the cases of practical
interest. More sophisticated SAT checkers, while still having runtimes exponential in n in
the worst case, do very well on practical problems that may even involve millions of vari-
ables. These still usually involve true/false case-splitting of variables, but in conjunction
with more intelligent simplification.

The Davis-Putnam method

Most high-performance SAT checkers are based on the venerable Davis-Putnam algo-
rithm (Davis and Putnam 1960), or more accurately on the ‘DPLL’ algorithm, a later
improvement (Davis, Logemann, and Loveland 1962).
The starting-point is to put the formula to be tested for satisfiability in ‘conjunctive
normal form’. A formula is said to be in conjunctive normal form (CNF) when it is an
‘and of ors’, i.e. of the form:
C1 ∧C2 ∧ · · · ∧Cn

with each Ci in turn of the form:

li1 ∨ li2 ∨ · · · ∨ limi

and all the li j ’s literals, i.e. primitive propositions or their negations. The individual con-
juncts Ci of a CNF form are often called clauses. We usually consider these as sets,
since both conjunction and disjunction are associative, commutative and idempotent, so
it makes sense to talk of ⊥ as the empty clause. If Ci consists of one literal, it is called
a unit clause. Dually, disjunctive normal form (DNF) reverses the role of the ‘and’s and
‘or’s. These special forms are analogous to ‘fully factorized’ and ‘fully expanded’ in
ordinary algebra — think of (x + 1)(x + 2)(x + 3) as CNF and x3 + 6x2 + 11x + 6 as
DNF. Again by analogy with algebra, we can always translate a formula into CNF by
repeatedly rewriting with equivalences like:

¬(¬p) ⇔ p
¬(p ∧ q) ⇔ ¬p ∨ ¬q
¬(p ∨ q) ⇔ ¬p ∧ ¬q
p ∨ (q ∧ r) ⇔ (p ∨ q) ∧ (p ∨ r)
(p ∧ q) ∨ r ⇔ (p ∨ r) ∧ (q ∨ r)

However, this in itself can cause the formula to blow up exponentially before we
even get to the main algorithm, which is hardly a good start. One can do better by in-
troducing new variables to denote subformulas, and putting the resulting list of equiv-
alences into CNF — so-called definitional CNF. It’s not hard to see that this preserves
satisfiability. For example, we start with the formula:

(p ∨ (q ∧ ¬r)) ∧ s

introduce new variables for subformulas:

(p1 ⇔ q ∧ ¬r)∧
(p2 ⇔ p ∨ p1 )∧
(p3 ⇔ p2 ∧ s)∧
p3

then transform to CNF:

(¬p1 ∨ q) ∧ (¬p1 ∨ ¬r) ∧ (p1 ∨ ¬q ∨ r)∧


(¬p2 ∨ p ∨ p1 ) ∧ (p2 ∨ ¬p) ∧ (p2 ∨ ¬p1 )∧
(¬p3 ∨ p2 ) ∧ (¬p3 ∨ s) ∧ (p3 ∨ ¬p2 ∨ ¬s)∧
p3
The DPLL algorithm is based on the following satisfiability-preserving transforma-
tions:
I The 1-literal rule: if a unit clause p appears, remove ¬p from other clauses and
remove all clauses including p.
II The affirmative-negative rule: if p occurs only negated, or only unnegated, delete
all clauses involving p.
III Case-splitting: consider the two separate problems by adding p and ¬p as new
unit clauses.
If you get the empty set of clauses, the formula is satisfiable; if you get an empty
clause, it is unsatisfiable. Since the first two rules make the problem simpler, one only
applies the case-splitting rule when no other progress is possible. In the worst case, many
case-splits are necessary and we get exponential behaviour. But in practice it works quite
well.

Industrial-strength SAT checkers

The above simple-minded sketch of the DPLL algorithm leaves plenty of room for im-
provement. The choice of case-splitting variable is often critical, the formulas can be rep-
resented in a way that allows for efficient implementation, and the kind of backtracking
that arises from case splits can be made more efficient via ‘intelligent backjumping’ and
‘conflict clauses’. A few recent highly efficient DPLL-based theorem provers are Chaff
(Moskewicz, Madigan, Zhao, Zhang, and Malik 2001), BerkMin (Goldberg and Novikov
2002) and and MiniSat (Eén and Sörensson 2003).
Another interesting technique that is used in the tools from Prover Technology
(www.prover.com), as well as the experimental system Heerhugo (Groote 2000), is
Stålmarck’s dilemma rule (Stålmarck and Säflund 1990). This involves using case-splits
in a non-nested fashion, accumulating common information from both sides of a case
split and feeding it back:


@
@
@
R
@
∆ ∪ {¬p} ∆ ∪ {p}

R R
? ?
∆ ∪ ∆0 ∆ ∪ ∆1
@
@
@
R
@
∆ ∪ (∆0 ∩ ∆1 )
In some cases, this works out much better than the usual DPLL algorithm. For a
nice introduction, see (Sheeran and Stålmarck 2000). Note that this method is covered
by patents (Stålmarck 1994).

2. Symbolic simulation

As noted, testing is also widely used in hardware design, and until a decade or so ago
was the exclusive way of trying to establish correctness. Usually what is tested is not
the circuit itself but some formal model — the whole idea is to flush out errors before
the expensive step of building the hardware. The usual term for this kind of testing is
simulation. We’ll now consider two refinements of basic simulation, then show how they
can profitably be combined.

Ternary simulation

One important optimization of straightforward simulation is ternary simulation (Bryant


1991a), which uses values drawn from 3-element set: as well as ‘1’ (true) and ‘0’ (false)
we have an ‘X’, denoting an ‘unknown’ or ‘undefined’ value. This is not meant to imply
that a wire in the circuit itself is capable of attaining some third truth-value, but merely
reflects the fact that we don’t know whether it is 0 or 1. The point of ternary simula-
tion is that many values in the circuit may be irrelevant to some current concern, and
we can abstract them away by setting them to ‘X’ during simulation. (For example, if
we are analyzing a small part of a large circuit, e.g. an adder within a complete micro-
processor, then only a few bits outside the present module are likely to have any effect
on its behaviour and we can set others to X without altering the results of simulation.)
We then extend the basic logical operations realized by logic gates to ternary values with
this interpretation in mind. For example, we want 0 ∧ X = 0, since whatever the value
on one input, the output will be low if the other input is. Formally, we can present the
use of X as an abstraction mapping, for which purpose it’s also convenient to add an
‘overconstrained’ value T , with the intended interpretation:

T = {}
0 = {0}
1 = {1}
X = {0, 1}

It’s useful to impose an information ordering to give this quaternary lattice:


T
@
@
@
0 1
@
@
@
X

As expected, the truth tables are monotonic with respect to this ordering:

p q p∧q p∨q p⇒q p⇔q


X X X X X X
X 0 0 X X X
X 1 X 1 1 X
0 X 0 X 1 X
0 0 0 0 1 1
0 1 0 1 1 0
1 X X 1 X X
1 0 0 1 0 0
1 1 1 1 1 1

Symbolic simulation

Since we are working in an abstract model, we can easily generalize conventional simula-
tion, which tests on particular combinations of 1s and 0s, to symbolic simulation (Carter,
Joyner, and Brand 1979), where variables are used for some or all of the inputs, and the
outputs are correspondingly symbolic expressions rather than simple truth-values. Test-
ing equivalence then becomes testing of the Boolean expressions. Thus, if we use vari-
ables for all inputs, we essentially get back to combinational equivalence checking as we
considered before. The main differences are:
• We may use a more refined circuit model, rather than a simple Boolean switch
model
• We explicitly compute the expressions that are the “values” at the internal wires
and at the outputs as functions of the input variables, rather than just asserting
relations between them.
Instead of using general propositional formulas to represent the signals, we can use
a canonical representation, where equivalent formulas are represented by the same data
structure. Equivalence checking then becomes cheap. The most important such represen-
tation is BDDs (binary decision diagrams). Symbolic simulation often does work quite
well with BDDs and allows many circuits to be verified exhaustively (Bryant 1985).
BDDs

BDDs are essentially a representation of Boolean functions as decision trees. We can


think of them as laying out a truth table but sharing common subcomponents. Consider
the truth table for a propositional formula involving primitive propositions p1 , . . . , pn .
Rather than a 2n -row truth table, we can consider a binary tree indicating how truth
assignments for the successive atoms yield a truth value for the whole expression. For
example, the function p ∧ q ⇒ q ∧ r is represented as follows, where dashed lines indicate
the ‘false’ assignment of a variable, and solid lines the ‘true’.

q q

r r r r

T F T T T T T T

However, there are many common subtrees here, and we can imagine sharing them
as much as possible to obtain a directed acyclic graph. Although this degree of saving
may be atypical, there is always some potential for sharing, at least in the lowest level of
k
the tree. There are after all “only” 22 possible subtrees involving k variables, in particular
only two possible leaf nodes ‘true’ and ‘false’.
p

T F

In general, a representation of a Boolean formula as a binary tree with branch nodes


labelled with the primitive atoms and leaf nodes either ‘false’ or ‘true’ is called a binary
decision tree. If all common subtrees are maximally shared to give a directed acyclic
graph, it is called a binary decision diagram (Lee 1959; Akers 1978). As part of maximal
sharing, we assume that there are no redundant branches, i.e. none where the ‘true’ and
‘false’ descendents of a given node are the same subgraph — in such a configuration
that subgraph could be replaced by one of its children. If the variable ordering is the
same along all branches, we have reduced ordered binary decision diagrams (ROBDDs)
introduced by Bryant (Bryant 1986). Nowadays, when people say ‘BDD’ they normally
mean ROBDD, though there have been experiments with non-canonical variants.
Surprisingly many interesting functions have quite a compact BDD representa-
tion. However, there are exceptions such as multipliers (Bryant 1986) and the ‘hidden
weighted bit’ function (Bryant 1991b). In these cases it is difficult to apply BDD-based
methods directly.

Symbolic trajectory evaluation

A still more refined (and at first sight surprising) approach is to combine ternary and
symbolic simulation, using Boolean variables to parametrize ternary values. Consider the
rather artificial example of verifying that a piece of combinational circuitry with 7 inputs
and one output implements a 7-input AND gate. In conventional simulation we would
need to check all 27 = 128 input values. In symbolic simulation we would only need to
check one case, but that case is a symbolic expression that may be quite large. In ternary
simulation, we could verify the circuit only from correct results in 8 explicit cases:
0 X X X X X X
X 0 X X X X X
X X 0 X X X X
X X X 0 X X X
X X X X 0 X X
X X X X X 0 X
X X X X X X 0
1 1 1 1 1 1 1

In a combined approach, we can parametrize these 8 test cases using just 3 Boolean
variables (since 23 > 8), say p, q and r. For example, we can represent the input wires as


 1 if p ∧ q ∧ r
a0 = 0 if ¬p ∧ ¬q ∧ ¬r
X otherwise

and


 1 if p ∧ q ∧ r
a1 = 0 if ¬p ∧ ¬q ∧ r
X otherwise

and so on until:


 1 if p ∧ q ∧ r
a6 = 0 if p ∧ q ∧ ¬r
X otherwise

Symbolic trajectory evaluation (STE), introduced in (Seger and Bryant 1995), uses
this kind of parametrization of circuit states in terms of Boolean variables, and a special
logic for making correctness assertions, similar to but more restricted than the tempo-
ral logics we consider in the next lecture. Recently a more general form of STE known
as generalized symbolic trajectory evaluation (GSTE) has been developed (Yang 2000).
This can use STE-like methods to verify so-called ω-regular properties, optionally sub-
ject to fairness constraints, and so represents a substantial extension of its scope.
STE turns out to be very useful in partially automating the formal verification of
some circuits. Part of the appeal is that the use of the ternary model gives a relatively
easy way of abstracting out irrelevant detail. STE has been extensively used in formal
verification at Intel. For one example, see (O’Leary, Zhao, Gerth, and Seger 1999). For a
good introduction to the theory behind STE, see (Melham and Darbari 2002).
3. Reachability and model checking

We’ve mainly been concerned so far with analyzing combinational circuits, although
symbolic simulation and STE can be used to track signals over any fixed finite number
of clocks. However, it is sometimes of interest to ask questions like ‘can a circuit ever
get into a state where . . . ’ or ‘if line r (request) goes high at some point, must line a
(acknowledge) eventually do so too?’. Dealing with queries over an unbounded time
period like this requires the use of some new techniques.
We can abstract and generalize from the particular features of synchronous sequen-
tial digital circuits by considering them as particular cases of a finite state transition sys-
tem or finite state machine (FSM). Such a system consists of a finite ‘state space’ S to-
gether with a binary relation R ⊆ S × S describing the possible transitions, where R(a, b)
is true iff it is possible for the system to make a transition from state a to state b in one
time-step.1
For example, consider a circuit with three latches v0 , v1 and v2 that is supposed to
implement a modulo-5 counter. The state of the system is described by the values of
these latches, so we can choose S = {0, 1}×{0, 1}×{0, 1}. The corresponding transition
relation can be enumerated as follows:

(0, 0, 0) → (0, 0, 1)
(0, 0, 1) → (0, 1, 0)
(0, 1, 0) → (0, 1, 1)
(0, 1, 1) → (1, 0, 0)
(1, 0, 0) → (0, 0, 0)

The transition systems arising from modelling circuits in this way have the special
feature that the transition relation is deterministic, i.e. for each state a there is at most
one state b such that R(a, b). However, it’s useful to consider state transition systems in
general without this restriction, since the same methods can then be applied to a vari-
ety of other practically interesting situations, such as the analysis of parallel or nonde-
terministic hardware, programs or protocols. For example, it is often helpful in analyz-
ing synchronization and mutual exclusion in systems with concurrent and/or interleaved
components.

Forward and backward reachability

Many interesting questions are then of the form: if we start in a state σ0 ∈ S0 , can we
ever reach a state σ1 ∈ S1 ? For example, if we start the counter in state (0, 0, 0), will it
eventually return to the same state? (Fairly obviously, the answer is yes.) In general, we
call questions like this reachability questions. In order to answer these questions, we can
simply construct the reflexive-transitive closure R∗ of the transition relation R, and see if
1 Sometimes, one also considers a set of possible ‘initial’ and ‘final’ states to be part of the state transition

system, but we prefer to consider the question of the starting and finishing states separately.
it links any pairs of states of interest. In principle, there is no difficulty, since the state
space is finite.
Suppose that the set of starting states is S0 ; we will consider this the first in an
infinite sequence (Si ) of sets of states where Si is the set of states reachable from S0 in
6 i transitions. Clearly we can compute Si+1 from Si by the following recursion:

Si+1 = S0 ∪ {b | ∃a ∈ Si . R(a, b)}

for a state is reachable in 6 i + 1 steps if either it is in S0 , i.e. reachable in 0 steps, or it is


a possible successor state b to a state a that is reachable in i steps. It is immediate from
the intuitive interpretation that Si ⊆ Si+1 for all i > 0. Now, since each Si ⊆ S, where S
is finite, the sets Si cannot properly increase forever, so we must eventually reach a stage
where Si+1 = Si and hence Sk = Si for all k > i. It is easy to see that this Si , which we
will write S∗ , is then precisely the set of states reachable from S0 in any finite number of
steps.
Thus, we have an algorithm, forward reachability, for deciding whether any states
in some set P ⊆ S are reachable from S0 . We simply compute S∗ and then decide whether
S∗ ∩ P 6= 0./ In the dual approach, backward reachability, we similarly compute the set of
states P∗ from which a state in P is reachable, and then ask whether S0 ∩ P∗ 6= 0. / Once
again we can compute the set P∗ by iterating an operation until a fixpoint is reached. We
can start with A0 = P and iterate:

Ai+1 = P ∪ {a | ∃b ∈ Ai . R(a, b)}

We can simply characterize Ai as the set of states from which a state in P is reachable
in 6 i steps. As before, we have Ai ⊆ Ai+1 and we must eventually reach a fixpoint that
is the required set P∗ . A formally appealing variant is to use A0 = 0/ instead, and the first
iteration will give A1 = P and thereafter we will obtain the same sequence offset by 1
and the same eventual fixpoint P∗ . This helps to emphasize that we can consider this a
case of finding the least fixpoint of the monotone mapping on sets of states A 7→ P ∪ {a |
∃b ∈ A. R(a, b)} à la Knaster-Tarski (Knaster 1927; Tarski 1955).
Is there any reason to prefer forward or backward reachability over the other?
Though they will always give the same answer if successfully carried out, one or the
other may be much more efficient, depending on the particular system and properties
concerned (Iwashita, Nakata, and Hirose 1996). We will follow the most common ap-
proach and focus on backward reachability. This is not because of efficiency concerns,
but because, as we will see shortly, it generalizes nicely to analyzing more complicated
‘future modalities’ than just reachability.

Symbolic state representation

In practice, the kind of fixed-point computation sketched in the previous section can be
impractical, because the enumeration of cases in the state set or transition relation is too
large. Instead, we can use the techniques we have already established. Instead of rep-
resenting sets of states and state transitions by explicit enumeration of cases, we can
do it symbolically, using formulas. Suppose we encode states using Boolean variables
v1 , . . . , vn . Any T ⊆ S can be identified with a Boolean formula t[v1 , . . . , vn ] that holds
precisely when (v1 , . . . , vn ) ∈ T . Moreover, by introducing n additional ‘next state’ vari-
ables v01 , . . . , v0n we can represent a binary relation A (such as the transition relation R) by
a formula a[v1 , . . . , vn , v01 , . . . , v0n ] that is true iff A((v1 , . . . , vn ), (v01 , . . . , v0n )). For example,
we can represent the transition relation for the modulo-5 counter using (among other
possibilities) the formula:

(v00 ⇔ ¬v0 ∧ ¬v2 )∧


(v01 ⇔ ¬(v0 ⇔ v1 ))∧
(v02 ⇔ v0 ∧ v1 )

Here the parametrization in terms of Boolean variables just uses one variable for
each latch, but this choice, while the most obvious, is not obligatory.

Bounded model checking

One immediate advantage of a symbolic state representation is that we can ask questions
about k-step transition sequences without risking a huge blowup in the representation, as
might happen in a canonical BDD representation used in symbolic simulation. We can
simply duplicate the original formula k times with k sets of the n variables, and can ask
any question about the relation between the initial and final states expressed as a Boolean
formula p[v11 , . . . , v1n , vk1 , . . . , vkn ]:

r[v11 , . . . , v1n , v21 , . . . , v2n ] ∧ · · · r[vk−1 k−1 k k


1 , . . . , vn , v1 , . . . , vn ]
1 1 k
⇒ p[v1 , . . . , vn , v1 , . . . , vn k

This reduces the problem to tautology-checking. Of course, there is no guarantee


that this is feasible in a reasonable amount of time, but experience on many practical
problems with leading-edge tautology checkers has often showed impressive results for
this so-called bounded model checking (Biere, Cimatti, Clarke, and Zhu 1999).
As presented above, bounded model checking permits one to answer only bounded
reachability questions such as ‘if we start in a state σ0 ∈ S0 , can we reach a state σ1 ∈ S1
in k (or 6 k) cycles?’. Of course, every finite state transition system, precisely because
the number of states is finite, has some characteristic ‘diameter’ d such that any sequence
of d transition must contain a cycle. So it is possible in principle to decide unbounded
reachability questions by testing bounded reachability for all k 6 d. In reality, d is invari-
ably too large for this to be practical. But recently it has been shown (McMillan 2003)
how to use interpolants generated from the propositional unsatisfiability proofs of suc-
cessively longer reachability queries to find much sharper bounds, so that unbounded
reachability is accessible by this technique. This promises to be a highly significant dis-
covery, but we now turn to the more traditional technique for dealing with unbounded
reachability queries.
BDD-based reachability

In the case of unbounded queries, the fixpoint computation can easily be performed di-
rectly on the symbolic representation. Traditionally, a canonical format such as BDDs
has been used (Coudert, Berthet, and Madre 1989; Burch, Clarke, McMillan, Dill, and
Hwang 1992; Pixley 1990). This can immediately cause problems if the BDD repre-
sentation is infeasibly large, and indeed there have been recent explorations using non-
canonical representations (Bjesse 1999; Abdulla, Bjesse, and Eén 2000). On the other
hand, the fact that BDDs are a canonical representation means that if we repeat the iter-
ative steps leading through a sequence Si , we can immediately recognize when we have
reached a fixed point simply by seeing if the BDD for Si+1 is identical to that for Si ,
whereas in a non-canonical representation, we would need to perform a tautology check
at each stage.
Most of the operations used to find the fixpoints are, in the symbolic context, sim-
ply logical operations. For example, union (‘∪’) and intersection (‘∩’) are simply imple-
mented by conjunction (‘∧’) and disjunction (‘∨’). However, we also need the operation
that maps a set P and a transition relation R to the set PreR (P) (or just Pre(P) when R is
understood) of states from which there is a 1-step transition into a state in P:

Pre(P) = {a | ∃b ∈ P. R(a, b)}

In the symbolic representation, we can characterize this as follows:

Pre(p) = ∃v01 , . . . , v0n . r[v1 , . . . , vn , v01 , . . . , v0n ] ∧ p[v01 , . . . , v0n ]

It is not difficult to implement this procedure on the BDD representation, though it


is often a significant efficiency bottleneck.

Temporal logic model checking

There are still many interesting questions about transition systems that we can’t answer
using just reachability, the request-acknowledge example above being one simple case.
But the material so far admits of generalization.
The backward reachability method answered a question ‘starting in a state satisfying
s, can we reach a state satisfying p?’ by computing the set of all states p∗ from which p
is reachable, and then considering whether p∗ ∧ s = ⊥. If we introduce the notation:

EF(p)

to mean ‘the state p is reachable’, or ‘there is some state on some path through the re-
lation in which p holds’, we can consider p∗ as the set of states satisfying the formula
EF(p). We can systematically extend propositional logic in this way with further ‘tem-
poral operators’ to yield a form of so-called temporal logic. The most popular logic in
practice is Computation Tree Logic (CTL), introduced and popularized by Clarke and
Emerson (Clarke and Emerson 1981).
One formal presentation of the semantics of CTL, using separate classes of ‘path
formulas’ and ‘state formulas’ is the following. (We also need some way of assigning
truth-values to primitive propositions based on a state σ, and we will write [[σ]] for this
mapping applied to σ. In many cases, of course, the properties we are interested in will
be stated precisely in terms of the variables used to encode the states, so a formula is
actually its own denotation in the symbolic representation, modulo the transformation
from CTL to pure propositional logic.) The valuation functions for an arbitrary formula
are then defined as follows for state formulas:

sval(R, σ)(⊥) = f alse


sval(R, σ)(>) = true
sval(R, σ)(¬p) = not(sval(R, σ)p)
sval(R, σ)p = [[σ]](p)
···
sval(R, σ)(p ⇔ q) = (sval(R, σ)p = sval(R, σ)q)
sval(R, σ)(Ap) = ∀π. Path(R, σ)π ⇒ pval(R, π)p
sval(R, σ)(E p) = ∃π. Path(R, σ)π ∧ pval(R, π)p

and path formulas:

pval(R, π)(F p) = ∃t. sval(R, π(t))p


pval(R, π)(Gp) = ∀t. sval(R, π(t))p
pval(R, π)(p U q) = ∃t. (∀t 0 . t 0 < t ⇒ sval(R, π(t 0 ))p) ∧ sval(R, π(t))q
pval(R, π)(X p) = sval(R, (t 7→ π(t + 1)))p

Although it was convenient to use path formulas above, the appeal of CTL is exactly
that we only really need to consider state formulas, with path formulas just an interme-
diate concept. We can eliminate the separate class of path formulas by just putting to-
gether the path quantifiers and temporal operators in combinations like ‘EF’ and ‘AG’.
A typical formula in this logic is the request-acknowledge property:

AG(a ⇒ EG(r))

The process of testing whether a transition system (defining a model) satisfies a tem-
poral formula is called model checking (Clarke and Emerson 1981; Queille and Sifakis
1982). The process can be implemented by a very straightforward generalization of the
fixpoint methods used in backwards reachability. The marriage of Clarke and Emerson’s
original explicit-state model-checking algorithms with a BDD-based symbolic represen-
tation, due to McMillan (Burch, Clarke, McMillan, Dill, and Hwang 1992), gives sym-
bolic model checking, and which has led to substantially wider practical applicability. Al-
though at Intel, STE is often more useful, there are situations, e.g. the analysis of bus and
cache protocols, where the greater generality of temporal logic model checking seems
crucial.
For a detailed discussion of temporal logic model checking, see (Clarke, Grumberg,
and Peled 1999). The topic is also covered in some books on logic in computer sci-
ence like (Huth and Ryan 1999) and formal verification texts like (Kropf 1999). CTL is
just one example of a temporal logic, and there are innumerable other varieties such as
LTL (Pnueli 1977), CT L∗ (Emerson and Halpern 1986) and the propositional µ-calculus
(Kozen 1983). LTL stands for linear temporal logic, indicating that it takes a different
point of view in its semantics, considering all paths and not permitting explicit ‘A’ and
‘E’ operators of CTL, but allowing temporal properties to be nested directly without
intervening path quantifiers.

4. Automated theorem proving

It was noted that one of the reasons for the greater success of formal verification in the
hardware domain is the fact that more of the verification task can be automated. How-
ever, for some more elaborate systems, we cannot rely on tools like equivalence check-
ers, symbolic simulators or model checkers. One potential obstacle — almost invariably
encountered on large industrial designs — is that the system is too large to be verified
in a feasible time using such methods. A more fundamental limitation is that some more
sophisticated properties cannot be expressed at all in the simple Boolean world. For ex-
ample, a floating-point sin function can hardly have its intended behaviour spelled out in
terms of Boolean formulas on the component bits. Instead, we need a more expressive
logical system where more or less any mathematical concepts, including real numbers
and infinite sets, can be analyzed. This is usually formulated in first-order or higher-order
logic, which, as well as allowing propositions to have internal structure, introduces the
universal and existential quantifiers:
• The formula ∀x. p, where x is an (object) variables and p any formula, means
intuitively ‘p is true for all values of x’.
• The analogous formula ∃x. p means intuitively ‘p is true for some value(s) of x’,
or in other words ‘there exists an x such that p’.
I will assume that the basic syntax and definitions of first order logic are known.

Automated theorem proving based on Herbrand’s theorem

In contrast to propositional logic, many interesting questions about first order and higher
order logic are undecidable even in principle, let alone in practice. Church (Church 1936)
and Turing (Turing 1936) showed that even pure logical validity in first order logic is
undecidable, introducing in the process many of the basic ideas of computability theory.
On the other hand, it is not too hard to see that logical validity is semidecidable — this is
certainly a direct consequence of completeness theorems for proof systems in first order
logic (Gödel 1930), and was arguably implicit in work by Skolem (Skolem 1922). This
means that we can at least program a computer to enumerate all valid first order formulas.
One simple approach is based on the following logical principle, due to Skolem and
Gödel but usually mis-named “Herbrand’s theorem”:
Let ∀x1 , . . . , xn . P[x1 , . . . , xn ] be a first order formula with only the indicated universal
quantifiers (i.e. the body P[x1 , . . . , xn ] is quantifier-free). Then the formula is satisfi-
able iff the infinite set of ‘ground instances’ p[t1i , . . . ,tni ] that arise by replacing the
variables by arbitrary variable-free terms made up from functions and constants in
the original formula is propositionally satisfiable.
We can get the original formula into the special form required by some simple nor-
mal form transformations, introducing Skolem functions to replace existentially quan-
tified variables. And by compactness for propositional logic, we know that if the infi-
nite set of instances is unsatisfiable, then so will be some finite subset. In principle we
can enumerate all possible sets, one by one, until we find one that is not proposition-
ally satisfiable. (If the formula is satisfiable, we will never discover it by this means. By
undecidability, we know this is unavoidable.) A precise description of this procedure is
tedious, but a simple example may help. Suppose we want to prove that the following is
valid. This is often referred to as the ‘drinker’s principle’, because you can think of it as
asserting that there is some person x such that if x drinks, so does everyone.

∃x. ∀y. D(x) ⇒ D(y)

We start by negating the formula. To prove that the original is valid, we need to
prove that this is unsatisfiable:

¬(∃x. ∀y. D(x) ⇒ D(y))

We then make some transformations to a logical equivalent so that it is in ‘prenex


form’ with all quantifiers at the front.

∀x. ∃y. D(x) ∧ ¬D(y)

We then introduce a Skolem function f for the existentially quantified variable y:

∀x. D(x) ∧ ¬D( f (x))

We now consider the Herbrand universe, the set of all terms built up from con-
stants and functions in the original formula. Since here we have no nullary con-
stants, we need to add one c to get started (this effectively builds in the assump-
tion that all models have a non-empty domain). The Herbrand universe then becomes
{c, f (c), f ( f (c)), f ( f ( f (c))), . . .}. By Herbrand’s theorem, we need to test all sets of
ground instances for propositional satisfiability. Let us enumerate them in increasing
size. The first one is:
D(c) ∧ ¬D( f (c))

This is not propositionally unsatisfiable, so we consider the next:

(D(c) ∧ ¬D( f (c))) ∧ (D( f (c)) ∧ ¬D( f ( f (c)))

Now this is propositionally unsatisfiable, so we terminate with success.

Unification-based methods

The above idea (Robinson 1957) led directly some early computer implementations, e.g.
by Gilmore (Gilmore 1960). Gilmore tested for propositional satisfiability by transform-
ing the successively larger sets to disjunctive normal form. A more efficient approach is
to use the Davis-Putnam algorithm — it was in this context that it was originally intro-
duced (Davis and Putnam 1960). However, as Davis (Davis 1983) admits in retrospect:
. . . effectively eliminating the truth-functional satisfiability obstacle only uncovered
the deeper problem of the combinatorial explosion inherent in unstructured search
through the Herbrand universe . . .
The next major step forward in theorem proving was a more intelligent means of
choosing substitution instances, to pick out the small set of relevant instances instead
of blindly trying all possibilities. The first hint of this idea appears in (Prawitz, Prawitz,
and Voghera 1960), and it was systematically developed by Robinson (Robinson 1965),
who gave an effective syntactic procedure called unification for deciding on appropriate
instantiations to make terms match up correctly.
There are many unification-based theorem proving algorithms. Probably the best-
known is resolution, in which context Robinson (Robinson 1965) introduced full unifi-
cation to automated theorem proving. Another important method quite close to resolu-
tion and developed independently at about the same time is the inverse method (Maslov
1964; Lifschitz 1986). Other popular algorithms include tableaux (Prawitz, Prawitz, and
Voghera 1960), model elimination (Loveland 1968; Loveland 1978) and the connection
method (Kowalski 1975; Bibel and Schreiber 1975; Andrews 1976). Crudely speaking:
• Tableaux = Gilmore procedure + unification
• Resolution = Davis-Putnam procedure + unification
Tableaux and resolution can be considered as classic representatives of ‘top-down’
and ‘bottom-up’ methods respectively. Roughly speaking, in top-down methods one
starts from a goal and works backwards, while in bottom-up methods one starts from the
assumptions and works forwards. This has significant implications for the very nature
of unifiable variables, since in bottom-up methods they are local (implicitly universally
quantified) whereas in top-down methods they are global, correlated in different portions
of the proof tree. This is probably the most useful way of classifying the various first-
order search procedures and has a significant impact on the problems where they perform
well.
4.1. Decidable problems

Although first order validity is undecidable in general, there are special classes of for-
mulas for which it is decidable, e.g.
• AE formulas, which involve no function symbols and when placed in prenex form
have all the universal quantifiers before the existential ones.
• Monadic formulas, involving no function symbols and only monadic (unary) re-
lation symbols.
• Purely universal formulas
The decidability of AE formulas is quite easy to see, because no function sym-
bols are there to start with, and because of the special quantifier nesting, none are intro-
duced in Skolemization. Therefore the Herbrand universe is finite and the enumeration
of ground instances cannot go on forever. The decidability of the monadic class can be
proved in various ways, e.g. by transforming into AE form by pushing quantifiers in-
wards (‘miniscoping’). Although neither of these classes is particularly useful in prac-
tice, it’s worth noting that the monadic formulas subsume traditional Aristotelian syllo-
gisms, at least on a straightforward interpretation of what they are supposed to mean. For
example
If all M are P, and all S are M, then all S are P
can be expressed using monadic relations as follows:

(∀x. M(x) ⇒ P(x)) ∧ (∀x. S(x) ⇒ M(x)) ⇒ (∀x. S(x) ⇒ P(x))

For purely universal formulas, we can use congruence closure (Nelson and Oppen
1980; Shostak 1978; Downey, Sethi, and Tarjan 1980). This allows us to prove that one
equation follows from others, e.g. that

∀x. f ( f ( f (x)) = x ∧ f ( f ( f ( f ( f (x))))) = x ⇒ f (x) = x

As the name implies, congruence closure involves deducing all equalities between
subterms that follow from the asserted ones by using equivalence and congruence prop-
erties of equality. In our case, for example, the first equation f ( f ( f (x)) = x implies
f ( f ( f ( f (x))) = f (x) and hence f ( f ( f ( f ( f (x))))) = f ( f (x)), and then symmetry and
transitivity with the second equation imply f ( f (x)) = x, and so on. It straightforwardly
extends to deciding the entire universal theory by refutation followed by DNF transfor-
mation and congruence closure on the equations in the disjuncts, seeing whether any
negated equations in the same disjunct are implied.
An alternative approach is to reduce the problem to SAT by introducing a propo-
sitional variable for each equation between subterms, adding constraints on these vari-
ables to reflect congruence properties. This has the possibly significant advantage that no
potentially explosive DNF transformation need be done, and exploits the power of SAT
solvers. For example if Em,n (for 0 6 m, n 6 5) represents the equation f m (x) = f n (x),
we want to deduce E3,0 ∧ E5,0 ⇒ E1,0 assuming the equality properties n En,n (re-
V

flexivity), m,n Em,n ⇒ En,m (symmetry), m,n,p Em,n ∧ En,p ⇒ Em,p (transitivity) and
V V

m,n Em,n ⇒ Em+1,n+1 (congruence).


V
4.2. Theories

Also interesting in practice are situations where, rather than absolute logical validity, we
want to know whether statements follow from some well-accepted set of mathematical
axioms, or are true in some particular interpretation like the real numbers R. We gener-
alize the notion of logical validity, and say that p is a logical consequence of axioms A,
written A |= p, if for any valuation, every interpretation that makes all the formulas in A
true also makes p true. (This looks straightforward but unfortunately there is some incon-
sistency in standard texts. For example the above definition is found in (Enderton 1972),
whereas in (Mendelson 1987) the definition has the quantification over valuations per
formula: every interpretation in which each formula of A is true in all valuations makes
p true in all valuations. Fortunately, in most cases all the formulas in A are sentences,
formulas with no free variables, and then the two definitions coincide.) Ordinary validity
of p is the special cases 0/ |= p, usually written just |= p.
By a theory, we mean a set of formulas closed under first-order validity, or in other
words, the set of logical consequences of a set of axioms. The smallest theory is the set
of consequences of the empty set of axioms, i.e. the set of logically valid formulas. Note
also that for any particular interpretation, the set of formulas true in that interpretation is
also a theory. Some particularly important characteristics of a theory are:
• Whether it is consistent, meaning that we never have both T |= p and T |= ¬p.
(Equivalently, that we do not have T |= ⊥, or that some formula does not follow
from T .)
• Whether it is complete, meaning that for any sentence p, either T |= p or T |= ¬p.
• Whether it is decidable, meaning that there is an algorithm that takes as input a
formula p and decides whether T |= p.
Note that since we have a semidecision procedure for first-order validity, any com-
plete theory based on a finite (or even semicomputable, with a slightly more careful anal-
ysis) set of axioms is automatically decidable: just search in parallel for proofs of A ⇒ p
and A ⇒ ¬p.

5. Arithmetical theories

First order formulas built up from equations and inequalities and interpreted over com-
mon number systems are often decidable. A common way of proving this is quantifier
elimination.

5.1. Quantifier elimination

We say that a theory T in a first-order language L admits quantifier elimination if for each
formula p of L, there is a quantifier-free formula q such that T |= p ⇔ q. (We assume
that the equivalent formula contains no new free variables.) For example, the well-known
criterion for a quadratic equation to have a (real) root can be considered as an example
of quantifier elimination in a suitable theory T of reals:

T |= (∃x. ax2 + bx + c = 0) ⇔ a 6= 0 ∧ b2 > 4ac ∨ a = 0 ∧ (b 6= 0 ∨ c = 0)


If a theory admits quantifier elimination, then in particular any closed formula (one
with no free variables, such as ∀x. ∃y. x < y) has a T -equivalent that is ground, i.e. con-
tains no variables at all. In many cases of interest, we can quite trivially decide whether
a ground formula is true or false, since it just amounts to evaluating a Boolean combina-
tion of arithmetic operations applied to constants, e.g. 2 < 3 ⇒ 42 + 5 < 23. (One inter-
esting exception is the theory of algebraically closed fields of unspecified characteristic,
where quantifiers can be eliminated but the ground formulas cannot in general be eval-
uated without knowledge about the characteristic.) Consequently quantifier elimination
in such cases yields a decision procedure, and also shows that such a theory T is com-
plete, i.e. every closed formula can be proved or refuted from T . For a good discussion
of quantifier elimination and many explicit examples, see (Kreisel and Krivine 1971).
One of the simplest examples is the theory of ‘dense linear (total) orders without end
points’ is based on a language containing the binary relation ‘<’ as well as equality, but
no function symbols. It can be axiomatized by the following set of sentences:

∀x y. x = y ∨ x < y ∨ y < x
∀x y z. x < y ∧ y < z ⇒ x < z
∀x. x 6< x
∀x y. x < y ⇒ ∃z. x < z ∧ z < y
∀x. ∃y. x < y
∀x. ∃y. y < x

5.2. Presburger arithmetic

One of the earliest theories shown to have quantifier elimination is linear arithmetic over
the natural numbers or integers, as first shown by Presburger (Presburger 1930). Linear
arithmetic means that we are allows the usual equality and inequality relations, constants
and addition, but no multiplication, except by constants. In fact, to get quantifier elim-
ination we need to add infinitely many divisibility relations Dk for all integers k > 2.
This doesn’t affect decidability because those new relations are decidable for particular
numbers.
Presburger’s original algorithm is fairly straightforward, and follows the classic
quantifier elimination pattern of dealing with the special case of an existentially quanti-
fied conjunction of literals. (We can always put the formula into disjunctive normal form
and distribute existential quantifiers over the disjuncts, then start from the innermost
quantifier.) For an in-depth discussion of Presburger’s original procedure, the reader can
consult (Enderton 1972) and (Smoryński 1980), or indeed the original article, which is
quite readable — (Stansifer 1984) gives an annotated English translation. A somewhat
more efficient algorithm more suited to computer implementation is given by Cooper
(Cooper 1972).

5.3. Complex numbers

Over the complex numbers, we can also allow multiplication and still retain quantifier
elimination. Here is a sketch of a naive algorithm for complex quantifier elimination.
By the usual quantifier elimination pre-canonization, it suffices to be able to eliminate a
single existential quantifier from a conjunction of positive and negative equations:
∃x. p1 (x) = 0 ∧ · · · ∧ pn (x) = 0 ∧ q1 (x) 6= 0 ∧ · · · qm (x) 6= 0

We’ll sketch now how this can be reduced to the m 6 1 and n 6 1 case. To reduce n
we can use one equation to reduce the powers of variables in the others by elimination,
e.g.

2x2 + 5x + 3 = 0 ∧ x2 − 1 = 0 ⇔ 5x + 5 = 0 ∧ x2 − 1 = 0
⇔ 5x + 5 = 0 ∧ 0 = 0
⇔ 5x + 5 = 0

To reduce m, we may simply multiply all the qi (x) together since qi (x) 6= 0 ∧
qi+1 (x) 6= 0 ⇔ qi (x) · qi+1 (x) 6= 0. Now, the problem that remains is:

∃x. p(x) = 0 ∧ q(x) 6= 0

or equivalently ¬(∀x. p(x) = 0 ⇒ q(x) = 0). Consider the core formula:

∀x. p(x) = 0 ⇒ q(x) = 0

Assume that neither p(x) nor q(x) is the zero polynomial. Since we are working in
an algebraically closed field, we know that the polynomials p(x) and q(x) split into linear
factors whatever they may be:

p(x) = (x − a1 ) · (x − a2 ) · · · · · (x − an )
q(x) = (x − b1 ) · (x − b2 ) · · · · · (x − bm )
W
Now p(x) = 0 is equivalent to 16i6n x = ai and q(x) = 0 is equivalent to
16 j6m x = b j . Thus, the formula ∀x. p(x) = 0 ⇒ q(x) = 0 says precisely that
W

_ _
∀x. x = ai ⇒ x = bj
16i6n 16 j6m

or in other words, all the ai appear among the b j . However, since there are just n linear
factors in the antecedent, a given factor (x − ai ) cannot occur more than n times and
thus the polynomial divisibility relation p(x)|q(x)n holds. Conversely, if this divisibil-
ity relation holds for n 6= 0, then clearly ∀x. p(x) = 0 ⇒ q(x) = 0 holds. Thus, the key
quantified formula can be reduced to a polynomial divisibility relation, and it’s not dif-
ficult to express this as a quantifier-free formula in the coefficients, thus eliminating the
quantification over x.

5.4. Real algebra

In the case of the real numbers, one can again use addition and multiplication arbitrarily
and it is decidable whether the formula holds in R. A simple (valid) example is a case of
the Cauchy-Schwartz inequality:

∀x1 x2 y1 y2 . (x1 · y1 + x2 · y2 )2 6 (x12 + x22 ) · (y21 + y22 )


This decidability result is originally due to Tarski (Tarski 1951), though Tarski’s
method has non-elementary complexity and has apparently never been implemented.
Perhaps the most efficient general algorithm currently known, and the first actually to
be implemented on a computer, is the Cylindrical Algebraic Decomposition (CAD)
method introduced by Collins (Collins 1976). (For related work around the same time
see (Łojasiewicz 1964) and the method, briefly described in (Rabin 1991), developed in
Monk’s Berkeley PhD thesis.) A simpler method, based on ideas by Cohen, was devel-
oped by Hörmander (Hörmander 1983) — see also (Bochnak, Coste, and Roy 1998) and
(Gårding 1997) — and it is this algorithm that we will sketch.
Consider first the problem of eliminating the quantifier from ∃x. P[x], where P[x]
is a Boolean combination of univariate polynomials (in x). Suppose the polynomials
involved in the body are p1 (x), . . . , pn (x). The key to the algorithm is to obtain a sign
matrix for the set of polynomials. This is a division of the real line into a (possibly empty)
ordered sequence of m points x1 < x2 < · · · < xm representing precisely the zeros of the
polynomials, with the rows of the matrix representing, in alternating fashion, the points
themselves and the intervals between adjacent pairs and the two intervals at the ends:

(−∞, x1 ), x1 , (x1 , x2 ), x2 , . . . , xm−1 , (xm−1 , xm ), xm , (xm , +∞)

and columns representing the polynomials p1 (x), . . . , pn (x), with the matrix entries giv-
ing the signs, either positive (+), negative (−) or zero (0), of each polynomial pi at the
points and on the intervals. For example, for the collection of polynomials:

p1 (x) = x2 − 3x + 2
p2 (x) = 2x − 3

the sign matrix looks like this:

Point/Interval p1 p2
(−∞, x1 ) + −
x1 0 −
(x1 , x2 ) − −
x2 − 0
(x2 , x3 ) − +
x3 0 +
(x3 , +∞) + +

Note that x1 and x3 represent the roots 1 and 2 of p1 (x) while x2 represents 1.5,
the root of p2 (x). However the sign matrix contains no numerical information about
the location of the points xi , merely specifying the order of the roots of the various
polynomials and what signs they take there and on the intervening intervals. It is easy to
see that the sign matrix for a set of univariate polynomials p1 (x), . . . , pn (x) is sufficient
to answer any question of the form ∃x. P[x] where the body P[x] is quantifier-free and all
atoms are of the form pi (x) ./i 0 for any of the relations =, <, >, 6, > or their negations.
We simply need to check each row of the matrix (point or interval) and see if one of them
makes each atomic subformula true or false; the formula as a whole can then simply be
“evaluated” by recursion.
In order to perform general quantifier elimination, we simply apply this basic op-
eration to all the innermost quantified subformulas first (we can consider a universally
quantified formula ∀x. P[x] as ¬(∃x. ¬P[x]) and eliminate from ∃x. ¬P[x]). This can then
be iterated until all quantifiers are eliminated. The only difficulty is that the coefficients
of a polynomial may now contain other variables as parameters. But we can quite easily
handle these by performing case-splits over the signs of coefficients and using pseudo-
division of polynomials instead of division as we present below.
So the key step is finding the sign matrix, and for this the following simple observa-
tion is key. To find the sign matrix for

p, p1 , . . . , pn

it suffices to find one for the set of polynomials

p0 , p1 , . . . , pn , q0 , q1 , . . . , qn

where p0 , which we will sometimes write p0 for regularity’s sake, is the derivative of p,
and qi is the remainder on dividing p by pi . For suppose we have a sign matrix for the
second set of polynomials. We can proceed as follows.
First, we split the sign matrix into two equally-sized parts, one for the p0 , p1 , . . . , pn
and one for the q0 , q1 , . . . , qn , but for now keeping all the points in each matrix, even if
the corresponding set of polynomials has no zeros there. We can now infer the sign of
p(xi ) for each point xi that is a zero of one of the polynomials p0 , p1 , . . . , pn , as follows.
Since qk is the remainder of p after division by pk , p(x) = sk (x)pk (x) + qk (x) for some
sk (x). Therefore, since pk (xi ) = 0 we have p(xi ) = qk (xi ) and so we can derive the sign
of p at xi from that of the corresponding qk .
Now we can throw away the second sign matrix, giving signs for the q0 , . . . , qn , and
retain the (partial) matrix for p, p0 , p1 , . . . , pn . We next ‘condense’ this matrix to remove
points that are not zeros of one of the p0 , p1 , . . . , pn , but only of one of the qi . The signs
of the p0 , p1 , . . . , pn in an interval from which some other points have been removed can
be read off from any of the subintervals in the original subdivision — they cannot change
because there are no zeros for the relevant polynomials there.
Now we have a sign matrix with correct signs at all the points, but undetermined
signs for p on the intervals, and the possibility that there may be additional zeros of p
inside these intervals. But note that since there are certainly no zeros of p0 inside the
intervals, there can be at most one additional root of p in each interval. Whether there
is one can be inferred, for an internal interval (xi , xi+1 ), by seeing whether the signs of
p(xi ) and p(xi+1 ), determined in the previous step, are both nonzero and are different.
If not, we can take the sign on the interval from whichever sign of p(xi ) and p(xi+1 ) is
nonzero (we cannot have them both zero, since then there would have to be a zero of
p0 in between). Otherwise we insert a new point y between xi and xi+1 which is a zero
(only) of p, and infer the signs on the new subintervals (xi , y) and (y, xi+1 ) from the signs
at the endpoints. Other polynomials have the same signs on (xi , y), y and (y, xi+1 ) that
had been inferred for the original interval (xi , xi+1 ). For external intervals, we can use the
same reasoning if we temporarily introduce new points −∞ and +∞ and infer the sign of
p(−∞) by flipping the sign of p0 on the lowest interval (−∞, x1 ) and the sign of p(+∞)
by copying the sign of p0 on the highest interval (xn , +∞).

5.5. Word problems

Suppose K is a class of structures, e.g. all groups. The word problem for K asks whether a
set E of equations between constants implies another such equation s = t in all algebraic
structures of class K. More precisely, we may wish to distinguish:
• The uniform word problem for K: deciding given any E and s = t whether E |=M
s = t for all interpretations M in K.
• The word problem for K, E: with E fixed, deciding given any s = t whether E |=M
s = t for all interpretations M in K.
• The free word problem for K: deciding given any s = t whether |=M s = t for all
interpretations M in K.
As a consequence of general algebraic results (e.g. every integral domain has a field
of fractions, every field has an algebraic closure), there is a close relationship between
word problems and results for particular interpretations. For example,Vfor any universal
formula in the language of rings, such as a word problem implication i si = ti ⇒ s = t,
the following are equivalent, and hence we can solve it using complex quantifier elimi-
nation:
• It holds in all integral domains of characteristic 0
• It holds in all fields of characteristic 0
• It holds in all algebraically closed fields of characteristic 0
• It holds in any given algebraically closed field of characteristic 0
• It holds in C
There is also a close relationship between word problems and ideal membership
questions (Scarpellini 1969; Simmons 1970), sketched in a later section. These ideal
membership questions can be solved using efficient methods like Gröbner bases (Buch-
berger 1965; Cox, Little, and O’Shea 1992).

5.6. Practical decision procedures

In many ‘practical’ applications of decision procedures, a straightforward implementa-


tion of one of the above quantifier elimination procedures may be a poor fit, in both a
positive and negative sense:
• Fully general quantifier elimination is not needed
• The decidable theory must be combined with others
For example, since most program verification involves reasoning about integer in-
equalities (array indices etc.), one might think that an implementation of Presburger
arithmetic is appropriate. But in practice, most of the queries (a) are entirely universally
quantified, and (b) do not rely on subtle divisibility properties. A much faster algorithm
can be entirely adequate. In some cases the problems are even more radically limited.
For example, (Ball, Cook, Lahriri, and Rajamani 2004) report that the majority of their
integer inequality problems fall into a very class with at most two variables per inequality
and coefficients of only ±1, for which a much more efficient decision procedure is avail-
able (Harvey and Stuckey 1997). Of course there are exceptions, with some applications
such as indexed predicate abstraction (Lahiri and Bryant 2004) demanding more general
quantifier prefixes.
It’s quite common in program verification to need a combination of a decidable
theory, or indeed more than one, with uninterpreted function symbols. For example, in
typical correctness theorems for loops, one can end up with problems like the following
(the antecedent comes from a combination of the loop invariant and the loop termination
condition):

x − 1 < n ∧ ¬(x < n) ⇒ a[x] = a[n]

In pure Presburger arithmetic one can deduce x − 1 < n ∧ ¬(x < n) ⇒ x = n, but one
needs a methodology for making the further trivial deduction x = n ⇒ a[x] = a[n]. For
non-trivial quantifier prefixes, this problem rapidly becomes undecidable, but for purely
universal formulas like the above, there are well-established methods. The Nelson-Oppen
(Nelson and Oppen 1979) approach is the most general. It exploits the fact that the only
communication between component procedures need be equations and negated equations
(s = t and s 6= t), by virtue of a result in logic known as the Craig interpolation theo-
rem. An alternative, which can be viewed as an optimization of Nelson-Oppen for some
common cases, is due to Shostak (Shostak 1984). It has taken a remarkably long time
to reach a rigorous understanding of Shostak’s method; indeed Reuß and Shankar (Reuß
and Shankar 2001) showed that Shostak’s original algorithm and all the then known later
refinements were in fact incomplete and potentially nonterminating!
At the time of writing, there is intense interest in decision procedures for combina-
tions of (mainly, but not entirely quantifier-free) theories. The topic has become widely
known as satisfiability modulo theories (SMT), emphasizing the perspective that it is
a generalization of the standard propositional SAT problem. Indeed, most of the latest
SMT systems use methods strongly influenced by the leading SAT solvers, and are usu-
ally organized around a SAT-solving core. The idea of basing other decision procedures
around SAT appeared in several places and in several slightly different contexts, going
back at least to (Armando, Castellini, and Giunchiglia 1999). The simplest approach is
to use the SAT checker as a ‘black box’ subcomponent. Given a formula to be tested
for satisfiability, just treat each atomic formula as a propositional atom and feed the for-
mula to the SAT checker. If the formula is propositionally unsatisfiable, then it is triv-
ially unsatisfiable as a first-order formula and we are finished. If on the other hand the
SAT solver returns a satisfying assignment for the propositional formula, test whether
the implicit conjunction of literals is also satisfiable within our theory or theories. If it is
satisfiable, then we can conclude that so is the whole formula and terminate. However, if
the putative satisfying valuation is not satisfiable in our theories, we conjoin its negation
with the input formula and repeat the procedure.

6. Interactive theorem proving

Even though first order validity is semi-decidable, it is seldom practical to solve inter-
esting problems using unification-based approaches to pure logic. Nor is it the case that
practical problems often fit conveniently into one of the standard decidable subsets. The
best we can hope for in most cases is that the human will have to guide the proof pro-
cess, but the machine may be able to relieve the tedium by filling in gaps, while always
ensuring that no mistakes are made. This kind of application was already envisaged by
Wang (Wang 1960)
[...] the writer believes that perhaps machines may more quickly become of practi-
cal use in mathematical research, not by proving new theorems, but by formalizing
and checking outlines of proofs, say, from textbooks to detailed formalizations more
rigorous than Principia [Mathematica], from technical papers to textbooks, or from
abstracts to technical papers.
The first notable interactive provers were the SAM (semi-automated mathematics)
series. In 1966, the fifth in the series, SAM V, was used to construct a proof of a hitherto
unproven conjecture in lattice theory (Bumcrot 1965). This was indubitably a success for
the semi-automated approach because the computer automatically proved a result now
called “SAM’s Lemma” and the mathematician recognized that it easily yielded a proof
of Bumcrot’s conjecture.
Not long after the SAM project, the AUTOMATH (de Bruijn 1970; de Bruijn 1980)
and Mizar (Trybulec 1978; Trybulec and Blair 1985) proof checking systems appeared,
and each of them in its way has been profoundly influential. Although we will refer to
these systems as ‘interactive’, we use this merely as an antonym of ‘automatic’. In fact,
both AUTOMATH and Mizar were oriented around batch usage. However, the files that
they process consist of a proof, or a proof sketch, which they check the correctness of,
rather than a statement for which they attempt to find a proof automatically.
Mizar has been used to proof-check a very large body of mathematics, spanning pure
set theory, algebra, analysis, topology, category theory and various unusual applications
like mathematical puzzles and computer models. The body of mathematics formally built
up in Mizar, known as the ‘Mizar Mathematical Library’ (MML), seems unrivalled in
any other theorem proving system. The ‘articles’ (proof texts) submitted to the MML
are automatically abstracted into human-readable form and published in the Journal of
Formalized Mathematics, which is devoted entirely to Mizar formalizations.2

LCF — a programmable proof checker

The ideal proof checker should be programmable, i.e. users should be able to extend
the built-in automation as much as desired. There’s no particular difficulty in allow-
ing this. Provided the basic mechanisms of the theorem prover are straightforward and
well-documented and the source code is made available, there’s no reason why a user
shouldn’t extend or modify it. However, the difficulty comes if we want to restrict the
user to extensions that are logically sound — as presumably we might well wish to, since
unsoundness renders questionable the whole idea of machine-checking of supposedly
more fallible human proofs. Even fairly simple automated theorem proving programs are
often subtler than they appear, and the difficulties of integrating a large body of special
proof methods into a powerful interactive system without compromising soundness is
not trivial.
2 Available on the Web via http://www.mizar.org/JFM.
One influential solution to this difficulty was introduced in the Edinburgh LCF
project led by Robin Milner (Gordon, Milner, and Wadsworth 1979). Although this was
for an obscure ‘logic of computable functions’ (hence the name LCF), the key idea, as
Gordon (Gordon 1982) emphasizes, is equally applicable to more orthodox logics sup-
porting conventional mathematics, and subsequently many programmable proof check-
ers were designed using the same principles, such as Coq,3 HOL (Gordon and Melham
1993), Isabelle (Paulson 1994) and Nuprl (Constable 1986).
The key LCF idea is to use a special type (say thm) of proven theorems in the im-
plementation language, so that anything of type thm must by construction have been
proved rather than simply asserted. (In practice, the implementation language is usually
a version of ML, which was specially designed for this purpose in the LCF project.) This
is enforced by making thm an abstract type whose only constructors correspond to ap-
proved inference rules. But the user is given full access to the implementation language
and can put the primitive rules together in more complicated ways using arbitrary pro-
gramming. Because of the abstract type, any result of type thm, however it was arrived
at, must ultimately have been produced by correct application of the primitive rules. Yet
the means for arriving at it may be complex. Most obviously, we can use ML as a kind of
‘macro language’ to automate common patterns of inference. But much more sophisti-
cated derived rules can be written that, for example, prove formulas of Presburger arith-
metic while automatically decomposing to logical primitives. In many theorem-proving
tasks, more ‘ad hoc’ manipulation code can be replaced by code performing inference
without significant structural change. In other cases we can use an automated proof pro-
cedure, or even an external system like a computer algebra system (Harrison and Théry
1998), as an oracle to find a proof that is later checked inside the system. Thus, LCF gives
a combination of programmability and logical security that would probably be difficult
to assure by other means.

An LCF kernel for first-order logic

To explain the LCF idea in more concrete terms, we will show a complete LCF-style
kernel for first order logic with equality, implemented in Objective CAML.4 We start
by defining the syntax of first order logic, as found in any standard text like (Enderton
1972; Mendelson 1987). First we have an OCaml type of first order terms, where a term
is either a named variable or a named function applied to a list of arguments (constants
like 1 are regarded as nullary functions):
type term = Var of string | Fn of string * term list;;

Using this, we can now define a type of first order formulas:


type formula = False
| True
| Atom of string * term list
| Not of formula
| And of formula * formula
| Or of formula * formula
| Imp of formula * formula

3 See the Coq Web page http://pauillac.inria.fr/coq.


4 See http://www.ocaml.org.
| Iff of formula * formula
| Forall of string * formula
| Exists of string * formula;;

Before proceeding, we define some OCaml functions for useful syntax operations:
constructing an equation, checking whether a term occurs in another, and checking
whether a term occurs free in a formula:

let mk_eq s t = Atom("=",[s;t]);;

let rec occurs_in s t =


s = t or
match t with
Var y -> false
| Fn(f,args) -> exists (occurs_in s) args;;

let rec free_in t fm =


match fm with
False -> false
| True -> false
| Atom(p,args) -> exists (occurs_in t) args
| Not(p) -> free_in t p
| And(p,q) -> free_in t p or free_in t q
| Or(p,q) -> free_in t p or free_in t q
| Imp(p,q) -> free_in t p or free_in t q
| Iff(p,q) -> free_in t p or free_in t q
| Forall(y,p) -> not (occurs_in (Var y) t) & free_in t p
| Exists(y,p) -> not (occurs_in (Var y) t) & free_in t p;;

There are many complete proof systems for first order logic. We will adopt a Hilbert-
style proof system close to one first suggested by Tarski (Tarski 1965), and subsequently
presented in a textbook (Monk 1976). The idea is to avoid defining relatively tricky
syntactic operations like substitution. We first define the signature for the OCaml abstract
datatype of theorems:

module type Proofsystem =


sig type thm
val axiom_addimp : formula -> formula -> thm
val axiom_distribimp :
formula -> formula -> formula -> thm
val axiom_doubleneg : formula -> thm
val axiom_allimp : string -> formula -> formula -> thm
val axiom_impall : string -> formula -> thm
val axiom_existseq : string -> term -> thm
val axiom_eqrefl : term -> thm
val axiom_funcong : string -> term list -> term list -> thm
val axiom_predcong : string -> term list -> term list -> thm
val axiom_iffimp1 : formula -> formula -> thm
val axiom_iffimp2 : formula -> formula -> thm
val axiom_impiff : formula -> formula -> thm
val axiom_true : thm
val axiom_not : formula -> thm
val axiom_or : formula -> formula -> thm
val axiom_and : formula -> formula -> thm
val axiom_exists : string -> formula -> thm
val modusponens : thm -> thm -> thm
val gen : string -> thm -> thm
val concl : thm -> formula
end;;

and then the actual implementation of the primitive inference rules. For example,
modusponens is the traditional modus ponens inference rule allowing us to pass from
two theorems of the form ` p ⇒ q and ` p to another one ` q:

` p=q ` p
EQ MP
`q

In the usual LCF style, this becomes a function taking two arguments of type thm and
producing another. In fact, most of these inference rules have no theorems as input, and
can thus be considered as axiom schemes. For example, axiom addimp creates theorems
of the form ` p ⇒ (q ⇒ p) and axiom existseq creates those of the form ∃x. x = t
provided x does not appear in the term t:

module Proven : Proofsystem =


struct type thm = formula
let axiom_addimp p q = Imp(p,Imp(q,p))
let axiom_distribimp p q r = Imp(Imp(p,Imp(q,r)),Imp(Imp(p,q),Imp(p,r)))
let axiom_doubleneg p = Imp(Imp(Imp(p,False),False),p)
let axiom_allimp x p q = Imp(Forall(x,Imp(p,q)),Imp(Forall(x,p),Forall(x,q)))
let axiom_impall x p =
if not (free_in (Var x) p) then Imp(p,Forall(x,p))
else failwith "axiom_impall"
let axiom_existseq x t =
if not (occurs_in (Var x) t) then Exists(x,mk_eq (Var x) t)
else failwith "axiom_existseq"
let axiom_eqrefl t = mk_eq t t
let axiom_funcong f lefts rights =
fold_right2 (fun s t p -> Imp(mk_eq s t,p))
lefts rights (mk_eq (Fn(f,lefts)) (Fn(f,rights)))
let axiom_predcong p lefts rights =
fold_right2 (fun s t p -> Imp(mk_eq s t,p))
lefts rights (Imp(Atom(p,lefts),Atom(p,rights)))
let axiom_iffimp1 p q = Imp(Iff(p,q),Imp(p,q))
let axiom_iffimp2 p q = Imp(Iff(p,q),Imp(q,p))
let axiom_impiff p q = Imp(Imp(p,q),Imp(Imp(q,p),Iff(p,q)))
let axiom_true = Iff(True,Imp(False,False))
let axiom_not p = Iff(Not p,Imp(p,False))
let axiom_or p q = Iff(Or(p,q),Not(And(Not(p),Not(q))))
let axiom_and p q = Iff(And(p,q),Imp(Imp(p,Imp(q,False)),False))
let axiom_exists x p = Iff(Exists(x,p),Not(Forall(x,Not p)))
let modusponens pq p =
match pq with Imp(p’,q) when p = p’ -> q
| _ -> failwith "modusponens"
let gen x p = Forall(x,p)
let concl c = c
end;;
Although simple, these rules are in fact complete for first-order logic with equality.
At first they are tedious to use, but using the LCF technique we can build up a set of
derived rules. The following derives p ⇒ p:

let imp_refl p = modusponens (modusponens (axiom_distribimp p (Imp(p,p)) p)


(axiom_addimp p (Imp(p,p))))
(axiom_addimp p p);;

Before long, we can reach the stage of automatic derived rules that, for example,
prove propositional tautologies automatically, perform Knuth-Bendix completion, and
prove first order formulas by standard proof search and translation into primitive infer-
ences.

Proof style

One feature of the LCF style is that proofs (being programs) tend to be highly procedu-
ral, in contrast to the more declarative proofs supported by Mizar — for more on the con-
trast see (Harrison 1996c). This can have important disadvantages in terms of readability
and maintainability. In particular, it is difficult to understand the formal proof scripts in
isolation; they need to be run in the theorem prover to understand what the intermediate
states are. Nevertheless as pointed out in (Harrison 1996b) it is possible to implement
more declarative styles of proof on top of LCF cores. For more recent experiments with
Mizar-like declarative proof styles see (Syme 1997; Wenzel 1999; Zammit 1999; Wiedijk
2001).

Theorem proving in industry

Theorem provers that have been used in real industrial applications include ACL2 (Kauf-
mann, Manolios, and Moore 2000), HOL Light (Gordon and Melham 1993; Harrison
1996a) and PVS (Owre, Rushby, and Shankar 1992). We noted earlier that formal verifi-
cation methods can be categorized according to their logical expressiveness and automa-
tion. The same kind of balance can be drawn within the general theorem proving section.
Although these theorem provers all have undecidable decision problems, it is still possi-
ble to provide quite effective partial automation by using a more restricted logic. ACL2
follows this philosophy: it uses a quantifier-free logic analogous to PRA (Primitive Re-
cursive Arithmetic) (Goodstein 1957). HOL and PVS use richer logics with higher-order
quantification; PVS’s type system is particularly expressive. Nevertheless they attempt to
provide some useful automation, and HOL in particular uses the LCF approach to ensure
soundness and programmability. This will be emphasized in the application considered
below.

7. HOL Light

In the early 80s Mike Gordon (one of the original LCF team, now working in Cam-
bridge), as well as supervising the further development of LCF, was interested in the for-
mal verification of hardware. For this purpose, classical higher order logic seemed a nat-
ural vehicle, since it allows a rather direct rendering of notions like signals as functions
from time to values. The case was first put by Hanna (Hanna and Daeche 1986) and,
after a brief experiment with an ad hoc formalism ‘LSM’ based on Milner’s Calculus of
Communicating Systems, Gordon (Gordon 1985) also became a strong advocate. Gor-
don modified Cambridge LCF to support classical higher order logic, and so HOL (for
Higher Order Logic) was born. The first major public release was HOL88 (Gordon and
Melham 1993), and it spawned a large family of later versions, of which we will concen-
trate on our own HOL Light (Harrison 1996a),5 which is designed to have a simple and
clean logical foundation:

HOL88
HH
@
@HH
@ HHH
@ H
j
H
R
@ Isabelle/HOL
hol90 ProofPower
@
@
@
R ?
@
HOL Light

?
hol98
?
HOL 4

Following Church (Church 1940), the HOL logic is based on simply typed λ-
calculus, so all terms are either variables, constants, applications or abstractions; there is
no distinguished class of formulas, merely terms of boolean type. The main difference
from Church’s system is that parametric polymorphism is an object-level, rather than a
meta-level, notion. For example, a theorem about type (α)list can be instantiated and
used for specific instances like (int)list and ((bool)list)list. Thus, the types in
HOL are essentially like terms of first order logic:

type hol_type = Tyvar of string


| Tyapp of string * hol_type list;;

The only primitive type constructors for the logic itself are bool (booleans) and fun
(function space):

let the_type_constants = ref ["bool",0; "fun",2];;

Later we add an infinite type ind (individuals). All other types are introduced by a
rule of type definition, to be in bijection with any nonempty subset of an existing type.
5 See http://www.cl.cam.ac.uk/users/jrh/hol-light/index.html.
' $
 existing
new  type
type bijections- P
δ 
 
γ
& %

HOL terms are those of simply-typed lambda calculus. In the abstract syntax, only
variables and constants are decorated with types.

type term = Var of string * hol_type


| Const of string * hol_type
| Comb of term * term
| Abs of term * term;;

The usual notation for these categories are v : ty, c : ty, f x and λx. t. The abstract
type interface ensures that only well-typed terms can be constructed. This stock of terms
may appear spartan, but using defined constants and a layer of parser and prettyprinter
support, many standard syntactic conventions are broken down to λ-calculus. For exam-
ple, the universal quantifier, following Church, is simply a higher order function, but the
conventional notation ∀x. P[x] is supported, mapping down to ∀(λx. P[x]). Similarly there
is a constant LET, which is semantically the identity and is used only as a tag for the
prettyprinter, and following Landin (Landin 1966), the construct ‘let x = t in s’ is broken
down to ‘LET (λx. s) t’.
The abstract type interface also ensures that constant terms can only be constructed
for defined constants. The only primitive constant for the logic itself is equality = with
polymorphic type α → α → bool.

let the_term_constants =
ref ["=", mk_fun_ty aty (mk_fun_ty aty bool_ty)];;

Later we add the Hilbert ε : (α → bool) → α yielding the Axiom of Choice. All
other constants are introduced using a rule of constant definition. Given a term t (closed,
and with some restrictions on type variables) and an unused constant name c, we can
define c and get the new theorem:

`c=t

Both terms and type definitions give conservative extensions and so in particular
preserve logical consistency. Thus, HOL is doubly ascetic:
• All proofs are done by primitive inferences
• All new types and constants are defined not postulated.
As noted, HOL has no separate syntactic notion of formula: we just use terms of
Boolean type. HOL’s theorems are single-conclusion sequents constructed from such
formulas:

type thm = Sequent of (term list * term);;


In the usual LCF style, these are considered an abstract type and the inference rules
become CAML functions operating on type thm. For example:

let ASSUME tm =
if type_of tm = bool_ty then Sequent([tm],tm)
else failwith "ASSUME: not a proposition";;

is the rule of assumption. The complete set of primitive rules, in usual logical notation,
is as follows (some of these have side-conditions that are not shown here):

REFL
`t =t

Γ`s=t ∆`t =u
TRANS
Γ∪∆ ` s = u

Γ`s=t ∆`u=v
MK COMB
Γ ∪ ∆ ` s(u) = t(v)

Γ`s=t
ABS
Γ ` (λx. s) = (λx. t)

BETA
` (λx. t)x = t

ASSUME
{p} ` p

Γ` p=q ∆` p
EQ MP
Γ∪∆ ` q

Γ` p ∆`q
DEDUCT ANTISYM RULE
(Γ − {q}) ∪ (∆ − {p}) ` p = q

Γ[x1 , . . . , xn ] ` p[x1 , . . . , xn ]
INST
Γ[t1 , . . . ,tn ] ` p[t1 , . . . ,tn ]
Γ[α1 , . . . , αn ] ` p[α1 , . . . , αn ]
INST TYPE
Γ[γ1 , . . . , γn ] ` p[γ1 , . . . , γn ]

We can create various simple derived rules in the usual LCF fashion, such as a one-
sided congruence rule:
let AP_TERM tm th =
try MK_COMB(REFL tm,th)
with Failure _ -> failwith "AP_TERM";;

and a symmetry rule to reverse equations (rator and rand return the operator and
operand of a function application f x):
let SYM th =
let tm = concl th in
let l,r = dest_eq tm in
let lth = REFL l in
EQ_MP (MK_COMB(AP_TERM (rator (rator tm)) th,lth)) lth;;

Even the logical connectives themselves are defined:


> = (λx. x) = (λx. x)
∧ = λp. λq. (λ f . f p q) = (λ f . f > >)
⇒ = λp. λq. p ∧ q = p
∀ = λP. P = λx. >
∃ = λP. ∀Q. (∀x. P(x) ⇒ Q) ⇒ Q
∨ = λp. λq. ∀r. (p ⇒ r) ⇒ (q ⇒ r) ⇒ r
⊥ = ∀P. P
¬ = λt. t ⇒ ⊥
∃! = λP. ∃P ∧ ∀x. ∀y. P x ∧ P y ⇒ (x = y)
(These are not constructive type theory’s Curry-Howard definitions.) We can now
implement the usual natural deduction rules, such as conjunction introduction:
let CONJ =
let f = ‘f:bool->bool->bool‘
and p = ‘p:bool‘ and q = ‘q:bool‘ in
let pth =
let pth = ASSUME p and qth = ASSUME q in
let th1 = MK_COMB(AP_TERM f (EQT_INTRO pth),EQT_INTRO qth) in
let th2 = ABS f th1 in
let th3 = BETA_RULE (AP_THM (AP_THM AND_DEF p) q) in
EQ_MP (SYM th3) th2 in
fun th1 th2 ->
let th = INST [concl th1,p; concl th2,q] pth in
PROVE_HYP th2 (PROVE_HYP th1 th);;
Now we can automate monotone inductive definitions, using a Knaster-Tarski
derivation. The implementation is quite hard work, but it’s then easy to use. It can cope
with infinitary definitions and user-defined monotone operators.

let TC_RULES,TC_INDUCT,TC_CASES = new_inductive_definition


‘(!x y. R x y ==> TC R x y) /\
(!x y z. TC R x y /\ TC R y z ==> TC R x z)‘;;

This just uses the basic logic, but at this point we introduce the usual axioms for
classical logic and mathematics:
• Choice in the form of the Hilbert ε and its characterizing axiom ` ∀x. P(x) ⇒
P(εx. P(x))
• Extensionality as an η-conversion axiom ` (λx. t x) = t
• Infinity as a new type ind and an assertion that it’s Dedekind-infinite.
Everything else is now purely definitional. The Law of the Excluded Middle is de-
duced from Choice (Diaconescu 1975) rather than being postulated separately. We then
proceed with the development of standard mathematical infrastructure:
• Basic arithmetic of the natural numbers
• Theory of wellfounded relations
• General recursive data types
• Lists
• Elementary ‘set’ theory
• Axiom of Choice variants like Zorn’s Lemma, wellordering principle etc.
• Construction of real and complex numbers
• Real analysis
Among the higher-level derived rules are:
• Simplifier for (conditional, contextual) rewriting.
• Tactic mechanism for mixed forward and backward proofs.
• Tautology checker.
• Automated theorem provers for pure logic, based on tableaux and model elimina-
tion.
• Linear arithmetic decision procedures over R, Z and N.
• Differentiator for real functions.
• Generic normalizers for rings and fields
• General quantifier elimination over C
• Gröbner basis algorithm over fields
HOL Light is perhaps the purest example of the LCF methodology that is actually
useful, in that the logical core is minimal and almost all mathematical concepts are de-
fined or constructed rather than merely postulated. But thanks to the LCF methodology
and the speed of modern computers, we can use it to tackle non-trivial mathematics and
quite difficult applications.
The first sustained attempt to actually formalize a body of mathematics (concepts
and proofs) was Principia Mathematica (Whitehead and Russell 1910). This success-
fully derived a body of fundamental mathematics from a small logical system. However,
the task of doing so was extraordinarily painstaking, and indeed Russell (Russell 1968)
remarked that his own intellect ‘never quite recovered from the strain of writing it’. But
now in the computer age, we can defer much of this work to a computer program like
HOL Light.

8. Floating-point verification

In the present section we describe some work applying HOL Light to some problems
in industrial floating-point verification, namely correctness of square root algorithms for
the Intel Itanium architecture.

Square root algorithms based on fma

The centrepiece of the Intel Itanium floating-point architecture is the fma (floating-
point multiply-add or fused multiply-accumulate) family of instructions. Given three
floating-point numbers x, y and z, these can compute x · y ± z as an atomic operation, with
the final result rounded as usual according to the IEEE Standard 754 for Binary Floating-
Point Arithmetic (IEEE 1985), but without intermediate rounding of the product x · y.
Of course, one can always obtain the usual addition and multiplication operations as the
special cases x · 1 + y and x · y + 0.
The fma has many applications in typical floating-point codes, where it can often
improve accuracy and/or performance. In particular (Markstein 1990) correctly rounded
quotients and square roots can be computed by fairly short sequences of fmas, obviating
the need for dedicated instructions. Besides enabling compilers and assembly language
programmers to make special optimizations, deferring these operations to software often
yields much higher throughput than with typical hardware implementations. Moreover,
the floating-point unit becomes simpler and easier to optimize because minimal hardware
need be dedicated to these relatively infrequent operations, and scheduling does not have
to cope with their exceptionally high latency.
Itanium architecture compilers for high-level languages will typically translate
division or square root operations into appropriate sequences of machine instructions.
Which sequence is used depends (i) on the required precision and (ii) whether one wishes
to minimize latency or maximize throughput. For concreteness, we will focus on a partic-
ular algorithm for calculating square roots in double-extended precision (64-bit precision
and 15-bit exponent field):

1. y0 = frsqrta(a)
2. H0 = 12 y0 S0 = ay0
3. d0 = 21 − S0 H0
4. H1 = H0 + d0 H0 S1 = S0 + d0 S0
5. d1 = 21 − S1 H1
6. H2 = H1 + d1 H1 S2 = S1 + d1 S1
7. d2 = 21 − S2 H2 e2 = a − S2 S2
8. H3 = H2 + d2 H2 S3 = S2 + e2 H2
9. e3 = a − S3 S3
10. S = S3 + e3 H3
All operations but the last are done using the register floating-point format with
rounding to nearest and with all exceptions disabled. (This format provides the same 64-
bit precision as the target format but has a greater exponent range, allowing us to avoid
intermediate overflow or underflow.) The final operation is done in double-extended pre-
cision using whatever rounding mode is currently selected by the user.
This algorithm is a non-trivial example in two senses. Since it is designed for the
maximum precision supported in hardware (64 bits), greater precision cannot be ex-
ploited in intermediate calculations and so a very careful analysis is necessary to ensure
correct rounding. Moreover, it is hardly feasible to test such an algorithm exhaustively,
even if an accurate and fast reference were available, since there are about 280 possible
inputs. (By contrast, one could certainly verify single-precision and conceivably verify
double precision by exhaustive or quasi-exhaustive methods.)

Algorithm verification

It’s useful to divide the algorithm into three parts, and our discussion of the correctness
proof will follow this separation:
1 Form6 an initial approximation y0 = √1 (1 + ε) with |ε| 6 2−8.8 .
a

2–8 Convert this to approximations H0 ≈ 2√1 a and S0 ≈ a, then
successively re-
fine these to much better approximations H3 and S3 using Goldschmidt iteration
(Goldschmidt 1964) (a Newton-Raphson variant).
9–10 Use these accurate approximations to produce the square root S correctly rounded
according to the current rounding mode, setting IEEE flags or triggering excep-
tions as appropriate.

Initial approximation
The frsrta instruction makes a number of initial checks for special cases that are dealt
with separately, and if necessary normalizes the input number. It then uses a simple table
lookup to provide the approximation. The algorithm and table used are precisely spec-
ified in the Itanium instruction set architecture. The formal verification is essentially
some routine algebraic manipulations for exponent scaling, then a 256-way case split fol-
lowed by numerical calculation. The following HOL theorem concerns the correctness
of the core table lookup:

|- normal a ∧ &0 <= Val a


⇒ abs(Val(frsqrta a) / inv(sqrt(Val a)) - &1)
< &303 / &138050

Refinement
Each fma operation will incur a rounding error, but we can easily find a mathematically
convenient (though by no means optimally sharp) bound for the relative error induced
by rounding. The key principle is the ‘1 + e’ property, which states that the rounded
6 Using frsqrta, the only Itanium instruction specially intended to support square root. In the present

discussion we abstract somewhat from the actual machine instruction, and ignore exceptional cases like a = 0
where it takes special action.
result involves only a small relative perturbation to the exact result. In HOL the formal
statement is as follows:

|- ¬(losing fmt rc x) ∧ ¬(precision fmt = 0)


⇒ ∃e. abs(e) <= mu rc / &2 pow (precision fmt - 1) ∧
(round fmt rc x = x * (&1 + e))

The bound on e depends on the precision of the floating-point format and the round-
ing mode; for round-to-nearest mode, mu rc is 1/2. The theorem has two side condi-
tions, one being a nontriviality hypothesis, and the other an assertion that the value x does
not lose precision. We will not show the formal definition (Harrison 1999) here, since
it is rather complicated. However, a simple and usually adequate sufficient condition is
that the exact result lies in the normal range (or is zero).
Actually applying this theorem, and then bounding the various error terms, would be
quite tedious if done by hand. We have programmed some special derived rules in HOL
to help us. First, these automatically bound absolute magnitudes of quantities, essentially
by using the triangle rule |x+y| 6 |x|+|y|. This usually allows us to show that no overflow
occurs. However, to apply the 1 + e theorem, we also need to exclude underflow, and
so must establish minimum (nonzero) absolute magnitudes. This is also largely done
automatically by HOL, repeatedly using theorems for the minimum nonzero magnitude
that can result from an individual operation. For example, if 2e 6 |a|, then either a + b · c
is exactly zero or 2e−2p 6 |a + b · c| where p is the precision of the floating-point format
containing a, b and c.
It’s now quite easy with a combination of automatic error bounding and some man-
ual algebraic rearrangement to obtain quite good relative error bounds for the main com-
puted quantities. In fact, in the early iterations, the rounding errors incurred are insignif-
icant in comparison with the approximation errors in the Hi and Si . Thus, the relative
errors in these quantities are roughly in step. If we write

1 √
Hi ≈ √ (1 + εi ) Si ≈ a(1 + εi )
2 a

then

1 1 1
di ≈ − Si Hi = − (1 + εi )2 = −(εi + ε2i /2)
2 2 2

Consequently, correcting the current approximations in the manner indicated will


approximately square the relative error, e.g.

√ √ 3
Si+1 ≈ Si + di Si = Si (1 + di ) ≈ a(1 + εi )(1 − εi − ε2i /2) = a(1 − ε2i )
2

Towards the end, the rounding errors in Si and Hi become more significantly decou-
pled and for the penultimate iteration we use a slightly different refinement for S3 .

e2 ≈ a − S2 S2 = a − ( a(1 + ε2 )2 ) ≈ −2aε2

and so:

√ 1 √
S2 + e2 H2 ≈ a(1 + ε2 ) − (2aε2 )( √ (1 + ε02 )) ≈ a(1 − ε2 ε02 )
2 a

Thus, S2 +e2 H2 will be √ quite an accurate square root approximation. In fact the HOL
proof yields S2 + e2 H2 = a(1 + ε) with |ε| 6 5579/279 ≈ 2−66.5 .
The above sketch elides what in the HOL proofs is a detailed bound on the rounding
error. However this only really becomes significant when S3 is rounded; this may in
itself contribute a relative error of order 2−64 , significantly
√ more than the error before
rounding. Nevertheless
√ it is important to note that if a happens to be an exact floating-
point number (e.g. 1.5625 = 1.25), S3 will be that number. This is a consequence of
the fact that the error in S2 + e2 H2 is less than half the distance between surrounding
floating-point numbers.

Correct rounding
The final two steps of the algorithm simply repeat the previous iteration for S3 and the
basic error analysis is the same. The difficulty is in passing from a relative error before
rounding to correct rounding afterwards. Again we consider the final rounding separately,
result of rounding the exact value S∗ = S3 + e3 H3 . The error analysis indicates
so S is the√
that S = a(1 + ε) for some |ε| 6 25219/2140 ≈ 2−125.37 . The final result S will, by the

basic property of the fma operation, be the result of rounding S∗ in whatever the chosen

rounding mode may be. The desired result would be the result of rounding exactly a in
the same way. How can we be sure these are the same? √
First we can dispose of some special cases. We noted earlier that if a is already
exactly a floating-point number, then S3 will already be that number. In this case we will
have e3 = 0 and so S∗ = S3 . Whatever the rounding mode, rounding a number already in
the format concerned will give that number itself:

|- a IN iformat fmt ⇒ (round fmt rc a = a)



so the result will be correct. Moreover, the earlier observation extends to show that if a
is fairly close (in a precise sense) to a floating-point number, then S3 will be that number.
It is then quite straightforward to see that the overall algorithm will be accurate without
great subtlety: we just need the fact that e3 has the right sign and roughly the correct
magnitude, so S∗ will never misround in directed rounding modes. Thus, we can also
deal immediately with what would otherwise be difficult cases for the directed rounding
modes, and concentrate our efforts on √ rounding to nearest.
On general grounds we note that a cannot be exactly the mid-point between two
floating-point numbers. This is not hard to see, since the square root of a number in a
given format cannot denormalize in that format, and a non-denormal midpoint has p + 1
significant digits, so its square must have more than p.7
7 An analogous result holds for quotients but here the denormal case must be dealt with specially. For example

2Emin × 0.111· · ·111/2 is exactly a midpoint.


|- &0 <= a ∧ a IN iformat fmt ∧ b IN midpoints fmt
⇒ ¬(sqrt a = b)

This is a useful observation. We’ll never be in the tricky case where there are two
equally close floating-point
√ numbers (resolved by the ‘round to even’ rule.) So in round-
to-nearest, S∗ and a could only round in different ways if there were a midpoint be-
tween them, for only then could the closest floating-point numbers to them differ. For
example in the following diagram √where large lines indicate floating-point numbers and
smaller ones represent midpoints, a would round ‘down’ while S∗ would round ‘up’:8

-
√ 6 6
a
S∗
Although analyzing this condition combinatorially would be complicated, there is a
much simpler sufficient condition. One can easily see that it would suffice to show that
for any midpoint m:

√ √
| a − S∗ | < | a − m|

In that case a and S∗ couldn’t lie on opposite sides of m. Here is the formal theorem
in HOL:

|- ¬(precision fmt = 0) ∧
(∀m. m IN midpoints fmt ⇒ abs(x - y) < abs(x - m))
⇒ (round fmt Nearest x = round fmt Nearest y)

√ One can arrive at an ‘exclusion zone’ theorem giving the minimum √ possible
| a − m|. However, this can be quite small, about 2−(2p+3) relative to a, where p is the
precision. For example, in our context with p = 64, consider the square root of the next
floating-point number below 1, whose mantissa consists entirely of 1s. Its square root is
about 2−131 from a midpoint:

p
1 − 2−64 ≈ (1 − 265 ) − 2−131

Therefore, our relative error in S∗ of about 2−125.37 is far from adequate to justify
perfect rounding based on the simple ‘exclusion zone’ theorem, for which we need some-
thing of order 2−131 . However, our relative error bounds are far from sharp, and it seems
quite plausible that the algorithm does nevertheless work correctly. What can we do?
One solution is to utilize more refined theorems (Markstein 2000), but this is com-
plicated and may still fail to justify several algorithms that are intuitively believed to
work correctly. An ingenious alternative developed by Cornea (Cornea-Hasegan 1998)
is to observe that there are relatively few cases like 0.111 · · · 1111 whose square roots
come close enough to render the exclusion zone theorem inapplicable, and these can be
isolated by fairly straightforward number-theoretic methods. We can therefore:

8 Similarly, in the other rounding modes, misrounding could only occur if a and S∗ are separated by a
floating-point number. However as we have noted one can deal with those cases more directly.
• Isolate the special cases a1 , . . . , an that have square roots within the critical dis-
tance of a midpoint.
• Conclude from the simple exclusion zone theorem that the algorithm will give
correct results except possibly for a1 , . . . , an .
• Explicitly show that the algorithm is correct for the a1 , . . . , an , (effectively by
running it on those inputs).
This two-part approach is perhaps a little unusual, but not unknown even in pure
mathematics.9 For example, consider “Bertrand’s Conjecture” (first proved by Cheby-
shev), stating that for any positive integer n there is a prime p with n 6 p 6 2n. The
most popular proof (Erdös 1930), involves assuming n > 4000 for the main proof and
separately checking the assertion for n 6 4000.10
By some straightforward mathematics described in (Cornea-Hasegan 1998) and for-
malized in HOL without difficulty, one can show that the difficult cases for square roots
have mantissas m, considered as p-bit integers, such that one of the following diophan-
tine equations has a solution k for some integer |d| 6 D, where D is roughly the factor by
which the guaranteed relative error is excessive:

2 p+2 m = k2 + d 2 p+1 m = k2 + d

We consider the equations separately for each chosen |d| 6 D. For example, we might
be interested in whether 2 p+1 m = k2 − 7 has a solution. If so, the possible value(s) of m
are added to the set of difficult cases. It’s quite easy to program HOL to enumerate all
the solutions of such diophantine equations, returning a disjunctive theorem of the form:

` (2 p+1 m = k2 + d) ⇒ (m = n1 ) ∨ . . . ∨ (m = ni )

The procedure simply uses even-odd reasoning and recursion on the power of two
(effectively so-called ‘Hensel lifting’). For example, if

225 m = k2 − 7

then we know k must be odd; we can write k = 2k0 + 1 and deduce:

224 m = 2k02 + 2k0 − 3

By more even/odd reasoning, this has no solutions. In general, we recurse down to


an equation that is trivially unsatisfiable, as here, or immediately solvable. One equation
can split into two, but never more. For example, we have a formally proved HOL the-
9 A more extreme case is the 4-color theorem, whose proof relies on extensive (computer-assisted) checking

of special cases (Appel and Haken 1976).


10 An ‘optimized’ way of checking, referred to in (Aigner and Ziegler 2001) as “Landau’s trick”, is to verify

that 3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503 and 4001 are all prime and each is less than twice its
predecessor.
√ √
orem asserting that for any double-extended number a,11 rounding a and a(1 + ε)
to double-extended precision using any of the four IEEE rounding modes will give the
same results provided |ε| < 31/2131 , with the possible exceptions of 22e m for:

m ∈ { 10074057467468575321, 10376293541461622781,
10376293541461622787, 11307741603771905196,
13812780109330227882, 14928119304823191698,
16640932189858196938, 18446744073709551611,
18446744073709551612, 18446744073709551613,
18446744073709551614, 18446744073709551615}

and 22e+1 m for

m ∈ { 9223372036854775809, 9223372036854775811,
11168682418930654643}

Note that while some of these numbers are obvious special cases like 264 − 1, the
“pattern” in others is only apparent from the kind of mathematical analysis we have un-
dertaken here. They aren’t likely to be exercised by random testing, or testing of plausible
special cases.12
Checking formally that the algorithm works on the special cases can also be au-
tomated, by applying theorems on the uniqueness of rounding to the concrete numbers
computed. (For a formal proof, it is not sufficient to separately test the implemented al-
gorithm, since such a result has no formal status.) In order to avoid trying all possible
even or odd exponents for the various significands, we exploit some results on invariance
of the rounding and arithmetic involved in the algorithm under systematic scaling by 22k ,
doing a simple form of symbolic simulation by formal proof.

Flag settings
Correctness according to the IEEE Standard 754 not only requires the correctly rounded
result, but the correct setting of flags or triggering of exceptions for conditions like over-
flow, underflow and inexactness. Actually, almost all these properties follow directly
from the arguments leading to perfect rounding. For example, the mere fact that two real
numbers round equivalently in all rounding modes implies that one is exact iff the other
is:

|- ¬(precision fmt = 0) ∧
(∀rc. round fmt rc x = round fmt rc y)
⇒ ∀rc. (round fmt rc x = x) = (round fmt rc y = y)

The correctness of other flag settings follows in the same sort of way, with underflow
only slightly more complicated (Harrison 1999).
11 Note that there is more subtlety required when using such a result in a mixed-precision environment. For

example, to obtain a single-precision result for a double-precision input, an algorithm that suffices for single-
precision inputs may not be adequate even though the final precision is the same.
12 On the other hand, we can well consider the mathematical analysis as a source of good test cases.
Conclusions and related work

What do we gain from developing these proofs formally in a theorem prover, compared
with a detailed hand proof? We see two main benefits: reliability and re-usability.
Proofs of this nature, large parts of which involve intricate but routine error bounding
and the exhaustive solution of Diophantine equations, are very tedious and error-prone
to do by hand. In practice, one would do better to use some kind of machine assistance,
such as ad hoc programs to solve the Diophantine equations and check the special cases
so derived. Although this can be helpful, it can also create new dangers of incorrectly im-
plemented helper programs and transcription errors when passing results between ‘hand’
and ‘machine’ portions of the proof. By contrast, we perform all steps of the proof in a
painstakingly foundational system, and can be quite confident that no errors have been
introduced. The proof proceeds according to strict logical deduction, all the way from the
underlying pure mathematics up to the symbolic “execution” of the algorithm in special
cases.
Although we have only discussed one particular example, many algorithms with a
similar format have been developed for use in systems based on the Itanium architec-
ture. One of the benefits of implementing division and square root in software is that dif-
ferent algorithms can be substituted depending on the detailed accuracy and performance
requirements of the application. Not only are different (faster) algorithms provided for
IEEE single and double precision operations, but algorithms often have two versions,
one optimized for minimum latency and one for maximal throughput. These algorithms
are all quite similar in structure and large parts of the correctness proofs use the same
ideas. By performing these proofs in a programmable theorem prover like HOL, we are
able to achieve high re-use of results, just tweaking a few details each time. Often, we
can produce a complete formal proof of a new algorithm in just a day. For a even more
rigidly stereotyped class of algorithms, one could quite practically implement a totally
automatic verification rule in HOL.
Underlying these advantages are three essential theorem prover features: soundness,
programmability and executability. HOL scores highly on these points. It is implemented
in a highly foundational style and does not rely on the correctness of very complex code.
It is freely programmable, since it is embedded in a full programming language. In par-
ticular, one can program it to perform various kinds of computation and symbolic exe-
cution by proof. The main disadvantage is that proofs can sometimes take a long time to
run, precisely because they always decompose to low-level primitives. This applies with
particular force to some kinds of symbolic execution, where instead of simply accepting
an equivalence like 294 + 3 = 13 · 19 · 681943 · 7941336391 · 14807473717 based, say, on
the results of a multiprecision arithmetic package, a detailed formal proof is constructed
under the surface. To some extent, this sacrifice of efficiency is a conscious choice when
we decide to adopt a highly foundational system, but it might be worth weakening this
ideology at least to include concrete arithmetic as an efficient primitive operation.
This is by no means the only work in this area. On the contrary, floating-point veri-
fication is regarded as one of the success stories of formal verification. For related work
on square root algorithms in commercial systems see (Russinoff 1998; O’Leary, Zhao,
Gerth, and Seger 1999; Sawada and Gamboa 2002). For similar verifications of division
algorithms and transcendental functions by the present author, see (Harrison 2000b; Har-
rison 2000a), while (Muller 2006) and (Markstein 2000) give a detailed discussion of
some relevant floating-point algorithms. When verifying transcendental functions, one
needs a fair amount of pure mathematics formalized just to get started. So, in a common
intellectual pattern, it is entirely possible that Mizar-like formalizations of mathemat-
ics undertaken for purely intellectual reasons might in the end turn out to be useful in
practical applications.

References

Aagaard, M. and Leeser, M. (1994) Verifying a logic synthesis tool in Nuprl: A case study
in software verification. In v. Bochmann, G. and Probst, D. K. (eds.), Computer Aided
Verification: Proceedings of the Fourth International Workshop, CAV’92, Volume 663
of Lecture Notes in Computer Science, Montreal, Canada, pp. 69–81. Springer Verlag.
Abdulla, P. A., Bjesse, P., and Eén, N. (2000) Symbolic reachability analysis based on
SAT-solvers. In Graf, S. and Schwartzbach, M. (eds.), Tools and Algorithms for the
Construction and Analysis of Systems (TACAS’00), Volume 1785 of Lecture Notes in
Computer Science. Springer-Verlag.
Aigner, M. and Ziegler, G. M. (2001) Proofs from The Book (2nd ed.). Springer-Verlag.
Akers, S. B. (1978) Binary decision diagrams. ACM Transactions on Computers, C-27,
509–516.
Alur, R. and Peled, D. A. (eds.) (2004) Computer Aided Verification, 16th International
Conference, CAV 2004, Volume 3114 of Lecture Notes in Computer Science, Boston,
MA. Springer-Verlag.
Andrews, P. B. (1976) Theorem proving by matings. IEEE transactions on Computers,
25, 801–807.
Appel, K. and Haken, W. (1976) Every planar map is four colorable. Bulletin of the
American Mathematical Society, 82, 711–712.
Armando, A., Castellini, C., and Giunchiglia, E. (1999) SAT-based procedures for tem-
poral reasoning. In Proceedings of the 5th European conference on Planning, Lecture
Notes in Computer Science, pp. 97–108. Springer-Verlag.
Ball, T., Cook, B., Lahriri, S. K., and Rajamani, S. K. (2004) Zapato: Automatic theorem
proving for predicate abstraction refinement. See Alur and Peled (2004), pp. 457–461.
Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., and Théry, L. (eds.) (1999) Theorem
Proving in Higher Order Logics: 12th International Conference, TPHOLs’99, Volume
1690 of Lecture Notes in Computer Science, Nice, France. Springer-Verlag.
Bibel, W. and Schreiber, J. (1975) Proof search in a Gentzen-like system of first order
logic. In Gelenbe, E. and Potier, D. (eds.), Proceedings of the International Computing
Symposium, pp. 205–212. North-Holland.
Biere, A., Cimatti, A., Clarke, E. M., and Zhu, Y. (1999) Symbolic model checking with-
out BDDs. In Proceedings of the 5th International Conference on Tools and Algo-
rithms for Construction and Analysis of Systems, Volume 1579 of Lecture Notes in
Computer Science, pp. 193–207. Springer-Verlag.
Bjesse, P. (1999) Symbolic model checking with sets of states represented as formulas.
Technical Report SC-1999-100, Department of Computer Science, Chalmers Univer-
sity of Technology.
Bochnak, J., Coste, M., and Roy, M.-F. (1998) Real Algebraic Geometry, Volume 36 of
Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag.
Bradley, A. R. and Manna, Z. (2007) The Calculus of Computation: Decision Procedures
with Applications to Verification. Springer-Verlag.
Bryant, R. E. (1985) Symbolic verification of MOS circuits. In Fuchs, H. (ed.), Proceed-
ings of the 1985 Chapel Hill Conference on VLSI, pp. 419–438. Computer Science
Press.
Bryant, R. E. (1986) Graph-based algorithms for Boolean function manipulation. IEEE
Transactions on Computers, C-35, 677–691.
Bryant, R. E. (1991a) A method for hardware verification based on logic simulation.
Journal of the ACM, 38, 299–328.
Bryant, R. E. (1991b) On the complexity of VLSI implementations and graph represen-
tations of Boolean functions with application to integer multiplication. IEEE Trans-
actions on Computers, C-40, 205–213.
Buchberger, B. (1965) Ein Algorithmus zum Auffinden der Basiselemente des Restk-
lassenringes nach einem nulldimensionalen Polynomideal. Ph. D. thesis, Mathema-
tisches Institut der Universität Innsbruck. English translation in Journal of Symbolic
Computation vol. 41 (2006), pp. 475–511.
Bumcrot, R. (1965) On lattice complements. Proceedings of the Glasgow Mathematical
Association, 7, 22–23.
Burch, J. R., Clarke, E. M., McMillan, K. L., Dill, D. L., and Hwang, L. J. (1992) Sym-
bolic model checking: 1020 states and beyond. Information and Computation, 98,
142–170.
Carter, W. C., Joyner, W. H., and Brand, D. (1979) Symbolic simulation for correct ma-
chine design. In Proceedings of the 16th ACM/IEEE Design Automation Conference,
pp. 280–286. IEEE Computer Society Press.
Caviness, B. F. and Johnson, J. R. (eds.) (1998) Quantifier Elimination and Cylindrical
Algebraic Decomposition, Texts and monographs in symbolic computation. Springer-
Verlag.
Chou, C.-T. and Peled, D. (1999) Formal verification of a partial-order reduction tech-
nique for model checking. Journal of Automated Reasoning, 23, 265–298.
Church, A. (1936) An unsolvable problem of elementary number-theory. American Jour-
nal of Mathematics, 58, 345–363.
Church, A. (1940) A formulation of the Simple Theory of Types. Journal of Symbolic
Logic, 5, 56–68.
Clarke, E. M. and Emerson, E. A. (1981) Design and synthesis of synchronization skele-
tons using branching-time temporal logic. In Kozen, D. (ed.), Logics of Programs,
Volume 131 of Lecture Notes in Computer Science, Yorktown Heights, pp. 52–71.
Springer-Verlag.
Clarke, E. M., Grumberg, O., and Peled, D. (1999) Model Checking. MIT Press.
Collins, G. E. (1976) Quantifier elimination for real closed fields by cylindrical algebraic
decomposition. In Brakhage, H. (ed.), Second GI Conference on Automata Theory and
Formal Languages, Volume 33 of Lecture Notes in Computer Science, Kaiserslautern,
pp. 134–183. Springer-Verlag.
Constable, R. (1986) Implementing Mathematics with The Nuprl Proof Development Sys-
tem. Prentice-Hall.
Cook, S. A. (1971) The complexity of theorem-proving procedures. In Proceedings of
the 3rd ACM Symposium on the Theory of Computing, pp. 151–158. ACM.
Cooper, D. C. (1972) Theorem proving in arithmetic without multiplication. In Melzer,
B. and Michie, D. (eds.), Machine Intelligence 7, pp. 91–99. Elsevier.
Cornea-Hasegan, M. (1998) Proving the IEEE correctness of iterative floating-point
square root, divide and remainder algorithms. Intel Technology Journal, 1998-Q2,
1–11. Available on the Web as http://developer.intel.com/technology/itj/
q21998/articles/art_3.htm.
Coudert, O., Berthet, C., and Madre, J.-C. (1989) Verification of synchronous sequential
machines based on symbolic execution. In Sifakis, J. (ed.), Automatic Verification
Methods for Finite State Systems, Volume 407 of Lecture Notes in Computer Science,
pp. 365–373. Springer-Verlag.
Cox, D., Little, J., and O’Shea, D. (1992) Ideals, Varieties, and Algorithms. Springer-
Verlag.
Davis, M. (1983) The prehistory and early history of automated deduction. See Siekmann
and Wrightson (1983), pp. 1–28.
Davis, M., Logemann, G., and Loveland, D. (1962) A machine program for theorem
proving. Communications of the ACM, 5, 394–397.
Davis, M. and Putnam, H. (1960) A computing procedure for quantification theory. Jour-
nal of the ACM, 7, 201–215.
de Bruijn, N. G. (1970) The mathematical language AUTOMATH, its usage and some of
its extensions. In Laudet, M., Lacombe, D., Nolin, L., and Schützenberger, M. (eds.),
Symposium on Automatic Demonstration, Volume 125 of Lecture Notes in Mathemat-
ics, pp. 29–61. Springer-Verlag.
de Bruijn, N. G. (1980) A survey of the project AUTOMATH. In Seldin, J. P. and
Hindley, J. R. (eds.), To H. B. Curry: Essays in Combinatory Logic, Lambda Calculus,
and Formalism, pp. 589–606. Academic Press.
DeMillo, R., Lipton, R., and Perlis, A. (1979) Social processes and proofs of theorems
and programs. Communications of the ACM, 22, 271–280.
Diaconescu, R. (1975) Axiom of choice and complementation. Proceedings of the Amer-
ican Mathematical Society, 51, 176–178.
Dijkstra, E. W. (1976) A Discipline of Programming. Prentice-Hall.
Downey, P. J., Sethi, R., and Tarjan, R. (1980) Variations on the common subexpression
problem. Journal of the ACM, 27, 758–771.
Eén, N. and Sörensson, N. (2003) An extensible SAT-solver. In Giunchiglia, E. and Tac-
chella, A. (eds.), Theory and Applications of Satisfiability Testing: 6th International
Conference SAT 2003, Volume 2919 of Lecture Notes in Computer Science, pp. 502–
518. Springer-Verlag.
Emerson, E. A. and Halpern, J. Y. (1986) “sometimes” and “not never” revisited: on
branching time versus linear time temporal logic. Journal of the ACM, 33, 151–178.
Enderton, H. B. (1972) A Mathematical Introduction to Logic. Academic Press.
Erdös, P. (1930) Beweis eines Satzes von Tschebyshev. Acta Scientiarum Mathemati-
carum (Szeged), 5, 194–198.
Fetzer, J. H. (1988) Program verification: The very idea. Communications of the ACM,
31, 1048–1063.
Gårding, L. (1997) Some Points of Analysis and Their History, Volume 11 of University
Lecture Series. American Mathematical Society / Higher Education Press.
Gilmore, P. C. (1960) A proof method for quantification theory: Its justification and
realization. IBM Journal of Research and Development, 4, 28–35.
Gödel, K. (1930) Die Vollständigkeit der Axiome des logischen Funktionenkalküls.
Monatshefte für Mathematik und Physik, 37, 349–360. English translation ‘The com-
pleteness of the axioms of the functional calculus of logic’ in van Heijenoort (1967),
pp. 582–591.
Goldberg, E. and Novikov, Y. (2002) BerkMin: a fast and robust Sat-solver. In Kloos,
C. D. and Franca, J. D. (eds.), Design, Automation and Test in Europe Conference and
Exhibition (DATE 2002), Paris, France, pp. 142–149. IEEE Computer Society Press.
Goldschmidt, R. E. (1964) Applications of division by convergence. Master’s thesis,
Dept. of Electrical Engineering, MIT, Cambridge, Mass.
Goodstein, R. L. (1957) Recursive Number Theory. Studies in Logic and the Foundations
of Mathematics. North-Holland.
Gordon, M. (1985) Why higher-order logic is a good formalism for specifying and veri-
fying hardware. Technical Report 77, University of Cambridge Computer Laboratory,
New Museums Site, Pembroke Street, Cambridge, CB2 3QG, UK.
Gordon, M. J. C. (1982) Representing a logic in the LCF metalanguage. In Néel, D.
(ed.), Tools and notions for program construction: an advanced course, pp. 163–185.
Cambridge University Press.
Gordon, M. J. C. and Melham, T. F. (1993) Introduction to HOL: a theorem proving
environment for higher order logic. Cambridge University Press.
Gordon, M. J. C., Milner, R., and Wadsworth, C. P. (1979) Edinburgh LCF: A Mech-
anised Logic of Computation, Volume 78 of Lecture Notes in Computer Science.
Springer-Verlag.
Groote, J. F. (2000) The propositional formula checker Heerhugo. Journal of Automated
Reasoning, 24, 101–125.
Hanna, F. K. and Daeche, N. (1986) Specification and verification using higher-order
logic: A case study. In Milne, G. and Subrahmanyam, P. A. (eds.), Formal Aspects of
VLSI Design: Proceedings of the 1985 Edinburgh Workshop on VLSI, pp. 179–213.
Harrison, J. (1996a) HOL Light: A tutorial introduction. In Srivas, M. and Camilleri,
A. (eds.), Proceedings of the First International Conference on Formal Methods in
Computer-Aided Design (FMCAD’96), Volume 1166 of Lecture Notes in Computer
Science, pp. 265–269. Springer-Verlag.
Harrison, J. (1996b) A Mizar mode for HOL. In Wright, J. v., Grundy, J., and Harrison,
J. (eds.), Theorem Proving in Higher Order Logics: 9th International Conference,
TPHOLs’96, Volume 1125 of Lecture Notes in Computer Science, Turku, Finland, pp.
203–220. Springer-Verlag.
Harrison, J. (1996c) Proof style. In Giménez, E. and Paulin-Mohring, C. (eds.), Types for
Proofs and Programs: International Workshop TYPES’96, Volume 1512 of Lecture
Notes in Computer Science, Aussois, France, pp. 154–172. Springer-Verlag.
Harrison, J. (1999) A machine-checked theory of floating point arithmetic. See Bertot,
Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 113–130.
Harrison, J. (2000a) Formal verification of floating point trigonometric functions. In
Hunt, W. A. and Johnson, S. D. (eds.), Formal Methods in Computer-Aided Design:
Third International Conference FMCAD 2000, Volume 1954 of Lecture Notes in Com-
puter Science, pp. 217–233. Springer-Verlag.
Harrison, J. (2000b) Formal verification of IA-64 division algorithms. In Aagaard, M.
and Harrison, J. (eds.), Theorem Proving in Higher Order Logics: 13th International
Conference, TPHOLs 2000, Volume 1869 of Lecture Notes in Computer Science, pp.
234–251. Springer-Verlag.
Harrison, J. (2009) Handbook of Practical Logic and Automated Reasoning. Cambridge
University Press.
Harrison, J. and Théry, L. (1998) A sceptic’s approach to combining HOL and Maple.
Journal of Automated Reasoning, 21, 279–294.
Harvey, W. and Stuckey, P. (1997) A unit two variable per inequality integer constraint
solver for constraint logic programming. Australian Computer Science Communica-
tions, 19, 102–111.
Hörmander, L. (1983) The Analysis of Linear Partial Differential Operators II, Volume
257 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag.
Huth, M. and Ryan, M. (1999) Logic in Computer Science: Modelling and reasoning
about systems. Cambridge University Press.
IEEE (1985) Standard for binary floating point arithmetic. ANSI/IEEE Standard 754-
1985, The Institute of Electrical and Electronic Engineers, Inc., 345 East 47th Street,
New York, NY 10017, USA.
Iwashita, H., Nakata, T., and Hirose, F. (1996) CTL model checking based on forward
state traversal. In Proceedings of tte IEEE/ACM conference on Computer Aided De-
sign (ICCAD ’96), pp. 82–87. Association for Computing Machinery.
Joyce, J. J. and Seger, C. (1993) The HOL-Voss system: Model-checking inside a
general-purpose theorem-prover. In Joyce, J. J. and Seger, C. (eds.), Proceedings of
the 1993 International Workshop on the HOL theorem proving system and its applica-
tions, Volume 780 of Lecture Notes in Computer Science, UBC, Vancouver, Canada,
pp. 185–198. Springer-Verlag.
Kaufmann, M., Manolios, P., and Moore, J. S. (2000) Computer-Aided Reasoning: An
Approach. Kluwer.
Knaster, B. (1927) Un théorème sur les fonctions d’ensembles. Annales de la Société
Polonaise de Mathématique, 6, 133–134. Volume published in 1928.
Kowalski, R. (1975) A proof procedure using connection graphs. Journal of the ACM,
22, 572–595.
Kozen, D. (1983) Results on the propositional µ-calculus. Theoretical Computer Science,
27, 333–354.
Kreisel, G. and Krivine, J.-L. (1971) Elements of mathematical logic: model the-
ory (Revised second ed.). Studies in Logic and the Foundations of Mathematics.
North-Holland. First edition 1967. Translation of the French ‘Eléments de logique
mathématique, théorie des modeles’ published by Dunod, Paris in 1964.
Kroening, D. and Strichman, O. (2008) Decision Procedures: An Algorithmic Point of
View. Springer-Verlag.
Kropf, T. (1999) Introduction to Formal Hardware Verification. Springer-Verlag.
Lahiri, S. K. and Bryant, R. E. (2004) Indexed predicate discovery for unbounded system
verification. See Alur and Peled (2004), pp. 135–147.
Landin, P. J. (1966) The next 700 programming languages. Communications of the ACM,
9, 157–166.
Lee, C. Y. (1959) Representation of switching circuits by binary-decision programs. Bell
System Technical Journal, 38, 985–999.
Lifschitz, V. (1986) Mechanical Theorem Proving in the USSR: the Leningrad School.
Monograph Series on Soviet Union. Delphic Associates, 7700 Leesburg Pike, #250,
Falls Church, VA 22043. See also ‘What is the inverse method?’ in the Journal of
Automated Reasoning, vol. 5, pp. 1–23, 1989.
Łojasiewicz, S. (1964) Triangulations of semi-analytic sets. Annali della Scuola Normale
Superiore di Pisa, ser. 3, 18, 449–474.
Loveland, D. W. (1968) Mechanical theorem-proving by model elimination. Journal of
the ACM, 15, 236–251.
Loveland, D. W. (1978) Automated theorem proving: a logical basis. North-Holland.
MacKenzie, D. (2001) Mechanizing Proof: Computing, Risk and Trust. MIT Press.
Markstein, P. (2000) IA-64 and Elementary Functions: Speed and Precision. Prentice-
Hall.
Markstein, P. W. (1990) Computation of elementary functions on the IBM RISC Sys-
tem/6000 processor. IBM Journal of Research and Development, 34, 111–119.
Maslov, S. J. (1964) An inverse method of establishing deducibility in classical predicate
calculus. Doklady Akademii Nauk, 159, 17–20.
McMillan, K. L. (2003) Interpolation and SAT-based model checking. In Hunt, W. A.
and Somenzi, F. (eds.), Computer Aided Verification, 15th International Conference,
CAV 2003, Volume 2725 of Lecture Notes in Computer Science, Boulder, CO, pp.
1–13. Springer-Verlag.
Melham, T. and Darbari, A. (2002) Symbolic trajectory evaluation in a nutshell. Unpub-
lished, available from the authors.
Mendelson, E. (1987) Introduction to Mathematical Logic (Third ed.). Mathematics
series. Wadsworth and Brooks Cole.
Monk, J. D. (1976) Mathematical logic, Volume 37 of Graduate Texts in Mathematics.
Springer-Verlag.
Moskewicz, M. W., Madigan, C. F., Zhao, Y., Zhang, L., and Malik, S. (2001) Chaff:
Engineering an efficient SAT solver. In Proceedings of the 38th Design Automation
Conference (DAC 2001), pp. 530–535. ACM Press.
Muller, J.-M. (2006) Elementary functions: Algorithms and Implementation (2nd ed.).
Birkhäuser.
Nelson, G. and Oppen, D. (1979) Simplification by cooperating decision procedures.
ACM Transactions on Programming Languages and Systems, 1, 245–257.
Nelson, G. and Oppen, D. (1980) Fast decision procedures based on congruence closure.
Journal of the ACM, 27, 356–364.
O’Leary, J., Zhao, X., Gerth, R., and Seger, C.-J. H. (1999) Formally verifying IEEE
compliance of floating-point hardware. Intel Technology Journal, 1999-Q1, 1–14.
Available on the Web as http://download.intel.com/technology/itj/q11999/
pdf/floating_point.pdf.
Owre, S., Rushby, J. M., and Shankar, N. (1992) PVS: A prototype verification system. In
Kapur, D. (ed.), 11th International Conference on Automated Deduction, Volume 607
of Lecture Notes in Computer Science, Saratoga, NY, pp. 748–752. Springer-Verlag.
Paulson, L. C. (1994) Isabelle: a generic theorem prover, Volume 828 of Lecture Notes
in Computer Science. Springer-Verlag. With contributions by Tobias Nipkow.
Peled, D. A. (2001) Software Reliability Methods. Springer-Verlag.
Pixley, C. (1990) A computational theory and implementation of sequential hardware
equivalence. In Proceedings of the DIMACS workshop on Computer Aided Verifica-
tion, pp. 293–320. DIMACS (technical report 90-31).
Pnueli, A. (1977) The temporal logic of programs. In Proceedings of the 18th IEEE
Symposium on Foundations of Computer Science, pp. 46–67.
Prawitz, D., Prawitz, H., and Voghera, N. (1960) A mechanical proof procedure and its
realization in an electronic computer. Journal of the ACM, 7, 102–128.
Presburger, M. (1930) Über die Vollständigkeit eines gewissen Systems der Arithmetik
ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. In Spra-
wozdanie z I Kongresu metematyków slowiańskich, Warszawa 1929, pp. 92–101, 395.
Warsaw. Annotated English version by Stansifer (1984).
Queille, J. P. and Sifakis, J. (1982) Specification and verification of concurrent programs
in CESAR. In Proceedings of the 5th International Symposium on Programming,
Volume 137 of Lecture Notes in Computer Science, pp. 195–220. Springer-Verlag.
Rabin, M. O. (1991) Decidable theories. In Barwise, J. and Keisler, H. (eds.), Hand-
book of mathematical logic, Volume 90 of Studies in Logic and the Foundations of
Mathematics, pp. 595–629. North-Holland.
Rajan, S., Shankar, N., and Srivas, M. K. (1995) An integration of model-checking with
automated proof-checking. In Wolper, P. (ed.), Computer-Aided Verification: CAV
’95, Volume 939 of Lecture Notes in Computer Science, Liege, Belgium, pp. 84–97.
Springer-Verlag.
Reuß, H. and Shankar, N. (2001) Deconstructing Shostak. In Proceedings of the Six-
teenth Annual IEEE Symposium on Logic in Computer Science, pp. 19–28. IEEE Com-
puter Society Press.
Robinson, A. (1957) Proving a theorem (as done by man, logician, or machine). In
Summaries of Talks Presented at the Summer Institute for Symbolic Logic. Second
edition published by the Institute for Defense Analysis, 1960. Reprinted in Siekmann
and Wrightson (1983), pp. 74–76.
Robinson, J. A. (1965) A machine-oriented logic based on the resolution principle. Jour-
nal of the ACM, 12, 23–41.
Russell, B. (1968) The autobiography of Bertrand Russell. Allen & Unwin.
Russinoff, D. (1998) A mechanically checked proof of IEEE compliance of a register-
transfer-level specification of the AMD-K7 floating-point multiplication, division, and
square root instructions. LMS Journal of Computation and Mathematics, 1, 148–
200. Available on the Web at http://www.russinoff.com/papers/k7-div-sqrt.
html.
Sawada, J. and Gamboa, R. (2002) Mechanical verification of a square root algorithms
using Taylor’s theorem. In Aagaard, M. and O’Leary, J. (eds.), Formal Methods
in Computer-Aided Design: Fourth International Conference FMCAD 2002, Volume
2517 of Lecture Notes in Computer Science, pp. 274–291. Springer-Verlag.
Scarpellini, B. (1969) On the metamathematics of rings and integral domains. Transac-
tions of the American Mathematical Society, 138, 71–96.
Seger, C. and Joyce, J. J. (1991) A two-level formal verification methodology using HOL
and COSMOS. Technical Report 91-10, Department of Computer Science, University
of British Columbia, 2366 Main Mall, University of British Columbia, Vancouver,
B.C, Canada V6T 1Z4.
Seger, C.-J. H. and Bryant, R. E. (1995) Formal verification by symbolic evaluation of
partially-ordered trajectories. Formal Methods in System Design, 6, 147–189.
Sheeran, M. and Stålmarck, G. (2000) A tutorial on Stålmarck’s proof procedure for
propositional logic. Formal Methods in System Design, 16, 23–58.
Shostak, R. (1978) An algorithm for reasoning about equality. Communications of the
ACM, 21, 356–364.
Shostak, R. (1984) Deciding combinations of theories. Journal of the ACM, 31, 1–12.
Siekmann, J. and Wrightson, G. (eds.) (1983) Automation of Reasoning — Classical
Papers on Computational Logic, Vol. I (1957-1966). Springer-Verlag.
Simmons, H. (1970) The solution of a decision problem for several classes of rings.
Pacific Journal of Mathematics, 34, 547–557.
Skolem, T. (1922) Einige Bemerkungen zur axiomatischen Begründung der Mengen-
lehre. In Matematikerkongress i Helsingfors den 4–7 Juli 1922, Den femte skan-
dinaviska matematikerkongressen, Redogörelse. Akademiska Bokhandeln, Helsinki.
English translation “Some remarks on axiomatized set theory” in van Heijenoort
(1967), pp. 290–301.
Smoryński, C. (1980) Logical Number Theory I: An Introduction. Springer-Verlag.
Stålmarck, G. (1994) System for determining propositional logic theorems by applying
values and rules to triplets that are generated from Boolean formula. United States
Patent number 5,276,897; see also Swedish Patent 467 076.
Stålmarck, G. and Säflund, M. (1990) Modeling and verifying systems and software in
propositional logic. In Daniels, B. K. (ed.), Safety of Computer Control Systems, 1990
(SAFECOMP ’90), Gatwick, UK, pp. 31–36. Pergamon Press.
Stansifer, R. (1984) Presburger’s article on integer arithmetic: Remarks and translation.
Technical Report CORNELLCS:TR84-639, Cornell University Computer Science De-
partment.
Syme, D. (1997) DECLARE: A prototype declarative proof system for higher order
logic. Technical Report 416, University of Cambridge Computer Laboratory, New
Museums Site, Pembroke Street, Cambridge, CB2 3QG, UK.
Tarski, A. (1951) A Decision Method for Elementary Algebra and Geometry. University
of California Press. Previous version published as a technical report by the RAND
Corporation, 1948; prepared for publication by J. C. C. McKinsey. Reprinted in Cavi-
ness and Johnson (1998), pp. 24–84.
Tarski, A. (1955) A lattice-theoretical fixpoint theorem and its applications. Pacific Jour-
nal of Mathematics, 5, 285–309.
Tarski, A. (1965) A simplified formalization of predicate logic with identity. Arkhiv für
mathematische Logik und Grundlagenforschung, 7, 61–79.
Trybulec, A. (1978) The Mizar-QC/6000 logic information language. ALLC Bulletin
(Association for Literary and Linguistic Computing), 6, 136–140.
Trybulec, A. and Blair, H. A. (1985) Computer aided reasoning. In Parikh, R. (ed.),
Logics of Programs, Volume 193 of Lecture Notes in Computer Science, Brooklyn,
pp. 406–412. Springer-Verlag.
Turing, A. M. (1936) On computable numbers, with an application to the Entschei-
dungsproblem. Proceedings of the London Mathematical Society (2), 42, 230–265.
van Heijenoort, J. (ed.) (1967) From Frege to Gödel: A Source Book in Mathematical
Logic 1879–1931. Harvard University Press.
Wang, H. (1960) Toward mechanical mathematics. IBM Journal of Research and Devel-
opment, 4, 2–22.
Wenzel, M. (1999) Isar - a generic interpretive approach to readable formal proof docu-
ments. See Bertot, Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 167–183.
Whitehead, A. N. and Russell, B. (1910) Principia Mathematica (3 vols). Cambridge
University Press.
Wiedijk, F. (2001) Mizar light for HOL Light. In Boulton, R. J. and Jackson, P. B. (eds.),
14th International Conference on Theorem Proving in Higher Order Logics: TPHOLs
2001, Volume 2152 of Lecture Notes in Computer Science, pp. 378–394. Springer-
Verlag.
Yang, J. (2000) A theory for generalized symbolic trajectory evaluation. In Proceedings
of the 2000 Symposium on Symbolic Trajectory Evaluation, Chicago. Available via
http://www.intel.com/research/scl/stesympsite.htm.
Zammit, V. (1999) On the implementation of an extensible declarative proof language.
See Bertot, Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 185–202.

You might also like