Mark 10
Mark 10
Abstract. These lectures are intended to give a broad overview of the most im-
portant formal verification techniques. These methods are applicable to hardware,
software, protocols etc. and even to checking proofs in pure mathematics, though
I will emphasize the applications to verification. Since other lecturers will cover
some of the topics in more detail (in particular model checking), my lectures will
cover mainly the material on deductive theorem proving, which in any case is my
special area of interest. The arrangement of material in these notes is roughly in or-
der of logical complexity, starting with methods for propositional logic and leading
up to general theorem proving, then finishing with a description of my own theorem
proving program ‘HOL Light’ and an extended case study on the verification of a
floating-point square root algorithm used by Intel.
  1.   Propositional logic
  2.   Symbolic simulation
  3.   Model checking
  4.   Automated theorem proving
  5.   Arithmetical theories
  6.   Interactive theorem proving
  7.   HOL Light
  8.   Case study of floating-point verification
   The treatment of the various topics is quite superficial, with the aim being overall
perspective rather than in-depth understanding. The last lecture is somewhat more
detailed and should give a good feel for the realities of formal verification in this
field.
   My book on theorem proving (Harrison 2009) covers many of these topics
in more depth, together with simple-minded example code (written in Objective
Caml) implementing the various techniques.
   http://www.cl.cam.ac.uk/˜jrh13/atp/index.html
Other useful references for the material here are (Bradley and Manna 2007; Clarke,
Grumberg, and Peled 1999; Kroening and Strichman 2008; Kropf 1999; MacKen-
zie 2001; Peled 2001).
0. Introduction
As most programmers know to their cost, writing programs that function correctly in all
circumstances — or even saying what that means — is difficult. Most large programs
contain ‘bugs’. In the past, hardware has been substantially simpler than software, but
this difference is eroding, and current leading-edge microprocessors are also extremely
complex and usually contain errors. It has often been noted that mere testing, even on
clever sets of test cases, is usually inadequate to guarantee correctness on all inputs, since
the number of possible inputs and internal states, while finite, is usually astronomically
large. For example (Dijkstra 1976):
    As I have now said many times and written in many places: program testing can be
    quite effective for showing the presence of bugs, but is hopelessly inadequate for
    showing their absence.
     The main alternative to testing is formal verification, where it is rigorously proved
that the system functions correctly on all possible inputs. This involves forming mathe-
matical models of the system and its intended behaviour and linking the two:
Actual requirements
                                                6
                                  Mathematical specification
                                     Mathematical model
                                                6
                                         Actual system
The facts that (i) getting a mathematical proof right is also difficult (DeMillo, Lipton, and
Perlis 1979), and (ii) correctness of formal models does not necessarily imply correctness
of the actual system (Fetzer 1988) caused much pointless controversy in the 70s. It is
now widely accepted that formal verification, with the proof itself checked by machine,
gives a much greater degree of confidence than traditional techniques.
     The main impediment to greater use of formal verification is not these generalist
philosophical objections, but just the fact that it’s rather difficult. Only in a few iso-
lated safety-critical niches of the software industry is any kind of formal verification
widespread, e.g. in avionics. But in the hardware industry, formal verification is widely
practised, and increasingly seen as necessary. We can identify at least three reasons:
    • Hardware is designed in a more modular way than most software. Constraints of
      interconnect layering and timing means that one cannot really design ‘spaghetti
      hardware’.
    • More proofs in the hardware domain can be largely automated, reducing the need
      for intensive interaction by a human expert with the mechanical theorem-proving
      system.
    • The potential consequences of a hardware error are greater, since such errors often
      cannot be patched or worked around, and may in extremis necessitate a hardware
      replacement.
      To emphasize the last point, an error in the FDIV (floating-point division) instruction
of some early Intel Intel Pentium processors resulted in 1994/5 in a charge to Intel of
approximately $500M. Given this salutary lesson, and the size and diversity of its market,
it’s therefore understandable that Intel is particularly interested in formal verification.
There are a number of different formal verification techniques used in industry. Roughly
speaking, these use different logics to achieve a particular balance between generality and
automation. At one end of the scale, propositional logic has limited expressiveness, but
(amazingly) efficient decision methods are available. At the other end, higher-order logic
is very expressive but no efficient decision methods are available (in fact, higher-order
validity is not even recursively enumerable) and therefore proofs must be constructed
interactively with extensive user guidance. The lectures that follow are organized on the
same basis, with propositional logic first, followed by symbolic simulation and temporal
logic model checking, and finally general theorem proving at the end.
     The particular method that is most appropriate for a given task may depend on de-
tails of that application. Needless to say, the effective combination of all methods is at-
tractive, and there has been much interest in this idea (Seger and Joyce 1991; Joyce and
Seger 1993; Rajan, Shankar, and Srivas 1995). Another ‘hybrid’ application of formal
verification is to prove correctness of some of the CAD tools used to produce hardware
designs (Aagaard and Leeser 1994) or even of the abstraction and reduction algorithms
used to model-check large or infinite-state systems (Chou and Peled 1999).
     There has been a recent trend has towards partial verification, not necessarily seek-
ing a complete proof of correctness, or even formulating what that means, but still prov-
ing important properties using techniques from formal verification. A common approach
is to replace the actual system with an abstraction that may lose many details while still
exposing some key aspect(s) that are then more tractable to reason about. For exam-
ple, Microsoft’s Static Driver Verifier forms an abstraction of code for device drivers
and checks compliance with various properties like the proper release of allocated re-
sources, using symbolic representations and arithmetical theorem proving. The Airbus
A380 flight control software has been subjected to abstract interpretation to prove rig-
orously that, subject to natural environmental constraints, no floating-point overflow ex-
ceptions can occur.
     A widespread perception is that formal verification is an impractical pipedream with
no connection to traditional methods. I prefer to see complete formal verification as the
extreme point on a spectrum that takes in various intermediate possibilities. Some static
checking of programs is already standard, if only via typechecking of the programming
language or tools like lint and its derivatives. Examples like SDV and the A380 veri-
fication show that the intermediate space is quite rich in possibilities. Thus, even if one
neglects the numerous successes of complete functional verification, verification tech-
nology should be considered a natural component of programming practice.
1. Propositional logic
I assume that everyone knows what propositional logic is. The following is mainly in-
tended to establish notation. Propositional logic deals with basic assertions that may be
either true or false and various ways of combining them into composite propositions,
without analyzing their internal structure. It is very similar to Boole’s original algebra of
logic, and for this reason the field is sometimes called Boolean algebra instead. In fact,
the Boolean notation for propositional operators is still widely used by circuit designers.
At a particular time-step, we can regard each internal or external wire in a (binary) digital
computer as having a Boolean value, ‘false’ for 0 and ‘true’ for 1, and think of each cir-
cuit element as a Boolean function, operating on the values on its input wire(s) to produce
a value at its output wire. The most basic building-blocks of computers used by digital
designers, so-called logic gates, correspond closely to the usual logical connectives. For
example an ‘AND gate’ is a circuit element with two inputs and one output whose output
wire will be high (true) precisely if both the input wires are high, and so it corresponds
exactly in behaviour to the ‘and’ (‘∧’) connective. Similarly a ‘NOT gate’ (or inverter)
has one input wire and one output wire, which is high when the input is low and low
when the input is high; hence it corresponds to the ‘not’ connective (‘¬’). Thus, there is
a close correspondence between digital circuits and formulas which can be sloganized as
follows:
                         Digital design    Propositional Logic
                         circuit           formula
                         logic gate        propositional connective
                         input wire        atom
                         internal wire     subexpression
                         voltage level     truth value
     An important issue in circuit design is proving that two circuits have the same func-
tion, i.e. give identical results on all inputs. This arises, for instance, if a designer makes
some special optimizations to a circuit and wants to check that they are “safe”. Using
the above correspondences, we can translate such problems into checking that a number
of propositional formulas Pn ⇔ Pn0 are tautologies. Slightly more elaborate problems in
circuit design (e.g. ignoring certain ‘don’t care’ possibilities) can also be translated to
tautology-checking. Thus, efficient methods for tautology checking directly yield useful
tools for hardware verification.
Tautology checking
Most high-performance SAT checkers are based on the venerable Davis-Putnam algo-
rithm (Davis and Putnam 1960), or more accurately on the ‘DPLL’ algorithm, a later
improvement (Davis, Logemann, and Loveland 1962).
     The starting-point is to put the formula to be tested for satisfiability in ‘conjunctive
normal form’. A formula is said to be in conjunctive normal form (CNF) when it is an
‘and of ors’, i.e. of the form:
                                      C1 ∧C2 ∧ · · · ∧Cn
and all the li j ’s literals, i.e. primitive propositions or their negations. The individual con-
juncts Ci of a CNF form are often called clauses. We usually consider these as sets,
since both conjunction and disjunction are associative, commutative and idempotent, so
it makes sense to talk of ⊥ as the empty clause. If Ci consists of one literal, it is called
a unit clause. Dually, disjunctive normal form (DNF) reverses the role of the ‘and’s and
‘or’s. These special forms are analogous to ‘fully factorized’ and ‘fully expanded’ in
ordinary algebra — think of (x + 1)(x + 2)(x + 3) as CNF and x3 + 6x2 + 11x + 6 as
DNF. Again by analogy with algebra, we can always translate a formula into CNF by
repeatedly rewriting with equivalences like:
          ¬(¬p) ⇔ p
        ¬(p ∧ q) ⇔ ¬p ∨ ¬q
        ¬(p ∨ q) ⇔ ¬p ∧ ¬q
       p ∨ (q ∧ r) ⇔ (p ∨ q) ∧ (p ∨ r)
      (p ∧ q) ∨ r ⇔ (p ∨ r) ∧ (q ∨ r)
     However, this in itself can cause the formula to blow up exponentially before we
even get to the main algorithm, which is hardly a good start. One can do better by in-
troducing new variables to denote subformulas, and putting the resulting list of equiv-
alences into CNF — so-called definitional CNF. It’s not hard to see that this preserves
satisfiability. For example, we start with the formula:
(p ∨ (q ∧ ¬r)) ∧ s
                                       (p1 ⇔ q ∧ ¬r)∧
                                       (p2 ⇔ p ∨ p1 )∧
                                       (p3 ⇔ p2 ∧ s)∧
                                       p3
The above simple-minded sketch of the DPLL algorithm leaves plenty of room for im-
provement. The choice of case-splitting variable is often critical, the formulas can be rep-
resented in a way that allows for efficient implementation, and the kind of backtracking
that arises from case splits can be made more efficient via ‘intelligent backjumping’ and
‘conflict clauses’. A few recent highly efficient DPLL-based theorem provers are Chaff
(Moskewicz, Madigan, Zhao, Zhang, and Malik 2001), BerkMin (Goldberg and Novikov
2002) and and MiniSat (Eén and Sörensson 2003).
     Another interesting technique that is used in the tools from Prover Technology
(www.prover.com), as well as the experimental system Heerhugo (Groote 2000), is
Stålmarck’s dilemma rule (Stålmarck and Säflund 1990). This involves using case-splits
in a non-nested fashion, accumulating common information from both sides of a case
split and feeding it back:
                                              ∆
                                                 @
                                                     @
                                                      @
                                                         R
                                                         @
                                ∆ ∪ {¬p}                ∆ ∪ {p}
                                   R                        R
                                    ?                       ?
                                ∆ ∪ ∆0                  ∆ ∪ ∆1
                                       @
                                        @
                                            @
                                             R
                                             @
                                       ∆ ∪ (∆0 ∩ ∆1 )
    In some cases, this works out much better than the usual DPLL algorithm. For a
nice introduction, see (Sheeran and Stålmarck 2000). Note that this method is covered
by patents (Stålmarck 1994).
2. Symbolic simulation
As noted, testing is also widely used in hardware design, and until a decade or so ago
was the exclusive way of trying to establish correctness. Usually what is tested is not
the circuit itself but some formal model — the whole idea is to flush out errors before
the expensive step of building the hardware. The usual term for this kind of testing is
simulation. We’ll now consider two refinements of basic simulation, then show how they
can profitably be combined.
Ternary simulation
      T = {}
       0 = {0}
       1 = {1}
      X = {0, 1}
As expected, the truth tables are monotonic with respect to this ordering:
Symbolic simulation
Since we are working in an abstract model, we can easily generalize conventional simula-
tion, which tests on particular combinations of 1s and 0s, to symbolic simulation (Carter,
Joyner, and Brand 1979), where variables are used for some or all of the inputs, and the
outputs are correspondingly symbolic expressions rather than simple truth-values. Test-
ing equivalence then becomes testing of the Boolean expressions. Thus, if we use vari-
ables for all inputs, we essentially get back to combinational equivalence checking as we
considered before. The main differences are:
    • We may use a more refined circuit model, rather than a simple Boolean switch
      model
    • We explicitly compute the expressions that are the “values” at the internal wires
      and at the outputs as functions of the input variables, rather than just asserting
      relations between them.
     Instead of using general propositional formulas to represent the signals, we can use
a canonical representation, where equivalent formulas are represented by the same data
structure. Equivalence checking then becomes cheap. The most important such represen-
tation is BDDs (binary decision diagrams). Symbolic simulation often does work quite
well with BDDs and allows many circuits to be verified exhaustively (Bryant 1985).
BDDs
q q
r r r r
T F T T T T T T
     However, there are many common subtrees here, and we can imagine sharing them
as much as possible to obtain a directed acyclic graph. Although this degree of saving
may be atypical, there is always some potential for sharing, at least in the lowest level of
                                       k
the tree. There are after all “only” 22 possible subtrees involving k variables, in particular
only two possible leaf nodes ‘true’ and ‘false’.
                                                    p
T F
A still more refined (and at first sight surprising) approach is to combine ternary and
symbolic simulation, using Boolean variables to parametrize ternary values. Consider the
rather artificial example of verifying that a piece of combinational circuitry with 7 inputs
and one output implements a 7-input AND gate. In conventional simulation we would
need to check all 27 = 128 input values. In symbolic simulation we would only need to
check one case, but that case is a symbolic expression that may be quite large. In ternary
simulation, we could verify the circuit only from correct results in 8 explicit cases:
                               0     X   X    X    X    X    X
                               X     0   X    X    X    X    X
                               X     X   0    X    X    X    X
                               X     X   X    0    X    X    X
                               X     X   X    X    0    X    X
                               X     X   X    X    X    0    X
                               X     X   X    X    X    X    0
                               1     1   1    1    1    1    1
     In a combined approach, we can parametrize these 8 test cases using just 3 Boolean
variables (since 23 > 8), say p, q and r. For example, we can represent the input wires as
                                   
                                    1 if p ∧ q ∧ r
                               a0 = 0 if ¬p ∧ ¬q ∧ ¬r
                                     X otherwise
                                   
and
                                    
                                     1 if p ∧ q ∧ r
                                a1 = 0 if ¬p ∧ ¬q ∧ r
                                      X otherwise
                                    
and so on until:
                                       
                                        1 if p ∧ q ∧ r
                                   a6 = 0 if p ∧ q ∧ ¬r
                                         X otherwise
                                       
     Symbolic trajectory evaluation (STE), introduced in (Seger and Bryant 1995), uses
this kind of parametrization of circuit states in terms of Boolean variables, and a special
logic for making correctness assertions, similar to but more restricted than the tempo-
ral logics we consider in the next lecture. Recently a more general form of STE known
as generalized symbolic trajectory evaluation (GSTE) has been developed (Yang 2000).
This can use STE-like methods to verify so-called ω-regular properties, optionally sub-
ject to fairness constraints, and so represents a substantial extension of its scope.
     STE turns out to be very useful in partially automating the formal verification of
some circuits. Part of the appeal is that the use of the ternary model gives a relatively
easy way of abstracting out irrelevant detail. STE has been extensively used in formal
verification at Intel. For one example, see (O’Leary, Zhao, Gerth, and Seger 1999). For a
good introduction to the theory behind STE, see (Melham and Darbari 2002).
3. Reachability and model checking
We’ve mainly been concerned so far with analyzing combinational circuits, although
symbolic simulation and STE can be used to track signals over any fixed finite number
of clocks. However, it is sometimes of interest to ask questions like ‘can a circuit ever
get into a state where . . . ’ or ‘if line r (request) goes high at some point, must line a
(acknowledge) eventually do so too?’. Dealing with queries over an unbounded time
period like this requires the use of some new techniques.
      We can abstract and generalize from the particular features of synchronous sequen-
tial digital circuits by considering them as particular cases of a finite state transition sys-
tem or finite state machine (FSM). Such a system consists of a finite ‘state space’ S to-
gether with a binary relation R ⊆ S × S describing the possible transitions, where R(a, b)
is true iff it is possible for the system to make a transition from state a to state b in one
time-step.1
      For example, consider a circuit with three latches v0 , v1 and v2 that is supposed to
implement a modulo-5 counter. The state of the system is described by the values of
these latches, so we can choose S = {0, 1}×{0, 1}×{0, 1}. The corresponding transition
relation can be enumerated as follows:
        (0, 0, 0) → (0, 0, 1)
        (0, 0, 1) → (0, 1, 0)
        (0, 1, 0) → (0, 1, 1)
        (0, 1, 1) → (1, 0, 0)
        (1, 0, 0) → (0, 0, 0)
     The transition systems arising from modelling circuits in this way have the special
feature that the transition relation is deterministic, i.e. for each state a there is at most
one state b such that R(a, b). However, it’s useful to consider state transition systems in
general without this restriction, since the same methods can then be applied to a vari-
ety of other practically interesting situations, such as the analysis of parallel or nonde-
terministic hardware, programs or protocols. For example, it is often helpful in analyz-
ing synchronization and mutual exclusion in systems with concurrent and/or interleaved
components.
Many interesting questions are then of the form: if we start in a state σ0 ∈ S0 , can we
ever reach a state σ1 ∈ S1 ? For example, if we start the counter in state (0, 0, 0), will it
eventually return to the same state? (Fairly obviously, the answer is yes.) In general, we
call questions like this reachability questions. In order to answer these questions, we can
simply construct the reflexive-transitive closure R∗ of the transition relation R, and see if
  1 Sometimes, one also considers a set of possible ‘initial’ and ‘final’ states to be part of the state transition
system, but we prefer to consider the question of the starting and finishing states separately.
it links any pairs of states of interest. In principle, there is no difficulty, since the state
space is finite.
      Suppose that the set of starting states is S0 ; we will consider this the first in an
infinite sequence (Si ) of sets of states where Si is the set of states reachable from S0 in
6 i transitions. Clearly we can compute Si+1 from Si by the following recursion:
     We can simply characterize Ai as the set of states from which a state in P is reachable
in 6 i steps. As before, we have Ai ⊆ Ai+1 and we must eventually reach a fixpoint that
is the required set P∗ . A formally appealing variant is to use A0 = 0/ instead, and the first
iteration will give A1 = P and thereafter we will obtain the same sequence offset by 1
and the same eventual fixpoint P∗ . This helps to emphasize that we can consider this a
case of finding the least fixpoint of the monotone mapping on sets of states A 7→ P ∪ {a |
∃b ∈ A. R(a, b)} à la Knaster-Tarski (Knaster 1927; Tarski 1955).
     Is there any reason to prefer forward or backward reachability over the other?
Though they will always give the same answer if successfully carried out, one or the
other may be much more efficient, depending on the particular system and properties
concerned (Iwashita, Nakata, and Hirose 1996). We will follow the most common ap-
proach and focus on backward reachability. This is not because of efficiency concerns,
but because, as we will see shortly, it generalizes nicely to analyzing more complicated
‘future modalities’ than just reachability.
In practice, the kind of fixed-point computation sketched in the previous section can be
impractical, because the enumeration of cases in the state set or transition relation is too
large. Instead, we can use the techniques we have already established. Instead of rep-
resenting sets of states and state transitions by explicit enumeration of cases, we can
do it symbolically, using formulas. Suppose we encode states using Boolean variables
v1 , . . . , vn . Any T ⊆ S can be identified with a Boolean formula t[v1 , . . . , vn ] that holds
precisely when (v1 , . . . , vn ) ∈ T . Moreover, by introducing n additional ‘next state’ vari-
ables v01 , . . . , v0n we can represent a binary relation A (such as the transition relation R) by
a formula a[v1 , . . . , vn , v01 , . . . , v0n ] that is true iff A((v1 , . . . , vn ), (v01 , . . . , v0n )). For example,
we can represent the transition relation for the modulo-5 counter using (among other
possibilities) the formula:
    Here the parametrization in terms of Boolean variables just uses one variable for
each latch, but this choice, while the most obvious, is not obligatory.
One immediate advantage of a symbolic state representation is that we can ask questions
about k-step transition sequences without risking a huge blowup in the representation, as
might happen in a canonical BDD representation used in symbolic simulation. We can
simply duplicate the original formula k times with k sets of the n variables, and can ask
any question about the relation between the initial and final states expressed as a Boolean
formula p[v11 , . . . , v1n , vk1 , . . . , vkn ]:
In the case of unbounded queries, the fixpoint computation can easily be performed di-
rectly on the symbolic representation. Traditionally, a canonical format such as BDDs
has been used (Coudert, Berthet, and Madre 1989; Burch, Clarke, McMillan, Dill, and
Hwang 1992; Pixley 1990). This can immediately cause problems if the BDD repre-
sentation is infeasibly large, and indeed there have been recent explorations using non-
canonical representations (Bjesse 1999; Abdulla, Bjesse, and Eén 2000). On the other
hand, the fact that BDDs are a canonical representation means that if we repeat the iter-
ative steps leading through a sequence Si , we can immediately recognize when we have
reached a fixed point simply by seeing if the BDD for Si+1 is identical to that for Si ,
whereas in a non-canonical representation, we would need to perform a tautology check
at each stage.
     Most of the operations used to find the fixpoints are, in the symbolic context, sim-
ply logical operations. For example, union (‘∪’) and intersection (‘∩’) are simply imple-
mented by conjunction (‘∧’) and disjunction (‘∨’). However, we also need the operation
that maps a set P and a transition relation R to the set PreR (P) (or just Pre(P) when R is
understood) of states from which there is a 1-step transition into a state in P:
There are still many interesting questions about transition systems that we can’t answer
using just reachability, the request-acknowledge example above being one simple case.
But the material so far admits of generalization.
     The backward reachability method answered a question ‘starting in a state satisfying
s, can we reach a state satisfying p?’ by computing the set of all states p∗ from which p
is reachable, and then considering whether p∗ ∧ s = ⊥. If we introduce the notation:
EF(p)
to mean ‘the state p is reachable’, or ‘there is some state on some path through the re-
lation in which p holds’, we can consider p∗ as the set of states satisfying the formula
EF(p). We can systematically extend propositional logic in this way with further ‘tem-
poral operators’ to yield a form of so-called temporal logic. The most popular logic in
practice is Computation Tree Logic (CTL), introduced and popularized by Clarke and
Emerson (Clarke and Emerson 1981).
     One formal presentation of the semantics of CTL, using separate classes of ‘path
formulas’ and ‘state formulas’ is the following. (We also need some way of assigning
truth-values to primitive propositions based on a state σ, and we will write [[σ]] for this
mapping applied to σ. In many cases, of course, the properties we are interested in will
be stated precisely in terms of the variables used to encode the states, so a formula is
actually its own denotation in the symbolic representation, modulo the transformation
from CTL to pure propositional logic.) The valuation functions for an arbitrary formula
are then defined as follows for state formulas:
     Although it was convenient to use path formulas above, the appeal of CTL is exactly
that we only really need to consider state formulas, with path formulas just an interme-
diate concept. We can eliminate the separate class of path formulas by just putting to-
gether the path quantifiers and temporal operators in combinations like ‘EF’ and ‘AG’.
A typical formula in this logic is the request-acknowledge property:
AG(a ⇒ EG(r))
     The process of testing whether a transition system (defining a model) satisfies a tem-
poral formula is called model checking (Clarke and Emerson 1981; Queille and Sifakis
1982). The process can be implemented by a very straightforward generalization of the
fixpoint methods used in backwards reachability. The marriage of Clarke and Emerson’s
original explicit-state model-checking algorithms with a BDD-based symbolic represen-
tation, due to McMillan (Burch, Clarke, McMillan, Dill, and Hwang 1992), gives sym-
bolic model checking, and which has led to substantially wider practical applicability. Al-
though at Intel, STE is often more useful, there are situations, e.g. the analysis of bus and
cache protocols, where the greater generality of temporal logic model checking seems
crucial.
     For a detailed discussion of temporal logic model checking, see (Clarke, Grumberg,
and Peled 1999). The topic is also covered in some books on logic in computer sci-
ence like (Huth and Ryan 1999) and formal verification texts like (Kropf 1999). CTL is
just one example of a temporal logic, and there are innumerable other varieties such as
LTL (Pnueli 1977), CT L∗ (Emerson and Halpern 1986) and the propositional µ-calculus
(Kozen 1983). LTL stands for linear temporal logic, indicating that it takes a different
point of view in its semantics, considering all paths and not permitting explicit ‘A’ and
‘E’ operators of CTL, but allowing temporal properties to be nested directly without
intervening path quantifiers.
It was noted that one of the reasons for the greater success of formal verification in the
hardware domain is the fact that more of the verification task can be automated. How-
ever, for some more elaborate systems, we cannot rely on tools like equivalence check-
ers, symbolic simulators or model checkers. One potential obstacle — almost invariably
encountered on large industrial designs — is that the system is too large to be verified
in a feasible time using such methods. A more fundamental limitation is that some more
sophisticated properties cannot be expressed at all in the simple Boolean world. For ex-
ample, a floating-point sin function can hardly have its intended behaviour spelled out in
terms of Boolean formulas on the component bits. Instead, we need a more expressive
logical system where more or less any mathematical concepts, including real numbers
and infinite sets, can be analyzed. This is usually formulated in first-order or higher-order
logic, which, as well as allowing propositions to have internal structure, introduces the
universal and existential quantifiers:
    • The formula ∀x. p, where x is an (object) variables and p any formula, means
      intuitively ‘p is true for all values of x’.
    • The analogous formula ∃x. p means intuitively ‘p is true for some value(s) of x’,
      or in other words ‘there exists an x such that p’.
    I will assume that the basic syntax and definitions of first order logic are known.
In contrast to propositional logic, many interesting questions about first order and higher
order logic are undecidable even in principle, let alone in practice. Church (Church 1936)
and Turing (Turing 1936) showed that even pure logical validity in first order logic is
undecidable, introducing in the process many of the basic ideas of computability theory.
On the other hand, it is not too hard to see that logical validity is semidecidable — this is
certainly a direct consequence of completeness theorems for proof systems in first order
logic (Gödel 1930), and was arguably implicit in work by Skolem (Skolem 1922). This
means that we can at least program a computer to enumerate all valid first order formulas.
One simple approach is based on the following logical principle, due to Skolem and
Gödel but usually mis-named “Herbrand’s theorem”:
    Let ∀x1 , . . . , xn . P[x1 , . . . , xn ] be a first order formula with only the indicated universal
    quantifiers (i.e. the body P[x1 , . . . , xn ] is quantifier-free). Then the formula is satisfi-
    able iff the infinite set of ‘ground instances’ p[t1i , . . . ,tni ] that arise by replacing the
    variables by arbitrary variable-free terms made up from functions and constants in
    the original formula is propositionally satisfiable.
     We can get the original formula into the special form required by some simple nor-
mal form transformations, introducing Skolem functions to replace existentially quan-
tified variables. And by compactness for propositional logic, we know that if the infi-
nite set of instances is unsatisfiable, then so will be some finite subset. In principle we
can enumerate all possible sets, one by one, until we find one that is not proposition-
ally satisfiable. (If the formula is satisfiable, we will never discover it by this means. By
undecidability, we know this is unavoidable.) A precise description of this procedure is
tedious, but a simple example may help. Suppose we want to prove that the following is
valid. This is often referred to as the ‘drinker’s principle’, because you can think of it as
asserting that there is some person x such that if x drinks, so does everyone.
    We start by negating the formula. To prove that the original is valid, we need to
prove that this is unsatisfiable:
     We now consider the Herbrand universe, the set of all terms built up from con-
stants and functions in the original formula. Since here we have no nullary con-
stants, we need to add one c to get started (this effectively builds in the assump-
tion that all models have a non-empty domain). The Herbrand universe then becomes
{c, f (c), f ( f (c)), f ( f ( f (c))), . . .}. By Herbrand’s theorem, we need to test all sets of
ground instances for propositional satisfiability. Let us enumerate them in increasing
size. The first one is:
                                     D(c) ∧ ¬D( f (c))
Unification-based methods
The above idea (Robinson 1957) led directly some early computer implementations, e.g.
by Gilmore (Gilmore 1960). Gilmore tested for propositional satisfiability by transform-
ing the successively larger sets to disjunctive normal form. A more efficient approach is
to use the Davis-Putnam algorithm — it was in this context that it was originally intro-
duced (Davis and Putnam 1960). However, as Davis (Davis 1983) admits in retrospect:
    . . . effectively eliminating the truth-functional satisfiability obstacle only uncovered
    the deeper problem of the combinatorial explosion inherent in unstructured search
    through the Herbrand universe . . .
     The next major step forward in theorem proving was a more intelligent means of
choosing substitution instances, to pick out the small set of relevant instances instead
of blindly trying all possibilities. The first hint of this idea appears in (Prawitz, Prawitz,
and Voghera 1960), and it was systematically developed by Robinson (Robinson 1965),
who gave an effective syntactic procedure called unification for deciding on appropriate
instantiations to make terms match up correctly.
     There are many unification-based theorem proving algorithms. Probably the best-
known is resolution, in which context Robinson (Robinson 1965) introduced full unifi-
cation to automated theorem proving. Another important method quite close to resolu-
tion and developed independently at about the same time is the inverse method (Maslov
1964; Lifschitz 1986). Other popular algorithms include tableaux (Prawitz, Prawitz, and
Voghera 1960), model elimination (Loveland 1968; Loveland 1978) and the connection
method (Kowalski 1975; Bibel and Schreiber 1975; Andrews 1976). Crudely speaking:
    • Tableaux = Gilmore procedure + unification
    • Resolution = Davis-Putnam procedure + unification
     Tableaux and resolution can be considered as classic representatives of ‘top-down’
and ‘bottom-up’ methods respectively. Roughly speaking, in top-down methods one
starts from a goal and works backwards, while in bottom-up methods one starts from the
assumptions and works forwards. This has significant implications for the very nature
of unifiable variables, since in bottom-up methods they are local (implicitly universally
quantified) whereas in top-down methods they are global, correlated in different portions
of the proof tree. This is probably the most useful way of classifying the various first-
order search procedures and has a significant impact on the problems where they perform
well.
4.1. Decidable problems
Although first order validity is undecidable in general, there are special classes of for-
mulas for which it is decidable, e.g.
     • AE formulas, which involve no function symbols and when placed in prenex form
       have all the universal quantifiers before the existential ones.
     • Monadic formulas, involving no function symbols and only monadic (unary) re-
       lation symbols.
     • Purely universal formulas
     The decidability of AE formulas is quite easy to see, because no function sym-
bols are there to start with, and because of the special quantifier nesting, none are intro-
duced in Skolemization. Therefore the Herbrand universe is finite and the enumeration
of ground instances cannot go on forever. The decidability of the monadic class can be
proved in various ways, e.g. by transforming into AE form by pushing quantifiers in-
wards (‘miniscoping’). Although neither of these classes is particularly useful in prac-
tice, it’s worth noting that the monadic formulas subsume traditional Aristotelian syllo-
gisms, at least on a straightforward interpretation of what they are supposed to mean. For
example
    If all M are P, and all S are M, then all S are P
can be expressed using monadic relations as follows:
    For purely universal formulas, we can use congruence closure (Nelson and Oppen
1980; Shostak 1978; Downey, Sethi, and Tarjan 1980). This allows us to prove that one
equation follows from others, e.g. that
        As the name implies, congruence closure involves deducing all equalities between
subterms that follow from the asserted ones by using equivalence and congruence prop-
erties of equality. In our case, for example, the first equation f ( f ( f (x)) = x implies
 f ( f ( f ( f (x))) = f (x) and hence f ( f ( f ( f ( f (x))))) = f ( f (x)), and then symmetry and
transitivity with the second equation imply f ( f (x)) = x, and so on. It straightforwardly
extends to deciding the entire universal theory by refutation followed by DNF transfor-
mation and congruence closure on the equations in the disjuncts, seeing whether any
negated equations in the same disjunct are implied.
        An alternative approach is to reduce the problem to SAT by introducing a propo-
sitional variable for each equation between subterms, adding constraints on these vari-
ables to reflect congruence properties. This has the possibly significant advantage that no
potentially explosive DNF transformation need be done, and exploits the power of SAT
solvers. For example if Em,n (for 0 6 m, n 6 5) represents the equation f m (x) = f n (x),
we want to deduce E3,0 ∧ E5,0 ⇒ E1,0 assuming the equality properties n En,n (re-
                                                                                          V
flexivity), m,n Em,n ⇒ En,m (symmetry), m,n,p Em,n ∧ En,p ⇒ Em,p (transitivity) and
                 V                                     V
Also interesting in practice are situations where, rather than absolute logical validity, we
want to know whether statements follow from some well-accepted set of mathematical
axioms, or are true in some particular interpretation like the real numbers R. We gener-
alize the notion of logical validity, and say that p is a logical consequence of axioms A,
written A |= p, if for any valuation, every interpretation that makes all the formulas in A
true also makes p true. (This looks straightforward but unfortunately there is some incon-
sistency in standard texts. For example the above definition is found in (Enderton 1972),
whereas in (Mendelson 1987) the definition has the quantification over valuations per
formula: every interpretation in which each formula of A is true in all valuations makes
p true in all valuations. Fortunately, in most cases all the formulas in A are sentences,
formulas with no free variables, and then the two definitions coincide.) Ordinary validity
of p is the special cases 0/ |= p, usually written just |= p.
     By a theory, we mean a set of formulas closed under first-order validity, or in other
words, the set of logical consequences of a set of axioms. The smallest theory is the set
of consequences of the empty set of axioms, i.e. the set of logically valid formulas. Note
also that for any particular interpretation, the set of formulas true in that interpretation is
also a theory. Some particularly important characteristics of a theory are:
    • Whether it is consistent, meaning that we never have both T |= p and T |= ¬p.
      (Equivalently, that we do not have T |= ⊥, or that some formula does not follow
      from T .)
    • Whether it is complete, meaning that for any sentence p, either T |= p or T |= ¬p.
    • Whether it is decidable, meaning that there is an algorithm that takes as input a
      formula p and decides whether T |= p.
     Note that since we have a semidecision procedure for first-order validity, any com-
plete theory based on a finite (or even semicomputable, with a slightly more careful anal-
ysis) set of axioms is automatically decidable: just search in parallel for proofs of A ⇒ p
and A ⇒ ¬p.
5. Arithmetical theories
First order formulas built up from equations and inequalities and interpreted over com-
mon number systems are often decidable. A common way of proving this is quantifier
elimination.
We say that a theory T in a first-order language L admits quantifier elimination if for each
formula p of L, there is a quantifier-free formula q such that T |= p ⇔ q. (We assume
that the equivalent formula contains no new free variables.) For example, the well-known
criterion for a quadratic equation to have a (real) root can be considered as an example
of quantifier elimination in a suitable theory T of reals:
                              ∀x y. x = y ∨ x < y ∨ y < x
                              ∀x y z. x < y ∧ y < z ⇒ x < z
                              ∀x. x 6< x
                              ∀x y. x < y ⇒ ∃z. x < z ∧ z < y
                              ∀x. ∃y. x < y
                              ∀x. ∃y. y < x
One of the earliest theories shown to have quantifier elimination is linear arithmetic over
the natural numbers or integers, as first shown by Presburger (Presburger 1930). Linear
arithmetic means that we are allows the usual equality and inequality relations, constants
and addition, but no multiplication, except by constants. In fact, to get quantifier elim-
ination we need to add infinitely many divisibility relations Dk for all integers k > 2.
This doesn’t affect decidability because those new relations are decidable for particular
numbers.
     Presburger’s original algorithm is fairly straightforward, and follows the classic
quantifier elimination pattern of dealing with the special case of an existentially quanti-
fied conjunction of literals. (We can always put the formula into disjunctive normal form
and distribute existential quantifiers over the disjuncts, then start from the innermost
quantifier.) For an in-depth discussion of Presburger’s original procedure, the reader can
consult (Enderton 1972) and (Smoryński 1980), or indeed the original article, which is
quite readable — (Stansifer 1984) gives an annotated English translation. A somewhat
more efficient algorithm more suited to computer implementation is given by Cooper
(Cooper 1972).
Over the complex numbers, we can also allow multiplication and still retain quantifier
elimination. Here is a sketch of a naive algorithm for complex quantifier elimination.
By the usual quantifier elimination pre-canonization, it suffices to be able to eliminate a
single existential quantifier from a conjunction of positive and negative equations:
                 ∃x. p1 (x) = 0 ∧ · · · ∧ pn (x) = 0 ∧ q1 (x) 6= 0 ∧ · · · qm (x) 6= 0
     We’ll sketch now how this can be reduced to the m 6 1 and n 6 1 case. To reduce n
we can use one equation to reduce the powers of variables in the others by elimination,
e.g.
      2x2 + 5x + 3 = 0 ∧ x2 − 1 = 0 ⇔ 5x + 5 = 0 ∧ x2 − 1 = 0
                                            ⇔ 5x + 5 = 0 ∧ 0 = 0
                                            ⇔ 5x + 5 = 0
     To reduce m, we may simply multiply all the qi (x) together since qi (x) 6= 0 ∧
qi+1 (x) 6= 0 ⇔ qi (x) · qi+1 (x) 6= 0. Now, the problem that remains is:
     Assume that neither p(x) nor q(x) is the zero polynomial. Since we are working in
an algebraically closed field, we know that the polynomials p(x) and q(x) split into linear
factors whatever they may be:
       p(x) = (x − a1 ) · (x − a2 ) · · · · · (x − an )
      q(x) = (x − b1 ) · (x − b2 ) · · · · · (x − bm )
                                                W
    Now p(x) = 0 is equivalent to 16i6n x = ai and q(x) = 0 is equivalent to
  16 j6m x = b j . Thus, the formula ∀x. p(x) = 0 ⇒ q(x) = 0 says precisely that
W
                                        _                   _
                                 ∀x.           x = ai ⇒            x = bj
                                       16i6n              16 j6m
or in other words, all the ai appear among the b j . However, since there are just n linear
factors in the antecedent, a given factor (x − ai ) cannot occur more than n times and
thus the polynomial divisibility relation p(x)|q(x)n holds. Conversely, if this divisibil-
ity relation holds for n 6= 0, then clearly ∀x. p(x) = 0 ⇒ q(x) = 0 holds. Thus, the key
quantified formula can be reduced to a polynomial divisibility relation, and it’s not dif-
ficult to express this as a quantifier-free formula in the coefficients, thus eliminating the
quantification over x.
In the case of the real numbers, one can again use addition and multiplication arbitrarily
and it is decidable whether the formula holds in R. A simple (valid) example is a case of
the Cauchy-Schwartz inequality:
and columns representing the polynomials p1 (x), . . . , pn (x), with the matrix entries giv-
ing the signs, either positive (+), negative (−) or zero (0), of each polynomial pi at the
points and on the intervals. For example, for the collection of polynomials:
       p1 (x) = x2 − 3x + 2
       p2 (x) = 2x − 3
                                     Point/Interval        p1      p2
                                       (−∞, x1 )           +       −
                                           x1              0       −
                                        (x1 , x2 )         −       −
                                           x2              −       0
                                        (x2 , x3 )         −       +
                                           x3              0       +
                                       (x3 , +∞)           +       +
     Note that x1 and x3 represent the roots 1 and 2 of p1 (x) while x2 represents 1.5,
the root of p2 (x). However the sign matrix contains no numerical information about
the location of the points xi , merely specifying the order of the roots of the various
polynomials and what signs they take there and on the intervening intervals. It is easy to
see that the sign matrix for a set of univariate polynomials p1 (x), . . . , pn (x) is sufficient
to answer any question of the form ∃x. P[x] where the body P[x] is quantifier-free and all
atoms are of the form pi (x) ./i 0 for any of the relations =, <, >, 6, > or their negations.
We simply need to check each row of the matrix (point or interval) and see if one of them
makes each atomic subformula true or false; the formula as a whole can then simply be
“evaluated” by recursion.
     In order to perform general quantifier elimination, we simply apply this basic op-
eration to all the innermost quantified subformulas first (we can consider a universally
quantified formula ∀x. P[x] as ¬(∃x. ¬P[x]) and eliminate from ∃x. ¬P[x]). This can then
be iterated until all quantifiers are eliminated. The only difficulty is that the coefficients
of a polynomial may now contain other variables as parameters. But we can quite easily
handle these by performing case-splits over the signs of coefficients and using pseudo-
division of polynomials instead of division as we present below.
     So the key step is finding the sign matrix, and for this the following simple observa-
tion is key. To find the sign matrix for
p, p1 , . . . , pn
p0 , p1 , . . . , pn , q0 , q1 , . . . , qn
where p0 , which we will sometimes write p0 for regularity’s sake, is the derivative of p,
and qi is the remainder on dividing p by pi . For suppose we have a sign matrix for the
second set of polynomials. We can proceed as follows.
     First, we split the sign matrix into two equally-sized parts, one for the p0 , p1 , . . . , pn
and one for the q0 , q1 , . . . , qn , but for now keeping all the points in each matrix, even if
the corresponding set of polynomials has no zeros there. We can now infer the sign of
p(xi ) for each point xi that is a zero of one of the polynomials p0 , p1 , . . . , pn , as follows.
Since qk is the remainder of p after division by pk , p(x) = sk (x)pk (x) + qk (x) for some
sk (x). Therefore, since pk (xi ) = 0 we have p(xi ) = qk (xi ) and so we can derive the sign
of p at xi from that of the corresponding qk .
     Now we can throw away the second sign matrix, giving signs for the q0 , . . . , qn , and
retain the (partial) matrix for p, p0 , p1 , . . . , pn . We next ‘condense’ this matrix to remove
points that are not zeros of one of the p0 , p1 , . . . , pn , but only of one of the qi . The signs
of the p0 , p1 , . . . , pn in an interval from which some other points have been removed can
be read off from any of the subintervals in the original subdivision — they cannot change
because there are no zeros for the relevant polynomials there.
     Now we have a sign matrix with correct signs at all the points, but undetermined
signs for p on the intervals, and the possibility that there may be additional zeros of p
inside these intervals. But note that since there are certainly no zeros of p0 inside the
intervals, there can be at most one additional root of p in each interval. Whether there
is one can be inferred, for an internal interval (xi , xi+1 ), by seeing whether the signs of
p(xi ) and p(xi+1 ), determined in the previous step, are both nonzero and are different.
If not, we can take the sign on the interval from whichever sign of p(xi ) and p(xi+1 ) is
nonzero (we cannot have them both zero, since then there would have to be a zero of
p0 in between). Otherwise we insert a new point y between xi and xi+1 which is a zero
(only) of p, and infer the signs on the new subintervals (xi , y) and (y, xi+1 ) from the signs
at the endpoints. Other polynomials have the same signs on (xi , y), y and (y, xi+1 ) that
had been inferred for the original interval (xi , xi+1 ). For external intervals, we can use the
same reasoning if we temporarily introduce new points −∞ and +∞ and infer the sign of
p(−∞) by flipping the sign of p0 on the lowest interval (−∞, x1 ) and the sign of p(+∞)
by copying the sign of p0 on the highest interval (xn , +∞).
Suppose K is a class of structures, e.g. all groups. The word problem for K asks whether a
set E of equations between constants implies another such equation s = t in all algebraic
structures of class K. More precisely, we may wish to distinguish:
    • The uniform word problem for K: deciding given any E and s = t whether E |=M
      s = t for all interpretations M in K.
    • The word problem for K, E: with E fixed, deciding given any s = t whether E |=M
      s = t for all interpretations M in K.
    • The free word problem for K: deciding given any s = t whether |=M s = t for all
      interpretations M in K.
     As a consequence of general algebraic results (e.g. every integral domain has a field
of fractions, every field has an algebraic closure), there is a close relationship between
word problems and results for particular interpretations. For example,Vfor any universal
formula in the language of rings, such as a word problem implication i si = ti ⇒ s = t,
the following are equivalent, and hence we can solve it using complex quantifier elimi-
nation:
    •   It holds in all integral domains of characteristic 0
    •   It holds in all fields of characteristic 0
    •   It holds in all algebraically closed fields of characteristic 0
    •   It holds in any given algebraically closed field of characteristic 0
    •   It holds in C
    There is also a close relationship between word problems and ideal membership
questions (Scarpellini 1969; Simmons 1970), sketched in a later section. These ideal
membership questions can be solved using efficient methods like Gröbner bases (Buch-
berger 1965; Cox, Little, and O’Shea 1992).
     In pure Presburger arithmetic one can deduce x − 1 < n ∧ ¬(x < n) ⇒ x = n, but one
needs a methodology for making the further trivial deduction x = n ⇒ a[x] = a[n]. For
non-trivial quantifier prefixes, this problem rapidly becomes undecidable, but for purely
universal formulas like the above, there are well-established methods. The Nelson-Oppen
(Nelson and Oppen 1979) approach is the most general. It exploits the fact that the only
communication between component procedures need be equations and negated equations
(s = t and s 6= t), by virtue of a result in logic known as the Craig interpolation theo-
rem. An alternative, which can be viewed as an optimization of Nelson-Oppen for some
common cases, is due to Shostak (Shostak 1984). It has taken a remarkably long time
to reach a rigorous understanding of Shostak’s method; indeed Reuß and Shankar (Reuß
and Shankar 2001) showed that Shostak’s original algorithm and all the then known later
refinements were in fact incomplete and potentially nonterminating!
     At the time of writing, there is intense interest in decision procedures for combina-
tions of (mainly, but not entirely quantifier-free) theories. The topic has become widely
known as satisfiability modulo theories (SMT), emphasizing the perspective that it is
a generalization of the standard propositional SAT problem. Indeed, most of the latest
SMT systems use methods strongly influenced by the leading SAT solvers, and are usu-
ally organized around a SAT-solving core. The idea of basing other decision procedures
around SAT appeared in several places and in several slightly different contexts, going
back at least to (Armando, Castellini, and Giunchiglia 1999). The simplest approach is
to use the SAT checker as a ‘black box’ subcomponent. Given a formula to be tested
for satisfiability, just treat each atomic formula as a propositional atom and feed the for-
mula to the SAT checker. If the formula is propositionally unsatisfiable, then it is triv-
ially unsatisfiable as a first-order formula and we are finished. If on the other hand the
SAT solver returns a satisfying assignment for the propositional formula, test whether
the implicit conjunction of literals is also satisfiable within our theory or theories. If it is
satisfiable, then we can conclude that so is the whole formula and terminate. However, if
the putative satisfying valuation is not satisfiable in our theories, we conjoin its negation
with the input formula and repeat the procedure.
Even though first order validity is semi-decidable, it is seldom practical to solve inter-
esting problems using unification-based approaches to pure logic. Nor is it the case that
practical problems often fit conveniently into one of the standard decidable subsets. The
best we can hope for in most cases is that the human will have to guide the proof pro-
cess, but the machine may be able to relieve the tedium by filling in gaps, while always
ensuring that no mistakes are made. This kind of application was already envisaged by
Wang (Wang 1960)
    [...] the writer believes that perhaps machines may more quickly become of practi-
    cal use in mathematical research, not by proving new theorems, but by formalizing
    and checking outlines of proofs, say, from textbooks to detailed formalizations more
    rigorous than Principia [Mathematica], from technical papers to textbooks, or from
    abstracts to technical papers.
     The first notable interactive provers were the SAM (semi-automated mathematics)
series. In 1966, the fifth in the series, SAM V, was used to construct a proof of a hitherto
unproven conjecture in lattice theory (Bumcrot 1965). This was indubitably a success for
the semi-automated approach because the computer automatically proved a result now
called “SAM’s Lemma” and the mathematician recognized that it easily yielded a proof
of Bumcrot’s conjecture.
     Not long after the SAM project, the AUTOMATH (de Bruijn 1970; de Bruijn 1980)
and Mizar (Trybulec 1978; Trybulec and Blair 1985) proof checking systems appeared,
and each of them in its way has been profoundly influential. Although we will refer to
these systems as ‘interactive’, we use this merely as an antonym of ‘automatic’. In fact,
both AUTOMATH and Mizar were oriented around batch usage. However, the files that
they process consist of a proof, or a proof sketch, which they check the correctness of,
rather than a statement for which they attempt to find a proof automatically.
     Mizar has been used to proof-check a very large body of mathematics, spanning pure
set theory, algebra, analysis, topology, category theory and various unusual applications
like mathematical puzzles and computer models. The body of mathematics formally built
up in Mizar, known as the ‘Mizar Mathematical Library’ (MML), seems unrivalled in
any other theorem proving system. The ‘articles’ (proof texts) submitted to the MML
are automatically abstracted into human-readable form and published in the Journal of
Formalized Mathematics, which is devoted entirely to Mizar formalizations.2
The ideal proof checker should be programmable, i.e. users should be able to extend
the built-in automation as much as desired. There’s no particular difficulty in allow-
ing this. Provided the basic mechanisms of the theorem prover are straightforward and
well-documented and the source code is made available, there’s no reason why a user
shouldn’t extend or modify it. However, the difficulty comes if we want to restrict the
user to extensions that are logically sound — as presumably we might well wish to, since
unsoundness renders questionable the whole idea of machine-checking of supposedly
more fallible human proofs. Even fairly simple automated theorem proving programs are
often subtler than they appear, and the difficulties of integrating a large body of special
proof methods into a powerful interactive system without compromising soundness is
not trivial.
  2 Available   on the Web via http://www.mizar.org/JFM.
     One influential solution to this difficulty was introduced in the Edinburgh LCF
project led by Robin Milner (Gordon, Milner, and Wadsworth 1979). Although this was
for an obscure ‘logic of computable functions’ (hence the name LCF), the key idea, as
Gordon (Gordon 1982) emphasizes, is equally applicable to more orthodox logics sup-
porting conventional mathematics, and subsequently many programmable proof check-
ers were designed using the same principles, such as Coq,3 HOL (Gordon and Melham
1993), Isabelle (Paulson 1994) and Nuprl (Constable 1986).
     The key LCF idea is to use a special type (say thm) of proven theorems in the im-
plementation language, so that anything of type thm must by construction have been
proved rather than simply asserted. (In practice, the implementation language is usually
a version of ML, which was specially designed for this purpose in the LCF project.) This
is enforced by making thm an abstract type whose only constructors correspond to ap-
proved inference rules. But the user is given full access to the implementation language
and can put the primitive rules together in more complicated ways using arbitrary pro-
gramming. Because of the abstract type, any result of type thm, however it was arrived
at, must ultimately have been produced by correct application of the primitive rules. Yet
the means for arriving at it may be complex. Most obviously, we can use ML as a kind of
‘macro language’ to automate common patterns of inference. But much more sophisti-
cated derived rules can be written that, for example, prove formulas of Presburger arith-
metic while automatically decomposing to logical primitives. In many theorem-proving
tasks, more ‘ad hoc’ manipulation code can be replaced by code performing inference
without significant structural change. In other cases we can use an automated proof pro-
cedure, or even an external system like a computer algebra system (Harrison and Théry
1998), as an oracle to find a proof that is later checked inside the system. Thus, LCF gives
a combination of programmability and logical security that would probably be difficult
to assure by other means.
To explain the LCF idea in more concrete terms, we will show a complete LCF-style
kernel for first order logic with equality, implemented in Objective CAML.4 We start
by defining the syntax of first order logic, as found in any standard text like (Enderton
1972; Mendelson 1987). First we have an OCaml type of first order terms, where a term
is either a named variable or a named function applied to a list of arguments (constants
like 1 are regarded as nullary functions):
type term = Var of string | Fn of string * term list;;
    Before proceeding, we define some OCaml functions for useful syntax operations:
constructing an equation, checking whether a term occurs in another, and checking
whether a term occurs free in a formula:
     There are many complete proof systems for first order logic. We will adopt a Hilbert-
style proof system close to one first suggested by Tarski (Tarski 1965), and subsequently
presented in a textbook (Monk 1976). The idea is to avoid defining relatively tricky
syntactic operations like substitution. We first define the signature for the OCaml abstract
datatype of theorems:
and then the actual implementation of the primitive inference rules. For example,
modusponens is the traditional modus ponens inference rule allowing us to pass from
two theorems of the form ` p ⇒ q and ` p to another one ` q:
                                 ` p=q ` p
                                           EQ MP
                                    `q
     In the usual LCF style, this becomes a function taking two arguments of type thm and
producing another. In fact, most of these inference rules have no theorems as input, and
can thus be considered as axiom schemes. For example, axiom addimp creates theorems
of the form ` p ⇒ (q ⇒ p) and axiom existseq creates those of the form ∃x. x = t
provided x does not appear in the term t:
    Before long, we can reach the stage of automatic derived rules that, for example,
prove propositional tautologies automatically, perform Knuth-Bendix completion, and
prove first order formulas by standard proof search and translation into primitive infer-
ences.
Proof style
One feature of the LCF style is that proofs (being programs) tend to be highly procedu-
ral, in contrast to the more declarative proofs supported by Mizar — for more on the con-
trast see (Harrison 1996c). This can have important disadvantages in terms of readability
and maintainability. In particular, it is difficult to understand the formal proof scripts in
isolation; they need to be run in the theorem prover to understand what the intermediate
states are. Nevertheless as pointed out in (Harrison 1996b) it is possible to implement
more declarative styles of proof on top of LCF cores. For more recent experiments with
Mizar-like declarative proof styles see (Syme 1997; Wenzel 1999; Zammit 1999; Wiedijk
2001).
Theorem provers that have been used in real industrial applications include ACL2 (Kauf-
mann, Manolios, and Moore 2000), HOL Light (Gordon and Melham 1993; Harrison
1996a) and PVS (Owre, Rushby, and Shankar 1992). We noted earlier that formal verifi-
cation methods can be categorized according to their logical expressiveness and automa-
tion. The same kind of balance can be drawn within the general theorem proving section.
Although these theorem provers all have undecidable decision problems, it is still possi-
ble to provide quite effective partial automation by using a more restricted logic. ACL2
follows this philosophy: it uses a quantifier-free logic analogous to PRA (Primitive Re-
cursive Arithmetic) (Goodstein 1957). HOL and PVS use richer logics with higher-order
quantification; PVS’s type system is particularly expressive. Nevertheless they attempt to
provide some useful automation, and HOL in particular uses the LCF approach to ensure
soundness and programmability. This will be emphasized in the application considered
below.
7. HOL Light
In the early 80s Mike Gordon (one of the original LCF team, now working in Cam-
bridge), as well as supervising the further development of LCF, was interested in the for-
mal verification of hardware. For this purpose, classical higher order logic seemed a nat-
ural vehicle, since it allows a rather direct rendering of notions like signals as functions
from time to values. The case was first put by Hanna (Hanna and Daeche 1986) and,
after a brief experiment with an ad hoc formalism ‘LSM’ based on Milner’s Calculus of
Communicating Systems, Gordon (Gordon 1985) also became a strong advocate. Gor-
don modified Cambridge LCF to support classical higher order logic, and so HOL (for
Higher Order Logic) was born. The first major public release was HOL88 (Gordon and
Melham 1993), and it spawned a large family of later versions, of which we will concen-
trate on our own HOL Light (Harrison 1996a),5 which is designed to have a simple and
clean logical foundation:
                                   HOL88
                                     HH
                                     @
                                       @HH
                                         @ HHH
                                           @       H
                                                   j
                                                   H
                                              R
                                              @    Isabelle/HOL
                       hol90            ProofPower
                          @
                               @
                                @
                                 R ?
                                 @
                                 HOL Light
                          ?
                       hol98
                         ?
                       HOL 4
    Following Church (Church 1940), the HOL logic is based on simply typed λ-
calculus, so all terms are either variables, constants, applications or abstractions; there is
no distinguished class of formulas, merely terms of boolean type. The main difference
from Church’s system is that parametric polymorphism is an object-level, rather than a
meta-level, notion. For example, a theorem about type (α)list can be instantiated and
used for specific instances like (int)list and ((bool)list)list. Thus, the types in
HOL are essentially like terms of first order logic:
    The only primitive type constructors for the logic itself are bool (booleans) and fun
(function space):
     Later we add an infinite type ind (individuals). All other types are introduced by a
rule of type definition, to be in bijection with any nonempty subset of an existing type.
  5 See   http://www.cl.cam.ac.uk/users/jrh/hol-light/index.html.
                                              '          $
                                            existing
              new                                           type
                type       bijections-              P
              δ 
                                             
                                                               γ
                                              &                      %
     HOL terms are those of simply-typed lambda calculus. In the abstract syntax, only
variables and constants are decorated with types.
     The usual notation for these categories are v : ty, c : ty, f x and λx. t. The abstract
type interface ensures that only well-typed terms can be constructed. This stock of terms
may appear spartan, but using defined constants and a layer of parser and prettyprinter
support, many standard syntactic conventions are broken down to λ-calculus. For exam-
ple, the universal quantifier, following Church, is simply a higher order function, but the
conventional notation ∀x. P[x] is supported, mapping down to ∀(λx. P[x]). Similarly there
is a constant LET, which is semantically the identity and is used only as a tag for the
prettyprinter, and following Landin (Landin 1966), the construct ‘let x = t in s’ is broken
down to ‘LET (λx. s) t’.
     The abstract type interface also ensures that constant terms can only be constructed
for defined constants. The only primitive constant for the logic itself is equality = with
polymorphic type α → α → bool.
let the_term_constants =
   ref ["=", mk_fun_ty aty (mk_fun_ty aty bool_ty)];;
     Later we add the Hilbert ε : (α → bool) → α yielding the Axiom of Choice. All
other constants are introduced using a rule of constant definition. Given a term t (closed,
and with some restrictions on type variables) and an unused constant name c, we can
define c and get the new theorem:
`c=t
    Both terms and type definitions give conservative extensions and so in particular
preserve logical consistency. Thus, HOL is doubly ascetic:
    • All proofs are done by primitive inferences
    • All new types and constants are defined not postulated.
    As noted, HOL has no separate syntactic notion of formula: we just use terms of
Boolean type. HOL’s theorems are single-conclusion sequents constructed from such
formulas:
let ASSUME tm =
  if type_of tm = bool_ty then Sequent([tm],tm)
  else failwith "ASSUME: not a proposition";;
is the rule of assumption. The complete set of primitive rules, in usual logical notation,
is as follows (some of these have side-conditions that are not shown here):
                                                    REFL
                                         `t =t
                               Γ`s=t ∆`t =u
                                             TRANS
                                 Γ∪∆ ` s = u
                             Γ`s=t ∆`u=v
                                                 MK COMB
                             Γ ∪ ∆ ` s(u) = t(v)
                                      Γ`s=t
                                                       ABS
                                 Γ ` (λx. s) = (λx. t)
                                                         BETA
                                     ` (λx. t)x = t
                                                   ASSUME
                                       {p} ` p
                                 Γ` p=q ∆` p
                                             EQ MP
                                    Γ∪∆ ` q
                        Γ` p ∆`q
                                              DEDUCT ANTISYM RULE
                (Γ − {q}) ∪ (∆ − {p}) ` p = q
                            Γ[x1 , . . . , xn ] ` p[x1 , . . . , xn ]
                                                                      INST
                             Γ[t1 , . . . ,tn ] ` p[t1 , . . . ,tn ]
                        Γ[α1 , . . . , αn ] ` p[α1 , . . . , αn ]
                                                                   INST TYPE
                         Γ[γ1 , . . . , γn ] ` p[γ1 , . . . , γn ]
    We can create various simple derived rules in the usual LCF fashion, such as a one-
sided congruence rule:
let AP_TERM tm th =
  try MK_COMB(REFL tm,th)
  with Failure _ -> failwith "AP_TERM";;
and a symmetry rule to reverse equations (rator and rand return the operator and
operand of a function application f x):
let SYM th =
  let tm = concl th in
  let l,r = dest_eq tm in
  let lth = REFL l in
  EQ_MP (MK_COMB(AP_TERM (rator (rator tm)) th,lth)) lth;;
     This just uses the basic logic, but at this point we introduce the usual axioms for
classical logic and mathematics:
    • Choice in the form of the Hilbert ε and its characterizing axiom ` ∀x. P(x) ⇒
      P(εx. P(x))
    • Extensionality as an η-conversion axiom ` (λx. t x) = t
    • Infinity as a new type ind and an assertion that it’s Dedekind-infinite.
    Everything else is now purely definitional. The Law of the Excluded Middle is de-
duced from Choice (Diaconescu 1975) rather than being postulated separately. We then
proceed with the development of standard mathematical infrastructure:
    •   Basic arithmetic of the natural numbers
    •   Theory of wellfounded relations
    •   General recursive data types
    •   Lists
    •   Elementary ‘set’ theory
    •   Axiom of Choice variants like Zorn’s Lemma, wellordering principle etc.
    •   Construction of real and complex numbers
    •   Real analysis
Among the higher-level derived rules are:
    •   Simplifier for (conditional, contextual) rewriting.
    •   Tactic mechanism for mixed forward and backward proofs.
    •   Tautology checker.
    •   Automated theorem provers for pure logic, based on tableaux and model elimina-
        tion.
    •   Linear arithmetic decision procedures over R, Z and N.
    •   Differentiator for real functions.
    •   Generic normalizers for rings and fields
    •   General quantifier elimination over C
    •   Gröbner basis algorithm over fields
     HOL Light is perhaps the purest example of the LCF methodology that is actually
useful, in that the logical core is minimal and almost all mathematical concepts are de-
fined or constructed rather than merely postulated. But thanks to the LCF methodology
and the speed of modern computers, we can use it to tackle non-trivial mathematics and
quite difficult applications.
     The first sustained attempt to actually formalize a body of mathematics (concepts
and proofs) was Principia Mathematica (Whitehead and Russell 1910). This success-
fully derived a body of fundamental mathematics from a small logical system. However,
the task of doing so was extraordinarily painstaking, and indeed Russell (Russell 1968)
remarked that his own intellect ‘never quite recovered from the strain of writing it’. But
now in the computer age, we can defer much of this work to a computer program like
HOL Light.
8. Floating-point verification
In the present section we describe some work applying HOL Light to some problems
in industrial floating-point verification, namely correctness of square root algorithms for
the Intel Itanium architecture.
The centrepiece of the Intel Itanium floating-point architecture is the fma (floating-
point multiply-add or fused multiply-accumulate) family of instructions. Given three
floating-point numbers x, y and z, these can compute x · y ± z as an atomic operation, with
the final result rounded as usual according to the IEEE Standard 754 for Binary Floating-
Point Arithmetic (IEEE 1985), but without intermediate rounding of the product x · y.
Of course, one can always obtain the usual addition and multiplication operations as the
special cases x · 1 + y and x · y + 0.
     The fma has many applications in typical floating-point codes, where it can often
improve accuracy and/or performance. In particular (Markstein 1990) correctly rounded
quotients and square roots can be computed by fairly short sequences of fmas, obviating
the need for dedicated instructions. Besides enabling compilers and assembly language
programmers to make special optimizations, deferring these operations to software often
yields much higher throughput than with typical hardware implementations. Moreover,
the floating-point unit becomes simpler and easier to optimize because minimal hardware
need be dedicated to these relatively infrequent operations, and scheduling does not have
to cope with their exceptionally high latency.
     Itanium architecture compilers for high-level languages will typically translate
division or square root operations into appropriate sequences of machine instructions.
Which sequence is used depends (i) on the required precision and (ii) whether one wishes
to minimize latency or maximize throughput. For concreteness, we will focus on a partic-
ular algorithm for calculating square roots in double-extended precision (64-bit precision
and 15-bit exponent field):
                          1. y0 = frsqrta(a)
                          2. H0 = 12 y0          S0 = ay0
                          3. d0 = 21 − S0 H0
                          4. H1 = H0 + d0 H0     S1 = S0 + d0 S0
                          5. d1 = 21 − S1 H1
                          6. H2 = H1 + d1 H1     S2 = S1 + d1 S1
                          7. d2 = 21 − S2 H2     e2 = a − S2 S2
                          8. H3 = H2 + d2 H2     S3 = S2 + e2 H2
                          9. e3 = a − S3 S3
                          10. S = S3 + e3 H3
     All operations but the last are done using the register floating-point format with
rounding to nearest and with all exceptions disabled. (This format provides the same 64-
bit precision as the target format but has a greater exponent range, allowing us to avoid
intermediate overflow or underflow.) The final operation is done in double-extended pre-
cision using whatever rounding mode is currently selected by the user.
     This algorithm is a non-trivial example in two senses. Since it is designed for the
maximum precision supported in hardware (64 bits), greater precision cannot be ex-
ploited in intermediate calculations and so a very careful analysis is necessary to ensure
correct rounding. Moreover, it is hardly feasible to test such an algorithm exhaustively,
even if an accurate and fast reference were available, since there are about 280 possible
inputs. (By contrast, one could certainly verify single-precision and conceivably verify
double precision by exhaustive or quasi-exhaustive methods.)
Algorithm verification
It’s useful to divide the algorithm into three parts, and our discussion of the correctness
proof will follow this separation:
      1 Form6 an initial approximation y0 =         √1 (1 + ε) with |ε| 6 2−8.8 .
                                                      a
                                                                      √
  2–8 Convert this to approximations             H0 ≈ 2√1 a and S0 ≈ a, then
                                                                      successively re-
      fine these to much better approximations H3 and S3 using Goldschmidt iteration
      (Goldschmidt 1964) (a Newton-Raphson variant).
 9–10 Use these accurate approximations to produce the square root S correctly rounded
      according to the current rounding mode, setting IEEE flags or triggering excep-
      tions as appropriate.
Initial approximation
The frsrta instruction makes a number of initial checks for special cases that are dealt
with separately, and if necessary normalizes the input number. It then uses a simple table
lookup to provide the approximation. The algorithm and table used are precisely spec-
ified in the Itanium instruction set architecture. The formal verification is essentially
some routine algebraic manipulations for exponent scaling, then a 256-way case split fol-
lowed by numerical calculation. The following HOL theorem concerns the correctness
of the core table lookup:
Refinement
Each fma operation will incur a rounding error, but we can easily find a mathematically
convenient (though by no means optimally sharp) bound for the relative error induced
by rounding. The key principle is the ‘1 + e’ property, which states that the rounded
   6 Using frsqrta, the only Itanium instruction specially intended to support square root. In the present
discussion we abstract somewhat from the actual machine instruction, and ignore exceptional cases like a = 0
where it takes special action.
result involves only a small relative perturbation to the exact result. In HOL the formal
statement is as follows:
      The bound on e depends on the precision of the floating-point format and the round-
ing mode; for round-to-nearest mode, mu rc is 1/2. The theorem has two side condi-
tions, one being a nontriviality hypothesis, and the other an assertion that the value x does
not lose precision. We will not show the formal definition (Harrison 1999) here, since
it is rather complicated. However, a simple and usually adequate sufficient condition is
that the exact result lies in the normal range (or is zero).
      Actually applying this theorem, and then bounding the various error terms, would be
quite tedious if done by hand. We have programmed some special derived rules in HOL
to help us. First, these automatically bound absolute magnitudes of quantities, essentially
by using the triangle rule |x+y| 6 |x|+|y|. This usually allows us to show that no overflow
occurs. However, to apply the 1 + e theorem, we also need to exclude underflow, and
so must establish minimum (nonzero) absolute magnitudes. This is also largely done
automatically by HOL, repeatedly using theorems for the minimum nonzero magnitude
that can result from an individual operation. For example, if 2e 6 |a|, then either a + b · c
is exactly zero or 2e−2p 6 |a + b · c| where p is the precision of the floating-point format
containing a, b and c.
      It’s now quite easy with a combination of automatic error bounding and some man-
ual algebraic rearrangement to obtain quite good relative error bounds for the main com-
puted quantities. In fact, in the early iterations, the rounding errors incurred are insignif-
icant in comparison with the approximation errors in the Hi and Si . Thus, the relative
errors in these quantities are roughly in step. If we write
                               1                           √
                          Hi ≈ √ (1 + εi )          Si ≈    a(1 + εi )
                              2 a
then
                            1          1 1
                     di ≈     − Si Hi = − (1 + εi )2 = −(εi + ε2i /2)
                            2          2 2
                                            √                             √     3
       Si+1 ≈ Si + di Si = Si (1 + di ) ≈    a(1 + εi )(1 − εi − ε2i /2) = a(1 − ε2i )
                                                                                2
    Towards the end, the rounding errors in Si and Hi become more significantly decou-
pled and for the penultimate iteration we use a slightly different refinement for S3 .
                                                 √
                           e2 ≈ a − S2 S2 = a − ( a(1 + ε2 )2 ) ≈ −2aε2
and so:
                               √                      1              √
              S2 + e2 H2 ≈      a(1 + ε2 ) − (2aε2 )( √ (1 + ε02 )) ≈ a(1 − ε2 ε02 )
                                                     2 a
     Thus, S2 +e2 H2 will be √ quite an accurate square root approximation. In fact the HOL
proof yields S2 + e2 H2 = a(1 + ε) with |ε| 6 5579/279 ≈ 2−66.5 .
     The above sketch elides what in the HOL proofs is a detailed bound on the rounding
error. However this only really becomes significant when S3 is rounded; this may in
itself contribute a relative error of order 2−64 , significantly
                                                          √      more than the error before
rounding. Nevertheless
                     √    it is important to note that if  a happens to be an exact floating-
point number (e.g. 1.5625 = 1.25), S3 will be that number. This is a consequence of
the fact that the error in S2 + e2 H2 is less than half the distance between surrounding
floating-point numbers.
Correct rounding
The final two steps of the algorithm simply repeat the previous iteration for S3 and the
basic error analysis is the same. The difficulty is in passing from a relative error before
rounding to correct rounding afterwards. Again we consider the final rounding separately,
           result of rounding the exact value S∗ = S3 + e3 H3 . The error analysis indicates
so S is the√
that S = a(1 + ε) for some |ε| 6 25219/2140 ≈ 2−125.37 . The final result S will, by the
      ∗
basic property of the fma operation, be the result of rounding S∗ in whatever the chosen
                                                                                      √
rounding mode may be. The desired result would be the result of rounding exactly a in
the same way. How can we be sure these are the same?                         √
     First we can dispose of some special cases. We noted earlier that if a is already
exactly a floating-point number, then S3 will already be that number. In this case we will
have e3 = 0 and so S∗ = S3 . Whatever the rounding mode, rounding a number already in
the format concerned will give that number itself:
    This is a useful observation. We’ll never be in the tricky case where there are two
equally close floating-point
                    √        numbers (resolved by the ‘round to even’ rule.) So in round-
to-nearest, S∗ and a could only round in different ways if there were a midpoint be-
tween them, for only then could the closest floating-point numbers to them differ. For
example in the following diagram √where large lines indicate floating-point numbers and
smaller ones represent midpoints, a would round ‘down’ while S∗ would round ‘up’:8
                                                                     -
                                           √ 6  6
                                             a
                                                 S∗
     Although analyzing this condition combinatorially would be complicated, there is a
much simpler sufficient condition. One can easily see that it would suffice to show that
for any midpoint m:
                                        √            √
                                       | a − S∗ | < | a − m|
                   √
    In that case    a and S∗ couldn’t lie on opposite sides of m. Here is the formal theorem
in HOL:
   |- ¬(precision fmt = 0) ∧
      (∀m. m IN midpoints fmt ⇒ abs(x - y) < abs(x - m))
      ⇒ (round fmt Nearest x = round fmt Nearest y)
 √ One can arrive at an ‘exclusion zone’ theorem giving the minimum   √         possible
| a − m|. However, this can be quite small, about 2−(2p+3) relative to a, where p is the
precision. For example, in our context with p = 64, consider the square root of the next
floating-point number below 1, whose mantissa consists entirely of 1s. Its square root is
about 2−131 from a midpoint:
                                 p
                                     1 − 2−64 ≈ (1 − 265 ) − 2−131
     Therefore, our relative error in S∗ of about 2−125.37 is far from adequate to justify
perfect rounding based on the simple ‘exclusion zone’ theorem, for which we need some-
thing of order 2−131 . However, our relative error bounds are far from sharp, and it seems
quite plausible that the algorithm does nevertheless work correctly. What can we do?
     One solution is to utilize more refined theorems (Markstein 2000), but this is com-
plicated and may still fail to justify several algorithms that are intuitively believed to
work correctly. An ingenious alternative developed by Cornea (Cornea-Hasegan 1998)
is to observe that there are relatively few cases like 0.111 · · · 1111 whose square roots
come close enough to render the exclusion zone theorem inapplicable, and these can be
isolated by fairly straightforward number-theoretic methods. We can therefore:
                                                                             √
  8 Similarly, in the other rounding modes, misrounding could only occur if a and S∗ are separated by a
floating-point number. However as we have noted one can deal with those cases more directly.
     • Isolate the special cases a1 , . . . , an that have square roots within the critical dis-
       tance of a midpoint.
     • Conclude from the simple exclusion zone theorem that the algorithm will give
       correct results except possibly for a1 , . . . , an .
     • Explicitly show that the algorithm is correct for the a1 , . . . , an , (effectively by
       running it on those inputs).
     This two-part approach is perhaps a little unusual, but not unknown even in pure
mathematics.9 For example, consider “Bertrand’s Conjecture” (first proved by Cheby-
shev), stating that for any positive integer n there is a prime p with n 6 p 6 2n. The
most popular proof (Erdös 1930), involves assuming n > 4000 for the main proof and
separately checking the assertion for n 6 4000.10
     By some straightforward mathematics described in (Cornea-Hasegan 1998) and for-
malized in HOL without difficulty, one can show that the difficult cases for square roots
have mantissas m, considered as p-bit integers, such that one of the following diophan-
tine equations has a solution k for some integer |d| 6 D, where D is roughly the factor by
which the guaranteed relative error is excessive:
2 p+2 m = k2 + d 2 p+1 m = k2 + d
     We consider the equations separately for each chosen |d| 6 D. For example, we might
be interested in whether 2 p+1 m = k2 − 7 has a solution. If so, the possible value(s) of m
are added to the set of difficult cases. It’s quite easy to program HOL to enumerate all
the solutions of such diophantine equations, returning a disjunctive theorem of the form:
` (2 p+1 m = k2 + d) ⇒ (m = n1 ) ∨ . . . ∨ (m = ni )
     The procedure simply uses even-odd reasoning and recursion on the power of two
(effectively so-called ‘Hensel lifting’). For example, if
225 m = k2 − 7
that 3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503 and 4001 are all prime and each is less than twice its
predecessor.
                                                                      √         √
orem asserting that for any double-extended number a,11 rounding a and a(1 + ε)
to double-extended precision using any of the four IEEE rounding modes will give the
same results provided |ε| < 31/2131 , with the possible exceptions of 22e m for:
                  m ∈ { 10074057467468575321, 10376293541461622781,
                        10376293541461622787, 11307741603771905196,
                        13812780109330227882, 14928119304823191698,
                        16640932189858196938, 18446744073709551611,
                        18446744073709551612, 18446744073709551613,
                        18446744073709551614, 18446744073709551615}
                    m ∈ { 9223372036854775809, 9223372036854775811,
                          11168682418930654643}
     Note that while some of these numbers are obvious special cases like 264 − 1, the
“pattern” in others is only apparent from the kind of mathematical analysis we have un-
dertaken here. They aren’t likely to be exercised by random testing, or testing of plausible
special cases.12
     Checking formally that the algorithm works on the special cases can also be au-
tomated, by applying theorems on the uniqueness of rounding to the concrete numbers
computed. (For a formal proof, it is not sufficient to separately test the implemented al-
gorithm, since such a result has no formal status.) In order to avoid trying all possible
even or odd exponents for the various significands, we exploit some results on invariance
of the rounding and arithmetic involved in the algorithm under systematic scaling by 22k ,
doing a simple form of symbolic simulation by formal proof.
Flag settings
Correctness according to the IEEE Standard 754 not only requires the correctly rounded
result, but the correct setting of flags or triggering of exceptions for conditions like over-
flow, underflow and inexactness. Actually, almost all these properties follow directly
from the arguments leading to perfect rounding. For example, the mere fact that two real
numbers round equivalently in all rounding modes implies that one is exact iff the other
is:
   |- ¬(precision fmt = 0) ∧
      (∀rc. round fmt rc x = round fmt rc y)
      ⇒ ∀rc. (round fmt rc x = x) = (round fmt rc y = y)
    The correctness of other flag settings follows in the same sort of way, with underflow
only slightly more complicated (Harrison 1999).
  11 Note that there is more subtlety required when using such a result in a mixed-precision environment. For
example, to obtain a single-precision result for a double-precision input, an algorithm that suffices for single-
precision inputs may not be adequate even though the final precision is the same.
  12 On the other hand, we can well consider the mathematical analysis as a source of good test cases.
Conclusions and related work
What do we gain from developing these proofs formally in a theorem prover, compared
with a detailed hand proof? We see two main benefits: reliability and re-usability.
      Proofs of this nature, large parts of which involve intricate but routine error bounding
and the exhaustive solution of Diophantine equations, are very tedious and error-prone
to do by hand. In practice, one would do better to use some kind of machine assistance,
such as ad hoc programs to solve the Diophantine equations and check the special cases
so derived. Although this can be helpful, it can also create new dangers of incorrectly im-
plemented helper programs and transcription errors when passing results between ‘hand’
and ‘machine’ portions of the proof. By contrast, we perform all steps of the proof in a
painstakingly foundational system, and can be quite confident that no errors have been
introduced. The proof proceeds according to strict logical deduction, all the way from the
underlying pure mathematics up to the symbolic “execution” of the algorithm in special
cases.
      Although we have only discussed one particular example, many algorithms with a
similar format have been developed for use in systems based on the Itanium architec-
ture. One of the benefits of implementing division and square root in software is that dif-
ferent algorithms can be substituted depending on the detailed accuracy and performance
requirements of the application. Not only are different (faster) algorithms provided for
IEEE single and double precision operations, but algorithms often have two versions,
one optimized for minimum latency and one for maximal throughput. These algorithms
are all quite similar in structure and large parts of the correctness proofs use the same
ideas. By performing these proofs in a programmable theorem prover like HOL, we are
able to achieve high re-use of results, just tweaking a few details each time. Often, we
can produce a complete formal proof of a new algorithm in just a day. For a even more
rigidly stereotyped class of algorithms, one could quite practically implement a totally
automatic verification rule in HOL.
      Underlying these advantages are three essential theorem prover features: soundness,
programmability and executability. HOL scores highly on these points. It is implemented
in a highly foundational style and does not rely on the correctness of very complex code.
It is freely programmable, since it is embedded in a full programming language. In par-
ticular, one can program it to perform various kinds of computation and symbolic exe-
cution by proof. The main disadvantage is that proofs can sometimes take a long time to
run, precisely because they always decompose to low-level primitives. This applies with
particular force to some kinds of symbolic execution, where instead of simply accepting
an equivalence like 294 + 3 = 13 · 19 · 681943 · 7941336391 · 14807473717 based, say, on
the results of a multiprecision arithmetic package, a detailed formal proof is constructed
under the surface. To some extent, this sacrifice of efficiency is a conscious choice when
we decide to adopt a highly foundational system, but it might be worth weakening this
ideology at least to include concrete arithmetic as an efficient primitive operation.
      This is by no means the only work in this area. On the contrary, floating-point veri-
fication is regarded as one of the success stories of formal verification. For related work
on square root algorithms in commercial systems see (Russinoff 1998; O’Leary, Zhao,
Gerth, and Seger 1999; Sawada and Gamboa 2002). For similar verifications of division
algorithms and transcendental functions by the present author, see (Harrison 2000b; Har-
rison 2000a), while (Muller 2006) and (Markstein 2000) give a detailed discussion of
some relevant floating-point algorithms. When verifying transcendental functions, one
needs a fair amount of pure mathematics formalized just to get started. So, in a common
intellectual pattern, it is entirely possible that Mizar-like formalizations of mathemat-
ics undertaken for purely intellectual reasons might in the end turn out to be useful in
practical applications.
References
Aagaard, M. and Leeser, M. (1994) Verifying a logic synthesis tool in Nuprl: A case study
  in software verification. In v. Bochmann, G. and Probst, D. K. (eds.), Computer Aided
  Verification: Proceedings of the Fourth International Workshop, CAV’92, Volume 663
  of Lecture Notes in Computer Science, Montreal, Canada, pp. 69–81. Springer Verlag.
Abdulla, P. A., Bjesse, P., and Eén, N. (2000) Symbolic reachability analysis based on
  SAT-solvers. In Graf, S. and Schwartzbach, M. (eds.), Tools and Algorithms for the
  Construction and Analysis of Systems (TACAS’00), Volume 1785 of Lecture Notes in
  Computer Science. Springer-Verlag.
Aigner, M. and Ziegler, G. M. (2001) Proofs from The Book (2nd ed.). Springer-Verlag.
Akers, S. B. (1978) Binary decision diagrams. ACM Transactions on Computers, C-27,
  509–516.
Alur, R. and Peled, D. A. (eds.) (2004) Computer Aided Verification, 16th International
  Conference, CAV 2004, Volume 3114 of Lecture Notes in Computer Science, Boston,
  MA. Springer-Verlag.
Andrews, P. B. (1976) Theorem proving by matings. IEEE transactions on Computers,
  25, 801–807.
Appel, K. and Haken, W. (1976) Every planar map is four colorable. Bulletin of the
  American Mathematical Society, 82, 711–712.
Armando, A., Castellini, C., and Giunchiglia, E. (1999) SAT-based procedures for tem-
  poral reasoning. In Proceedings of the 5th European conference on Planning, Lecture
  Notes in Computer Science, pp. 97–108. Springer-Verlag.
Ball, T., Cook, B., Lahriri, S. K., and Rajamani, S. K. (2004) Zapato: Automatic theorem
  proving for predicate abstraction refinement. See Alur and Peled (2004), pp. 457–461.
Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., and Théry, L. (eds.) (1999) Theorem
  Proving in Higher Order Logics: 12th International Conference, TPHOLs’99, Volume
  1690 of Lecture Notes in Computer Science, Nice, France. Springer-Verlag.
Bibel, W. and Schreiber, J. (1975) Proof search in a Gentzen-like system of first order
  logic. In Gelenbe, E. and Potier, D. (eds.), Proceedings of the International Computing
  Symposium, pp. 205–212. North-Holland.
Biere, A., Cimatti, A., Clarke, E. M., and Zhu, Y. (1999) Symbolic model checking with-
  out BDDs. In Proceedings of the 5th International Conference on Tools and Algo-
  rithms for Construction and Analysis of Systems, Volume 1579 of Lecture Notes in
  Computer Science, pp. 193–207. Springer-Verlag.
Bjesse, P. (1999) Symbolic model checking with sets of states represented as formulas.
  Technical Report SC-1999-100, Department of Computer Science, Chalmers Univer-
  sity of Technology.
Bochnak, J., Coste, M., and Roy, M.-F. (1998) Real Algebraic Geometry, Volume 36 of
  Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer-Verlag.
Bradley, A. R. and Manna, Z. (2007) The Calculus of Computation: Decision Procedures
  with Applications to Verification. Springer-Verlag.
Bryant, R. E. (1985) Symbolic verification of MOS circuits. In Fuchs, H. (ed.), Proceed-
  ings of the 1985 Chapel Hill Conference on VLSI, pp. 419–438. Computer Science
  Press.
Bryant, R. E. (1986) Graph-based algorithms for Boolean function manipulation. IEEE
  Transactions on Computers, C-35, 677–691.
Bryant, R. E. (1991a) A method for hardware verification based on logic simulation.
  Journal of the ACM, 38, 299–328.
Bryant, R. E. (1991b) On the complexity of VLSI implementations and graph represen-
  tations of Boolean functions with application to integer multiplication. IEEE Trans-
  actions on Computers, C-40, 205–213.
Buchberger, B. (1965) Ein Algorithmus zum Auffinden der Basiselemente des Restk-
  lassenringes nach einem nulldimensionalen Polynomideal. Ph. D. thesis, Mathema-
  tisches Institut der Universität Innsbruck. English translation in Journal of Symbolic
  Computation vol. 41 (2006), pp. 475–511.
Bumcrot, R. (1965) On lattice complements. Proceedings of the Glasgow Mathematical
  Association, 7, 22–23.
Burch, J. R., Clarke, E. M., McMillan, K. L., Dill, D. L., and Hwang, L. J. (1992) Sym-
  bolic model checking: 1020 states and beyond. Information and Computation, 98,
  142–170.
Carter, W. C., Joyner, W. H., and Brand, D. (1979) Symbolic simulation for correct ma-
  chine design. In Proceedings of the 16th ACM/IEEE Design Automation Conference,
  pp. 280–286. IEEE Computer Society Press.
Caviness, B. F. and Johnson, J. R. (eds.) (1998) Quantifier Elimination and Cylindrical
  Algebraic Decomposition, Texts and monographs in symbolic computation. Springer-
  Verlag.
Chou, C.-T. and Peled, D. (1999) Formal verification of a partial-order reduction tech-
  nique for model checking. Journal of Automated Reasoning, 23, 265–298.
Church, A. (1936) An unsolvable problem of elementary number-theory. American Jour-
  nal of Mathematics, 58, 345–363.
Church, A. (1940) A formulation of the Simple Theory of Types. Journal of Symbolic
  Logic, 5, 56–68.
Clarke, E. M. and Emerson, E. A. (1981) Design and synthesis of synchronization skele-
  tons using branching-time temporal logic. In Kozen, D. (ed.), Logics of Programs,
  Volume 131 of Lecture Notes in Computer Science, Yorktown Heights, pp. 52–71.
  Springer-Verlag.
Clarke, E. M., Grumberg, O., and Peled, D. (1999) Model Checking. MIT Press.
Collins, G. E. (1976) Quantifier elimination for real closed fields by cylindrical algebraic
  decomposition. In Brakhage, H. (ed.), Second GI Conference on Automata Theory and
  Formal Languages, Volume 33 of Lecture Notes in Computer Science, Kaiserslautern,
  pp. 134–183. Springer-Verlag.
Constable, R. (1986) Implementing Mathematics with The Nuprl Proof Development Sys-
  tem. Prentice-Hall.
Cook, S. A. (1971) The complexity of theorem-proving procedures. In Proceedings of
  the 3rd ACM Symposium on the Theory of Computing, pp. 151–158. ACM.
Cooper, D. C. (1972) Theorem proving in arithmetic without multiplication. In Melzer,
   B. and Michie, D. (eds.), Machine Intelligence 7, pp. 91–99. Elsevier.
Cornea-Hasegan, M. (1998) Proving the IEEE correctness of iterative floating-point
   square root, divide and remainder algorithms. Intel Technology Journal, 1998-Q2,
   1–11. Available on the Web as http://developer.intel.com/technology/itj/
   q21998/articles/art_3.htm.
Coudert, O., Berthet, C., and Madre, J.-C. (1989) Verification of synchronous sequential
   machines based on symbolic execution. In Sifakis, J. (ed.), Automatic Verification
   Methods for Finite State Systems, Volume 407 of Lecture Notes in Computer Science,
   pp. 365–373. Springer-Verlag.
Cox, D., Little, J., and O’Shea, D. (1992) Ideals, Varieties, and Algorithms. Springer-
   Verlag.
Davis, M. (1983) The prehistory and early history of automated deduction. See Siekmann
   and Wrightson (1983), pp. 1–28.
Davis, M., Logemann, G., and Loveland, D. (1962) A machine program for theorem
   proving. Communications of the ACM, 5, 394–397.
Davis, M. and Putnam, H. (1960) A computing procedure for quantification theory. Jour-
   nal of the ACM, 7, 201–215.
de Bruijn, N. G. (1970) The mathematical language AUTOMATH, its usage and some of
   its extensions. In Laudet, M., Lacombe, D., Nolin, L., and Schützenberger, M. (eds.),
   Symposium on Automatic Demonstration, Volume 125 of Lecture Notes in Mathemat-
   ics, pp. 29–61. Springer-Verlag.
de Bruijn, N. G. (1980) A survey of the project AUTOMATH. In Seldin, J. P. and
   Hindley, J. R. (eds.), To H. B. Curry: Essays in Combinatory Logic, Lambda Calculus,
   and Formalism, pp. 589–606. Academic Press.
DeMillo, R., Lipton, R., and Perlis, A. (1979) Social processes and proofs of theorems
   and programs. Communications of the ACM, 22, 271–280.
Diaconescu, R. (1975) Axiom of choice and complementation. Proceedings of the Amer-
   ican Mathematical Society, 51, 176–178.
Dijkstra, E. W. (1976) A Discipline of Programming. Prentice-Hall.
Downey, P. J., Sethi, R., and Tarjan, R. (1980) Variations on the common subexpression
   problem. Journal of the ACM, 27, 758–771.
Eén, N. and Sörensson, N. (2003) An extensible SAT-solver. In Giunchiglia, E. and Tac-
   chella, A. (eds.), Theory and Applications of Satisfiability Testing: 6th International
   Conference SAT 2003, Volume 2919 of Lecture Notes in Computer Science, pp. 502–
   518. Springer-Verlag.
Emerson, E. A. and Halpern, J. Y. (1986) “sometimes” and “not never” revisited: on
   branching time versus linear time temporal logic. Journal of the ACM, 33, 151–178.
Enderton, H. B. (1972) A Mathematical Introduction to Logic. Academic Press.
Erdös, P. (1930) Beweis eines Satzes von Tschebyshev. Acta Scientiarum Mathemati-
   carum (Szeged), 5, 194–198.
Fetzer, J. H. (1988) Program verification: The very idea. Communications of the ACM,
   31, 1048–1063.
Gårding, L. (1997) Some Points of Analysis and Their History, Volume 11 of University
   Lecture Series. American Mathematical Society / Higher Education Press.
Gilmore, P. C. (1960) A proof method for quantification theory: Its justification and
  realization. IBM Journal of Research and Development, 4, 28–35.
Gödel, K. (1930) Die Vollständigkeit der Axiome des logischen Funktionenkalküls.
  Monatshefte für Mathematik und Physik, 37, 349–360. English translation ‘The com-
  pleteness of the axioms of the functional calculus of logic’ in van Heijenoort (1967),
  pp. 582–591.
Goldberg, E. and Novikov, Y. (2002) BerkMin: a fast and robust Sat-solver. In Kloos,
  C. D. and Franca, J. D. (eds.), Design, Automation and Test in Europe Conference and
  Exhibition (DATE 2002), Paris, France, pp. 142–149. IEEE Computer Society Press.
Goldschmidt, R. E. (1964) Applications of division by convergence. Master’s thesis,
  Dept. of Electrical Engineering, MIT, Cambridge, Mass.
Goodstein, R. L. (1957) Recursive Number Theory. Studies in Logic and the Foundations
  of Mathematics. North-Holland.
Gordon, M. (1985) Why higher-order logic is a good formalism for specifying and veri-
  fying hardware. Technical Report 77, University of Cambridge Computer Laboratory,
  New Museums Site, Pembroke Street, Cambridge, CB2 3QG, UK.
Gordon, M. J. C. (1982) Representing a logic in the LCF metalanguage. In Néel, D.
  (ed.), Tools and notions for program construction: an advanced course, pp. 163–185.
  Cambridge University Press.
Gordon, M. J. C. and Melham, T. F. (1993) Introduction to HOL: a theorem proving
  environment for higher order logic. Cambridge University Press.
Gordon, M. J. C., Milner, R., and Wadsworth, C. P. (1979) Edinburgh LCF: A Mech-
  anised Logic of Computation, Volume 78 of Lecture Notes in Computer Science.
  Springer-Verlag.
Groote, J. F. (2000) The propositional formula checker Heerhugo. Journal of Automated
  Reasoning, 24, 101–125.
Hanna, F. K. and Daeche, N. (1986) Specification and verification using higher-order
  logic: A case study. In Milne, G. and Subrahmanyam, P. A. (eds.), Formal Aspects of
  VLSI Design: Proceedings of the 1985 Edinburgh Workshop on VLSI, pp. 179–213.
Harrison, J. (1996a) HOL Light: A tutorial introduction. In Srivas, M. and Camilleri,
  A. (eds.), Proceedings of the First International Conference on Formal Methods in
  Computer-Aided Design (FMCAD’96), Volume 1166 of Lecture Notes in Computer
  Science, pp. 265–269. Springer-Verlag.
Harrison, J. (1996b) A Mizar mode for HOL. In Wright, J. v., Grundy, J., and Harrison,
  J. (eds.), Theorem Proving in Higher Order Logics: 9th International Conference,
  TPHOLs’96, Volume 1125 of Lecture Notes in Computer Science, Turku, Finland, pp.
  203–220. Springer-Verlag.
Harrison, J. (1996c) Proof style. In Giménez, E. and Paulin-Mohring, C. (eds.), Types for
  Proofs and Programs: International Workshop TYPES’96, Volume 1512 of Lecture
  Notes in Computer Science, Aussois, France, pp. 154–172. Springer-Verlag.
Harrison, J. (1999) A machine-checked theory of floating point arithmetic. See Bertot,
  Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 113–130.
Harrison, J. (2000a) Formal verification of floating point trigonometric functions. In
  Hunt, W. A. and Johnson, S. D. (eds.), Formal Methods in Computer-Aided Design:
  Third International Conference FMCAD 2000, Volume 1954 of Lecture Notes in Com-
  puter Science, pp. 217–233. Springer-Verlag.
Harrison, J. (2000b) Formal verification of IA-64 division algorithms. In Aagaard, M.
  and Harrison, J. (eds.), Theorem Proving in Higher Order Logics: 13th International
  Conference, TPHOLs 2000, Volume 1869 of Lecture Notes in Computer Science, pp.
  234–251. Springer-Verlag.
Harrison, J. (2009) Handbook of Practical Logic and Automated Reasoning. Cambridge
  University Press.
Harrison, J. and Théry, L. (1998) A sceptic’s approach to combining HOL and Maple.
  Journal of Automated Reasoning, 21, 279–294.
Harvey, W. and Stuckey, P. (1997) A unit two variable per inequality integer constraint
  solver for constraint logic programming. Australian Computer Science Communica-
  tions, 19, 102–111.
Hörmander, L. (1983) The Analysis of Linear Partial Differential Operators II, Volume
  257 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag.
Huth, M. and Ryan, M. (1999) Logic in Computer Science: Modelling and reasoning
  about systems. Cambridge University Press.
IEEE (1985) Standard for binary floating point arithmetic. ANSI/IEEE Standard 754-
  1985, The Institute of Electrical and Electronic Engineers, Inc., 345 East 47th Street,
  New York, NY 10017, USA.
Iwashita, H., Nakata, T., and Hirose, F. (1996) CTL model checking based on forward
  state traversal. In Proceedings of tte IEEE/ACM conference on Computer Aided De-
  sign (ICCAD ’96), pp. 82–87. Association for Computing Machinery.
Joyce, J. J. and Seger, C. (1993) The HOL-Voss system: Model-checking inside a
  general-purpose theorem-prover. In Joyce, J. J. and Seger, C. (eds.), Proceedings of
  the 1993 International Workshop on the HOL theorem proving system and its applica-
  tions, Volume 780 of Lecture Notes in Computer Science, UBC, Vancouver, Canada,
  pp. 185–198. Springer-Verlag.
Kaufmann, M., Manolios, P., and Moore, J. S. (2000) Computer-Aided Reasoning: An
  Approach. Kluwer.
Knaster, B. (1927) Un théorème sur les fonctions d’ensembles. Annales de la Société
  Polonaise de Mathématique, 6, 133–134. Volume published in 1928.
Kowalski, R. (1975) A proof procedure using connection graphs. Journal of the ACM,
  22, 572–595.
Kozen, D. (1983) Results on the propositional µ-calculus. Theoretical Computer Science,
  27, 333–354.
Kreisel, G. and Krivine, J.-L. (1971) Elements of mathematical logic: model the-
  ory (Revised second ed.). Studies in Logic and the Foundations of Mathematics.
  North-Holland. First edition 1967. Translation of the French ‘Eléments de logique
  mathématique, théorie des modeles’ published by Dunod, Paris in 1964.
Kroening, D. and Strichman, O. (2008) Decision Procedures: An Algorithmic Point of
  View. Springer-Verlag.
Kropf, T. (1999) Introduction to Formal Hardware Verification. Springer-Verlag.
Lahiri, S. K. and Bryant, R. E. (2004) Indexed predicate discovery for unbounded system
  verification. See Alur and Peled (2004), pp. 135–147.
Landin, P. J. (1966) The next 700 programming languages. Communications of the ACM,
  9, 157–166.
Lee, C. Y. (1959) Representation of switching circuits by binary-decision programs. Bell
  System Technical Journal, 38, 985–999.
Lifschitz, V. (1986) Mechanical Theorem Proving in the USSR: the Leningrad School.
  Monograph Series on Soviet Union. Delphic Associates, 7700 Leesburg Pike, #250,
  Falls Church, VA 22043. See also ‘What is the inverse method?’ in the Journal of
  Automated Reasoning, vol. 5, pp. 1–23, 1989.
Łojasiewicz, S. (1964) Triangulations of semi-analytic sets. Annali della Scuola Normale
  Superiore di Pisa, ser. 3, 18, 449–474.
Loveland, D. W. (1968) Mechanical theorem-proving by model elimination. Journal of
  the ACM, 15, 236–251.
Loveland, D. W. (1978) Automated theorem proving: a logical basis. North-Holland.
MacKenzie, D. (2001) Mechanizing Proof: Computing, Risk and Trust. MIT Press.
Markstein, P. (2000) IA-64 and Elementary Functions: Speed and Precision. Prentice-
  Hall.
Markstein, P. W. (1990) Computation of elementary functions on the IBM RISC Sys-
  tem/6000 processor. IBM Journal of Research and Development, 34, 111–119.
Maslov, S. J. (1964) An inverse method of establishing deducibility in classical predicate
  calculus. Doklady Akademii Nauk, 159, 17–20.
McMillan, K. L. (2003) Interpolation and SAT-based model checking. In Hunt, W. A.
  and Somenzi, F. (eds.), Computer Aided Verification, 15th International Conference,
  CAV 2003, Volume 2725 of Lecture Notes in Computer Science, Boulder, CO, pp.
  1–13. Springer-Verlag.
Melham, T. and Darbari, A. (2002) Symbolic trajectory evaluation in a nutshell. Unpub-
  lished, available from the authors.
Mendelson, E. (1987) Introduction to Mathematical Logic (Third ed.). Mathematics
  series. Wadsworth and Brooks Cole.
Monk, J. D. (1976) Mathematical logic, Volume 37 of Graduate Texts in Mathematics.
  Springer-Verlag.
Moskewicz, M. W., Madigan, C. F., Zhao, Y., Zhang, L., and Malik, S. (2001) Chaff:
  Engineering an efficient SAT solver. In Proceedings of the 38th Design Automation
  Conference (DAC 2001), pp. 530–535. ACM Press.
Muller, J.-M. (2006) Elementary functions: Algorithms and Implementation (2nd ed.).
  Birkhäuser.
Nelson, G. and Oppen, D. (1979) Simplification by cooperating decision procedures.
  ACM Transactions on Programming Languages and Systems, 1, 245–257.
Nelson, G. and Oppen, D. (1980) Fast decision procedures based on congruence closure.
  Journal of the ACM, 27, 356–364.
O’Leary, J., Zhao, X., Gerth, R., and Seger, C.-J. H. (1999) Formally verifying IEEE
  compliance of floating-point hardware. Intel Technology Journal, 1999-Q1, 1–14.
  Available on the Web as http://download.intel.com/technology/itj/q11999/
  pdf/floating_point.pdf.
Owre, S., Rushby, J. M., and Shankar, N. (1992) PVS: A prototype verification system. In
  Kapur, D. (ed.), 11th International Conference on Automated Deduction, Volume 607
  of Lecture Notes in Computer Science, Saratoga, NY, pp. 748–752. Springer-Verlag.
Paulson, L. C. (1994) Isabelle: a generic theorem prover, Volume 828 of Lecture Notes
  in Computer Science. Springer-Verlag. With contributions by Tobias Nipkow.
Peled, D. A. (2001) Software Reliability Methods. Springer-Verlag.
Pixley, C. (1990) A computational theory and implementation of sequential hardware
  equivalence. In Proceedings of the DIMACS workshop on Computer Aided Verifica-
  tion, pp. 293–320. DIMACS (technical report 90-31).
Pnueli, A. (1977) The temporal logic of programs. In Proceedings of the 18th IEEE
  Symposium on Foundations of Computer Science, pp. 46–67.
Prawitz, D., Prawitz, H., and Voghera, N. (1960) A mechanical proof procedure and its
  realization in an electronic computer. Journal of the ACM, 7, 102–128.
Presburger, M. (1930) Über die Vollständigkeit eines gewissen Systems der Arithmetik
  ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. In Spra-
  wozdanie z I Kongresu metematyków slowiańskich, Warszawa 1929, pp. 92–101, 395.
  Warsaw. Annotated English version by Stansifer (1984).
Queille, J. P. and Sifakis, J. (1982) Specification and verification of concurrent programs
  in CESAR. In Proceedings of the 5th International Symposium on Programming,
  Volume 137 of Lecture Notes in Computer Science, pp. 195–220. Springer-Verlag.
Rabin, M. O. (1991) Decidable theories. In Barwise, J. and Keisler, H. (eds.), Hand-
  book of mathematical logic, Volume 90 of Studies in Logic and the Foundations of
  Mathematics, pp. 595–629. North-Holland.
Rajan, S., Shankar, N., and Srivas, M. K. (1995) An integration of model-checking with
  automated proof-checking. In Wolper, P. (ed.), Computer-Aided Verification: CAV
  ’95, Volume 939 of Lecture Notes in Computer Science, Liege, Belgium, pp. 84–97.
  Springer-Verlag.
Reuß, H. and Shankar, N. (2001) Deconstructing Shostak. In Proceedings of the Six-
  teenth Annual IEEE Symposium on Logic in Computer Science, pp. 19–28. IEEE Com-
  puter Society Press.
Robinson, A. (1957) Proving a theorem (as done by man, logician, or machine). In
  Summaries of Talks Presented at the Summer Institute for Symbolic Logic. Second
  edition published by the Institute for Defense Analysis, 1960. Reprinted in Siekmann
  and Wrightson (1983), pp. 74–76.
Robinson, J. A. (1965) A machine-oriented logic based on the resolution principle. Jour-
  nal of the ACM, 12, 23–41.
Russell, B. (1968) The autobiography of Bertrand Russell. Allen & Unwin.
Russinoff, D. (1998) A mechanically checked proof of IEEE compliance of a register-
  transfer-level specification of the AMD-K7 floating-point multiplication, division, and
  square root instructions. LMS Journal of Computation and Mathematics, 1, 148–
  200. Available on the Web at http://www.russinoff.com/papers/k7-div-sqrt.
  html.
Sawada, J. and Gamboa, R. (2002) Mechanical verification of a square root algorithms
  using Taylor’s theorem. In Aagaard, M. and O’Leary, J. (eds.), Formal Methods
  in Computer-Aided Design: Fourth International Conference FMCAD 2002, Volume
  2517 of Lecture Notes in Computer Science, pp. 274–291. Springer-Verlag.
Scarpellini, B. (1969) On the metamathematics of rings and integral domains. Transac-
  tions of the American Mathematical Society, 138, 71–96.
Seger, C. and Joyce, J. J. (1991) A two-level formal verification methodology using HOL
  and COSMOS. Technical Report 91-10, Department of Computer Science, University
  of British Columbia, 2366 Main Mall, University of British Columbia, Vancouver,
  B.C, Canada V6T 1Z4.
Seger, C.-J. H. and Bryant, R. E. (1995) Formal verification by symbolic evaluation of
   partially-ordered trajectories. Formal Methods in System Design, 6, 147–189.
Sheeran, M. and Stålmarck, G. (2000) A tutorial on Stålmarck’s proof procedure for
   propositional logic. Formal Methods in System Design, 16, 23–58.
Shostak, R. (1978) An algorithm for reasoning about equality. Communications of the
   ACM, 21, 356–364.
Shostak, R. (1984) Deciding combinations of theories. Journal of the ACM, 31, 1–12.
Siekmann, J. and Wrightson, G. (eds.) (1983) Automation of Reasoning — Classical
   Papers on Computational Logic, Vol. I (1957-1966). Springer-Verlag.
Simmons, H. (1970) The solution of a decision problem for several classes of rings.
   Pacific Journal of Mathematics, 34, 547–557.
Skolem, T. (1922) Einige Bemerkungen zur axiomatischen Begründung der Mengen-
   lehre. In Matematikerkongress i Helsingfors den 4–7 Juli 1922, Den femte skan-
   dinaviska matematikerkongressen, Redogörelse. Akademiska Bokhandeln, Helsinki.
   English translation “Some remarks on axiomatized set theory” in van Heijenoort
   (1967), pp. 290–301.
Smoryński, C. (1980) Logical Number Theory I: An Introduction. Springer-Verlag.
Stålmarck, G. (1994) System for determining propositional logic theorems by applying
   values and rules to triplets that are generated from Boolean formula. United States
   Patent number 5,276,897; see also Swedish Patent 467 076.
Stålmarck, G. and Säflund, M. (1990) Modeling and verifying systems and software in
   propositional logic. In Daniels, B. K. (ed.), Safety of Computer Control Systems, 1990
   (SAFECOMP ’90), Gatwick, UK, pp. 31–36. Pergamon Press.
Stansifer, R. (1984) Presburger’s article on integer arithmetic: Remarks and translation.
   Technical Report CORNELLCS:TR84-639, Cornell University Computer Science De-
   partment.
Syme, D. (1997) DECLARE: A prototype declarative proof system for higher order
   logic. Technical Report 416, University of Cambridge Computer Laboratory, New
   Museums Site, Pembroke Street, Cambridge, CB2 3QG, UK.
Tarski, A. (1951) A Decision Method for Elementary Algebra and Geometry. University
   of California Press. Previous version published as a technical report by the RAND
   Corporation, 1948; prepared for publication by J. C. C. McKinsey. Reprinted in Cavi-
   ness and Johnson (1998), pp. 24–84.
Tarski, A. (1955) A lattice-theoretical fixpoint theorem and its applications. Pacific Jour-
   nal of Mathematics, 5, 285–309.
Tarski, A. (1965) A simplified formalization of predicate logic with identity. Arkhiv für
   mathematische Logik und Grundlagenforschung, 7, 61–79.
Trybulec, A. (1978) The Mizar-QC/6000 logic information language. ALLC Bulletin
   (Association for Literary and Linguistic Computing), 6, 136–140.
Trybulec, A. and Blair, H. A. (1985) Computer aided reasoning. In Parikh, R. (ed.),
   Logics of Programs, Volume 193 of Lecture Notes in Computer Science, Brooklyn,
   pp. 406–412. Springer-Verlag.
Turing, A. M. (1936) On computable numbers, with an application to the Entschei-
   dungsproblem. Proceedings of the London Mathematical Society (2), 42, 230–265.
van Heijenoort, J. (ed.) (1967) From Frege to Gödel: A Source Book in Mathematical
   Logic 1879–1931. Harvard University Press.
Wang, H. (1960) Toward mechanical mathematics. IBM Journal of Research and Devel-
  opment, 4, 2–22.
Wenzel, M. (1999) Isar - a generic interpretive approach to readable formal proof docu-
  ments. See Bertot, Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 167–183.
Whitehead, A. N. and Russell, B. (1910) Principia Mathematica (3 vols). Cambridge
  University Press.
Wiedijk, F. (2001) Mizar light for HOL Light. In Boulton, R. J. and Jackson, P. B. (eds.),
  14th International Conference on Theorem Proving in Higher Order Logics: TPHOLs
  2001, Volume 2152 of Lecture Notes in Computer Science, pp. 378–394. Springer-
  Verlag.
Yang, J. (2000) A theory for generalized symbolic trajectory evaluation. In Proceedings
  of the 2000 Symposium on Symbolic Trajectory Evaluation, Chicago. Available via
  http://www.intel.com/research/scl/stesympsite.htm.
Zammit, V. (1999) On the implementation of an extensible declarative proof language.
  See Bertot, Dowek, Hirschowitz, Paulin, and Théry (1999), pp. 185–202.