0% found this document useful (0 votes)
5 views212 pages

Calc of Comp

Uploaded by

kawaiiku08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views212 pages

Calc of Comp

Uploaded by

kawaiiku08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 212

The Calculus of Computation

Aaron R. Bradley · Zohar Manna

The Calculus
of Computation
Decision Procedures
with Applications to Verification

With 60 Figures

123
Authors

Aaron R. Bradley
Zohar Manna
Gates Building, Room 481
Stanford University
Stanford, CA 94305
USA
arbrad@cs.stanford.edu
manna@cs.stanford.edu

Library of Congress Control Number: 2007932679


ACM Computing Classification (1998): B.8, D.1, D.2, E.1, F.1, F.3, F.4, G.2, I.1, I.2

ISBN 978-3-540-74112-1 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material
is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad-
casting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of
this publication or parts thereof is permitted only under the provisions of the German Copyright Law
of September 9, 1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2007

The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant pro-
tective laws and regulations and therefore free for general use.
Typesetting by the authors
Production: LE-TEX Jelonek, Schmidt & Vöckler GbR, Leipzig
Cover design: KünkelLopka Werbeagentur, Heidelberg
Printed on acid-free paper 45/3180/YL - 5 4 3 2 1 0
To my wife,

Sarah

A.R.B.

To my grandchildren,

Itai
Maya
Ori

Z.M.
Preface

Logic is the calculus of computation. Forty-five years ago, John McCarthy


predicted in A Basis for a Mathematical Theory of Computation that “the
relationship between computation and mathematical logic will be as fruitful in
the next century as that between analysis and physics in the last”. The field of
computational logic emerged over the past few decades in partial fulfillment
of that vision. Focusing on producing efficient and powerful algorithms for
deciding the satisfiability of formulae in logical theories and fragments, it
continues to push the frontiers of general computer science.
This book is about computational logic and its applications to program
verification. Program verification is the task of analyzing the correctness of a
program. It encompasses the formal specification of what a program should do
and the formal proof that the program meets this specification. The reasoning
power that computational logic offers revolutionized the field of verification.
Ongoing research will make verification standard practice in software and
hardware engineering in the next few decades. This acceptance into everyday
engineering cannot come too soon: software and hardware are becoming ever
more ubiquitous and thus ever more the source of failure.
We wrote this book with an undergraduate and beginning graduate audi-
ence in mind. However, any computer scientist or engineer who would like to
enter the field of computational logic or apply its products should find this
book useful.

Content

The book has two parts. Part I, Foundations, presents first-order logic, induc-
tion, and program verification. The methods are general. For example, Chap-
ter 2 presents a complete proof system for first-order logic, while Chapter 5
describes a relatively complete verification methodology. Part II, Algorithmic
Reasoning, focuses on specialized algorithms for reasoning about fragments of
first-order logic and for deducing facts about programs. Part II trades gener-
ality for decidability and efficiency.
VIII Preface

The first three chapters of Part I introduce first-order logic. Chapters 1 and
2 begin our presentation with a review of propositional and predicate logic.
Much of the material will be familiar to the reader who previously studied
logic. However, Chapter 3 on first-order theories will be new to many readers.
It axiomatically defines the various first-order theories and fragments that we
study and apply throughout the rest of the book. Chapter 4 reviews induction,
introducing some forms of induction that may be new to the reader. Induction
provides the mathematical basis for analyzing program correctness.
Chapter 5 turns to the primary motivating application of computational
logic in this book, the task of verifying programs. It discusses specification, in
which the programmer formalizes in logic the (sometimes surprisingly vague)
understanding that he has about what functions should do; partial correctness,
which requires proving that a program or function meets a given specification
if it halts; and total correctness, which requires proving additionally that a pro-
gram or function always halts. The presentation uses the simple programming
language pi and is supported by the verifying compiler πVC (see The πVC
System, below, for more information on πVC). Chapter 6 suggests strategies
for applying the verification methodology.
Part II on Algorithmic Reasoning begins in Chapter 7 with quantifier-
elimination methods for limited integer and rational arithmetic. It describes
an algorithm for reducing a quantified formula in integer or rational arithmetic
to an equivalent formula without quantifiers.
Chapter 8 begins a sequence of chapters on decision procedures for
quantifier-free and other fragments of theories. These fragments of first-order
theories are interesting for three reasons. First, they are sometimes decidable
when the full theory is not (see Chapters 9, 10, and 11). Second, they are
sometimes efficiently decidable when the full theory is not (compare Chapters
7 and 8). Finally, they are often useful; for example, proving the verification
conditions that arise in the examples of Chapters 5 and 6 requires just the
fragments of theories studied in Chapters 8–11. The simplex method for linear
programming is presented in Chapter 8 as a decision procedure for deciding
satisfiability in rational and real arithmetic without multiplication.
Chapters 9 and 11 turn to decision procedures for non-arithmetical theo-
ries. Chapter 9 discusses the classic congruence closure algorithm for equality
with uninterpreted functions and extends it to reason about data structures
like lists, trees, and arrays. These decision procedures are for quantifier-free
fragments only. Chapter 11 presents decision procedures for larger fragments
of theories that formalize array-like data structures.
Decision procedures are most useful when they are combined. For example,
in program verification one must reason about arithmetic and data structures
simultaneously. Chapter 10 presents the Nelson-Oppen method for combining
decision procedures for quantifier-free fragments. The decision procedures of
Chapters 8, 9, and 11 are all combinable using the Nelson-Oppen method.
Chapter 12 presents a methodology for constructing invariant generation
procedures. These procedures reason inductively about programs to aid in
Preface IX

1–4

5,6 7 8 9

12 10

11

Verification Decision procedures

Fig. 0.1. The chapter dependency graph

verification. They relieve some of the burden on the programmer to provide


program annotations for verification purposes. For now, developing a static
analysis is one of the easiest ways of bringing formal methods into general
usage, as a typical static analysis requires little or no input from the pro-
grammer. The chapter presents a general methodology and two instances of
the method for deducing arithmetical properties of programs.
Finally, Chapter 13 suggests directions for further reading and research.

Teaching

This book can be used in various ways and taught at multiple levels. Figure
0.1 presents a dependency graph for the chapters. There are two main tracks:
the verification track, which focuses on Chapters 1–4, 5, 6, and 12; and the
decision procedures track, which focuses on Chapters 1–4 and 7–11. Within
the decision procedures track, the reader can focus on the quantifier-free de-
cision procedures track, which skips Chapters 7 and 11. The reader interested
in quickly obtaining an understanding of modern combination decision proce-
dures would prefer this final track.
We have annotated several sections with a ⋆ to indicate that they provide
additional depth that is unnecessary for understanding subsequent material.
Additionally, all proofs may be skipped without preventing a general under-
standing of the material.
Each chapter ends with a set of exercises. Some require just a mechanical
understanding of the material, while others require a conceptual understand-
ing or ask the reader to think beyond what is presented in the book. These
latter exercises are annotated with a ⋆ . For certain audiences, additional exer-
cises might include implementing decision procedures or invariant generation
procedures and exploring certain topics in greater depth (see Chapter 13).
In our courses, we assign program verification exercises from Chapters 5
and 6 throughout the term to give students time to develop this important
skill. Learning to verify programs is about as difficult for students as learning
X Preface

to program in the first place. Specifying and verifying programs also strength-
ens the students’ facility with logic.

Bibliographic Remarks

Each chapter ends with a section entitled Bibliographic Remarks in which


we attempt to provide a brief account of the historical context and develop-
ment of the chapter’s material. We have undoubtedly missed some important
contributions, for which we apologize. We welcome corrections, comments,
and historical anecdotes.

The πVC System

We implemented a verifying compiler called πVC to accompany this text. It


allows users to write and verify annotated programs in the pi programming
language. The system and a set of examples, including the programs listed in
this book, are available for download from http://theory.stanford.edu/
∼arbrad/pivc. We plan to update this website regularly and welcome readers’

comments, questions, and suggestions about πVC and the text.

Acknowledgments

This material is based upon work supported by the National Science Foun-
dation under Grant Nos. CSR-0615449 and CNS-0411363 and by Navy/ONR
contract N00014-03-1-0939. Any opinions, findings, and conclusions or rec-
ommendations expressed in this material are those of the authors and do
not necessarily reflect the views of the National Science Foundation or the
Navy/ONR. The first author received additional support from a Sang Samuel
Wang Stanford Graduate Fellowship.
We thank the following people for their comments throughout the writ-
ing of this book: Miquel Bertran, Andrew Bradley, Susan Bradley, Chang-
Seo Park, Caryn Sedloff, Henny Sipma, Matteo Slanina, Sarah Solter, Fabio
Somenzi, Tomás Uribe, the students of CS156, and Alfred Hofmann and the
reviewers and editors at Springer. Their suggestions helped us to improve
the presentation substantially. Remaining errors and shortcomings are our
responsibility.

Stanford University, Aaron R. Bradley


June 2007 Zohar Manna
Contents

Part I Foundations

1 Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Satisfiability and Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Truth Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 Semantic Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Equivalence and Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Decision Procedures for Satisfiability . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.1 Simple Decision Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7.2 Reconsidering the Truth-Table Method . . . . . . . . . . . . . . . 22
1.7.3 Conversion to an Equisatisfiable Formula in CNF . . . . . . 24
1.7.4 The Resolution Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7.5 DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Satisfiability and Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.1 Safe Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4.2 Schema Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.5 Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.6 Decidability and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6.1 Satisfiability as a Formal Language . . . . . . . . . . . . . . . . . . 53
XII Contents

2.6.2 Decidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.3 ⋆ Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.7 ⋆ Meta-Theorems of First-Order Logic . . . . . . . . . . . . . . . . . . . . . 56
2.7.1 Simplifying the Language of FOL . . . . . . . . . . . . . . . . . . . . 57
2.7.2 Semantic Argument Proof Rules . . . . . . . . . . . . . . . . . . . . . 58
2.7.3 Soundness and Completeness . . . . . . . . . . . . . . . . . . . . . . . 58
2.7.4 Additional Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 First-Order Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.1 First-Order Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Natural Numbers and Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 Peano Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.2 Presburger Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.3.3 Theory of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 Rationals and Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4.1 Theory of Reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4.2 Theory of Rationals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5 Recursive Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.7 ⋆ Survey of Decidability and Complexity . . . . . . . . . . . . . . . . . . . 90
3.8 Combination Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1 Stepwise Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Complete Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3 Well-Founded Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4 Structural Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5 Program Correctness: Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 113


5.1 pi: A Simple Imperative Language . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.1.1 The Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.2 Program Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2 Partial Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.1 Basic Paths: Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.2.2 Basic Paths: Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . 131
Contents XIII

5.2.3 Program States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


5.2.4 Verification Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2.5 P -Invariant and P -Inductive . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3 Total Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6 Program Correctness: Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 153


6.1 Developing Inductive Annotations . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1.1 Basic Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.1.2 The Precondition Method . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.1.3 A Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.2 Extended Example: QuickSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.2.1 Partial Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.2.2 Total Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Part II Algorithmic Reasoning

7 Quantified Linear Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183


7.1 Quantifier Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.1.1 Quantifier Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
7.1.2 A Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.2 Quantifier Elimination over Integers . . . . . . . . . . . . . . . . . . . . . . . 185
7.2.1 Augmented Theory of Integers . . . . . . . . . . . . . . . . . . . . . . 185
7.2.2 Cooper’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.2.3 A Symmetric Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.2.4 Eliminating Blocks of Quantifiers . . . . . . . . . . . . . . . . . . . . 195
7.2.5 ⋆ Solving Divides Constraints . . . . . . . . . . . . . . . . . . . . . . . 196
7.3 Quantifier Elimination over Rationals . . . . . . . . . . . . . . . . . . . . . . 200
7.3.1 Ferrante and Rackoff’s Method . . . . . . . . . . . . . . . . . . . . . . 200
7.4 ⋆ Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8 Quantifier-Free Linear Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 207


8.1 Decision Procedures for Quantifier-Free Fragments . . . . . . . . . . . 207
8.2 Preliminary Concepts and Notation . . . . . . . . . . . . . . . . . . . . . . . . 209
8.3 Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.4 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
XIV Contents

8.4.1 From M to M0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219


8.4.2 Vertex Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.4.3 ⋆ Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

9 Quantifier-Free Equality and Data Structures . . . . . . . . . . . . . . 241


9.1 Theory of Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
9.2 Congruence Closure Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.2.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.2.2 Congruence Closure Algorithm . . . . . . . . . . . . . . . . . . . . . . 247
9.3 Congruence Closure with DAGs . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.3.1 Directed Acyclic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
9.3.2 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
9.3.3 Congruence Closure Algorithm . . . . . . . . . . . . . . . . . . . . . . 255
9.3.4 Decision Procedure for TE -Satisfiability . . . . . . . . . . . . . . . 256
9.3.5 ⋆ Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
9.4 Recursive Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.5 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

10 Combining Decision Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 269


10.1 Combining Decision Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
10.2 Nelson-Oppen Method: Nondeterministic Version . . . . . . . . . . . . 271
10.2.1 Phase 1: Variable Abstraction . . . . . . . . . . . . . . . . . . . . . . . 271
10.2.2 Phase 2: Guess and Check . . . . . . . . . . . . . . . . . . . . . . . . . . 273
10.2.3 Practical Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
10.3 Nelson-Oppen Method: Deterministic Version . . . . . . . . . . . . . . . 276
10.3.1 Convex Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
10.3.2 Phase 2: Equality Propagation . . . . . . . . . . . . . . . . . . . . . . 278
10.3.3 Equality Propagation: Implementation . . . . . . . . . . . . . . . 282
10.4 ⋆ Correctness of the Nelson-Oppen Method . . . . . . . . . . . . . . . . . 283
10.5 ⋆ Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

11 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
11.1 Arrays with Uninterpreted Indices . . . . . . . . . . . . . . . . . . . . . . . . . 292
11.1.1 Array Property Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . 292
11.1.2 Decision Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
11.2 Integer-Indexed Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Contents XV

11.2.1 Array Property Fragment . . . . . . . . . . . . . . . . . . . . . . . . . . 300


11.2.2 Decision Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
11.3 Hashtables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
11.3.1 Hashtable Property Fragment . . . . . . . . . . . . . . . . . . . . . . . 305
11.3.2 Decision Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
11.4 Larger Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

12 Invariant Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311


12.1 Invariant Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
12.1.1 Weakest Precondition and Strongest Postcondition . . . . 312
12.1.2 ⋆ General Definitions of wp and sp . . . . . . . . . . . . . . . . . . . 315
12.1.3 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
12.1.4 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
12.2 Interval Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
12.3 Karr’s Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
12.4 ⋆ Standard Notation and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 341
12.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

13 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Part I

Foundations

Everything is vague to a degree you do not realize till you have tried
to make it precise.
— Bertrand Russell
Philosophy of Logical Atomism, 1918
Modern design and implementation of software and hardware systems lacks
precision. Design documents written in a natural language admit misinterpre-
tation. Informal arguments about why a system works miss crucial weaknesses.
The resulting systems are fragile. Part I of this book presents an alternative
approach to system design and implementation based on using a formal lan-
guage to specify and reason about software systems.
Chapters 1 and 2 introduce the (first-order) predicate calculus. Chapter 1
presents the propositional calculus, and Chapter 2 presents the full predicate
calculus. A central task is determining whether formulae of the calculus are
valid. Chapter 3 formalizes common data types of software in the predicate
calculus. It also introduces the concepts of decidability and complexity of
deciding validity of formulae.
The final three chapters of Part I discuss applications of the predicate cal-
culus. Chapter 4 formalizes mathematical induction in the predicate calculus,
in the process introducing several forms of induction that may be new to the
reader. Chapters 5 and 6 then apply the predicate calculus and mathemat-
ical induction to the specification and verification of software. Specification
consists of asserting facts about software. Verification applies mathematical
induction to prove that each assertion evaluates to true when program con-
trol reaches it; and to prove that program control eventually reaches specific
program locations.
Part I thus provides the mathematical foundations for precise engineering.
Part II will investigate algorithmic aspects of applying these foundations.
1
Propositional Logic

A deduction is speech in which, certain things having been supposed,


something different from the things supposed results of necessity be-
cause of their being so.
— Aristotle
Prior Analytics, 4th century BC
A calculus is a set of symbols and a system of rules for manipulating the
symbols. In an interesting calculus, the symbols and rules have meaning in
some domain that matters. For example, the differential calculus defines rules
for manipulating the integral symbol over a polynomial to compute the area
under the curve that the polynomial defines. Area has meaning outside of the
calculus; the calculus provides the tool for computing such quantities. The
domain of the differential calculus, loosely speaking, consists of real numbers
and functions over those numbers.
Computer scientists are interested in a different domain and thus require
a different calculus. The behavior of programs, or computation, is a computer
scientist’s chief concern. What is an appropriate domain for studying com-
putation? The basic entity of the domain is state: roughly, the assignment of
values (for example, Booleans, integers, or addresses) to variables. Pairs of
states comprise transitions. A computation is a sequence of states, each ad-
jacent pair of which is a transition. A program defines the form of its states,
the set of transitions between states, and the set of computations that it can
produce. A program’s set of computations characterizes the program itself as
precisely as its source code. Chapter 5 studies these ideas in depth.
With a domain in mind, a computer scientist can now ask questions. Does
this program that accepts an array of integers produce a sorted array? In
other words, does each of the program’s computations have a state in which a
sorted array is returned? Does this program ever access unallocated memory?
Does this function always halt? To answer such questions, we need a calculus
to reason about computations.
4 1 Propositional Logic

This chapter and the next introduce the calculus that will be the basis for
studying computation in this book. In this chapter, we cover propositional
logic (PL); in the next chapter, we build on the presentation to define first-
order logic (FOL). PL and FOL are also known as propositional calculus
and predicate calculus, respectively, because they are calculi for reasoning
about propositions (“the sky is blue”, “this comment references itself”) and
predicates (“x is blue”, “y references z”), respectively. Propositions are either
true or false, while predicates evaluate to true or false depending on the values
given to their parameters (x, y, and z).
Just as differential calculus has a set of symbols, a set of rules, and a
mapping to reality that provides its meaning, propositional logic has its own
symbols, rules of inference, and meaning. Sections 1.1 and 1.2 introduce the
syntax and semantics (meaning) of PL formulae. Then Section 1.3 discusses
two concepts that are fundamental throughout this book, satisfiability (Is
this formula ever true?) and validity (Is this formula always true?), and the
rules for computing whether a PL formula is satisfiable or valid. Rules for
manipulating PL formulae, some of which preserve satisfiability and validity,
are discussed in Section 1.5 and applied in Section 1.6.

1.1 Syntax

In this section, we introduce the syntax of PL. The syntax of a logical lan-
guage consists of a set of symbols and rules for combining them to form
“sentences” (in this case, formulae) of the language.
The basic elements of PL are the truth symbols ⊤ (“true”) and ⊥
(“false”) and the propositional variables, usually denoted by P , Q, R,
P1 , P2 , . . .. A countably infinite set of propositional variable symbols exists.
Logical connectives, also called Boolean connectives, provide the expres-
sive power of PL. A formula is simply ⊤, ⊥, or a propositional variable P ; or
the application of one of the following connectives to formulae F , F1 , or F2 :
• ¬F : negation, pronounced “not”;
• F1 ∧ F2 : conjunction, pronounced “and”;
• F1 ∨ F2 : disjunction, pronounced “or”;
• F1 → F2 : implication, pronounced “implies”;
• F1 ↔ F2 : iff, pronounced “if and only if”.
Each connective has an arity (the number of arguments that it takes): nega-
tion is unary (it takes one argument), while the other connectives are binary
(they take two arguments). The left and right arguments of → are called the
antecedent and consequent, respectively.
Some common terminology is useful. An atom is a truth symbol ⊤, ⊥ or
propositional variable P , Q, . . .. A literal is an atom α or its negation ¬α. A
formula is a literal or the application of a logical connective to a formula or
formulae.
1.1 Syntax 5

Formula G is a subformula of formula F if it occurs syntactically within


G. More precisely,
• the only subformula of P is P ;
• the subformulae of ¬F are ¬F and the subformulae of F ;
• and the subformulae of F1 ∧F2 , F1 ∨F2 , F1 → F2 , F1 ↔ F2 are the formula
itself and the subformulae of F1 and F2 .
Notice that every formula is a subformula of itself. The strict subformulae
of a formula are all its subformulae except itself.

Example 1.1. Consider the formula

F : (P ∧ Q) → (P ∨ ¬Q) .

It contains two propositional variables, P and Q. Each instance of P and Q


is an atom and a literal. ¬Q is a literal, but not an atom. F has six distinct
subformulae:

F , P ∨ ¬Q , ¬Q , P ∧Q , P , Q.

Its strict subformulae are all of its subformulae except F itself. 

Parentheses are cumbersome. We define the relative precedence of the logi-


cal connectives from highest to lowest as follows: ¬, ∧, ∨, →, ↔. Additionally,
let → and ↔ associate to the right, so that P → Q → R is the same formula
as P → (Q → R).

Example 1.2. Abbreviate F of Example 1.1 as

F ′ : P ∧ Q → P ∨ ¬Q .

Also,

P1 ∧ ¬P2 ∧ ⊤ ∨ ¬P1 ∧ P2

stands for

(P1 ∧ ((¬P2 ) ∧ ⊤)) ∨ ((¬P1 ) ∧ P2 ) .

Finally,

P1 → P2 → P3

abbreviates

P1 → (P2 → P3 ) .


6 1 Propositional Logic

1.2 Semantics
So far, we have considered the syntax of PL. The semantics of a logic provides
its meaning. What exactly is meaning? In PL, meaning is given by the truth
values true and false, where true 6= false. Our objective is to define how to
give meaning to formulae.
The first step in defining the semantics of PL is to provide a mechanism
for evaluating the propositional variables. An interpretation I assigns to
every propositional variable exactly one truth value. For example,

I : {P 7→ true, Q 7→ false, . . .}

is an interpretation assigning true to P and false to Q, where . . . elides the


(countably infinitely many) assignments that are not relevant to us. That is, I
assigns to every propositional variable available to us (and there are countably
infinitely many) a value. We usually do not write the elision. Clearly, many
interpretations exist.
Now given a PL formula F and an interpretation I, the truth value of F
can be computed. The simplest manner of computing the truth value of F is
via a truth table. Let us first examine truth tables that indicate how to eval-
uate each logical connective in terms of its arguments. First, a propositional
variable gets its truth value immediately from I. Now consider the possible
evaluations of F : it is either true or false. How is ¬F evaluated? The following
table summarizes the possibilities, where 0 corresponds to the value false, and
1 corresponds to true:

F ¬F
0 1
1 0

The other connective can be defined similarly given values of F1 and F2 :

F1 F2 F1 ∧ F2 F1 ∨ F2 F1 → F2 F1 ↔ F2
0 0 0 0 1 1
0 1 0 1 1 0
1 0 0 1 0 0
1 1 1 1 1 1

In particular, F1 → F2 is false iff F1 is true and F2 is false. (Throughout the


book, we use the word “iff” to abbreviate the phrase “if and only if”; one can
also read it as “precisely when”.)
Example 1.3. Consider the formula

F : P ∧ Q → P ∨ ¬Q

and the interpretation


1.2 Semantics 7

I : {P 7→ true, Q 7→ false} .

To evaluate the truth value of F under I, construct the following table:

P Q ¬Q P ∧ Q P ∨ ¬Q F
1 0 1 0 1 1

The top row is given by the subformulae of F . I provides values for the first
two columns; then the semantics of PL provide the values for the remainder
of the table. Hence, F evaluates to true under I. 

This tabular notation is convenient, but it is unsuitable for the predicate


logic of Chapter 2. Instead, we introduce an inductive definition of PL’s
semantics that will extend to Chapter 2. An inductive definition defines the
meaning of basic elements first, which in the case of PL are atoms. Then it
assumes that the meaning of a set of elements is fixed and defines a more
complex element in terms of these elements. For example, in PL, F1 ∧ F2 is a
more complex formula than either of the formulae F1 or F2 .
Recall that we want to compute whether F has value true under inter-
pretation I. We write I |= F if F evaluates to true under I and I 6|= F if
F evaluates to false. To start our inductive definition, define the meaning of
truth symbols:

I |= ⊤
I 6|= ⊥

Under any interpretation I, ⊤ has value true, and ⊥ has value false. Next,
define the truth value of propositional variables:

I |= P iff I[P ] = true

P has value true iff the interpretation I assigns P to have value true.
Since an interpretation assigns a truth value to every propositional vari-
able, I assigns false to P when I does not assign true to P . Thus, we can
instead define the truth values of propositional variables as follows:

I 6|= P iff I[P ] = false

Since true 6= false, both definitions yield the same (unique) truth values.
Having completed the base cases of our inductive definition, we turn to
the inductive step. Assume that formulae F , F1 , and F2 have truth values.
From these formulae, evaluate the semantics of more complex formulae:

I |= ¬F iff I 6|= F
I |= F1 ∧ F2 iff I |= F1 and I |= F2
I |= F1 ∨ F2 iff I |= F1 or I |= F2
I |= F1 → F2 iff, if I |= F1 then I |= F2
I |= F1 ↔ F2 iff I |= F1 and I |= F2 , or I 6|= F1 and I 6|= F2
8 1 Propositional Logic

In studying these definitions, it is useful to recall the earlier definitions given


by the truth tables, which are free of English ambiguities.
For implication, consider also the equivalent formulation

I 6|= F1 → F2 iff I |= F1 and I 6|= F2

The formula F1 → F2 has truth value true under I when either F1 is false
or F2 is true. It is false only when F1 is true and F2 is false. Our inductive
definition of the semantics of PL is complete.

Example 1.4. Consider the formula

F : P ∧ Q → P ∨ ¬Q

and the interpretation

I : {P 7→ true, Q 7→ false} .

Compute the truth value of F as follows:

1. I |= P since I[P ] = true


2. I 6|= Q since I[Q] = false
3. I |= ¬Q by 2 and semantics of ¬
4. I 6|= P ∧Q by 2 and semantics of ∧
5. I |= P ∨ ¬Q by 1 and semantics of ∨
6. I |= F by 4 and semantics of →

We considered the distinct subformulae of F according to the subformula


ordering: F1 precedes F2 if F1 is a subformula of F2 . In that order, we
computed the truth value of F from its simplest subformulae to its most
complex subformula (F itself).
The final line of the calculation deserves some explanation. According to
the semantics for implication,

I |= F1 → F2 iff, if I |= F1 then I |= F2

the implication F1 → F2 has value true when I 6|= F1 . Thus, line 5 is unnec-
essary for establishing the truth value of F . 

1.3 Satisfiability and Validity


We now consider a fundamental characterization of PL formulae.
A formula F is satisfiable iff there exists an interpretation I such that
I |= F . A formula F is valid iff for all interpretations I, I |= F . Determining
satisfiability and validity of formulae are important tasks in logic.
Satisfiability and validity are dual concepts, and switching from one to the
other is easy. F is valid iff ¬F is unsatisfiable. For suppose that F is valid;
1.3 Satisfiability and Validity 9

then for any interpretation I, I |= F . By the semantics of negation, I 6|= ¬F ,


so ¬F is unsatisfiable. Conversely, suppose that ¬F is unsatisfiable. For any
interpretation I, I 6|= ¬F , so that I |= F by the semantics of negation. Thus,
F is valid.
Because of this duality between satisfiability and validity, we are free to
focus on either one or the other in the text, depending on which is more
convenient for the discussion. The reader should realize that statements about
one are also statements about the other.
In this section, we present several methods of determining validity and
satisfiability of PL formulae.

1.3.1 Truth Tables

Our first approach to checking the validity of a PL formula is the truth-table


method. We exhibit this method by example.

Example 1.5. Consider the formula

F : P ∧ Q → P ∨ ¬Q .

Is it valid? Construct a table in which the first row is a list of the subformulae
of F ordered according to the subformula ordering. Fill columns of proposi-
tional variables with all possible combinations of truth values. Then apply the
semantics of PL to fill the rest of the table:
P QP ∧Q ¬Q P ∨ ¬Q F
0 0 0 1 1 1
0 1 0 0 0 1
1 0 0 1 1 1
1 1 1 0 1 1

The final column, which represents the truth value of F under the possible
interpretations, is filled entirely with true. F is valid. 

Example 1.6. Consider the formula

F : P ∨Q → P ∧Q .

Construct the truth table:


P QP ∨Q P ∧Q F
0 0 0 0 1
0 1 1 0 0
1 0 1 0 0
1 1 1 1 1

Because the second and third rows show that F can be false, F is invalid. 
10 1 Propositional Logic

1.3.2 Semantic Arguments

Our next approach to validity checking is the semantic argument method.


While more complicated than the truth-table method, we introduce it and
emphasize it throughout the remainder of the chapter because it is our only
method of evaluating the satisfiability and validity of formulae in Chapter 2.
A proof based on the semantic method begins by assuming that the given
formula F is invalid: hence, there is a falsifying interpretation I such that
I 6|= F . The proof proceeds by applying the semantic definitions of the logical
connectives in the form of proof rules. A proof rule has one or more premises
(assumed facts) and one or more deductions (deduced facts). An application
of a proof rule requires matching the premises to facts already existing in the
semantic argument and then forming the deductions. The proof rules are the
following:
• According to the semantics of negation, from I |= ¬F , deduce I 6|= F ; and
from I 6|= ¬F , deduce I |= F :

I |= ¬F I 6|= ¬F
I 6|= F I |= F

• According to the semantics of conjunction, from I |= F ∧ G, deduce both


I |= F and I |= G; and from I 6|= F ∧ G, deduce I 6|= F or I 6|= G. The
latter deduction results in a fork in the proof; each case must be considered
separately.

I |= F ∧ G I 6|= F ∧ G
I |= F I 6|= F | I 6|= G
I |= G

• According to the semantics of disjunction, from I |= F ∨ G, deduce I |= F


or I |= G; and from I 6|= F ∨ G, deduce both I 6|= F and I 6|= G. The
former deduction requires a case analysis in the proof.

I |= F ∨ G I |6 = F ∨ G
I |= F | I |= G I 6|= F
I 6|= G

• According to the semantics of implication, from I |= F → G, deduce


I 6|= F or I |= G; and from I 6|= F → G, deduce both I |= F and I 6|= G.
The former deduction requires a case analysis in the proof.

I |= F → G I |6 = F → G
I 6|= F | I |= G I |= F
I 6|= G
1.3 Satisfiability and Validity 11

• According to the semantics of iff, from I |= F ↔ G, deduce I |= F ∧ G or


I 6|= F ∨ G; and from I 6|= F ↔ G, deduce I |= F ∧ ¬G or I |= ¬F ∧ G.
Both deductions require considering multiple cases.
I |= F ↔ G I 6|= F ↔ G
I |= F ∧ G | I 6|= F ∨ G I |= F ∧ ¬G | I |= ¬F ∧ G
• Finally, a contradiction occurs when following the above proof rules results
in the claim that an interpretation I both satisfies a formula F and does
not satisfy F .
I |= F
I 6|= F
I |= ⊥
Before explaining proofs in more detail, let us see several examples.
Example 1.7. To prove that the formula
F : P ∧ Q → P ∨ ¬Q
is valid, assume that it is invalid and derive a contradiction. Thus, assume
that there is a falsifying interpretation I of F (such that I 6|= F ). Then,
1. I 6|= P ∧ Q → P ∨ ¬Q assumption
2. I |= P ∧Q by 1 and semantics of →
3. I 6|= P ∨ ¬Q by 1 and semantics of →
4. I |= P by 2 and semantics of ∧
5. I |= Q by 2 and semantics of ∧
6. I 6|= P by 3 and semantics of ∨
7. I 6|= ¬Q by 3 and semantics of ∨
8. I |= Q by 7 and semantics of ¬
Lines 4 and 6 contradict each other, so that our assumption must be wrong:
F is actually valid.
We can end the proof as soon as we have a contradiction. For example,
1. I 6|= P ∧ Q → P ∨ ¬Q assumption
2. I |= P ∧Q by 1 and semantics of →
3. I 6|= P ∨ ¬Q by 1 and semantics of →
4. I |= P by 2 and semantics of ∧
5. I 6|= P by 3 and semantics of ∨
This argument is sufficient because a contradiction already exists. In other
words, the discovered contradiction closes the one branch of the proof. We
sometimes note the contradiction explicitly in the proof:
6. I |= ⊥ 4 and 5 are contradictory

12 1 Propositional Logic

Example 1.8. To prove that the formula

F : (P → Q) ∧ (Q → R) → (P → R)

is valid, assume otherwise and derive a contradiction:

1. I 6|= F assumption
2. I |= (P → Q) ∧ (Q → R) by 1 and semantics of →
3. I 6|= P →R by 1 and semantics of →
4. I |= P by 3 and semantics of →
5. I 6|= R by 3 and semantics of →
6. I |= P →Q by 2 and semantics of ∧
7. I |= Q→R by 2 and semantics of ∧

There are two cases to consider from 6. In the first case,

8a. I 6|= P by 6 and semantics of →


9a. I |= ⊥ 4 and 8a are contradictory

In the second case,

8b. I |= Q by 6 and semantics of →

Now there are two more cases from 7. In the first case,

9ba. I 6|= Q by 7 and semantics of →


10ba. I |= ⊥ 8b and 9ba are contradictory

In the second case,

9bb. I |= R by 7 and semantics of →


10bb. I |= ⊥ 5 and 9bb are contradictory

All three branches of the proof are closed: F is valid. 

We introduce vocabulary for discussing semantic proofs. The reader need


not memorize these terms now; just refer to them as they are used. A line
L : I |= F or L : I 6|= F is a single statement in the proof, sometimes labeled
as in the examples. A line L is a direct descendant of a parent M if L is
directly below M in the proof. L is a descendant of M if M is L itself, if L is
a direct descendant of M , or if the parent of L is a descendant of M (in other
words, descendant is the reflexive and transitive closure of direct descendant).
M is an ancestor of L if L is a descendant of M . Several proof rules — the
second conjunction rule, the first disjunction rule, the first implication rule,
and both rules for iff — produce a fork in the argument, as the last example
shows. A proof thus evolves as a tree rather than linearly. A branch of the
tree is a sequence of lines descending from the root. A branch is closed if it
contains a contradiction, either explicitly as I |= ⊥ or implicitly as I |= G
1.3 Satisfiability and Validity 13

and I 6|= G for some formula G. Otherwise, the branch is open. A semantic
argument is finished when no more proof rules are applicable. It is a proof
of the validity of F if every branch is closed; otherwise, each open branch
describes a falsifying interpretation of F .
While the given proof rules are (theoretically) sufficient, derived proof
rules can make proofs more concise.
Example 1.9. The derived rule of modus ponens simplifies the proof of
Example 1.8. The rule is the following:
I |= F
I |= F → G
I |= G

In words, from I |= F and I |= F → G, deduce I |= G.


Using this rule, let us simplify the proof of the validity of

F : (P → Q) ∧ (Q → R) → (P → R) .

We assume that it is invalid and try to derive a contradiction.


1. I 6|= F assumption
2. I |= (P → Q) ∧ (Q → R) by 1 and semantics of →
3. I 6|= P →R by 1 and semantics of →
4. I |= P by 3 and semantics of →
5. I 6|= R by 3 and semantics of →
6. I |= P →Q by 2 and semantics of ∧
7. I |= Q→R by 2 and semantics of ∧
8. I |= Q by 4, 6, and modus ponens
9. I |= R by 8, 7, and modus ponens
10. I |= ⊥ 5 and 9 are contradictory
This proof has only one branch. 
The truth-table and semantic methods can be used to check satisfiability.
For example, the truth table of Example 1.6 can be extended to show that

¬F : ¬(P ∨ Q → P ∧ Q)

is satisfiable:
P QP ∨Q P ∧Q F ¬F
0 0 0 0 1 0
0 1 1 0 0 1
1 0 1 0 0 1
1 1 1 1 1 0

The second and third rows represent satisfying interpretations of ¬F . Addi-


tionally, the semantic argument in the following example shows that
14 1 Propositional Logic

G : ¬(P ∨ Q → P ∧ Q)

is satisfied by the discovered interpretation I, and thus that G is satisfiable.

Example 1.10. To prove that the formula

F : P ∨Q → P ∧Q

is valid, assume that F is invalid; then there is an interpretation I such that


I |= ¬F :

1. I 6|= P ∨ Q → P ∧ Q assumption
2. I |= P ∨ Q by 1 and semantics of →
3. I 6|= P ∧ Q by 1 and semantics of →

We have two choices to make. By 2 and the semantics of disjunction, either


P or Q must be true. By 3 and the semantics of conjunction, either P or Q
must be false. So there are two options: either P is true and Q is false, or P is
false and Q is true. We choose P to be true and Q to be false. Then,

4a. I |= P by 2 and semantics of ∨


5a. I |6 = Q by 3 and semantics of ∧

The only subformulae of P and Q are themselves, so the table is complete.


Yet we did not derive a contradiction. In fact, we found the interpretation

I : {P 7→ true, Q 7→ false}

for which I |= ¬F . Therefore, F is actually invalid. The interpretation I :


{P 7→ true, Q 7→ false} is a falsifying interpretation.
If our choice had resulted in a contradiction, then we would have had to
try the other choice for P and Q, in which P is false and Q is true. In general,
we stop either when we have found an interpretation or when we have closed
every branch. 

1.4 Equivalence and Implication


Just as satisfiability and validity are important properties of PL formulae,
equivalence and implication are important properties of pairs of formulae.
Two formulae F1 and F2 are equivalent if they evaluate to the same truth
value under all interpretations I. That is, for all interpretations I, I |= F1
iff I |= F2 . Another way to state the equivalence of F1 and F2 is to assert
the validity of the formula F1 ↔ F2 . We write F1 ⇔ F2 when F1 and F2 are
equivalent. F1 ⇔ F2 is not a formula; it simply abbreviates the statement “F1
and F2 are equivalent.”
We use the last characterization to prove that two formulae are equivalent.
1.4 Equivalence and Implication 15

Example 1.11. To prove that

P ⇔ ¬¬P ,

we prove that

P ↔ ¬¬P

is valid via a truth table:


P ¬P ¬¬P P ↔ ¬¬P
0 1 0 1
1 0 1 1

Example 1.12. To prove

P → Q ⇔ ¬P ∨ Q ,

we prove that

F : P → Q ↔ ¬P ∨ Q

is valid via a truth table:


P QP →Q ¬P ¬P ∨ Q F
0 0 1 1 1 1
0 1 1 1 1 1
1 0 0 0 0 1
1 1 1 0 1 1

Formula F1 implies formula F2 if I |= F2 for every interpretation I such


that I |= F1 . Another way to state that F1 implies F2 is to assert the validity
of the formula F1 → F2 . We write F1 ⇒ F2 when F1 implies F2 . Do not
confuse the implication F1 ⇒ F2 , which asserts the validity of F1 → F2 , with
the PL formula F1 → F2 , which is constructed using the logical operator →.
F1 ⇒ F2 is not a formula.
As with equivalences, we use the validity characterization to prove impli-
cations.

Example 1.13. To prove that

R ∧ (¬R ∨ P ) ⇒ P ,

we prove that
16 1 Propositional Logic

F : R ∧ (¬R ∨ P ) → P

is valid via a semantic argument. Suppose F is not valid; then there exists an
interpretation I such that I 6|= F :

1. I 6|= F assumption
2. I |= R ∧ (¬R ∨ P ) by 1 and semantics of →
3. I 6|= P by 1 and semantics of →
4. I |= R by 2 and semantics of ∧
5. I |= ¬R ∨ P by 2 and semantics of ∧

There are two cases to consider. In the first case,

6a. I |= ¬R by 5 and semantics of ∨


7a. I |= ⊥ 4 and 6a are contradictory

In the second case,

6b. I |= P by 5 and semantics of ∨


7b. I |= ⊥ 3 and 6b are contradictory

Thus, our assumption that I 6|= F is wrong, and F is valid. 

1.5 Substitution
Substitution is a syntactic operation on formulae with significant semantic
consequences. It allows us to prove the validity of entire sets of formulae via
formula templates. It is also an essential tool for manipulating formulae
throughout the text.
A substitution σ is a mapping from formulae to formulae:

σ : {F1 7→ G1 , . . . , Fn 7→ Gn } .

The domain of σ, domain(σ), is

domain(σ) : {F1 , . . . , Fn } ,

while the range range(σ) is

range(σ) : {G1 , . . . , Gn } .

The application of a substitution σ to a formula F , F σ, replaces each occur-


rence of a formula Fi in the domain of σ with its corresponding formula Gi in
the range of σ. Replacements occur all at once. We remove any ambiguity by
establishing that when both subformulae Fj and Fk are in the domain of σ,
and Fk is a strict subformula of Fj , then the larger subformula Fj is replaced
by the corresponding formula Gj . An example clarifies this statement.
1.5 Substitution 17

Example 1.14. Consider formula

F : P ∧ Q → P ∨ ¬Q

and substitution

σ : {P 7→ R, P ∧ Q 7→ P → Q} .

Then

F σ : (P → Q) → R ∨ ¬Q ,

where the antecedent P ∧ Q of F is replaced by P → Q, and the P of the


consequent is replaced by R. Moreover,

F σ 6= R ∧ Q → R ∨ ¬Q

by our convention. 

A variable substitution is a substitution in which the domain consists


only of propositional variables.
One notation is useful when working with substitutions. When we write
F [F1 , . . . , Fn ], we mean that formula F can have formulae Fi , i = 1, . . . , n, as
subformulae. If σ is {F1 7→ G1 , . . . , Fn 7→ Gn }, then

F [F1 , . . . , Fn ]σ : F [G1 , . . . , Gn ] .

In the formula of Example 1.14, writing

F [P, P ∧ Q]σ : F [R, P → Q]

emphasizes that subformulae P and P ∧ Q of F are replaced by formulae R


and P → Q, respectively.
Two interesting semantic consequences can be derived from substitution.
Proposition 1.15 states that substituting subformulae Fi of F with correspond-
ing equivalent subformulae Gi results in an equivalent formula F ′ .

Proposition 1.15 (Substitution of Equivalent Formulae). Consider


substitution

σ : {F1 7→ G1 , . . . , Fn 7→ Gn }

such that for each i, Fi ⇔ Gi . Then F ⇔ F σ.

Example 1.16. Consider applying substitution

σ : {P → Q 7→ ¬P ∨ Q}

to
18 1 Propositional Logic

F : (P → Q) → R .

Since P → Q ⇔ ¬P ∨ Q, the formula

F σ : (¬P ∨ Q) → R

is equivalent to F . 

Proposition 1.17 asserts that proving the validity of a PL formula F actu-


ally proves the validity of an infinite set of formulae: those formulae that can
be derived from F via variable substitutions.

Proposition 1.17 (Valid Template). If F is valid and G = F σ for some


variable substitution σ, then G is valid.

Example 1.18. In Example 1.12, we proved that P → Q is equivalent to


¬P ∨ Q:

F : (P → Q) ↔ (¬P ∨ Q)

is valid. The validity of F implies that every formula of the form F1 → F2 is


equivalent to ¬F1 ∨ F2 , for arbitrary subformulae F1 and F2 . 

Finally, it is often useful to compute the composition of substitutions.


Given substitutions σ1 and σ2 , the idea is to compute substitution σ such that
F σ1 σ2 = F σ for any F . Compute σ1 σ2 as follows:
1. apply σ2 to each formula of the range of σ1 , and add the results to σ;
2. if Fi of Fi 7→ Gi appears in the domain of σ2 but not in the domain of
σ1 , then add Fi 7→ Gi to σ.

Example 1.19. Compute the composition of substitutions

σ1 σ2 : {P 7→ R, P ∧ Q 7→ P → Q}{P 7→ S, S 7→ Q}

as follows:

{P 7→ Rσ2 , P ∧ Q 7→ (P → Q)σ2 , S 7→ Q}
= {P 7→ R, P ∧ Q 7→ S → Q, S 7→ Q}

1.6 Normal Forms


A normal form of formulae is a syntactic restriction such that for every
formula of the logic, there is an equivalent formula in the normal form. Three
normal forms are particularly important for PL.
1.6 Normal Forms 19

Negation normal form (NNF) requires that ¬, ∧, and ∨ be the only


connectives and that negations appear only in literals. Transforming a formula
F to equivalent formula F ′ in NNF can be computed recursively using the
following list of template equivalences:

¬¬F1 ⇔ F1
¬⊤ ⇔ ⊥
¬⊥ ⇔ ⊤
¬(F1 ∧ F2 ) ⇔ ¬F1 ∨ ¬F2
¬(F1 ∨ F2 ) ⇔ ¬F1 ∧ ¬F2
F1 → F2 ⇔ ¬F1 ∨ F2
F1 ↔ F2 ⇔ (F1 → F2 ) ∧ (F2 → F1 )

When implementing the transformation, the equivalences should be applied


left-to-right. The equivalences

¬(F1 ∧ F2 ) ⇔ ¬F1 ∨ ¬F2 ¬(F1 ∨ F2 ) ⇔ ¬F1 ∧ ¬F2

are known as De Morgan’s Law.


Propositions 1.15 and 1.17 justify that the result of applying the template
equivalences to a formula produces an equivalent formula. The transitivity of
equivalence justifies that this equivalence holds over any number of transfor-
mations: if F ⇔ G and G ⇔ H, then F ⇔ H.

Example 1.20. To convert the formula

F : ¬(P → ¬(P ∧ Q))

into NNF, apply the template equivalence

F1 → F2 ⇔ ¬F1 ∨ F2 (1.1)

to produce

F ′ : ¬(¬P ∨ ¬(P ∧ Q)) .

Let us understand this “application” of the template equivalence in detail.


First, apply variable substitution

σ1 : {F1 7→ P, F2 7→ ¬(P ∧ Q)}

to the valid template formula of equivalence (1.1):

(F1 → F2 ↔ ¬F1 ∨ F2 )σ1 : P → ¬(P ∧ Q) ↔ ¬P ∨ ¬(P ∧ Q) .

Proposition 1.17 implies that the result is valid. Then construct substitution

σ2 : {P → ¬(P ∧ Q) 7→ ¬P ∨ ¬(P ∧ Q)} ,


20 1 Propositional Logic

and apply Proposition 1.15 to F σ2 to yield that


F ′ : ¬(¬P ∨ ¬(P ∧ Q))
is equivalent to F . Subsequently, we shall not provide these details.
Continuing with the conversion to NNF, apply De Morgan’s law
¬(F1 ∨ F2 ) ⇔ ¬F1 ∧ ¬F2
to produce
F ′′ : ¬¬P ∧ ¬¬(P ∧ Q) .
Apply
¬¬F1 ⇔ F1
twice to produce
F ′′′ : P ∧ P ∧ Q ,
which is in NNF and equivalent to F . 
A formula is in disjunctive normal form (DNF) if it is a disjunction
of conjunctions of literals:
_^
ℓi,j for literals ℓi,j .
i j

To convert a formula F into an equivalent formula in DNF, transform F into


NNF and then use the following table of template equivalences:
(F1 ∨ F2 ) ∧ F3 ⇔ (F1 ∧ F3 ) ∨ (F2 ∧ F3 )
F1 ∧ (F2 ∨ F3 ) ⇔ (F1 ∧ F2 ) ∨ (F1 ∧ F3 )
Again, when implementing the transformation, the equivalences should be
applied left-to-right. The equivalences simply say that conjunction distributes
over disjunction.
Example 1.21. To convert
F : (Q1 ∨ ¬¬Q2 ) ∧ (¬R1 → R2 )
into DNF, first transform it into NNF
F ′ : (Q1 ∨ Q2 ) ∧ (R1 ∨ R2 ) ,
and then apply distributivity to obtain
F ′′ : (Q1 ∧ (R1 ∨ R2 )) ∨ (Q2 ∧ (R1 ∨ R2 )) ,
and then distributivity twice again to produce
F ′′′ : (Q1 ∧ R1 ) ∨ (Q1 ∧ R2 ) ∨ (Q2 ∧ R1 ) ∨ (Q2 ∧ R2 ) .
F ′′′ is in DNF and is equivalent to F . 
1.7 Decision Procedures for Satisfiability 21

The dual of DNF is conjunctive normal form (CNF). A formula in


CNF is a conjunction of disjunctions of literals:
^_
ℓi,j for literals ℓi,j .
i j

Each inner block of disjunctions is called a clause. To convert a formula F


into an equivalent formula in CNF, transform F into NNF and then use the
following table of template equivalences:

(F1 ∧ F2 ) ∨ F3 ⇔ (F1 ∨ F3 ) ∧ (F2 ∨ F3 )


F1 ∨ (F2 ∧ F3 ) ⇔ (F1 ∨ F2 ) ∧ (F1 ∨ F3 )

Example 1.22. To convert

F : (Q1 ∧ ¬¬Q2 ) ∨ (¬R1 → R2 )

into CNF, first transform F into NNF:

F ′ : (Q1 ∧ Q2 ) ∨ (R1 ∨ R2 ) .

Then apply distributivity to obtain

F ′′ : (Q1 ∨ R1 ∨ R2 ) ∧ (Q2 ∨ R1 ∨ R2 ) ,

which is in CNF and equivalent to F . 

1.7 Decision Procedures for Satisfiability

Section 1.3 introduced the truth-table and semantic argument methods for
determining the satisfiability of PL formulae. In this section, we study al-
gorithms for deciding satisfiability (see Section 2.6 for a formal discussion of
decidability). A decision procedure for satisfiability of PL formulae reports,
after some finite amount of computation, whether a given PL formula F is
satisfiable.

1.7.1 Simple Decision Procedures

The truth-table method immediately suggests a decision procedure: construct


the full table, which has 2n rows when F has n variables, and report whether
the final column, representing F , has value 1 in any row.
The semantic argument method also suggests a decision procedure. The
basic idea is to make sure that a proof rule is only applied to each line in
the argument at most once. Because each deduction is simpler in construction
than its premise, the constructed proof is of finite size (see Chapter 4 for
22 1 Propositional Logic

a formal approach to proving this point). When the semantic argument is


finished, report whether any branch is still open.
This simple description leaves out many details. Most importantly, when
many lines exist to which one can apply proof rules, which line should be con-
sidered next? Different implementations of this decision, called proof tactics,
result in different proof shapes and sizes. For example, one basic tactic is to
apply proof rules with only one deduction before proof rules with multiple
deductions to delay forks in the proof as long as possible.
Subsequent sections consider more sophisticated procedures that are the
basis for modern satisfiability solvers.

1.7.2 Reconsidering the Truth-Table Method

In the naive decision procedure based on the truth-table method, the entire
table is constructed. Actually, only one row need be considered at a time, mak-
ing for a space efficient procedure. This idea is implemented in the following
recursive algorithm for deciding the satisfiability of a PL formula F :

let rec sat F =


if F = ⊤ then true
else if F = ⊥ then false
else
let P = choose vars(F ) in
(sat F {P 7→ ⊤}) ∨ (sat F {P 7→ ⊥})

The notation “let rec sat F =” declares sat as a recursive function that
takes one argument, a formula F . The notation “let P = choose vars(F ) in”
means that P ’s value in the subsequent text is the variable returned by the
choose function. When applying the substitutions F {P 7→ ⊤} or F {P 7→ ⊥},
the template equivalences of Exercise 1.2 should be applied to simplify the
result. Then the comparisons F = ⊤ and F = ⊥ can be implemented as
purely syntactic operations.
At each recursive step, if F is not yet ⊤ or ⊥, a variable is chosen on which
to branch. Each possibility for P is attempted if necessary. This algorithm
returns true immediately upon finding a satisfying interpretation. Otherwise,
if F is unsatisfiable, it eventually returns ⊥. sat may save branching on certain
variables by simplifying intermediate formulae.
Example 1.23. Consider the formula

F : (P → Q) ∧ P ∧ ¬Q .

To compute sat F , choose a variable, say P , and recurse on the first case,

F {P 7→ ⊤} : (⊤ → Q) ∧ ⊤ ∧ ¬Q ,

which simplifies to
1.7 Decision Procedures for Satisfiability 23

F
P 7→ ⊤ P 7→ ⊥ F
F1 : Q ∧ ¬Q ⊥ P 7→ ⊤ P 7→ ⊥
Q 7→ ⊤ Q 7→ ⊥ ⊥ ⊤
⊥ ⊥
(a) (b)

Fig. 1.1. Visualizing runs of sat

F1 : Q ∧ ¬Q .

Now try each of

F1 {Q 7→ ⊤} and F1 {Q 7→ ⊥} .

Both simplify to ⊥, so this branch ends without finding a satisfying interpre-


tation.
Now try the other branch for P in F :

F {P 7→ ⊥} : (⊥ → Q) ∧ ⊥ ∧ ¬Q ,

which simplifies to ⊥. Thus, this branch also ends without finding a satisfying
interpretation. Thus, F is unsatisfiable.
The run of sat on F is visualized in Figure 1.1(a). 

Example 1.24. Consider the formula

F : (P → Q) ∧ ¬P .

To compute sat F , choose a variable, say P , and recurse on the first case,

F {P 7→ ⊤} : (⊤ → Q) ∧ ¬⊤ ,

which simplifies to ⊥. Therefore, try

F {P 7→ ⊥} : (⊥ → Q) ∧ ¬⊥

instead, which simplifies to ⊤. Arbitrarily assigning a value to Q produces the


following satisfying interpretation:

I : {P 7→ false, Q 7→ true} .

The run of sat on F is visualized in Figure 1.1(b). 


24 1 Propositional Logic

→ Rep(F )

Rep(P ∨ Q) ∨ ¬ Rep(¬(P ∧ ¬R))

P Q ∧ Rep(P ∧ ¬R)

P ¬ Rep(¬R)

Fig. 1.2. Parse tree of F : P ∨Q → ¬(P ∧¬R) with representatives for subformulae

1.7.3 Conversion to an Equisatisfiable Formula in CNF

The next two decision procedures operate on PL formulae in CNF. The trans-
formation suggested in Section 1.6 produces an equivalent formula that can be
exponentially larger than the original formula: consider converting a formula
in DNF into CNF. However, to decide the satisfiability of F , we need only
examine a formula F ′ such that F and F ′ are equisatisfiable. F and F ′ are
equisatisfiable when F is satisfiable iff F ′ is satisfiable.
We define a method for converting PL formula F to equisatisfiable PL
formula F ′ in CNF that is at most a constant factor larger than F . The main
idea is to introduce new propositional variables to represent the subformulae
of F . The constructed formula F ′ includes extra clauses that assert that these
new variables are equivalent to the subformulae that they represent.
Figure 1.2 visualizes the idea of the procedure. Each node of the “parse
tree” of F represents a subformula G of F . With each node G is associated a
representative propositional variable Rep(G). In the constructed formula F ′ ,
each representative Rep(G) is asserted to be equivalent to the subformula G
that it represents in such a way that the conjunction of all such assertions is
in CNF. Finally, the representative Rep(F ) of F is asserted to be true.
To obtain a small formula in CNF, each assertion of equivalence between
Rep(G) and G refers at most to the children of G in the parse tree. How is this
possible when a subformula may be arbitrarily large? The main trick is to refer
to the representatives of G’s children rather than the children themselves.
Let the “representative” function Rep : PL → V ∪{⊤, ⊥} map PL formulae
to propositional variables V, ⊤, or ⊥. In the general case, it is intended to
map a formula F to its representative propositional variable PF such that the
truth value of PF is the same as that of F . In other words, PF provides a
compact way of referring to F .
Let the “encoding” function En : PL → PL map PL formulae to PL formu-
lae. En is intended to map a PL formula F to a PL formula F ′ in CNF that
asserts that F ’s representative, PF , is equivalent to F : “Rep(F ) ↔ F ”.
1.7 Decision Procedures for Satisfiability 25

As the base cases for defining Rep and En, define their behavior on ⊤, ⊥,
and propositional variables P :
Rep(⊤) = ⊤ En(⊤) = ⊤
Rep(⊥) = ⊥ En(⊥) = ⊤
Rep(P ) = P En(P ) = ⊤
The representative of ⊤ is ⊤ itself, and the representative of ⊥ is ⊥ itself.
Thus, Rep(⊤) ↔ ⊤ and Rep(⊥) ↔ ⊥ are both trivially valid, so En(⊤) and
En(⊥) are both ⊤. Finally, the representative of a propositional variable P is
P itself; and again, Rep(P ) ↔ P is trivially valid so that En(P ) is ⊤.
For the inductive case, F is a formula other than an atom, so define its
representative as a unique propositional variable PF :

Rep(F ) = PF .

En then asserts the equivalence of F and PF as a CNF formula. On conjunc-


tion, define
En(F1 ∧ F2 ) =
let P = Rep(F1 ∧ F2 ) in
(¬P ∨ Rep(F1 )) ∧ (¬P ∨ Rep(F2 )) ∧ (¬Rep(F1 ) ∨ ¬Rep(F2 ) ∨ P )
The returned formula

(¬P ∨ Rep(F1 )) ∧ (¬P ∨ Rep(F2 )) ∧ (¬Rep(F1 ) ∨ ¬Rep(F2 ) ∨ P )

is in CNF and is equivalent to

Rep(F1 ∧ F2 ) ↔ Rep(F1 ) ∧ Rep(F2 ) .

In detail, the first two clauses


(¬P ∨ Rep(F1 )) ∧ (¬P ∨ Rep(F2 ))

together assert

P → Rep(F1 ) ∧ Rep(F2 )

(since, for example, ¬P ∨ Rep(F1 ) is equivalent to P → Rep(F1 )), while the


final clause asserts

Rep(F1 ) ∧ Rep(F2 ) → P .

Notice the application of Rep to F1 and F2 . As mentioned above, it is the


trick to producing a small CNF formula.
On negation, En(¬F ) returns a formula equivalent to Rep(¬F ) ↔ ¬Rep(F ):
En(¬F ) =
let P = Rep(¬F ) in
(¬P ∨ ¬Rep(F )) ∧ (P ∨ Rep(F ))
26 1 Propositional Logic

En is defined for ∨, →, and ↔ as well:

En(F1 ∨ F2 ) =
let P = Rep(F1 ∨ F2 ) in
(¬P ∨ Rep(F1 ) ∨ Rep(F2 )) ∧ (¬Rep(F1 ) ∨ P ) ∧ (¬Rep(F2 ) ∨ P )

En(F1 → F2 ) =
let P = Rep(F1 → F2 ) in
(¬P ∨ ¬Rep(F1 ) ∨ Rep(F2 )) ∧ (Rep(F1 ) ∨ P ) ∧ (¬Rep(F2 ) ∨ P )
En(F1 ↔ F2 ) =
let P = Rep(F1 ↔ F2 ) in
(¬P ∨ ¬Rep(F1 ) ∨ Rep(F2 )) ∧ (¬P ∨ Rep(F1 ) ∨ ¬Rep(F2 ))
∧ (P ∨ ¬Rep(F1 ) ∨ ¬Rep(F2 )) ∧ (P ∨ Rep(F1 ) ∨ Rep(F2 ))
Having defined En, let us construct the full CNF formula that is equisat-
isfiable to F . If SF is the set of all subformulae of F (including F itself),
then
^
F ′ : Rep(F ) ∧ En(G)
G∈SF

is in CNF and is equisatisfiable to F . The second main conjunct asserts the


equivalences between all subformulae of F and their corresponding represen-
tatives. Rep(F ) asserts that F ’s representative, and thus F itself (according
to the second conjunct), is true.
If F has size n, where each instance of a logical connective or a proposi-
tional variable contributes one unit of size, then F ′ has size at most 30n + 2.
The size of F ′ is thus linear in the size of F . The number of symbols in the
formula returned by En(F1 ↔ F2 ), which incurs the largest expansion, is 29.
Up to one additional conjunction is also required per symbol of F . Finally,
two extra symbols are required for asserting that Rep(F ) is true.

Example 1.25. Consider formula

F : (Q1 ∧ Q2 ) ∨ (R1 ∧ R2 ) ,

which is in DNF. To convert it to CNF, we collect its subformulae

SF : {Q1 , Q2 , Q1 ∧ Q2 , R1 , R2 , R1 ∧ R2 , F }

and compute

En(Q1 ) = ⊤
En(Q2 ) = ⊤
En(Q1 ∧ Q2 ) = (¬P(Q1 ∧Q2 ) ∨ Q1 ) ∧ (¬P(Q1 ∧Q2 ) ∨ Q2 )
∧ (¬Q1 ∨ ¬Q2 ∨ P(Q1 ∧Q2 ) )
1.7 Decision Procedures for Satisfiability 27

En(R1 ) = ⊤
En(R2 ) = ⊤
En(R1 ∧ R2 ) = (¬P(R1 ∧R2 ) ∨ R1 ) ∧ (¬P(R1 ∧R2 ) ∨ R2 )
∧ (¬R1 ∨ ¬R2 ∨ P(R1 ∧R2 ) )
En(F ) = (¬P(F ) ∨ P(Q1 ∧Q2 ) ∨ P(R1 ∧R2 ) )
∧ (¬P(Q1 ∧Q2 ) ∨ P(F ) )
∧ (¬P(R1 ∧R2 ) ∨ P(F ) )
Then
^
F ′ : P(F ) ∧ En(G)
G∈SF

is equisatisfiable to F and is in CNF. 

1.7.4 The Resolution Procedure

The next decision procedure that we consider is based on resolution and


applies only to PL formulae in CNF. Therefore, the procedure of Section
1.7.3 must first be applied to the given PL formula if it is not already in CNF.
Resolution follows from the following observation of any PL formula F in
CNF: to satisfy clauses C1 [P ] and C2 [¬P ] that share variable P but disagree
on its value, either the rest of C1 or the rest of C2 must be satisfied. Why? If
P is true, then a literal other than ¬P in C2 must be satisfied; while if P is
false, then a literal other than P in C1 must be satisfied. Therefore, the clause
C1 [⊥] ∨ C2 [⊥], simplified according to the template equivalences of Exercise
1.2, can be added as a conjunction to F to produce an equivalent formula still
in CNF.
Clausal resolution is stated as the following proof rule:

C1 [P ] C2 [¬P ]
C1 [⊥] ∨ C2 [⊥]

From the two clauses of the premise, deduce the new clause, called the resol-
vent.
If ever ⊥ is deduced via resolution, F must be unsatisfiable since F ∧ ⊥ is
unsatisfiable. Otherwise, if every possible resolution produces a clause that is
already known, then F must be satisfiable.

Example 1.26. The CNF of (P → Q) ∧ P ∧ ¬Q is the following:

F : (¬P ∨ Q) ∧ P ∧ ¬Q .

From resolution
(¬P ∨ Q) P
,
Q
28 1 Propositional Logic

construct

F1 : (¬P ∨ Q) ∧ P ∧ ¬Q ∧ Q .

From resolution
¬Q Q
,

deduce that F , and thus the original formula, is unsatisfiable. 

Example 1.27. Consider the formula

F : (¬P ∨ Q) ∧ ¬Q .

The one possible resolution

(¬P ∨ Q) ¬Q
¬P

yields

F1 : (¬P ∨ Q) ∧ ¬Q ∧ ¬P .

Since no further resolutions are possible, F is satisfiable. Indeed,

I : {P 7→ false, Q 7→ false}

is a satisfying interpretation. A CNF formula that does not contain the clause
⊥ and to which no more resolutions can be applied represents all possible
satisfying interpretations. 

1.7.5 DPLL

Modern satisfiability procedures for propositional logic are based on the Davis-
Putnam-Logemann-Loveland algorithm (DPLL), which combines the space-
efficient procedure of Section 1.7.2 with a restricted form of resolution. We
review in this section the basic algorithm. Much research in the past decade
has advanced the state-of-the-art considerably.
Like the resolution procedure, DPLL operates on PL formulae in CNF.
But again, as the procedure decides satisfiability, we can apply the conversion
procedure of Section 1.7.3 to produce a small equisatisfiable CNF formula.
As in the procedure sat, DPLL attempts to construct an interpretation of
F ; failing to do so, it reports that the given formula is unsatisfiable. Rather
than relying solely on enumerating possibilities, however, DPLL applies a
restricted form of resolution to gain some deductive power. The process of
applying this restricted resolution as much as possible is called Boolean con-
straint propagation (BCP).
1.7 Decision Procedures for Satisfiability 29

BCP is based on unit resolution. Unit resolution operates on two clauses.


One clause, called the unit clause, consists of a single literal ℓ (ℓ = P or
ℓ = ¬P for some propositional variable P ). The second clause contains the
negation of ℓ: C[¬ℓ]. Then unit resolution is the deduction

ℓ C[¬ℓ]
.
C[⊥]

Unlike with full resolution, the literals of the resolvent are a subset of the
literals of the second clause. Hence, the resolvent replaces the second clause.

Example 1.28. In the formula

F : (P ) ∧ (¬P ∨ Q) ∧ (R ∨ ¬Q ∨ S) ,

(P ) is a unit clause. Therefore, applying unit resolution

P (¬P ∨ Q)
Q

produces

F ′ : (Q) ∧ (R ∨ ¬Q ∨ S) .

Applying unit resolution again

Q R ∨ ¬Q ∨ S
R∨S

produces

F ′′ : (R ∨ S) ,

ending this round of BCP. 

The implementation of DPLL is structurally similar to sat, except that it


begins by applying BCP:

let rec dpll F =


let F ′ = bcp F in
if F ′ = ⊤ then true
else if F ′ = ⊥ then false
else
let P = choose vars(F ′ ) in
(dpll F ′ {P 7→ ⊤}) ∨ (dpll F ′ {P 7→ ⊥})

As in sat, intermediate formulae are simplified according to the template


equivalences of Exercise 1.2.
30 1 Propositional Logic

F
Q 7→ ⊤ Q 7→ ⊥
(R) ∧ (¬R) ∧ (P ∨ ¬R) (¬P ∨ R)
R 7→ ⊤
R (¬R)
⊥ ¬P
P 7→ ⊥
⊥ I : {P 7→ false, Q 7→ false, R 7→ true}

Fig. 1.3. Visualization of Example 1.30

One easy optimization is the following: if variable P appears only positively


or only negatively in F , it should not be chosen by choose vars(F ′ ). P appears
only positively when every P -literal is just P ; P appears only negatively when
every P -literal is ¬P . In both cases, F is equisatisfiable to the formula F ′
constructed by removing all clauses containing an instance of P . Therefore,
these clauses do not contribute to BCP. When only such variables remain,
the formula must be satisfiable: a full interpretation can be constructed by
setting each variable’s value based on whether it appears only positively (true)
or only negatively (false).
The values to which propositional variables are set on the path to a solution
can be recorded so that DPLL can return a satisfying interpretation if one
exists, rather than just true.

Example 1.29. Consider the formula

F : (P ) ∧ (¬P ∨ Q) ∧ (R ∨ ¬Q ∨ S) .

On the first level of recursion, dpll recognizes the unit clause (P ) and applies
the BCP steps from Example 1.28, resulting in the formula

F ′′ : R ∨ S .

The unit resolutions correspond to the partial interpretation

{P 7→ true, Q 7→ true} .

Only positively occurring variables remain, so F is satisfiable. In particular,

{P 7→ true, Q 7→ true, R 7→ true, S 7→ true}

is a satisfying interpretation of F .
Branching was not required in this example. 

Example 1.30. Consider the formula

F : (¬P ∨ Q ∨ R) ∧ (¬Q ∨ R) ∧ (¬Q ∨ ¬R) ∧ (P ∨ ¬Q ∨ ¬R) .


1.8 Summary 31

On the first level of recursion, dpll must branch. Branching on Q or R will


result in unit clauses; choose Q.
Then

F {Q 7→ ⊤} : (R) ∧ (¬R) ∧ (P ∨ ¬R) .

The unit resolution


R (¬R)

finishes this branch.


On the other branch,

F {Q 7→ ⊥} : (¬P ∨ R) .

P appears only negatively, and R appears only positively, so the formula is


satisfiable. In particular, F is satisfied by interpretation

I : {P 7→ false, Q 7→ false, R 7→ true} .

This run of dpll is visualized in Figure 1.3. 

1.8 Summary
This chapter introduces propositional logic (PL). It covers:
• Its syntax. How one constructs a PL formula. Propositional variables,
atoms, literals, logical connectives.
• Its semantics. What a PL formula means. Truth values true and false.
Interpretations. Truth-table definition, inductive definition.
• Satisfiability and validity. Whether a PL formula evaluates to true under
any or all interpretations. Duality of satisfiability and validity, truth-table
method, semantic argument method.
• Equivalence and implication. Whether two formulae always evaluate to the
same truth value under every interpretation. Whether under any interpre-
tation, if one formula evaluates to true, the other also evaluates to true.
Reduction to validity.
• Substitution, which is a tool for manipulating formulae and making general
claims. Substitution of equivalent formulae. Valid templates.
• Normal forms. A normal form is a set of syntactically restricted formulae
such that every PL formula is equivalent to some member of the set.
• Decision procedures for satisfiability. Truth-table method, sat, resolution
procedure, dpll. Transformation to equisatisfiable CNF formula.
32 1 Propositional Logic

PL is an important logic with applications in software and hardware de-


sign and analysis, knowledge representation, combinatorial optimization, and
complexity theory, to name a few. Although relatively simple, the Boolean
structure that is central to PL is often a main source of complexity in appli-
cations of the algorithmic reasoning that is the focus of Part II. Exercise 8.1
explores this point in more depth.
Besides being an important logic in its own right, PL serves to introduce
the main concepts that are important throughout the book, in particular
syntax, semantics, and satisfiability and validity. Chapter 2 presents first-
order logic by building on the concepts of this chapter.

Bibliographic Remarks

For a complete and concise presentation of propositional logic, see Smullyan’s


text First-Order Logic [87]. The semantic argument method is similar to
Smullyan’s tableau method.
The DPLL algorithm is based on work by Davis and Putnam, presented
in [26], and by Davis, Logemann, and Loveland, presented in [25].

Exercises
1.1 (PL validity & satisfiability). For each of the following PL formulae,
identify whether it is valid or not. If it is valid, prove it with a truth table or
semantic argument; otherwise, identify a falsifying interpretation. Recall our
conventions for operator precedence and associativity from Section 1.1.
(a) P ∧ Q → P → Q
(b) (P → Q) ∨ P ∧ ¬Q
(c) (P → Q → R) → P → R
(d) (P → Q ∨ R) → P → R
(e) ¬(P ∧ Q) → R → ¬R → Q
(f) P ∧ Q ∨ ¬P ∨ (¬Q → ¬P )
(g) (P → Q → R) → ¬R → ¬Q → ¬P
(h) (¬R → ¬Q → ¬P ) → P → Q → R

1.2 (Template equivalences). Use the truth table or semantic argument


method to prove the following template equivalences.
(a) ⊤ ⇔ ¬⊥
(b) ⊥ ⇔ ¬⊤
(c) ¬¬F ⇔ F
(d) F ∧ ⊤ ⇔ F
(e) F ∧ ⊥ ⇔ ⊥
(f) F ∧ F ⇔ F
Exercises 33

(g) F ∨ ⊤ ⇔ ⊤
(h) F ∨ ⊥ ⇔ F
(i) F ∨ F ⇔ F
(j) F → ⊤ ⇔ ⊤
(k) F → ⊥ ⇔ ¬F
(l) ⊤ → F ⇔ F
(m) ⊥ → F ⇔ ⊤
(n) ⊤ ↔ F ⇔ F
(o) ⊥ ↔ F ⇔ ¬F
(p) ¬(F1 ∧ F2 ) ⇔ ¬F1 ∨ ¬F2
(q) ¬(F1 ∨ F2 ) ⇔ ¬F1 ∧ ¬F2
(r) F1 → F2 ⇔ ¬F1 ∨ F2
(s) F1 → F2 ⇔ ¬F2 → ¬F1
(t) ¬(F1 → F2 ) ⇔ F1 ∧ ¬F2
(u) (F1 ∨ F2 ) ∧ F3 ⇔ (F1 ∧ F3 ) ∨ (F2 ∧ F3 )
(v) (F1 ∧ F2 ) ∨ F3 ⇔ (F1 ∨ F3 ) ∧ (F2 ∨ F3 )
(w) (F1 → F3 ) ∧ (F2 → F3 ) ⇔ F1 ∨ F2 → F3
(x) (F1 → F2 ) ∧ (F1 → F3 ) ⇔ F1 → F2 ∧ F3
(y) F1 → F2 → F3 ⇔ F1 ∧ F2 → F3
(z) (F1 ↔ F2 ) ∧ (F2 ↔ F3 ) ⇒ (F1 ↔ F3 )

1.3 (Redundant logical connectives). Given ⊤, ∧, and ¬, prove that ⊥,


∨, →, and ↔ are redundant logical connectives. That is, show that each of ⊥,
F1 ∨ F2 , F1 → F2 , and F1 ↔ F2 is equivalent to a formula that uses only F1 ,
F2 , ⊤, ∨, and ¬.

1.4 (The nand connective). Let the logical connective ∧ (pronounced


“nand”) be defined according to the following truth table:

F1 F2 F1 ∧F2
0 0 1
0 1 1
1 0 1
1 1 0

Show that all standard logical connectives can be defined in terms of ∧.

1.5 (Normal forms). Convert the following PL formulae to NNF, DNF, and
CNF via the transformations of Section 1.6.
(a) ¬(P → Q)
(b) ¬(¬(P ∧ Q) → ¬R)
(c) (Q ∧ R → (P ∨ ¬Q)) ∧ (P ∨ R)
(d) ¬(Q → R) ∧ P ∧ (Q ∨ ¬(P ∧ R))
34 1 Propositional Logic

1.6 (Graph coloring). A solution to a graph coloring problem is an as-


signment of colors to vertices such that no two adjacent vertices have the same
color. Formally, a finite graph G = hV, Ei consists of vertices V = {v1 , . . . , vn }
and edges E = {hvi1 , wi1 i, . . . , hvik , wik i}. The finite set of colors is given by
C = {c1 , . . . , cm }. A problem instance is given by a graph and a set of colors:
the problem is to assign each vertex v ∈ V a color(v) ∈ C such that for every
edge hv, wi ∈ E, color(v) 6= color(w). Clearly, not all instances have solutions.
Show how to encode an instance of a graph coloring problem into a PL
formula F . F should be satisfiable iff a graph coloring exists.
(a) Describe a set of constraints in PL asserting that every vertex is colored.
Since the sets of vertices, edges, and colors are all finite, use notation such
as “color(v) = c” to indicate that vertex v has color c. Realize that such
an assertion is encodeable as a single propositional variable Pvc .
(b) Describe a set of constraints in PL asserting that every vertex has at most
one color.
(c) Describe a set of constraints in PL asserting that no two connected vertices
have the same color.
(d) Identify a significant optimization in this encoding. Hint: Can any con-
straints be dropped? Why?
(e) If the constraints are not already in CNF, specify them in CNF now. For
N vertices, K edges, and M colors, how many variables does the optimized
encoding require? How many clauses?
1.7 (CNF). Example 1.25 constructs a CNF formula that is equisatisfiable
to a given small formula in DNF.
(a) If distribution of disjunction over conjunction (described in Section 1.6)
were used, how many clauses would the resulting formula have?
(b) Consider the formulae
n
_
Fn : (Qi ∧ Ri )
i=1

for positive integers n. As a function of n, how many clauses are in


(i) the formula F ′ constructed based on distribution of disjunction over
conjunction?
(ii) the formula
^
F ′ : Rep(Fn ) ∧ En(G) ?
G∈SFn

(iii) For which n is the distribution approach better?


1.8 (DPLL). Describe the execution of DPLL on the following formulae.
(a) (P ∨ ¬Q ∨ ¬R) ∧ (Q ∨ ¬P ∨ R) ∧ (R ∨ ¬Q)
(b) (P ∨ Q ∨ R) ∧ (¬P ∨ ¬Q ∨ ¬R) ∧ (¬P ∨ Q ∨ R) ∧ (¬Q ∨ R) ∧ (Q ∨ ¬R)
2
First-Order Logic

One task we logicians are interested in is that of analyzing the notion


of “proof ” — to make it as rigorous as any other notion in mathe-
matics.
— Raymond Smullyan
The Lady or the Tiger?, 1982
This chapter extends the machinery of propositional logic to first-order
logic (FOL), also called both predicate logic and the first-order predicate
calculus. While first-order logic enjoys a degree of expressiveness that makes
it suitable for reasoning about computation, it does not admit completely
automated reasoning.
FOL extends PL with predicates, functions, and quantifiers. As in our
discussion of PL, we first introduce the syntax of FOL and its semantics. We
then build on the semantic argument method of PL to provide a method of
proving first-order validity.
Section 2.6 reviews decidability and complexity. A decidable problem has
an algorithm, which is a procedure that always finishes with a correct answer
on every instance of the problem. While validity of PL formulae is decid-
able, validity of FOL formulae is not. Complexity is the study of the intrinsic
hardness of a decidable problem.
The optional Section 2.7 proves the soundness and completeness of the
semantic argument method. It also presents two classic theorems that are
applied in Chapter 10.

2.1 Syntax
All formulae of PL evaluate to true or false. FOL is not so simple. In FOL,
terms evaluate to values other than truth values such as integers, people, or
cards of a deck. However, we are getting ahead of ourselves: just as in PL,
36 2 First-Order Logic

the syntax of FOL is independent of its meaning. The most basic terms are
variables x, y, z, x1 , x2 , . . . and constants a, b, c, a1 , a2 , . . ..
More complicated terms are constructed using functions. An n-ary func-
tion f takes n terms as arguments. Notationally, we represent generic FOL
functions by symbols f , g, h, f1 , f2 , . . .. A constant can also be viewed as a
0-ary function.

Example 2.1. The following are all terms:


• a, a constant (or 0-ary function);
• x, a variable;
• f (a), a unary function f applied to a constant;
• g(x, b), a binary function g applied to a variable x and a constant b;
• f (g(x, f (b))).


The propositional variables of PL are generalized to predicates, denoted


p, q, r, p1 , p2 , . . .. An n-ary predicate takes n terms as arguments. A FOL
propositional variable is a 0-ary predicate, which we write P , Q, R, P1 ,
P2 , . . ..
Countably infinitely many constant, function, and predicate symbols are
available.
An atom is ⊤, ⊥, or an n-ary predicate applied to n terms. A literal is
an atom or its negation.

Example 2.2. The following are all literals:


• P , a propositional variable (or 0-ary predicate);
• p(f (x), g(x, f (x))), a binary predicate applied to two terms;
• ¬p(f (x), g(x, f (x))).
The first two literals are also atoms. 

A FOL formula is a literal, the application of a logical connective ¬, ∧,


∨, →, or ↔ to a formula or formulae, or the application of a quantifier to a
formula. There are two FOL quantifiers:
• the existential quantifier ∃x. F [x], read “there exists an x such that
F [x]”;
• and the universal quantifier ∀x. F [x], read “for all x, F [x]”.
In ∀x. F [x], x is the quantified variable, and F [x] is the scope of the
quantifier ∀x. For convenience, we sometimes refer informally to the scope
of the quantifier ∀x as the scope of the quantified variable x itself. The case
is similar for ∃x. F [x]. Also, x in F [x] is bound (by the quantifier). By
convention, the period in “. F [x]” indicates that the scope of the quantified
variable x extends as far as possible. We often abbreviate ∀x. ∀y. F [x, y] by
∀x, y. F [x, y].
2.1 Syntax 37

Example 2.3. In

∀x. p(f (x), x) → (∃y. p(f (g(x, y)), g(x, y))) ∧ q(x, f (x)) ,
| {z }
G
| {z }
F

the scope of x is F , and the scope of y is G. This formula is read: “for all x, if
p(f (x), x) then there exists a y such that p(f (g(x, y)), g(x, y)) and q(x, f (x))”.


A variable is free in formula F [x] if there is an occurrence of x that is


not bound by any quantifier. Denote by free(F ) the set of free variables of
a formula F . A variable is bound in formula F [x] if there is an occurrence
of x in the scope of a binding quantifier ∀x or ∃x. Denote by bound(F ) the
set of bound variables of a formula F . In general, it is possible that free(F ) ∩
bound(F ) 6= ∅, as a variable x can have both free and bound occurrences.
A formula F is closed if it does not contain any free variables.

Example 2.4. In

F : ∀x. p(f (x), y) → ∀y. p(f (x), y) ,

x only occurs bound, while y appears both free (in the antecedent) and bound
(in the consequent). Thus, free(F ) = {y} and bound(F ) = {x, y}. 

If free(F ) = {x1 , . . . , xn }, then its universal closure is

∀x1 . . . . ∀xn . F ,

and its existential closure is

∃x1 . . . . ∃xn . F .

We usually write the universal and existential closures as ∀ ∗ . F and ∃ ∗ . F ,


respectively.
The subformulae of a FOL formula are defined according to an extension
of the PL definition of subformula:
• the only subformula of p(t1 , . . . , tn ), where the ti are terms, is p(t1 , . . . , tn );
• the subformulae of ¬F are ¬F and the subformulae of F ;
• the subformulae of F1 ∧ F2 , F1 ∨ F2 , F1 → F2 , F1 ↔ F2 are the formula
itself and the subformulae of F1 and F2 ;
• and the subformulae of ∃x. F and ∀x. F are the formula itself and the
subformulae of F .
The strict subformulae of a formula excludes the formula itself.
The subterms of a FOL term are defined as follows:
• the only subterm of constant a or variable x is a or x itself, respectively;
38 2 First-Order Logic

• and the subterms of f (t1 , . . . , tn ) are the term itself and the subterms of
t1 , . . . , tn .
The strict subterms of a term excludes the term itself.
Example 2.5. In

F : ∀x. p(f (x), y) → ∀y. p(f (x), y) ,

the subformulae of F are

F , p(f (x), y) → ∀y. p(f (x), y) , ∀y. p(f (x), y) , p(f (x), y) .

The subterms of g(f (x), f (h(f (x)))) are

g(f (x), f (h(f (x)))) , f (x) , f (h(f (x))) , h(f (x)) , x .

f (x) occurs twice in g(f (x), f (h(f (x)))). 


Example 2.6. Before discussing the formal semantics for FOL, we suggest
translations of English sentences into FOL. The names of the constants, func-
tions, and predicates are chosen to provide some intuition for the meaning of
the FOL formulae.
• Every dog has its day.

∀x. dog (x) → ∃y. day(y) ∧ itsDay(x, y)

• Some dogs have more days than others.

∃x, y. dog(x) ∧ dog(y) ∧ #days(x) > #days(y)

• All cats have more days than dogs.

∀x, y. dog(x) ∧ cat (y) → #days(y) > #days (x)

• Fido is a dog. Furrball is a cat. Fido has fewer days than does Furrball.

dog(F ido) ∧ cat (F urrball) ∧ #days(F ido) < #days(F urrball)

• The length of one side of a triangle is less than the sum of the lengths of
the other two sides.

∀x, y, z. triangle(x, y, z) → length(x) < length(y) + length(z)

• Fermat’s Last Theorem.


∀n. integer(n) ∧ n > 2
→ ∀x, y, z.
integer (x) ∧ integer (y) ∧ integer (z) ∧ x > 0 ∧ y > 0 ∧ z > 0
→ xn + y n 6= z n

2.2 Semantics 39

2.2 Semantics
Having defined the syntax of FOL, we now define its semantics. Formulae
of FOL evaluate to the truth values true and false as in PL. However, terms
of FOL formulae evaluate to values from a specified domain. We extend the
concept of interpretations to this more complex setting and then define the
semantics of FOL in terms of interpretations.
First, we define a FOL interpretation I. The domain DI of an interpre-
tation I is a nonempty set of values or objects, such as integers, real numbers,
dogs, people, or merely abstract objects. |DI | denotes the cardinality, or
size, of DI . Domains can be finite, such as the 52 cards of a deck of cards;
countably infinite, such as the integers; or uncountably infinite, such as the
reals. But all domains are nonempty.
The assignment αI of interpretation I maps constant, function, and pred-
icate symbols to elements, functions, and predicates over DI . It also maps
variables to elements of DI :
• each variable symbol x is assigned a value xI from DI ;
• each n-ary function symbol f is assigned an n-ary function

fI : DIn → DI

that maps n elements of DI to an element of DI ;


• each n-ary predicate symbol p is assigned an n-ary predicate

pI : DIn → {true, false}

that maps n elements of DI to a truth value.


In particular, each constant (0-ary function symbol) is assigned a value from
DI , and each propositional variable (0-ary predicate symbol) is assigned a
truth value.
An interpretation I : (DI , αI ) is thus a pair consisting of a domain and an
assignment.
Example 2.7. The formula

F : x+y >z → y >z−x

contains the binary function symbols + and −, the binary predicate symbol >,
and the variables x, y, and z. Again, +, −, and > are just symbols: we choose
these names to provide intuition for the intended meaning of the formulae.
We could just as easily have written

F ′ : p(f (x, y), z) → p(y, g(z, x)) .

We construct a “standard” interpretation. The domain is the integers, Z:

DI = Z = {. . . , −2, −1, 0, 1, 2, . . .} .
40 2 First-Order Logic

To + and − we assign standard addition +Z and subtraction −Z of integers,


respectively. To > we assign the standard greater-than relation >Z of integers.
Finally, to x, y, and z, we assign the values 13, 42, and 1, respectively. We
ignore the countably infinitely many other constant, function, and predicate
symbols that do not appear in F . We thus have interpretation I : (Z, αI ),
where

αI : {+ 7→ +Z , − 7→ −Z , >7→>Z , x 7→ 13, y 7→ 42, z 7→ 1, . . .} .

The elision reminds us that, as always, αI provides values for the countably
infinitely many other constant, function, and predicate symbols. Usually, we
do not write the elision. 

Given a FOL formula F and interpretation I : (DI , αI ), we want to com-


pute if F evaluates to true under interpretation I, I |= F , or if F evaluates to
false under interpretation I, I 6|= F . We define the semantics inductively as in
PL. To start, define the meaning of truth symbols:

I |= ⊤
I 6|= ⊥

Next, consider more complicated atoms. αI gives meaning αI [x], αI [c], and
αI [f ] to variables x, constants c, and functions f . Evaluate arbitrary terms
recursively:

αI [f (t1 , . . . , tn )] = αI [f ](αI [t1 ], . . . , αI [tn ]) ,

for terms t1 , . . . , tn . That is, define the value of f (t1 , . . . , tn ) under αI by


evaluating the function αI [f ] over the terms αI [t1 ], . . . , αI [tn ]. Similarly,
evaluate arbitrary atoms recursively:

αI [p(t1 , . . . , tn )] = αI [p](αI [t1 ], . . . , αI [tn ]) .

Then

I |= p(t1 , . . . , tn ) iff αI [p(t1 , . . . , tn )] = true

Having completed the base cases of our inductive definition, we turn to


the inductive step. Assume that formulae F1 and F2 have fixed truth values.
From these formulae, evaluate the semantics of more complex formulae. The
logical connectives are handled in FOL in precisely the same way as in PL:

I |= ¬F iff I 6|= F
I |= F1 ∧ F2 iff I |= F1 and I |= F2
I |= F1 ∨ F2 iff I |= F1 or I |= F2
I |= F1 → F2 iff, if I |= F1 then I |= F2
I |= F1 ↔ F2 iff I |= F1 and I |= F2 , or I 6|= F1 and I 6|= F2
2.2 Semantics 41

Example 2.8. Recall the formula

F : x+y >z → y >z−x

of Example 2.7 and the interpretation I : (Z, αI ), where

αI : {+ 7→ +Z , − 7→ −Z , >7→>Z , x 7→ 13Z , y 7→ 42Z , z 7→ 1Z } .

Compute the truth value of F under I as follows:

1. I |= x + y > z since αI [x + y > z] = 13Z +Z 42 >Z 1Z


2. I |= y > z − x since αI [y > z − x] = 42Z >Z 1Z −Z 13Z
3. I |= F by 1, 2, and the semantics of →

For the quantifiers, let x be a variable. Define an x-variant of an inter-


pretation I : (DI , αI ) as an interpretation J : (DJ , αJ ) such that
• DI = DJ ;
• and αI [y] = αJ [y] for all constant, free variable, function, and predicate
symbols y, except possibly x.
That is, I and J agree on everything except possibly the value of variable x.
Denote by J : I ⊳ {x 7→ v} the x-variant of I in which αJ [x] = v for some
v ∈ DI . Then

I |= ∀x. F iff for all v ∈ DI , I ⊳ {x 7→ v} |= F


I |= ∃x. F iff there exists v ∈ DI such that I ⊳ {x 7→ v} |= F

In words, I is an interpretation of ∀x. F iff all x-variants of I are interpre-


tations of F . I is an interpretation of ∃x. F iff some x-variant of I is an
interpretation of F .

Example 2.9. Consider the formula

F : ∃x. f (x) = g(x)

and the interpretation I : (D : {◦, •}, αI ) in which

αI : {f (◦) 7→ ◦, f (•) 7→ •, g(◦) 7→ •, g(•) 7→ ◦} .

Compute the truth value of F under I as follows:

1. I ⊳ {x 7→ v} |6 = f (x) = g(x) for v ∈ D


2. I |6 = ∃x. f (x) = g(x) since v ∈ D is arbitrary

In the first line, basic reasoning about the interpretation I reveals that f and
g always disagree. The second line follows from the first by the semantics of
existential quantification. 
42 2 First-Order Logic

2.3 Satisfiability and Validity


A formula F is said to be satisfiable iff there exists an interpretation I
such that I |= F . A formula F is said to be valid iff for all interpretations
I, I |= F . Determining satisfiability and validity of formulae are important
tasks in FOL. Recall that satisfiability and validity are dual: F is valid iff ¬F
is unsatisfiable.
Technically, satisfiability and validity only apply to closed FOL formulae,
which do not have free variables. However, let us agree on a convention: if we
say that a formula F such that free(F ) 6= ∅ is valid, we mean that its universal
closure ∀ ∗ . F is valid; and if we say that it is satisfiable, we mean that its
existential closure ∃ ∗ . F is satisfiable. Duality still holds: a formula F with
free variables is valid (∀ ∗ . F is valid) iff its negation is unsatisfiable (∃ ∗ . ¬F
is unsatisfiable). Henceforth, we freely discuss the validity and satisfiability of
formulae with free variables.
For arguing the validity of FOL formulae, we extend the semantic argu-
ment method from PL to FOL. Most of the concepts carry over to the FOL
case without change. In addition to the rules for the logical connectives of PL
(see Section 1.3), we have the following rules for the quantifiers.
• According to the semantics of universal quantification, from I |= ∀x. F ,
deduce I ⊳ {x 7→ v} |= F for any v ∈ DI .
I |= ∀x. F for any v ∈ DI
I ⊳ {x 7→ v} |= F
In practice, we usually apply this rule using a domain element v that was
introduced earlier in the proof.
• Similarly, from the semantics of existential quantification, from I 6|= ∃x. F ,
deduce I ⊳ {x 7→ v} 6|= F for any v ∈ DI .
I 6|= ∃x. F for any v ∈ DI
I ⊳ {x 7→ v} 6|= F
Again, we usually apply this rule using a domain element v that was in-
troduced earlier in the proof.
• According to the semantics of existential quantification, from I |= ∃x. F ,
deduce I ⊳ {x 7→ v} |= F for some v ∈ DI that has not been previously
used in the proof.
I |= ∃x. F for a fresh v ∈ DI
I ⊳ {x 7→ v} |= F
• Similarly, from the semantics of universal quantification, from I 6|= ∀x. F ,
deduce I ⊳ {x 7→ v} 6|= F for some v ∈ DI that has not been previously
used in the proof.
I 6|= ∀x. F for a fresh v ∈ DI
I ⊳ {x 7→ v} 6|= F
2.3 Satisfiability and Validity 43

The restriction in the latter two rules corresponds to our intuition: if all we
know is that ∃x. F , then we certainly do not know which value in particular
satisfies F . Hence, we choose a new value v that does not appear previously in
the proof: it was never introduced before by a quantification rule. Moreover,
αI does not already assign it to some constant, αI [a], or to some function
application, αI [f (t1 , . . . , tn )].
Notice the similarity between the first two and between the final two rules.
The first two rules handle a case that is universal in character. Consider the
second rule: if there does not exist an x such that F , then for all values, F
does not hold. The final two rules are existential in character.
Lastly, the contradiction rule is modified for the FOL case.
• A contradiction exists if two variants of the original interpretation I dis-
agree on the truth value of an n-ary predicate p for a given tuple of domain
values.
J : I ⊳ · · · |= p(s1 , . . . , sn )
K : I ⊳ · · · 6|= p(t1 , . . . , tn ) for i ∈ {1, . . . , n}, αJ [si ] = αK [ti ]
I |= ⊥

The intuition behind the contradiction rule is the following. The variants J
and K are constructed only through the rules for quantification. Hence, the
truth value of p on the given tuple of domain values is already established
by I. Therefore, the disagreement between J and K on the truth value of p
indicates a problem with I.
None of these rules cause branching, but several of the rules for the logical
connectives do. Thus, a proof in general is a tree. A branch is closed if it
contains a contradiction according to the (first-order) contradiction rule; it is
open otherwise. All branches are closed in a finished proof of a valid formula.
We exhibit the proof method through several examples.
Example 2.10. We prove that

F : (∀x. p(x)) → (∀y. p(y))

is valid. Suppose not; then there is an interpretation I such that I 6|= F :

1. I 6|= F assumption
2. I |= ∀x. p(x) 1 and semantics of →
3. I 6|= ∀y. p(y) 1 and semantics of →
4. I ⊳ {y 7→ v} 6|= p(y) 3 and semantics of ∀, for some v ∈ DI
5. I ⊳ {x 7→ v} |= p(x) 2 and semantics of ∀

Lines 2 and 3 state the case in which line 1 holds: the antecedent and conse-
quent of F are respectively true and false under I. Line 4 states that because
of 3, there must be a value v ∈ DI such that I ⊳ {y 7→ v} 6|= p(y). Line 5 uses
this same value v and the semantics of ∀ with 2 to derive a contradiction:
under I, p(v) is false by 4 and true by 5. Thus, F is valid. 
44 2 First-Order Logic

For concision we shorten, for example, “semantics of ∀” to “∀” in the


explanation column of our arguments.
Example 2.11. Consider the following relation between universal and exis-
tential quantification:
F : (∀x. p(x)) ↔ (¬∃x. ¬p(x)) .
Is it valid? Suppose not. Then there is an interpretation I such that I 6|= F .
Consider the forward (→) and backward (←) directions of ↔ as separate
cases. In the first case,
1. I |= ∀x. p(x) assumption
2. I 6|= ¬∃x. ¬p(x) assumption
3. I |= ∃x. ¬p(x) 2 and ¬
4. I ⊳ {x 7→ v} |= ¬p(x) 3 and ∃, for some v ∈ DI
5. I ⊳ {x 7→ v} |= p(x) 1 and ∀
Lines 4 and 5 are contradictory. In line 5, we use the value introduced in line
4 with the semantics of ∀ and line 1. We are allowed to choose this same value
v precisely because line 1 states that for all x, p(x).
For the second case,
1. I 6|= ∀x. p(x) assumption
2. I |= ¬∃x. ¬p(x) assumption
3. I ⊳ {x 7→ v} 6|= p(x) 1 and ∀, for some v ∈ DI
4. I 6|= ∃x. ¬p(x) 2 and ¬
5. I ⊳ {x 7→ v} 6|= ¬p(x) 4 and ∃
6. I ⊳ {x 7→ v} |= p(x) 5 and ¬
Lines 3 and 6 are contradictory. Line 4 says that ∃x. ¬p(x) is false under I.
Thus, by the semantics of ∃, no value w from DI is such that p(w) is true. In
particular, line 5 identifies v, introduced in line 3.
As both cases end in contradictions for arbitrary interpretation I, F is
valid. 
It is sometimes useful to reference known values, as the following simple
example illustrates.
Example 2.12. To prove that
F : p(a) → ∃x. p(x)
is valid, assume otherwise and derive a contradiction.
1. I 6|= F assumption
2. I |= p(a) 1 and →
3. I 6|= ∃x. p(x) 1 and →
4. I ⊳ {x 7→ αI [a]} 6|= p(x) 3 and ∃
5. I |= ⊥ 2, 4
2.4 Substitution 45

In line 4, we used the value assigned to a to instantiate the quantified variable


of line 3, which has universal character. Because lines 2 and 4 are contradic-
tory, F is valid. 
To show that a formula F is invalid, it suffices to find an interpretation I
such that I |= ¬F .

Example 2.13. Consider the formula

F : (∀x. p(x, x)) → (∃x. ∀y. p(x, y)) .

To show that it is invalid, we find an interpretation I such that

I |= ¬((∀x. p(x, x)) → (∃x. ∀y. p(x, y))) ,

or, according to the semantics of →,

I |= (∀x. p(x, x)) ∧ ¬(∃x. ∀y. p(x, y)) .

Choose

DI = {0, 1}

and

pI = {(0, 0), (1, 1)} .

We use a common notation for defining relations: pI (a, b) is true iff (a, b) ∈ pI .
Here, pI (0, 0) is true, and pI (1, 0) is false.
Both ∀x. p(x, x) and ¬(∃x. ∀y. p(x, y)) evaluate to true under I, so

I |= (∀x. p(x, x)) ∧ ¬(∃x. ∀y. p(x, y)) ,

which shows that F is invalid. Interpretation I is a falsifying interpretation


of F . 

We apply the semantic argument method to more examples in Section 3.1.


Equivalence (F1 ⇔ F2 ) and implication (F1 ⇒ F2 ) extend directly from
PL to FOL. Equivalence of and implication between two formulae can be
argued using the semantic argument method. See, for example, Example 2.11.

2.4 Substitution
Substitution for FOL is more complex than substitution for PL because of
quantification. We introduce two types of substitution in this section with
the goal of generalizing Propositions 1.15 and 1.17 to the FOL setting. As in
PL, substitution allows us to consider the validity of entire sets of formulae
simultaneously.
46 2 First-Order Logic

First, we define the renaming of a quantified variable. If variable x is


quantified in F so that F has the form F [∀x. G[x]], then the renaming of x to
fresh variable x′ produces the formula F [∀x′ . G[x′ ]]. A “fresh variable” is any
variable that does not occur in F . By the semantics of universal quantification,
the original and final formulae are equivalent. The case is similar for existential
quantification. Often, we simply say that a variable is renamed to mean that
its bound occurrences are renamed to a fresh variable. Free occurrences of
variables are never renamed.
Example 2.14. Renaming the bound variable x to fresh variable x′ in
F : p(x) ∧ ∀x. q(x, y)
produces
F ′ : p(x) ∧ ∀x′ . q(x′ , y) .
Renaming y does not cause any change because y does not occur bound. 
A substitution is a map from FOL formulae to FOL formulae:
σ : {F1 7→ G1 , . . . , Fn 7→ Gn } .
As in PL, domain(σ) = {F1 , . . . , Fn } and range(σ) = {G1 , . . . , Gn }. To com-
pute the application of σ to F , F σ, replace each occurrence of Fi in F by Gi
simultaneously. When both subformulae Fj and Fk are in the domain of σ
and Fk is a strict subformula of Fj , replace occurrences of Fj by Gj .
Example 2.15. Consider formula
F : (∀x. p(x, y)) → q(f (y), x)
and substitution
σ : {x 7→ g(x), y 7→ f (x), q(f (y), x) 7→ ∃x. h(x, y)} .
Then
F σ : (∀x. p(g(x), f (x))) → ∃x. h(x, y) .
Notice how there are more bound occurrences of x in F σ than in F . 
Use care when substitutions include quantifiers in the domain. Substituting
for a quantified subformula ∀x. F requires that all of ∀x. F be replaced.
Example 2.16. Consider formula
F : ∃y. p(x, y) ∧ p(y, x)
and substitution
σ : {∃y. p(x, y) 7→ p(x, a)} ,
where a is a constant. F σ = F because the scope of the quantifier ∃y in F is
p(x, y) ∧ p(y, x), not just p(x, y). 
2.4 Substitution 47

2.4.1 Safe Substitution

A restricted application of substitution has a useful semantic property.


Define for a substitution σ its set of free variables:
[
Vσ = (free(Fi ) ∪ free(Gi )) .
i

Vσ consists of the free variables of all formulae Fi and Gi of the domain and
range of σ. Compute the safe substitution F σ of formula F as follows:
1. For each quantified variable x in F such that x ∈ Vσ , rename x to a fresh
variable to produce F ′ .
2. Compute F ′ σ.

Example 2.17. Consider again formula

F : (∀x. p(x, y)) → q(f (y), x)

and substitution

σ : {x 7→ g(x), y 7→ f (x), q(f (y), x) 7→ ∃x. h(x, y)} .

To compute the safe substitution F σ, first compute

Vσ = free(x) ∪ free(g(x)) ∪ free(y) ∪ free(f (x))


∪ free(q(f (y), x)) ∪ free(∃x.h(x, y))
= {x} ∪ {x} ∪ {y} ∪ {x} ∪ {x, y} ∪ {y}
= {x, y}

Then
1. x ∈ Vσ , so rename bound occurrences in F :

F ′ : (∀x′ . p(x′ , y)) → q(f (y), x) .

x also occurs free in F .


2. F ′ σ : (∀x′ . p(x′ , f (x))) → ∃x. h(x, y).


The first step of computing a safe substitution becomes trivial if each


quantified variable has a unique name.

Example 2.18. Consider formula

F : (∀z. p(z, y)) → q(f (y), x) ,

in which the quantified variable has a different name than any free variable
of F or the substitution
48 2 First-Order Logic

σ : {x 7→ g(x), y 7→ f (y), q(f (y), x) 7→ ∃w. h(w, y)} .

Compared to Example 2.17, the quantified variable z in F and the quantified


variable w of σ have different names than any other variable of F or σ. The
safe substitution is the unrestricted substitution

F σ : (∀z. p(z, f (y))) → ∃w. h(w, y) .

Proposition 2.19 (Substitution of Equivalent Formulae). Consider


substitution

σ : {F1 7→ G1 , . . . , Fn 7→ Gn }

such that for each i, Fi ⇔ Gi . Then F ⇔ F σ when F σ is computed as a safe


substitution.

The language of Propositions 2.19 and 1.15 are almost identical.

2.4.2 Schema Substitution

Example 2.11 proves an interesting relation between universal and existential


quantification (∀x. p(x) ⇔ ¬∃x. ¬p(x)), but the result is not general. We
would like to prove that for any FOL formula F ,

H : (∀x. F ) ↔ (¬∃x. ¬F )

is valid. H is a formula schema (plural: schemata). Formula schema and


schema substitutions provide the desired generality.
A formula schema H contains at least one placeholder F1 , F2 , . . .. For
example, F is a placeholder in the formula schema H above. A formula schema
can also have side conditions that specify that certain variables do not occur
free in the placeholders.
Consider a substitution σ mapping placeholders to FOL formulae. A
schema substitution is an (unrestricted) application of σ to a formula
schema. It is legal only if σ obeys the side conditions of the formula schema.

Example 2.20. Recall from Example 2.11 that

(∀x. p(x)) ↔ (¬∃x. ¬p(x))

is valid. It can act as a formula schema. First, rewrite the formula using
placeholders:

H : (∀x. F ) ↔ (¬∃x. ¬F ) .

H does not have any side conditions. Next, to prove the validity of
2.4 Substitution 49

G : (∀x. ∃y. q(x, y)) ↔ (¬∃x. ¬∃y. q(x, y)) ,

show that G is derivable from H via a schema substitution:

σ : {F 7→ ∃y. q(x, y)} .

Then Hσ = G (Hσ is syntactically identical to G), so that by Proposition


2.25 below, G is valid. 
A formula schema H is valid if Hσ is valid for every schema substitution
σ that obeys the side conditions of H. Apply the semantic method to prove
the validity of a formula schema.
Example 2.21. To prove the validity of the formula schema

H : (∀x. F1 ∧ F2 ) ↔ (∀x. F1 ) ∧ (∀x. F2 ) ,

consider the two directions. First, assume that I 6|= (∀x. F1 ∧F2 ) → (∀x. F1 ) ∧
(∀x. F2 ):

1. I |= ∀x. F1 ∧ F2 assumption
2. I |6 = (∀x. F1 ) ∧ (∀x. F2 ) assumption
3. I |= (∃x. ¬F1 ) ∨ (∃x. ¬F2 ) 2, ¬

There are two cases to consider:


4a. I |= ∃x. ¬F1 3, ∨
5a. I ⊳ {x 7→ v} |= ¬F1 4a, ∃, for some v ∈ DI
6a. I ⊳ {x 7→ v} |= F1 ∧ F2 1, ∀
7a. I ⊳ {x 7→ v} |= F1 6a, ∧

ending in a contradiction. The second disjunctive case is similar.


For the second main case, assume that I 6|= (∀x. F1 ) ∧ (∀x. F2 ) →
(∀x. F1 ∧ F2 ):

1. I 6|= ∀x. F1 ∧ F2 assumption


2. I |= (∀x. F1 ) ∧ (∀x. F2 ) assumption
3. I |= ∃x. ¬F1 ∨ ¬F2 1, ¬
4. I ⊳ {x 7→ v} |= ¬F1 ∨ ¬F2 3, ∃, for some v ∈ DI
5. I |= ∀x. F1 2, ∧
6. I |= ∀x. F2 2, ∧
7. I ⊳ {x 7→ v} |= F1 5, ∀
8. I ⊳ {x 7→ v} |= F2 6, ∀

Again, there are two cases to consider:

9a. I ⊳ {x 7→ v} |= ¬F1 4, ∨

ending in a contradiction. The second disjunctive case is similar. Thus, H is


a valid formula schema. 
50 2 First-Order Logic

We now consider formula schemata with side conditions.

Example 2.22. Consider the formula schema with side condition

H : (∀x. F ) ↔ F provided x 6∈ free(F ) .

If we disregard the side condition, then H is an invalid formula schema as, for
example,

G1 : (∀x. p(x)) ↔ p(x) ,

obtained from H by schema substitution

σ : {F 7→ p(x)} ,

is invalid. However, σ is disallowed by the side condition that x should not


occur free in F .
H (with the side condition) is a valid formula schema. Thus, the formula

G2 : (∀x. ∃y. p(z, y)) ↔ ∃y. p(z, y)

is valid: obtain it via the schema substitution

σ : {F 7→ ∃y. p(z, y)} ,

which obeys H’s side condition. 

Reasoning about the validity of a formula schema with side conditions


usually requires invoking its side conditions during a semantic argument.

Example 2.23. To prove the validity of

H : (∀x. F ) ↔ F provided x 6∈ free(F ) ,

consider the two directions of ↔. First,

1. I |= ∀x. F assumption
2. I 6|= F assumption
3. I |= F 1, ∀, since x 6∈ free(F )
4. I |= ⊥ 2, 3

Second,

1. I 6|= ∀x. F assumption


2. I |= F assumption
3. I |= ∃x. ¬F 1 and ¬
4. I |= ¬F 3, ∃, since x 6∈ free(F )
5. I |= ⊥ 2, 4

Thus, H is a valid formula schema. 


2.5 Normal Forms 51

Example 2.24. To prove the validity of


H : (∀x. F1 ∧ F2 ) ↔ (∀x. F1 ) ∧ F2 provided x 6∈ free(F2 ) ,
consider two cases. First,
1. I |= ∀x. F1 ∧ F2 assumption
2. I 6|= (∀x. F1 ) ∧ F2 assumption
3. I |= (∀x. F1 ) ∧ (∀x. F2 ) 1, valid schema from Example 2.21
4. I |= (∀x. F1 ) ∧ F2 3, Example 2.23, since x 6∈ free(F2 )
5. I |= ⊥ 2, 4
Observe the application of valid formula schemata from previous examples in
lines 3 and 4. Second,
1. I 6|= ∀x. F1 ∧ F2 assumption
2. I |= (∀x. F1 ) ∧ F2 assumption
3. I |= (∀x. F1 ) ∧ (∀x. F2 ) 2, Example 2.23, since x 6∈ free(F2 )
4. I |= ∀x. F1 ∧ F2 3, Example 2.21
5. I |= ⊥ 1, 4
Thus, H is a valid formula schema. 
Proposition 2.25 (Formula Schema). If H is a valid formula schema and
σ is a substitution obeying H’s side conditions, then Hσ is also valid.
The valid PL formula
(P → Q) ↔ (¬P ∨ Q)
can be treated as a valid formula schema:
(F1 → F2 ) ↔ (¬F1 ∨ F2 ) .
In general, valid propositional templates are valid formulae schemata, so that
Proposition 2.25 generalizes Proposition 1.17.

2.5 Normal Forms


The normal forms of PL extend to FOL. A FOL formula F can be trans-
formed into negation normal form (NNF) by using the procedure of Section
1.6 augmented with these two (schema) equivalences:
¬∀x. F [x] ⇔ ∃x. ¬F [x] ¬∃x. F [x] ⇔ ∀x. ¬F [x] .
Example 2.26. We apply the procedure to find a formula in NNF that is
equivalent to
G : ∀x. (∃y. p(x, y) ∧ p(x, z)) → ∃w.p(x, w) .
Each formula below is equivalent to G and is obtained from the previous one
through an application of an equivalence.
52 2 First-Order Logic

1. ∀x. (∃y. p(x, y) ∧ p(x, z)) → ∃w. p(x, w)


2. ∀x. ¬(∃y. p(x, y) ∧ p(x, z)) ∨ ∃w. p(x, w)
3. ∀x. (∀y. ¬(p(x, y) ∧ p(x, z))) ∨ ∃w. p(x, w)
4. ∀x. (∀y. ¬p(x, y) ∨ ¬p(x, z)) ∨ ∃w. p(x, w)
Formula 2 follows from the equivalence

F1 → F2 ⇔ ¬F1 ∨ F2 .

Formula 3 arises from an application of

¬∃x. F [x] ⇔ ∀x. ¬F [x] ,

and the final formula, which is in NNF, follows from De Morgan’s Law. 

A formula is in prenex normal form (PNF) if all of its quantifiers


appear at the beginning of the formula:

Q1 x1 . . . . Qn xn . F [x1 , . . . , xn ] ,

where Qi ∈ {∀, ∃} and F is quantifier-free. Every FOL formula F can be


transformed into an equivalent formula F ′ in PNF. To compute an equivalent
PNF F ′ of F ,
1. Convert F into NNF formula F1 .
2. When multiple quantified variables have the same name, rename them to
fresh variables, resulting in F2 .
3. Remove all quantifiers from F2 to produce quantifier-free formula F3 .
4. Add the quantifiers before F3 ,

F4 : Q1 x1 . . . . Qn xn . F3 ,

where the Qi are the quantifiers such that if Qj is in the scope of Qi in


F1 , then i < j.
F4 is equivalent to F .

Example 2.27. We apply the procedure to find a PNF equivalent of

F : ∀x. ¬(∃y. p(x, y) ∧ p(x, z)) ∨ ∃y. p(x, y) .

1. Write F in NNF:

F1 : ∀x. (∀y. ¬p(x, y) ∨ ¬p(x, z)) ∨ ∃y. p(x, y) .

2. Rename quantified variables:

F2 : ∀x. (∀y. ¬p(x, y) ∨ ¬p(x, z)) ∨ ∃w. p(x, w) .


2.6 Decidability and Complexity 53

3. Remove all quantifiers to produce quantifier-free formula

F3 : ¬p(x, y) ∨ ¬p(x, z) ∨ p(x, w) .

4. Add the quantifiers before F3 :

F4 : ∀x. ∀y. ∃w. ¬p(x, y) ∨ ¬p(x, z) ∨ p(x, w) .

Alternately, choose order

F4′ : ∀x. ∃w. ∀y. ¬p(x, y) ∨ ¬p(x, z) ∨ p(x, w) .

Both F4 and F4′ are equivalent to F . However,

G : ∀y. ∃w. ∀x. ¬p(x, y) ∨ ¬p(x, z) ∨ p(x, w)

is not equivalent to F .

A FOL formula is in CNF (DNF) if it is in PNF and its main quantifier-
free subformula is in CNF (DNF). CNF and DNF equivalents are obtained by
transforming formula F into PNF formula F ′ and then applying the relevant
procedure of Section 1.6 to the main quantifier-free subformula of F ′ .

2.6 Decidability and Complexity


We review the main concepts from decidability and complexity theory. Bibli-
ographic Remarks refers the interested reader to other texts that focus on
these topics.

2.6.1 Satisfiability as a Formal Language

Satisfiability of formulae is the primary decision problem in logic. We can


formalize satisfiability as a formal language decision problem. Consider PL.
Let LPL be the set of all satisfiable formulae. That is, the word w ∈ LPL iff
1. w is a syntactically well-formed formulae: it parses according to the defi-
nition of Section 1.1;
2. and when w is viewed as a PL formula F , F is satisfiable.
Then the formal decision problem is the following: given a word w, is w ∈ LPL ?
Satisfiability of FOL formulae can be similarly formalized as a language
question. Let LFOL be the set of all FOL formulae (words that parse according
to Section 2.1) that are satisfiable. Then the formal decision problem is the
following: given a word w, is w ∈ LFOL ? In other words, is it a well-formed
FOL formula, and if so, is it satisfiable? Dually, we can define the validity
problem for PL and FOL.
54 2 First-Order Logic

2.6.2 Decidability

A formal language L is decidable if there exists a procedure that, given a


word w, (1) eventually halts and (2) answers yes if w ∈ L and no if w 6∈
L. Other terms for “decidable” are recursive and Turing-decidable. A
procedure for a decidable language is called an algorithm. Satisfiability of
PL formulae is decidable: the truth-table method is a decision procedure.
A formal language is undecidable if it is not decidable.
A formal language L is semi-decidable if there exists a procedure that,
given a word w, (1) halts and answers yes iff w ∈ L, (2) halts and answers no if
w 6∈ L, or (3) does not halt if w 6∈ L. The possible outcomes (2) and (3) for the
case w 6∈ L mean that the procedure may or may not halt. Unlike a decidable
language, the procedure is only guaranteed to halt if w ∈ L. Other terms for
“semi-decidable” are partially decidable, recursively enumerable, and
Turing-recognizable.
The terms “Turing-decidable” and “Turing-recognizable” arise from Alan
Turing’s classic formalization of procedures as Turing machines. A Turing
machine consists of a finite automaton coupled with an infinite tape and tape
head. Each cell of the tape can hold one character from a finite alphabet.
The state of the automaton changes based on its control structure and the
character currently under the tape head. During a state change, the automaton
instructs the tape head to write a new character to the current cell and to
move one position left or right.
Church and Turing showed that LFOL is undecidable: there does not exist
an algorithm for deciding if a FOL formula F is satisfiable (and similarly
for validity). However, there is a procedure that halts and says yes if F is
valid, so validity is semi-decidable. We describe such a procedure based on
the semantic argument method in Section 2.7.


2.6.3 Complexity

If a language is decidable, then one considers the complexity of the decision


problem. We define several of the main complexity classes here. A language
L is polynomial-time decidable, or in PTIME (also, in P), if there exists a
procedure that, given w, answers yes when w ∈ L, answers no when w 6∈ L,
and halts with the answer in a number of steps that is at most proportionate
to some polynomial of the size of w. For example, determining if the word
w is a well-formed FOL formula is polynomial-time decidable and can be
implemented using standard parsing methods.
A language L is nondeterministic-polynomial-time decidable, or in
NPTIME (also, in NP), if there exists a nondeterministic procedure that, given
w,
1. guesses a witness W to the fact that w ∈ L that is at most proportionate
in size to some polynomial of the size of w;
2.6 Decidability and Complexity 55

2. checks in time at most proportionate to some polynomial of the size of w


that W really is a witness to w ∈ L;
3. and answers yes if the check succeeds and no otherwise.
For example, LPL is in NP, as exhibited by the following nondeterministic
procedure for deciding if a PL formula is satisfiable:
1. parse the input w as formula F (return no if w is not a well-formed PL
formula);
2. guess an interpretation I, which is linear in the size of w;
3. check that I |= F .
I is the witness to the satisfiability of F .
A language L is in co-NP if its complement language L is in NP. For
example, unsatisfiability of PL formulae is in co-NP because satisfiability is
in NP. It is not known if unsatisfiability of PL formulae is in NP. While
a satisfiable PL formula has a polynomial size witness of its satisfiability
(a satisfying interpretation I), there is no known polynomial size witness of
unsatisfiability.
A language L is NP-hard if every instance v ∈ L′ of every other NP decid-
able language L′ can be reduced to deciding an instance wL v
′ ∈ L. Moreover,
v
the size of wL ′ must be at most proportionate to some polynomial of the size
of v. That is, L is NP-hard if every query v ∈ L′ of every NP language L′
can be encoded into a query w ∈ L, where w is not much larger than v. L is
NP-complete if it is in NP and is NP-hard.
LPL is NP-complete. Indeed, LPL , also called SAT, was the first language
shown to be NP-complete. We proved that LPL is in NP above by describing a
nondeterministic polynomial time procedure. The Cook-Levin theorem shows
that all NP-languages L can be reduced to LPL , so that LPL is NP-hard. They
exhibited a polynomial time algorithm that, given L and input w, constructs
an encoding into PL of a simulation of a run of a nondeterministic Turing
machine for L on w. The encoding as PL formula F has length that is poly-
nomial in the length of w. F is satisfiable iff the Turing machine decides that
w ∈ L.
For discussing the complexity of decision problems and algorithms, we use
the standard notation. Consider a function f (n) over integers. For example,
n
f (n) = log n, f (n) = n2 , f (n) = 2n , or f (n) = 22 . A function g(n) is of at
most order f (n) if there exist a scalar c ≥ 0 and an integer n0 ≥ 0 such that

∀n ≥ n0 . g(n) ≤ cf (n) .

O(f (n)) denotes the set of all functions of at most order f (n). Similarly,
Ω(f (n)) denotes the set of all function of at least order f (n): a function g(n)
is of at least order f (n) if there exist a scalar c ≥ 0 and an an integer n0 ≥ 0
such that

∀n ≥ n0 . g(n) ≥ cf (n) .
56 2 First-Order Logic

Finally, Θ(f (n)) = Ω(f (n)) ∩ O(f (n)) denotes the set of all functions of
precisely order f (n).

Example 2.28.
• 3n2 + n ∈ O(n2 )
• 3n2 + n ∈ Ω(n2 )
• 3n2 + n ∈ Θ(n2 )
1 2 2
• 99 n + n ∈ Ω(n )
2 n
• 3n + n ∈ O(2 )
• 3n2 + n ∈ Ω(n)
• 3n2 + n 6∈ Ω(2n )
• 3n2 + n 6∈ Θ(2n )
• 2n ∈ Ω(n3 )
• 2n 6∈ O(n3 )


A decision problem has time complexity O(f (n)) if there exists a decision
algorithm P for the problem and a function g(n) ∈ O(f (n)) such that P
runs in time at most g(n) on input of size n. A decision problem has time
complexity Ω(f (n)) if there exists a function g(n) ∈ Ω(f (n)) such that all
decision algorithms P for the problem run in time at least g(n) on input of
size n. Finally, a decision problem has time complexity Θ(f (n)) if it has time
complexities Ω(f (n)) and O(f (n)).

Example 2.29. The algorithm sat for deciding PL satisfiability runs in time
Θ(2n ), where n is the number of variables in the input formula, because each
level of recursion branches. Hence, the problem of PL satisfiability has time
complexity O(2n ). 


2.7 Meta-Theorems of First-Order Logic
We prove that the semantic argument method for FOL is sound and, given
a proper strategy of applying the proof rules, complete. A proof method is
sound if every formula that has a proof according to the method is valid. A
proof method is complete if every formula that is valid has a proof according
to the method. That the semantic argument method is sound means that a
closed semantic argument for I 6|= F proves the validity of F ; and that the
semantic argument method is complete means that every valid formula F
of FOL has a closed semantic argument proving its validity. Because there
exists a complete proof method for FOL, FOL is a complete logic: every valid
formula of FOL has a proof of its validity.
The second half of this section is devoted to proving two classic theorems
that we apply in Chapter 10.

2.7 Meta-Theorems of First-Order Logic 57

2.7.1 Simplifying the Language of FOL

In preparation for the proofs, we simplify the language of FOL without los-
ing expressiveness. Exercises 1.3 and 4.6 show that we have many redundant
logical connectives. We choose to use only the logical constant ⊤ and the con-
nectives ¬ and ∧, from which the others can be constructed. Additionally, we
need only one quantifier since ∃x. F is equivalent to ¬∀x. ¬F . We choose ∀.
A second simplification is more involved. The goal is to remove constant
and function symbols from the language by using predicate symbols instead.
Given a formula F , let S be the set of function symbols appearing in it.
Associate with each n-ary function symbol f of S a new (n + 1)-ary predicate
pf . Then for each occurrence of a function f in a literal L of F

L[f (t1 , . . . , tn )] ,

replace L in F with the new formula

∃x. pf (t1 , . . . , tn , x) ∧ L[x] .

After all replacements, the resulting formula G does not contain any function
symbols.
The next step ensures that the new predicate pf describes a function f : it
associates with each tuple of domain values v1 , . . . , vn precisely one value v.
For each introduced predicate pf , construct the formula

If : ∀x. ∃y. pf (x, y) ∧ (∀z. pf (x, z) → y = z) .

Then construct the formula


 
^
H:  If  → G .
f ∈S

The equality predicate = is not yet defined. To make = an equivalence


relation, assert that it is reflexive, symmetric, and transitive:

(∀x. x = x)
E: ∧ (∀x, y. x = y → y = x)
∧ (∀x, y, z. x = y ∧ y = z → x = z)

Additionally, every predicate symbol p appearing in G should obey =:

Ep : ∀x, y. x = y → (p(x) ↔ p(y))

Let T be the set of predicate symbols of G. Construct the final formula


 
^
F ′ : E ∧ Ep  → H .
p∈T
58 2 First-Order Logic

F ′ is valid iff F is valid. Moreover, F ′ does not contain any function symbols.
For the special case of constant symbols, it is simpler to replace F [a] with
F ′ : ∀x. F [x].
For the remainder of this section, we consider a version of FOL with only
the logical constant ⊤, the connectives ¬ and ∧, the quantifier ∀, and predicate
symbols. It is equivalent in expressive power to the richer language studied
earlier in the chapter.

2.7.2 Semantic Argument Proof Rules


In this simplified form of FOL, we need only the following seven proof rules.
• For handling negation:
I |= ¬F I 6|= ¬F
I 6|= F I |= F
• For handling conjunction:
I |= F ∧ G I 6|= F ∧ G
I |= F I 6|= F | I 6|= G
I |= G
• For handling universal quantification:
I |= ∀x. F for any v ∈ DI
I ⊳ {x 7→ v} |= F
and
I 6|= ∀x. F for a fresh v ∈ DI
I ⊳ {x 7→ v} 6|= F
• For deriving a contradiction:
J : I ⊳ · · · |= p(x1 , . . . , xn )
K : I ⊳ · · · 6|= p(y1 , . . . , yn ) for i ∈ {1, . . . , n}, αJ [xi ] = αK [yi ]
I |= ⊥
Only the second rule for conjunction requires a case analysis.

2.7.3 Soundness and Completeness


That the semantic argument method is sound is fairly obvious for the first six
rules: each follows almost directly from the semantics of FOL. The final rule
for deriving a contradiction requires some explanation. The variants J and
K are constructed only through the rules for handling quantification so that
they simply assign values to the arguments of p. Hence, the truth value of p
on the given tuple of domain values is already established by I. Furthermore,
the disagreement between J and K on the truth value of p indicates that I is
not in fact an interpretation. Therefore, we have the following theorem.

2.7 Meta-Theorems of First-Order Logic 59

Theorem 2.30 (Sound). If every branch of a semantic argument proof of


I 6|= F closes, then F is valid.

Completeness is more complicated. We want to show that there exists a


closed semantic argument proof of I 6|= F when F is valid. Our strategy is as
follows. We define a procedure for applying the proof rules. When applying
the quantification rules, the procedure selects values from a predetermined
countably infinite domain. We then show that when some falsifying interpre-
tation I exists (such that I 6|= F ) our procedure constructs, at the limit, a
falsifying interpretation. Therefore, F must be valid if the procedures actu-
ally discovers an argument in which all branches are closed. We now proceed
according to this proof plan.
Let D be a countably infinite domain of values v1 , v2 , v3 , . . . which we
can enumerate in some fixed order. Start the semantic argument by placing
I 6|= F at the root and marking it as unused. Now assume that the procedure
has constructed a partial semantic argument and that each line is marked as
either used or unused. We describe the next iteration.
Select the earliest line L : I |= G or L : I 6|= G in the argument that
is marked unused, and choose the appropriate proof rule to apply according
to the root symbol of G’s parse tree. To apply a rule, add the appropriate
deductions at the end of every open branch that passes through line L; mark
each new deduction as unused ; and mark L as used. The application of the
negation rules and the first conjunction rule is then straightforward. Applying
the second (branching) conjunction rule introduces a fork at the end of every
open branch, doubling the number of open branches. In applying the second
quantification rule, choose the next domain element vi that does not appear
in the semantic argument so far. For the first quantification rule, assume that
G has the form ∀x. H. Choose the first value vi on which ∀x. H has not been
instantiated in any ancestor of L. Additionally, consider I |= G as a second
“deduction” of this rule (so that both I ⊳ {x 7→ vi } |= H and I |= G are
added to every branch passing through L and marked as unused ). This trick
guarantees that x of ∀x. H is instantiated on every domain element without
preventing the rest of the proof from progressing. Finally, close any branch
that has a contradiction resulting from a deduction in this iteration.
Recall that a semantic argument is finished if no further applications of
rules are possible. In our proof procedure, this situation occurs when all lines
are marked as used. Although we can never construct a finished semantic
argument with infinitely many lines in practice, we can reason about such
arguments. For example, such an argument has an infinitely long branch. For
suppose not: then every branch has finite length, so there must be an infinite
number of these finite branches. But such a situation requires a deduction
step that results in an infinite number of branches, whereas each proof rule
produces at most two branches. This result is known as König’s Lemma. We
next prove that each open branch of a finished semantic argument describes
a falsifying interpretation.
60 2 First-Order Logic

Lemma 2.31. Each open branch of a finished semantic argument for I 6|= F
defines a falsifying interpretation of F .

Proof. We apply structural induction on the formulae appearing in the branch


to conclude that each line L : I |= G or L : I 6|= G holds, including I 6|= F . In
that case, I is a falsifying interpretation of F . The technique of structural
induction is defined in Section 4.4.
For the base case, consider lines in which G is an atom. As the contradiction
rule is never applied (otherwise, the branch would be closed) no contradiction
exists. Therefore, each instance I ⊳· · · |= p(x1 , . . . , xn ) or I ⊳· · · 6|= p(x1 , . . . , xn )
defines the truth value of p on one tuple of domain elements without contra-
dicting any other definition on the branch. (For tuples of domain elements
on which p is not explicitly defined on the branch, p may take any value, say
false.)
Consider when G is formed by applying a logical connective to one or two
formulae. As the procedure applied the appropriate proof rule, the inductive
hypothesis and the semantics of the logical connectives tell us that L holds.
For example, consider the case L : I |= F1 ∧ F2 . Then I |= F1 and I |= F2
appear on the branch, and both lines hold by the inductive hypothesis. The
reader may verify the other logical connectives with similar reasoning.
Consider the case L : I 6|= ∀x. F . For L to hold, it must be the case
that for some fresh domain value v, M : I ⊳ {x 7→ v} 6|= F . But the procedure
guarantees that M is a descendant of L. Moreover, F is a subformula of ∀x. F ,
so the inductive hypothesis asserts that M holds. Then so does L.
Consider the case L : I |= ∀x. F . For L to hold, it must be the case that
for all domain values v, M : I ⊳ {x 7→ v} |= F . But the procedure guarantees
that such a line exists for every v. Moreover, F is a subformula of ∀x. F , so
the inductive hypothesis asserts that each such lines holds. Hence, so does L,
finishing the proof. 

Remark 2.32. The formulae that appear on an open branch of a finished


semantic proof comprise a Hintikka set. The proof strategy that we employed
is essentially that used in proving Hintikka’s Lemma, which asserts that a
Hintikka set is satisfiable.

Remark 2.33. We defined the procedure with a fixed countably infinite do-
main in mind and then proved that an open branch of a finished semantic
argument corresponds to at least one falsifying interpretation. Therefore, we
have proved an additional fact: every satisfiable FOL formula is satisfied by an
interpretation with a countable domain. This result is Löwenheim’s Theo-
rem.

Theorem 2.34 (Complete). The semantic argument method is complete:


each valid formula F has a semantic argument proof (in which every branch
is closed).

2.7 Meta-Theorems of First-Order Logic 61

Proof. Suppose that F is valid, yet no semantic argument proof exists. Then
a finished semantic argument constructed according to our procedure has an
open branch. By Lemma 2.31, this branch describes a falsifying interpretation
of F , a contradiction. Hence, all branches of a finished semantic argument must
in fact be closed (and thus finite). By König’s Lemma, the semantic argument
itself has finite size. 

2.7.4 Additional Theorems

Is a countable (but possibly infinite) set S of satisfiable formulae simulta-


neously satisfiable? That is, does there exist a single interpretation I that
satisfies every member of S? The Compactness Theorem relates simulta-
neous satisfiability of S to satisfiability of the conjunction of each finite subset
of S. Dually, we might consider whether the disjunction of a countable set of
formulae is valid.

Theorem 2.35 (Compactness Theorem). A countable set of first-order


formulae S is simultaneously satisfiable iff the conjunction of every finite sub-
set is satisfiable.

Proof. The forward direction is clear: if I simultaneously satisfies the members


of S, then it satisfies each finite conjunction.
For the other direction, extend the proof procedure of the previous section
as follows. Arrange the members of S in some order F1 , F2 , F3 , . . ., which is
possible because S is countable. Start the procedure with I 6|= ¬F1 . At the
end of each iteration of the procedure, choose the next formula Fi in the
sequence and append I 6|= ¬Fi to every open branch, marking it as unused.
Since each finite subset of S is satisfiable, at least one branch remains open
and the procedure does not terminate.
A finished semantic argument constructed in this manner enumerates an
interpretation that falsifies every ¬Fi and thus satisfies every Fi . Hence, S is
simultaneously satisfiable. 

Remark 2.36. This proof proves an additional fact that extends Löwenheim’s
Theorem: every simultaneously satisfiable countable set of FOL formulae is
simultaneously satisfied by an interpretation with a countable domain. This
result is the Löwenheim-Skolem Theorem.

We apply the next theorem, the Craig Interpolation Lemma, in Chap-


ter 10. It asserts that if F → G is valid, then there is a formula H (called an
interpolant) such that F → H and H → G are valid and whose predicates
and free variables occur in both F and G. The proof is constructive: it de-
scribes a procedure for extracting the interpolant from a proof of the validity
of F → G. However, proofs constructed via the proof rules of Section 2.7.2
are not directly amenable to the interpolation procedure. We describe an al-
ternate set of proof rules instead and show that any proof constructed from
62 2 First-Order Logic

the rules of 2.7.2 can be translated into a proof using the new rules. Then we
prove the Craig Interpolation Lemma using these new proof rules.
One trick that will prove convenient is the following. Associate a fresh
variable xi with each domain value vi introduced during the proof. Whenever
a variant I ⊳ {x 7→ vi } is used, rename x to the variable xi corresponding to
the value vi in both the variant interpretation and the formula. This renaming
does not affect the soundness of the proof, but it makes contradictions more
obvious.
The new rules are the following:
• For handling double negation:

I |= ¬¬F I |6 = ¬¬F
I |= F I 6|= F

• For handling conjunction:

I |= F ∧ G I 6|= F ∧ G
I |= F I 6|= F | I 6|= G
I |= G

and

I |= ¬(F ∧ G) I |6 = ¬(F ∧ G)
I |= ¬F | I |= ¬G I 6|= ¬F
I 6|= ¬G

• For handling universal quantification:

I |= ∀x. F I |6 = ¬∀x. F for any v ∈ DI


I ⊳ {x 7→ v} |= F I ⊳ {x 7→ v} 6|= ¬F

and
I 6|= ∀x. F I |= ¬∀x. F for a fresh v ∈ DI
I ⊳ {x 7→ v} 6|= F I ⊳ {x →
7 v} |= ¬F

• For deriving a contradiction (recall our trick of renaming variables to cor-


respond uniquely to domain values):

J : I ⊳ · · · |= p(x1 , . . . , xn )
K : I ⊳ · · · 6|= p(x1 , . . . , xn )
I |= ⊥

and
J : I ⊳ · · · |= p(x1 , . . . , xn ) J : I ⊳ · · · |6 = p(x1 , . . . , xn )
K : I ⊳ · · · |= ¬p(x1 , . . . , xn ) K : I ⊳ · · · |6 = ¬p(x1 , . . . , xn )
I |= ⊥ I |= ⊥

2.7 Meta-Theorems of First-Order Logic 63

The important characteristic (for proving the interpolation lemma) of this set
of proof rules is that premises and deductions agree on the use of |= or 6|=,
except in the contradiction rules. In contrast, the negation rules of Section
2.7.2 do not have this property. We obtained this property by folding each
negation rule into every other rule.
Before proving the interpolation lemma, let us prove that the new semantic
argument proof system based on these rules is sound and complete. Soundness
is fairly obvious; for completeness, we briefly describe how to map a proof from
the system of Section 2.7.2 to a proof using these rules.

Lemma 2.37. Every proof in the proof system of Section 2.7.2 has a corre-
sponding proof in the new proof system.

Proof. In constructing the new proof, ignore any use of the negation rules
of Section 2.7.2, instead choosing from the (doubled) set of conjunction and
quantification rules depending on whether a ¬ is at the root of the parse tree
of a formula. Use the new negation rules to remove double negations when
necessary. For deriving a contradiction, one of the three cases represented by
the contradiction rules must occur when a contradiction occurs in the original
proof. 

We can now prove the theorem.

Theorem 2.38 (Craig Interpolation Lemma). If F → G is valid, then


there exists a formula H such that F → H and H → G are valid and whose
predicates and free variables occur in both F and G.

Proof. We prove the result by describing a procedure that extracts from a


(closed) semantic argument proof of the validity of F → G the interpolant H.
For convenience, let the proof itself begin with the lines

1. I |= F assumption
2. I |6 = G assumption

Notice that with the new set of proof rules, only |= rules will be applied
to deductions stemming from line 1, while only 6|= rules will be applied to
those stemming from line 2. The three contradiction rules correspond to three
possible situations: a contradiction between I |= F and I 6|= G (F → G is
valid), within I |= F itself (F is unsatisfiable), and within I 6|= G itself (G is
valid).
The procedure runs backwards through a proof. It associates with each line
L of the proof a set of positive formulae U and a set of negative formulae V .
U consists of formulae on lines from which L descends (including itself) that
are satisfied by their interpretation (lines of the form K |= F1 ). V consists of
formulae on lines from which L descends (including itself) that are falsified
by their interpretation (lines of the form K 6|= F2 ). Define L’s characteristic
formula as
64 2 First-Order Logic
^ _
U → V ,

written as {U } → {V } for concision. That the branch on which L lies ends in


a contradiction implies that {U } → {V } is valid. The procedure constructs
for each line an interpolant X of {U } → {V }; that is, X is such that
^ _
U ⇒ X and X ⇒ V

and the predicates and free variables of X appear in both U and V . The
interpolant of line 2 of the proof is the interpolant H that we seek.
Let us begin with the end of a branch, L : I |= ⊥. It must have been
deduced via a contradiction. If the first contradiction rule produced L, then
its characteristic formula is of the form

{U, p(x1 , . . . , xn ), ⊥} → {V, p(x1 , . . . , xn )} ,

where the variable renaming trick ensures that the arguments to p are syn-
tactically the same. Its parent has characteristic formula

{U, p(x1 , . . . , xn )} → {V, p(x1 , . . . , xn )} ,

and both have interpolant p(x1 , . . . , xn ). If the second contradiction rule pro-
duced L, then its characteristic formula is of the form

{U, p(x1 , . . . , xn ), ¬p(x1 , . . . , xn ), ⊥} → {V } ,

and its parent’s is of the form

{U, p(x1 , . . . , xn ), ¬p(x1 , . . . , xn )} → {V } .

Both have interpolant ⊥ (¬⊤ in the restricted language). Similarly, if the third
contradiction rule produced L, then the interpolant is ⊤.
Consider lines derived via the conjunction rules. Suppose L : I |= F is
deduced from I |= F ∧ G. Then the characteristic formulae of L and its parent
are

{U, F ∧ G, F } → {V } and {U, F ∧ G} → {V } ,

respectively. If L has interpolant X, then so does its parent. The case is similar
for a line L : I 6|= ¬F deduced from I 6|= ¬(F ∧ G).
For the next conjunction rule, suppose that L : I 6|= F is deduced on one
branch from I 6|= F ∧ G. Then L is at a fork in the proof and has sibling line
L′ : I 6|= G. The characteristic formulae of L, L′ , and their parent are

{U } → {V, F ∧ G, F } , {U } → {V, F ∧ G, G} , and {U } → {V, F ∧ G} ,

respectively. If L and L′ have interpolants X and Y , respectively, then their


parent has interpolant X ∧ Y .

2.7 Meta-Theorems of First-Order Logic 65

Similarly, suppose L : I |= ¬F is deduced on one branch from I |= ¬(F ∧G)


(so that its sibling is L′ : I |= ¬G). If L and L′ have interpolants X and Y ,
respectively, then their parent has interpolant X ∨ Y (¬(¬X ∧ ¬Y ) in the
restricted language).
The interpolant X of a line L derived via a double-negation rule passes
directly to its parent, for the characteristic formula of L simply has a repetition
¬¬F of a formula F that the parent’s characteristic formula does not have.
We turn to the quantification rules. Consider a line L : I ⊳ {z 7→ v} 6|= F
derived from I 6|= ∀x. F . L and its parent M have characteristic formulae

{U } → {V, ∀x. F, F } and {U } → {V, ∀x. F } ,

respectively. Moreover, v is fresh, and thus z does not appear in either U or


V according to our trick. Hence, z cannot occur free in L’s interpolant X. It
thus follows that

∀ ∗ . ∀z. X → V ∨ ∀x.F ∨ F

is equivalent to

∀ ∗ . X → V ∨ ∀x.F ∨ ∀z. F

and thus to

∀ ∗ . X → V ∨ ∀x.F .

Therefore, X is an interpolant of M . Similarly, X is an interpolant of the


parent of L : I ⊳ {z 7→ v} |= ¬F deduced from I |= ¬∀x. F .
Consider L : I ⊳ {z 7→ v} |= F with interpolant X derived from I |= ∀x. F ,
where z is not necessarily fresh. The characteristic formulae of L and its parent
M are

{U, ∀x. F, F } → {V } and {U, ∀x. F } → {V } ,

respectively. Clearly,

U ∧ ∀x. F ∧ F ⇒ X

implies that

U ∧ ∀x. F ⇒ X .

Hence, X is an interpolant of M when z is not free in U or V (so that z is not


free in X) and when it is free in both. However, if z is free in V but not in U ,
then X is not an interpolant of M . But ∀z. X is an interpolant. In particular,
we have

U ∧ ∀x. F ⇒ ∀z. X
66 2 First-Order Logic

and
∀z. X ⇒ V because ∀z. X ⇒ X and X ⇒ V .
For the final case, suppose that L : I ⊳ {z 7→ v} 6|= ¬F is deduced from
I 6|= ¬∀x. F . The characteristic formula of L is
{U } → {V, ¬∀x. F, F } .
Then X is the interpolant of the parent M unless z is free in U but not free
in V . In the latter case, the interpolant is ∃z. X (¬∀z. ¬X in the restricted
language). The reasoning is similar to the previous case, completing the proof.


2.8 Summary
Building on the presentation of PL in Chapter 1, this chapter introduces first-
order logic (FOL). It covers:
• Its syntax. How one constructs a FOL formula. Variables, terms, function
symbols, predicate symbols, atoms, literals, logical connectives, quantifiers.
• Its semantics. What a FOL formula means. Truth values true and false.
Interpretations: domain and assignments. Difference between a function
(predicate) symbol and a function (predicate) over a domain.
• Satisfiability and validity. Whether a FOL formula evaluates to true under
any or all interpretations. Semantic argument method.
• Substitution, which is a tool for manipulating formulae and making general
claims. Safe and schema substitutions. Substitution of equivalent formulae.
Valid schemata.
• Normal forms. A normal form is a set of syntactically restricted formulae
such that every FOL formula is equivalent to some member of the set.
• A review of decidability and complexity theory, which provides the concepts
necessary for discussing decidability and complexity questions in logic.
• Meta-theorems. Semantic argument method is sound and complete. Com-
pactness Theorem. Craig Interpolation Lemma.
The results of Section 2.7 are the groundwork for our theoretical treatment
of the Nelson-Oppen combination method in Chapter 10.
FOL is the most general logic that is discussed in this book. Its applications
include software and hardware design and analysis, knowledge representation,
and complexity and decidability theory.
FOL is a complete logic: every valid FOL formula has a proof in the se-
mantic argument method. However, validity is undecidable. Many applications
benefit from complete automation, which is impossible when considering all
of FOL. Therefore, Chapter 3 introduces first-order theories, which formal-
ize interesting structures, such as integers, rationals, lists, stacks, and arrays.
Part II of this book explores algorithms for reasoning within these theories.
Exercises 67

Bibliographic Remarks
For a complete and concise presentation of propositional and first-order logic,
see Smullyan’s text First-Order Logic [87]. The semantic argument method
is similar to Smullyan’s tableau method. Also, the proofs of completeness of
the semantic argument method, the Compactness Theorem, and the Craig
Interpolation Lemma are inspired by Smullyan’s presentation.
The history of the development of mathematical logic is rich. For an
overview, see [98] and related articles in The Stanford Encyclopedia of Phi-
losophy. We mention in particular Hilbert’s program of the 1920s — see, for
example, [38] — to find a consistent and complete axiomatization of arith-
metic. Gödels two incompleteness theorems proved that such a goal is impos-
sible. The first incompleteness theorem, which Gödel presented in a lecture
in September, 1930, and then in [36], states that any axiomatization of arith-
metic contains theorems that are not provable within the theory. The second,
which Gödel had proved by October, 1930, states that a theory such as Peano
arithmetic cannot prove its own consistency unless it is itself inconsistent.
Earlier, Gödel proved that first-order logic is complete [35]: every theorem
has a proof. However, Church — and, independently, Turing — proved that
satisfiability in first-order logic is undecidable [13]. Thus, while every theorem
of first-order logic has a finite proof, invalid formulae need not have a finite
proof of their invalidity.
For an introduction to formal languages, decidability, and complexity the-
ory, see [85, 72, 41].

Exercises
2.1 (English and FOL). Encode the following English sentences into FOL.
(a) Some days are longer than others.
(b) In all the world, there is but one place that I call home.
(c) My mother’s mother is my grandmother.
(d) The intersection of two convex sets is convex.
2.2 (FOL validity & satisfiability). For each of the following FOL formu-
lae, identify whether it is valid or not. If it is valid, prove it with a semantic
argument; otherwise, identify a falsifying interpretation.
(a) (∀x, y. p(x, y) → p(y, x)) → ∀z. p(z, z)
(b) ∀x, y. p(x, y) → p(y, x) → ∀z. p(z, z)
(c) (∃x. p(x)) → ∀y. p(y)
(d) (∀x. p(x)) → ∃y. p(y)
(e) ∃x, y. (p(x, y) → (p(y, x) → ∀z. p(z, z)))
2.3 (Semantic argument). Use the semantic argument method to prove the
following formula schemata.
68 2 First-Order Logic

(a) ¬(∀x. F ) ⇔ ∃x. ¬F


(b) ¬(∃x. F ) ⇔ ∀x. ¬F
(c) ∀x, y. F ⇔ ∀y, x. F
(d) ∃y. ∀x. F ⇒ ∀x. ∃y. F
(e) ∃x. F ∨ G ⇔ (∃x. F ) ∨ (∃y. G)
(f) ∃x. F → G ⇔ (∀x. F ) → (∃x. G)
(g) ∃x. F ∨ G ⇔ (∃x. F ) ∨ G, provided x 6∈ free(G)
(h) ∀x. F ∨ G ⇔ (∀x. F ) ∨ G, provided x 6∈ free(G)
(i) ∃x. F ∧ G ⇔ (∃x. F ) ∧ G, provided x 6∈ free(G)
(j) ∀x. F → G ⇔ (∃x. F ) → G, provided x 6∈ free(G)

2.4 (Normal forms). Put the following formulae into prenex normal form.
(a) (∀x. ∃y. p(x, y)) → ∀x. p(x, x)
(b) ∃z. (∀x. ∃y. p(x, y)) → ∀x. p(x, z)
(c) ∀w. ¬(∃x, y. ∀z. p(x, z) → q(y, z)) ∧ ∃z. p(w, z)

2.5 (⋆ Characteristic formula). Why is the characteristic formula of a line


on a closed branch of a semantic argument valid?
3
First-Order Theories

Formalization works as “an early-warning system” when things are


getting contorted.
— Edsger W. Dijkstra
EWD764: Repaying Our Debts, 1980
When reasoning in particular application domains such as software or
hardware, one often has particular structures in mind. For example, programs
manipulate numbers, lists, and arrays. First-order theories formalize these
structures to enable reasoning about them. This chapter introduces first-order
theories in general and then focuses on theories useful in verification and re-
lated tasks. These theories include a theory of equality, of integers, of rationals
and reals, of recursive data structures, and of arrays.
There is another reason to study first-order theories. While validity in FOL
is undecidable, validity in particular theories or fragments of theories is some-
times decidable. Many of the theories studied in this chapter have important
fragments for which validity is efficiently decidable. For each theory, we iden-
tify the decidable and efficiently decidable fragments, which we summarize in
Section 3.7. Part II studies decision procedures for the decidable fragments.

3.1 First-Order Theories


A first-order theory T is defined by the following components.
1. Its signature Σ is a set of constant, function, and predicate symbols.
2. Its set of axioms A is a set of closed FOL formulae in which only constant,
function, and predicate symbols of Σ appear.
A Σ-formula is constructed from constant, function, and predicate symbols
of Σ, as well as variables, logical connectives, and quantifiers. As usual, the
symbols of Σ are just symbols without prior meaning. The axioms A provide
their meaning.
70 3 First-Order Theories

A Σ-formula F is valid in the theory T , or T -valid, if every interpre-


tation I that satisfies the axioms of T ,

I |= A for every A ∈ A , (3.1)

also satisfies F : I |= F . For this reason, we write

T |= F

to mean that F is T -valid. Formally, the theory T consists of all (closed)


formulae that are T -valid. We call an interpretation satisfying (3.1) a T -
interpretation.
A Σ-formula F is satisfiable in T , or T -satisfiable, if there is a T -
interpretation I that satisfies F .
A theory T is complete if for every closed Σ-formula F , T |= F or
T |= ¬F . A theory is consistent if there is at least one T -interpretation. In
particular, in a consistent theory T , there does not exist a Σ-formula F such
that both T |= F and T |= ¬F . (Otherwise, by the semantics of conjunction,
T |= F ∧ ¬F and thus T |= ⊥; but ⊥ is not satisfied by any interpretation.)
Concepts from general FOL validity carry over to first-order theories in
the natural way. For example, two formulae F1 and F2 are equivalent in T ,
or T -equivalent, if T |= F1 ↔ F2 : for every T -interpretation I, I |= F1 iff
I |= F2 .
A fragment of a theory is a syntactically-restricted subset of formulae
of the theory. For example, the quantifier-free fragment of a theory T is
the set of formulae without quantifiers that are valid in T . Recall our con-
vention that non-closed formula F is valid iff its universal closure is valid.
Technically, the “quantifier-free fragment” of T actually consists of valid for-
mulae in which all variables are universally quantified. However, the term
“quantifier-free fragment” is the common and accepted name for this frag-
ment. Subsequent chapters show that the quantifier-free fragments of theories
are of great practical and theoretical importance.
A theory T is decidable if T |= F is decidable for every Σ-formula F . That
is, there is an algorithm that always terminates with “yes” if F is T -valid or
with “no” if F is T -invalid. A fragment of T is decidable if T |= F is decidable
for every Σ-formula F that obeys the fragment’s syntactic restrictions.
The union T1 ∪ T2 of two theories T1 and T2 has signature Σ1 ∪ Σ2 and
axioms A1 ∪ A2 . Clearly, a (T1 ∪ T2 )-interpretation is both a T1 -interpretation
and a T2 -interpretation since it satisfies the axioms of both T1 and T2 . Hence,
a formula that is T1 -valid or T2 -valid is (T1 ∪ T2 )-valid, while a formula that
is (T1 ∪ T2 )-satisfiable is both T1 -satisfiable and T2 -satisfiable.
Because FOL (the “empty” theory, or the theory without axioms) is unde-
cidable in general, we must turn to theories and fragments of theories for the
possibility of fully automated reasoning. While many interesting theories are
undecidable, there are several important theories and fragments of theories
that are decidable. These theories and fragments are the main subject of Part
3.2 Equality 71

II of this book. We introduce them in the following sections. In Section 3.7,


we summarize the decidability and complexity results for these theories and
fragments.

3.2 Equality
The theory of equality TE is the simplest first-order theory. Its signature

ΣE : {=, a, b, c, . . . , f, g, h, . . . , p, q, r, . . .}

consists of
• = (equality), a binary predicate;
• and all constant, function, and predicate symbols.
Equality = is an interpreted predicate symbol: its meaning is defined via
the axioms of TE . The other constant, function, and predicate symbols are
uninterpreted except as they relate to equality. The axioms of TE are the
following:
1. ∀x. x = x (reflexivity)
2. ∀x, y. x = y → y = x (symmetry)
3. ∀x, y, z. x = y ∧ y = z → x = z (transitivity)
4. for each positive integer n and n-ary function symbol f ,
n
!
^
∀x, y. xi = yi → f (x) = f (y) (function congruence)
i=1

5. for each positive integer n and n-ary predicate symbol p,


n
!
^
∀x, y. xi = yi → (p(x) ↔ p(y)) (predicate congruence)
i=1

The notation x stands for the list of variables x1 , . . . , xn . Axioms (function con-
gruence) and (predicate congruence) are actually axiom schemata. An axiom
schema stands for a set of axioms, each an instantiation of the parameters (f
and p in (function congruence) and (predicate congruence), respectively). For
example, for binary function symbol f2 , (function congruence) instantiates to
the following axiom:

∀x1 , x2 , y1 , y2 . x1 = y1 ∧ x2 = y2 → f2 (x1 , x2 ) = f2 (y1 , y2 ) .

The first three axioms state that = is an equivalence relation: it is a


binary predicate that obeys reflexivity, symmetry, and transitivity. The final
two axiom schemata formalize our intuition for the behavior of functions and
predicates under equality. A function (predicate) always evaluates to the same
72 3 First-Order Theories

value (truth value) for a given set of argument values. They assert that = is
a congruence relation.
TE is just as undecidable as full FOL because it allows all constant, func-
tion, and predicate symbols. In particular, any FOL formula F can be en-
coded as a ΣE -formula F ′ simply by replacing occurrences of the symbol =
with a fresh symbol. Since = does not occur in this transformed formula F ′ ,
the axioms of TE are irrelevant; hence, F ′ is TE -satisfiable iff F ′ is first-order
satisfiable.
However, the quantifier-free fragment of TE is both interesting and effi-
ciently decidable, as we show in Chapter 9.

Example 3.1. Without quantifiers, free variables and constants play the
same role. In the formula

F : a = b ∧ b = c → g(f (a), b) = g(f (c), a) ,

a, b, and c are constants, while in

F ′ : x = y ∧ y = z → g(f (x), y) = g(f (z), x) ,

x, y, and z are free variables. F is TE -valid iff F ′ is TE -valid; F is TE -satisfiable


iff F ′ is TE -satisfiable. 

It is often useful to reason about the T -satisfiability or T -validity of a Σ-


formula F in a structured but informal way. We show how to use the semantic
argument method with TE .

Example 3.2. To prove that

F : a = b ∧ b = c → g(f (a), b) = g(f (c), a)

is TE -valid, assume otherwise: there exists a TE -interpretation I such that


I 6|= F :

1. I 6|= F assumption
2. I |= a=b ∧ b=c 1, →
3. I 6|= g(f (a), b) = g(f (c), a) 1, →
4. I |= a=b 2, ∧
5. I |= b=c 2, ∧
6. I |= a=c 4, 5, (transitivity)
7. I |= f (a) = f (c) 6, (function congruence)
8. I |= b=a 4, (symmetry)
9. I |= g(f (a), b) = g(f (c), a) 7, 8 (function congruence)
10. I |= ⊥ 3, 9

Our assumption is apparently false: F is TE -valid. 


3.3 Natural Numbers and Integers 73

3.3 Natural Numbers and Integers

Arithmetic involving the addition and multiplication of the natural numbers


N = {0, 1, 2, . . .} is perhaps the oldest of mathematical theories. In this section
we describe three theories of arithmetic. Peano arithmetic allows addition
and multiplication over natural numbers, while Presburger arithmetic is
restricted to addition over natural numbers. The final theory, the theory of
integers, is convenient for automated reasoning but is no more expressive
than Presburger arithmetic.

3.3.1 Peano Arithmetic

The theory of Peano arithmetic TPA , or first-order arithmetic, has


signature

ΣPA : {0, 1, +, ·, =} ,

where
• 0 and 1 are constants;
• + (addition) and · (multiplication) are binary functions;
• and = (equality) is a binary predicate.
Its axioms are the following:
1. ∀x. ¬(x + 1 = 0) (zero)
2. ∀x, y. x + 1 = y + 1 → x = y (successor)
3. F [0] ∧ (∀x. F [x] → F [x + 1]) → ∀x. F [x] (induction)
4. ∀x. x + 0 = x (plus zero)
5. ∀x, y. x + (y + 1) = (x + y) + 1 (plus successor)
6. ∀x. x · 0 = 0 (times zero)
7. ∀x, y. x · (y + 1) = x · y + x (times successor)
These axioms concisely define addition, multiplication, and equality over nat-
ural numbers. Informally, axioms (zero), (plus zero), and (times zero) define
0 as we understand it: it is the minimal element of the natural numbers; it
is the identity for addition (x + 0 = x); and under multiplication, it maps
any number to 0 (x · 0 = 0). Axioms (zero), (successor), (plus zero), and (plus
successor) define addition. Axioms (times zero) and (times successor) define
multiplication: in particular, (times successor) defines multiplication in terms
of addition.
(induction) is an axiom schema: it stands for the set of axioms obtained
by substituting for F each ΣPA -formula that has precisely one free variable.
It asserts that every TPA -interpretation I obeys induction: if I satisfies F [0]
and ∀x. F [x] → F [x + 1], then I also satisfies ∀x. F [x].
For convenience, we usually do not write the “·” for multiplication. For
example, we write xy rather than x · y.
74 3 First-Order Theories

The intended interpretations of TPA have domain N and assignments


αI defining 0, 1, +, ·, and = as we understand them in everyday arithmetic.
In particular,
• αI [0] is 0N : αI maps the symbols “0” to 0N ∈ N;
• αI [1] is 1N : αI maps the symbols “1” to 1N ∈ N;
• αI [+] is +N , addition over N;
• αI [·] is ·N , multiplication over N;
• αI [=] is =N , equality over N.

Example 3.3. The formula 3x + 5 = 2y can be written using the signature


ΣPA as

x+x+x+1+1+1+1+1 =y+y

or as

(1 + 1 + 1) · x + 1 + 1 + 1 + 1 + 1 = (1 + 1) · y .

In practice, we use the abbreviated notation 3x + 5 = 2y. 

Example 3.4. Rather than augmenting TPA with axioms defining inequality
>, we can transform formulae with inequality into formulae over the restricted
signature ΣPA . Write

3x + 5 > 2y as ∃z. z 6= 0 ∧ 3x + 5 = 2y + z ,

where z 6= 0 abbreviates ¬(z = 0). The latter formula is a ΣPA -formula. Weak
inequality can be similarly transformed. Write

3x + 5 ≥ 2y as ∃z. 3x + 5 = 2y + z .

Example 3.5. The ΣPA -formula

∃x, y, z. x 6= 0 ∧ y 6= 0 ∧ z 6= 0 ∧ xx + yy = zz

is TPA -valid. It asserts that there exists a triple of positive integers fulfilling
the Pythagorean Theorem. The formula

∃x, y, z. x 6= 0 ∧ y 6= 0 ∧ z 6= 0 ∧ xxx + yyy = zzz

is the cubic analogue. For constant n, let xn represent n multiplications of x;


then every formula of the set

{∀x, y, z. x 6= 0 ∧ y 6= 0 ∧ z 6= 0 → xn + y n 6= z n : n > 2 ∧ n ∈ Z}

is TPA -valid, as claimed by Fermat’s Last Theorem and proved by Andrew


Wiles in 1994. 
3.3 Natural Numbers and Integers 75

Remark 3.6. Gödel’s first incompleteness theorem (see Bibliographic Re-


marks of Chapter 2) implies that Peano arithmetic TPA does not capture true
arithmetic: there exist closed ΣPA -formulae representing valid propositions of
number theory that are TPA -invalid. Gödel’s proof constructs such a formula:
it encodes the assertion that the formula itself cannot be proved. Now, either
this formula can be proved from the axioms of TPA (contradicting itself so that
TPA is inconsistent) or it cannot be proved (so that TPA is incomplete).

Satisfiability and validity in TPA is undecidable. Therefore, we turn to a


more restricted theory of arithmetic that does not allow multiplication.

3.3.2 Presburger Arithmetic

The theory of Presburger arithmetic TN has signature

ΣN : {0, 1, +, =} ,

where
• 0 and 1 are constants;
• + (addition) is a binary function;
• and = (equality) is a binary predicate.
Its axioms are a subset of the axioms of TPA :
1. ∀x. ¬(x + 1 = 0) (zero)
2. ∀x, y. x + 1 = y + 1 → x = y (successor)
3. F [0] ∧ (∀x. F [x] → F [x + 1]) → ∀x. F [x] (induction)
4. ∀x. x + 0 = x (plus zero)
5. ∀x, y. x + (y + 1) = (x + y) + 1 (plus successor)
Again, (induction) is an axiom schema standing for the set of axioms obtained
by replacing F with each ΣN -formula that has precisely one free variable.
The intended interpretations of TN have domain N and are such that
• αI [0] is 0N ∈ N;
• αI [1] is 1N ∈ N;
• αI [+] is +N , addition over N;
• αI [=] is =N , equality over N.
How does one reason about all integers, Z = {. . . , −2, −1, 0, 1, 2, . . .}? Such
formulae can be encoded as ΣN -formulae.

Example 3.7. Consider the formula

F0 : ∀w, x. ∃y, z. x + 2y − z − 13 > −3w + 5 ,

where − is meant to be interpreted as standard subtraction, and w, x, y, and


z are intended to range over Z. The formula
76 3 First-Order Theories

∀wp , wn , xp , xn . ∃yp , yn , zp , zn .
F1 :
(xp − xn ) + 2(yp − yn ) − (zp − zn ) − 13 > −3(wp − wn ) + 5

introduces two variables, vp and vn , for each variable v of F0 . While each of vp


and vn can only range over N, vp − vn should range over the integers. But how
is − interpreted? Moving negated terms to the other side of the inequality
eliminates −:

∀wp , wn , xp , xn . ∃yp , yn , zp , zn .
F2 :
xp + 2yp + zn + 3wp > xn + 2yn + zp + 13 + 3wn + 5 .

The final transformation eliminates constant coefficients and strict inequality:

∀wp , wn , xp , xn . ∃yp , yn , zp , zn . ∃u.


¬(u = 0) ∧
xp + yp + yp + zn + wp + wp + wp
F3 :
= xn + yn + yn + zp + wn + wn + wn + u
+1+1+1+1+1+1+1+1+1
+1+1+1+1+1+1+1+1+1 .


Presburger showed in 1929 that TN is decidable. Therefore, the “theory
of (negative and positive) integers” that we loosely constructed above is also
decidable via the syntactic rewriting of formulae into ΣN -formulae. Rather
than using this cumbersome rewriting, however, we next study a theory of
integers.

3.3.3 Theory of Integers

The theory of integers TZ has signature

ΣZ : {. . . , −2, −1, 0, 1, 2, . . . , −3·, −2·, 2·, 3·, . . . , +, −, =, >} ,

where
• . . . , −2, −1, 0, 1, 2, . . . are constants, intended to be assigned the obvious
corresponding values in the intended domain of integers Z;
• . . . , −3·, −2·, 2·, 3·, . . . are unary functions, intended to represent con-
stant coefficients (e.g., 2 · x, abbreviated 2x);
• + and − are binary functions, intended to represent the obvious corre-
sponding functions over Z;
• = and > are binary predicates, intended to represent the obvious corre-
sponding predicates over Z.
Since Example 3.7 shows that ΣZ -formulae can be reduced to ΣN -formulae, we
do not axiomatize TZ . TZ is merely a convenient representation for reasoning
about addition over all integers.
3.3 Natural Numbers and Integers 77

The intended interpretations of TZ have domain Z and are such that αI


assigns the obvious values, functions, and predicates to the constant, function,
and predicate symbols of ΣZ .
In Chapter 7, we discuss Cooper’s decision procedure for deciding TZ -
validity, while in Chapter 8, we discuss decision procedures for the quantifier-
free fragment of TZ . These procedures decide TN -validity as well: the following
example illustrates that ΣN -formulae can be reduced to ΣZ -formulae.

Example 3.8. To decide the TN -validity of

∀x. ∃y. x = y + 1 ,

decide the TZ -validity of

∀x. x ≥ 0 → ∃y. y ≥ 0 ∧ x = y + 1 ,

where t1 ≥ t2 expands to t1 = t2 ∨ t1 > t2 . 

We prove the validity of several ΣZ -formulae using the semantic argument


method. Our application of the semantic argument method in this context is
informal; it is intended to allow us to argue intuitively about validity until
Chapter 7.

Example 3.9. To prove that the ΣZ -formula

F : ∀x, y, z. x > z ∧ y ≥ 0 → x + y > z .

is TZ -valid, assume otherwise: there is a TZ -interpretation I such that I 6|= F :

1. I |6 = F assumption
2. I1 : I ⊳ {x 7→ v1 } ⊳ {y →
7 v2 } ⊳ {z 7→ v3 }
6|= x > z ∧ y ≥ 0 → x + y > z 1, ∀
3. I1 |= x > z ∧ y ≥ 0 2, →
4. I1 6|= x + y > z 2, →
5. I1 |= ¬(x + y > z) 4, ¬

We derive a contradiction by collecting formulae from lines 3 and 5, applying


the variant interpretation I1 , and querying the theory TZ : are there integers
v1 , v2 , v3 such that

v1 > v3 ∧ v2 ≥ 0 ∧ ¬(v1 + v2 > v3 ) ?

No, for v1 > v3 ∧ v2 ≥ 0 implies v1 + v2 > v3 . We summarize this reasoning


in TZ with the line

6. I1 |= ⊥ 3, 5, TZ

Therefore, F is TZ -valid. 
78 3 First-Order Theories

Arguing the validity of arithmetic formulae at the level of axioms is te-


dious. Therefore, unlike in Example 3.2 in which the theory-specific reasoning
is incorporated into the semantic argument method by applying and stating
specific axioms of TE , the semantic argument method for TZ handles the “log-
ical” aspects of the structured reasoning, while a separate informal argument
reasons about the theory-specific elements.

Example 3.10. To prove that the ΣZ -formula

F : ∀x, y. x > 0 ∧ (x = 2y ∨ x = 2y + 1) → x − y > 0

is TZ -valid, assume otherwise: there is a TZ -interpretation I such that I 6|= F :

1. I |6 = F assumption
2. I1 : I ⊳ {x 7→ v1 } ⊳ {y 7→ v2 }
6|= x > 0 ∧ (x = 2y ∨ x = 2y + 1) → x − y > 0
1, ∀
3. I1 |= x > 0 ∧ (x = 2y ∨ x = 2y + 1) 2, →
4. I1 |= x > 0 3, ∧
5. I1 |= x = 2y ∨ x = 2y + 1 3, ∧
6. I1 6|= x − y > 0 2, →
7. I1 |= ¬(x − y > 0) 6, ¬

There are two cases to consider. In the first case,

8a. I1 |= x = 2y 5, ∨

We collect the formulae of lines 4, 7, and 8a, apply the variant interpretation
I1 , and query the theory TZ : are there integers v1 , v2 such that

v1 > 0 ∧ v1 = 2v2 ∧ ¬(v1 − v2 > 0) ?

No, for substituting v1 = 2v2 throughout produces

2v2 > 0 ∧ ¬(2v2 − v2 > 0) ,

which simplifies to

v2 > 0 ∧ ¬(v2 > 0) ,

a contradiction. This reasoning is summarized by

9a. I1 |= ⊥ 4, 7, 8a, TZ

In the second case,

8b. I1 |= x = 2y + 1 5, ∨

Considering lines 4, 7, and 8b, are there integers v1 , v2 such that


3.4 Rationals and Reals 79

v1 > 0 ∧ v1 = 2v2 + 1 ∧ ¬(v1 − v2 > 0) ?

No, for substituting v1 = 2v2 + 1 throughout produces

2v2 + 1 > 0 ∧ ¬(2v2 + 1 − v2 > 0) ,

which simplifies to

2v2 + 1 > 0 ∧ ¬(v2 + 1 > 0) .

The first literal holds only when v2 > −1, while the second holds only when
v2 ≤ −1, a contradiction. This reasoning is summarized by

9b. I1 |= ⊥ 4, 7, 8b, TZ

Thus, F is TZ -valid. 

3.4 Rationals and Reals


Almost as old as arithmetic on integers is arithmetic on the rational numbers
Q and (not quite as old) on the real numbers R. In this section, we describe
two theories of real arithmetic. The latter theory can also be seen as a theory
of rational arithmetic.
The first theory is the theory of reals, involving addition and multiplica-
tion over R; it is also known as elementary algebra. The term “elementary”
refers to the restriction that variables range only over domain elements (as in
all first-order theories), not over sets or functions of domain elements. Most
junior high students are familiar with elementary algebra. As a first-order
theory, elementary algebra is of course more complex since formulae are con-
structed with logical connectives and quantifiers.
The second theory is the theory of addition over R or Q. Interpretations
with domains of R are indistinguishable from interpretations with domains of
Q, as we discuss below. For this reason, we call this second theory the theory
of rationals.

Example 3.11. Let us distinguish informally between the theories of integers


(with only addition), reals (with addition and multiplication), and rationals
(with only addition).
In the theory of integers,

F : ∃x. 2x = 7

is TZ -invalid. However, assigning to x the rational number 27 satisfies 2x = 7,


so F should be satisfiable in the theory of rationals. Moreover, 27 is also a real
number, so F should also be satisfiable in the theory of reals.
The theory of reals includes multiplication, allowing a formula like
80 3 First-Order Theories

G : ∃x. x2 = 2

to be expressed, where x2 abbreviates x · x. G should


√ be valid in the theory

of reals because assigning to x the real number 2 satisfies x2 = 2. 2 is
irrational. 

3.4.1 Theory of Reals

The theory of reals TR , or elementary algebra, has signature

ΣR : {0, 1, +, −, ·, =, ≥} ,

where
• 0 and 1 are constants;
• + (addition) and · (multiplication) are binary functions;
• − (negation) is a unary function;
• and = (equality) and ≥ (weak inequality) are binary predicates.
TR has the most complex axiomatization of the theories that we study. We
group axioms by their mathematical content.
First are the axioms of an abelian group. An abelian group is a structure
with additive identity 0, associative and commutative addition +, additive
inverse −, and equality =. The qualifier “abelian” simply means that addition
is commutative. The axioms are the following:
1. ∀x, y, z. (x + y) + z = x + (y + z) (+ associativity)
2. ∀x. x + 0 = x (+ identity)
3. ∀x. x + (−x) = 0 (+ inverse)
4. ∀x, y. x + y = y + x (+ commutativity)
The first three axioms are the axioms of a group.
Second are the additional axioms of a ring. A ring is an abelian group with
a multiplicative identity 1 and associative multiplication · that distributes over
addition. For convenience, we usually shorten x · y to xy.
1. ∀x, y, z. (xy)z = x(yz) (· associativity)
2. ∀x. x1 = x (· left identity)
3. ∀x. 1x = x (· right identity)
4. ∀x, y, z. x(y + z) = xy + xz (left distributivity)
5. ∀x, y, z. (x + y)z = xz + yz (right distributivity)
Both left and right identity and distributivity axioms are required since · is
not commutative (yet). It is made so in the next set of axioms.
Third are the additional axioms of a field. In a field, · is commutative;
the additive and multiplicative identities are different; and the multiplicative
inverse of a non-0 value exists (e.g., 12 is the multiplicative inverse of 2).
1. ∀x, y. xy = yx (· commutativity)
3.4 Rationals and Reals 81

2. 0 6= 1 (separate identities)
3. ∀x. x 6= 0 → ∃y. xy = 1 (· inverse)
The axiom (· commutativity) makes the (· right identity) and (right distributivity)
axioms redundant.
Fourth are the additional axioms characterizing ≥ as a total order.
1. ∀x, y. x ≥ y ∧ y ≥ x → x = y (antisymmetry)
2. ∀x, y, z. x ≥ y ∧ y ≥ z → x ≥ z (transitivity)
3. ∀x, y. x ≥ y ∨ y ≥ x (totality)
Finally are the additional axioms of a real closed field.
1. ∀x, y, z. x ≥ y → x + z ≥ y + z (+ ordered)
2. ∀x, y. x ≥ 0 ∧ y ≥ 0 → xy ≥ 0 (· ordered)
3. ∀x. ∃y. x = y 2 ∨ x = −y 2 (square-root)
4. for each odd integer n,

∀x. ∃y. y n + x1 y n−1 + · · · + xn−1 y + xn = 0 (at least one root)

We again abbreviate x1 , . . . , xn by x. By y n , we mean y multiplied by itself


n times: y · · · y. The axioms (+ ordered) and (· ordered) assert that every
TR -interpretation is an ordered field. The axiom (square-root) asserts the
existence of the square-root of every value. The final axiom schema states
that polynomials of odd degree have at least one root.
Putting all axioms together and pruning redundant axioms, we have:
1. ∀x, y. x ≥ y ∧ y ≥ x → x = y (antisymmetry)
2. ∀x, y, z. x ≥ y ∧ y ≥ z → x ≥ z (transitivity)
3. ∀x, y. x ≥ y ∨ y ≥ x (totality)

4. ∀x, y, z. (x + y) + z = x + (y + z) (+ associativity)
5. ∀x. x + 0 = x (+ identity)
6. ∀x. x + (−x) = 0 (+ inverse)
7. ∀x, y. x + y = y + x (+ commutativity)
8. ∀x, y, z. x ≥ y → x + z ≥ y + z (+ ordered)

9. ∀x, y, z. (xy)z = x(yz) (· associativity)


10. ∀x. 1x = x (· identity)
11. ∀x. x 6= 0 → ∃y. xy = 1 (· inverse)
12. ∀x, y. xy = yx (· commutativity)
13. ∀x, y. x ≥ 0 ∧ y ≥ 0 → xy ≥ 0 (· ordered)

14. ∀x, y, z. x(y + z) = xy + xz (distributivity)


15. 0 6= 1 (separate identities)

16. ∀x. ∃y. x = y 2 ∨ −x = y 2 (square-root)


82 3 First-Order Theories

17. for each odd integer n,

∀x. ∃y. y n + x1 y n−1 + · · · + xn−1 y + xn = 0 (at least one root)

Example 3.12. The method of quantifier elimination, which we study


in Chapter 7, eliminates quantifiers from a formula to produce an equivalent
quantifier-free formula. If a formula F contains free variables, then a quantifier
elimination procedure produces an equivalent quantifier-free formula F ′ such
that free(F ′ ) ⊆ free(F ). For example, when is the formula

F : ∃x. ax2 + bx + c = 0

satisfiable? That is, what are the conditions on a, b, and c such that a quadratic
polynomial has a real root? Recall that the discriminant must be nonnegative:

F ′ : b2 − 4ac ≥ 0 .

F ′ is the quantifier-free formula that is TR -equivalent to F . 

Tarski proved that TR was decidable in the 1930s, although the Second
World War prevented his publishing the result until 1956. Collins proposed the
more efficient technique of cylindrical algebraic decomposition (CAD) in
1975. Unfortunately, even the most efficient decision procedures for TR have
k|F |
prohibitively high time complexity: CAD runs in time proportionate to 22 ,
for some constant k and for |F | the length of ΣR -formula F .

3.4.2 Theory of Rationals

Given the high complexity of deciding TR -validity (and the high intellectual
complexity of Tarski’s and subsequent decision procedures for TR ), we turn to
a simpler theory without multiplication, the theory of rationals TQ . It has
signature

ΣQ : {0, 1, +, −, =, ≥} ,

where
• 0 and 1 are constants;
• + (addition) is a binary function;
• − (negation) is a unary function;
• and = (equality) and ≥ (weak inequality) are binary predicates.
Its axioms are the following:
1. ∀x, y. x ≥ y ∧ y ≥ x → x = y (antisymmetry)
2. ∀x, y, z. x ≥ y ∧ y ≥ z → x ≥ z (transitivity)
3. ∀x, y. x ≥ y ∨ y ≥ x (totality)
3.4 Rationals and Reals 83

4. ∀x, y, z. (x + y) + z = x + (y + z) (+ associativity)
5. ∀x. x + 0 = x (+ identity)
6. ∀x. x + (−x) = 0 (+ inverse)
7. ∀x, y. x + y = y + x (+ commutativity)
8. ∀x, y, z. x ≥ y → x + z ≥ y + z (+ ordered)

9. for each positive integer n,

∀x. nx = 0 → x = 0 (torsion-free)

10. for each positive integer n,

∀x. ∃y. x = ny (divisible)

By nx we mean x added to itself n times: x + · · · + x. The first eight axioms


are a subset of the axioms of TR . They state that every TQ -interpretation is
an ordered abelian group. ≥ is a total order by the first three axioms. Identity
0, addition +, additive inverse −, and equality = comprise an abelian group
by the next four axioms. The eighth axiom asserts that the abelian group is
ordered.
The axiom schema (torsion-free) states that only 0 can be added to itself to
produce 0. The name “torsion-free” comes from the following mathematical
context. In a group, the order of an element v is the integer n such that nv
is the identity element 0: nv = 0. If no such n exists, then the element v has
infinite order. A group is torsion-free if the only element with finite order is
the identity 0.
Finally, the axiom schema (divisible) asserts that all elements of the domain
DI of a TQ -interpretation I are divisible. That is, for every positive integer n,
every element v ∈ DI is the sum of n of some other element w ∈ DI .
Thus, every TQ -interpretation is a divisible torsion-free abelian group. In
particular, the rationals and reals with +, −, =, and ≥ are divisible torsion-
free abelian groups. As TQ -interpretations, the rationals and reals are ele-
mentarily equivalent: there does not exist a ΣQ -formula that distinguishes
between a real TQ -interpretation (an interpretation with domain R) and a
rational TQ -interpretation (an interpretation with domain Q).
This characteristic makes sense, intuitively:
√ no linear expression with only
integer coefficients can capture, say, 2 without also being satisfied by some
rational values. When junior high students solve linear algebra problems, they
apply addition, subtraction, multiplication, and division; but they do not take
roots.
In contrast, TR is a theory of reals: the Σ√
R -formula x·x =√ 2 is only satisfied
by TR -interpretations I in which αI [x] = − 2 or αI [x] = 2.

Example 3.13. Strict inequality is simple to express in TQ . Write

∀x, y. ∃z. x + y > z


84 3 First-Order Theories

as the ΣQ -formula

∀x, y. ∃z. ¬(x + y = z) ∧ x + y ≥ z .

The situation is similar for TR . 

Example 3.14. Rational coefficients are simple to express in TQ . Write


1 2
x+ y ≥4
2 3
as the ΣQ -formula 3x + 4y ≥ 24. 

For convenience, we sometimes write x ≤ y for y ≥ x.


In Chapter 7, we study a procedure for eliminating quantifiers in the theory
TQ . On closed formulae, this procedure decides validity. In Chapter 8, we study
a decision procedure for the quantifier-free fragment of TQ , which is efficiently
decidable.

3.5 Recursive Data Structures


The theory of recursive data structures (RDS) describes a set of data
structures that are ubiquitous in programming. The most basic RDS is a non-
recursive structure, like C’s struct, in which a single variable has multiple
fields. Truly recursive RDSs include lists, stacks, and binary trees.
The theory of recursive data structures TRDS formalizes the reasoning
about such structures. It builds on the theory of equality TE .

Theory of Lists

We first focus on the theory of LISP-like lists, Tcons , which has signature

Σcons : {cons, car, cdr, atom, =} ,

where
• cons is a binary function, called the constructor: cons(a, b) represents the
list constructed by concatenating a to b;
• car is a unary function, called the left projector: car(cons(a, b)) = a;
• cdr is a unary function, called the right projector: cdr(cons(a, b)) = b;
• atom is a unary predicate: atom(x) is true iff x is a single-element list;
• and = (equality) is a binary predicate.
car and cdr are historical names abbreviating “contents of address register”
and “contents of decrement register”, respectively. In the intended interpre-
tations, atoms are individual elements, while lists are multiple elements as-
sembled together via cons. For example, cons(a, cons(b, c)) is a list of three
3.5 Recursive Data Structures 85

elements, while a for which atom(a) holds is an atom. car and cdr are func-
tions for accessing parts of lists. For example, car(cons(a, cons(b, c))) returns
the head a of the list; cdr(cons(a, cons(b, c))) returns the tail cons(b, c) of the
list; and cdr(cdr(cons(a, cons(b, c)))) returns c.
The axioms of Tcons are the following:
1. the axioms of (reflexivity), (symmetry), and (transitivity) of TE
2. instantiations of the (function congruence) axiom schema for cons, car, and
cdr:
∀x1 , x2 , y1 , y2 . x1 = x2 ∧ y1 = y2 → cons(x1 , y1 ) = cons(x2 , y2 )

∀x, y. x = y → car(x) = car(y)

∀x, y. x = y → cdr(x) = cdr(y)

3. an instantiation of the (predicate congruence) axiom schema for atom:

∀x, y. x = y → (atom(x) ↔ atom(y))

4. ∀x, y. car(cons(x, y)) = x (left projection)


5. ∀x, y. cdr(cons(x, y)) = y (right projection)
6. ∀x. ¬atom(x) → cons(car(x), cdr(x)) = x (construction)
7. ∀x, y. ¬atom(cons(x, y)) (atom)
The first three sets of axioms define = to be a congruence relation for cons,
car, cdr, and atom. The axioms (left projection) and (right projection) define
the behavior of car and cdr on non-atom lists: car returns the first element of
a cons structure, and cdr returns the second element. However, they do not
specify the behavior of car and cdr on atoms. The (construction) axiom states
that the cons of car(x) and cdr(x) is x itself, unless x is an atom. In other
words, cons constructs structures, and car and cdr deconstructs them. Finally,
the axiom (atom) asserts that a term with root function symbol cons is not
an atom; it is a non-atomic list.
The congruence axioms for cons, car, and cdr assert an important property
about lists: two lists are equal iff their components are equal. The forward
direction — if two lists are equal, then their components are equal — is a
consequence of the (function congruence) axioms for car and cdr. The backward
direction is a consequence of the (function congruence) axiom for cons. This
relationship between two structures and their components is sometimes called
extensionality. We see it in arrays as well.

General Theory of RDS

Tcons is an instance of the general theory of recursive data structures TRDS .


Each RDS contributes the following to the signature:
• an n-ary constructor C;
86 3 First-Order Theories

• n projection functions π1C , . . . , πnC ;


• and one atom predicate atomC .
Associated with each RDS is an instantiation of the following axiom schema:
1. the axioms of (reflexivity), (symmetry), and (transitivity) of TE ;
2. instantiations of the (function congruence) axiom schema for constructor
C and set of projectors π1C , . . . , πnC ;
3. an instantiation of the (predicate congruence) axiom schema for atomC ;
4. for each i ∈ {1, . . . , n},

∀x1 , . . . , xn . πiC (C(x1 , . . . , xn )) = xi (projection)

5. ∀x. ¬atomC (x) → C(π1C (x), . . . , πnC (x)) = x (construction)


6. ∀x1 , . . . , xn . ¬atomC (C(x1 , . . . , xn )) (atom)
The axioms of Tcons are an instantiation of this schema. We subsequently
focus on Tcons for concreteness.

Theory of Acyclic Lists

A variation on this theory in which data structures are acyclic has been stud-
ied. Acyclicity makes sense for stacks, but not necessarily for lists and other
+
data structures. Consider the theory of acyclic LISP-like lists, Tcons . Its axioms
include those of Tcons and the following axiom schema:

∀x. car(x) 6= x
∀x. cdr(x) 6= x
∀x. car(car(x)) 6= x
∀x. car(cdr(x)) 6= x
∀x. cdr(car(x)) 6= x
...
+
Tcons is decidable, but Tcons is not. However, the quantifier-free fragments of
these theories are efficiently decidable.

Theory of Lists with Specified Atoms

The axioms of Tcons leave the behavior of car and cdr on atoms unspecified.
Adding the axiom

∀x. atom(x) → atom(car(x)) ∧ atom(cdr(x))


atom
to those of Tcons makes decidability of the resulting theory Tcons NP-complete.
3.6 Arrays 87

Theory of Lists with Equality

In Chapter 9, we describe a decision procedure for satisfiability in the


quantifier-free fragment of Tcons . The decision procedure is actually appli-
=
cable to the quantifier-free fragment of a more expressive theory, Tcons , which
is the combination of TE and Tcons and thus includes uninterpreted constants,
functions, and predicates. Thus, its signature is ΣE ∪ Σcons , and its axioms are
the union of the axioms of TE and Tcons . In Section 3.8 and Chapter 10, we
discuss more general combinations of theories.
=
Example 3.15. To prove that the Σcons -formula
car(a) = car(b) ∧ cdr(a) = cdr(b) ∧ ¬atom(a) ∧ ¬atom(b)
F :
→ f (a) = f (b)
= =
is Tcons -valid, assume otherwise: there exists a Tcons -interpretation I such that
I 6|= F :
1. I 6|= F assumption
2. I |= car(a) = car(b) 1, →, ∧
3. I |= cdr(a) = cdr(b) 1, →, ∧
4. I |= ¬atom(a) 1, →, ∧
5. I |= ¬atom(b) 1, →, ∧
6. I 6|= f (a) = f (b) 1, →
7. I |= cons(car(a), cdr(a)) = cons(car(b), cdr(b))
2, 3, (function congruence)
8. I |= cons(car(a), cdr(a)) = a 4, (construction)
9. I |= cons(car(b), cdr(b)) = b 5, (construction)
10. I |= a=b 7, 8, 9, (transitivity)
11. I |= f (a) = f (b) 10, (function congruence)
12. I |= ⊥ 6, 11
=
Therefore, F is Tcons -valid. 

3.6 Arrays
Arrays are another common data structure in programming. They are similar
to the uninterpreted functions of TE except that they can be modified. The
theory of arrays TA describes the basic characteristic of an array: if value v
is written to position i of array a, then subsequently reading from position i
of a should return v. Because logic is static, modified arrays are represented
functionally, as in functional programming.
The theory of arrays TA has signature

ΣA : {·[·], ·h· ⊳ ·i, =} ,


where
88 3 First-Order Theories

• a[i] (read) is a binary function: a[i] represents the value of array a at


position i;
• ahi ⊳ vi (write) is a ternary function: ahi ⊳ vi represents the modified array
a in which position i has value v;
• and = (equality) is a binary predicate.
·[·] and ·h·⊳·i really are binary and ternary functions, respectively, even though
we write them using a convenient notation. Writing a[i] as read(a, i) and ahi⊳ei
as write(a, i, e) emphasizes that they are functions.
Arrays are represented functionally. The term ahi ⊳ vi is an array that
is like a except that it has value v at position i. The term ahi ⊳ vi[j] (which
abbreviates (ahi⊳vi)[j]) is equal to the value of array ahi⊳vi at position j: it is
v if j = i and a[j] otherwise. ahi ⊳ vihj ⊳ wi (which abbreviates (ahi ⊳ vi)hj ⊳ wi)
is an array that is like a except that it differs at the positions i, where it has
value v, and j, where is has value w. Finally, the term ahi ⊳ vihj ⊳ wi[k] (which
abbreviates ((ahi ⊳ vi)hj ⊳ wi)[k]) has value w if k = j (even if k = i also),
value v if k = i and k 6= j, and value a[k] otherwise.
The axioms of TA are the following:
1. the axioms of (reflexivity), (symmetry), and (transitivity) of TE ;
2. ∀a, i, j. i = j → a[i] = a[j] (array congruence)
3. ∀a, v, i, j. i = j → ahi ⊳ vi[j] = v (read-over-write 1)
4. ∀a, v, i, j. i 6= j → ahi ⊳ vi[j] = a[j] (read-over-write 2)
The first set of axioms defines = as an equivalence relation. The next ax-
iom asserts that accessing an array with two equal expressions produces the
same element. The final two axioms capture the basic characteristic of arrays:
reading at an index that has been written produces the most recently written
value.
The equality predicate = is only defined for array “elements”. For example,

F : a[i] = e → ahi ⊳ ei = a

is not TA -valid, although our intuition suggests that it should be. The problem
is that the interaction between = and the read and write functions is not
captured in the axioms of TA . In other words, equality between arrays, not
just between elements, is undefined.
Instead of F , we write

F ′ : a[i] = e → ∀j. ahi ⊳ ei[j] = a[j] ,

which is TA -valid.

Example 3.16. To prove that

F ′ : a[i] = e → ∀j. ahi ⊳ ei[j] = a[j] ,

is TA -valid, assume otherwise: there is a TA -interpretation I such that I 6|= F ′ :


3.6 Arrays 89

1. I 6|= F′ assumption
2. I |= a[i] = e 1, →
3. I 6|= ∀j. ahi ⊳ ei[j] = a[j] 1, →
4. I1 : I ⊳ {j 7→ j} 6|= ahi ⊳ ei[j] = a[j] 3, ∀, for some j ∈ DI
5. I1 |= ahi ⊳ ei[j] 6= a[j] 4, ¬
6. I1 |= i=j 5, (read-over-write 2)
7. I1 |= a[i] = a[j] 6, (array congruence)
8. I1 |= ahi ⊳ ei[j] = e 6, (read-over-write 1)
9. I1 |= ahi ⊳ ei[j] = a[j] 2, 7, 8, (transitivity)
10. I1 |= ⊥ 4, 9

We derive line 6 from line 5 by using the contrapositive of (read-over-write


2). The contrapositive of F1 → F2 is ¬F2 → ¬F1 , and

F1 → F2 ⇔ ¬F2 → ¬F1 .

Lines 4 and 9 are contradictory, so that actually I |= F ′ . Thus, F ′ is TA -valid.




Unfortunately, TA -validity is undecidable. It is straightforward to encode


arbitrary formulae of FOL in TA by viewing functions as multi-dimensional
arrays (arrays whose elements are arrays). Therefore, a theory TA= in which
the behavior of = on arrays is axiomatized has been studied. Its quantifier-
free fragment is decidable. The signature of TA= is the same as that of TA . Its
axioms consists of those of TA and the following axiom:

∀a, b. (∀i. a[i] = b[i]) ↔ a = b (extensionality)

Example 3.17. To prove that

F : a[i] = e → ahi ⊳ ei = a

is TA= -valid, assume otherwise: there is a TA= -interpretation I such that I 6|=
F:
1. I 6|= F assumption
2. I |= a[i] = e 1, →
3. I 6|= ahi ⊳ ei = a 1, →
4. I |= 6 a
ahi ⊳ ei = 3, ¬
5. I |= ¬(∀j. ahi ⊳ ei[j] = a[j]) 4, (extensionality)
6. I 6|= ∀j. ahi ⊳ ei[j] = a[j] 5, ¬

The rest of the proof then proceeds as in Example 3.16. 

We present a decision procedure for the quantifier-free fragment of TA in


Chapter 9. In Chapter 11, we present a decision procedure for satisfiability in a
fragment of TA that is more expressive than even the quantifier-free fragment
of TA= .
90 3 First-Order Theories

Table 3.1. Decidability of theories and quantifier-free fragments


Theory Description Full QFF
TE equality no yes
TPA Peano arithmetic no no
TN Presburger arithmetic yes yes
TZ linear integers yes yes
TR reals (with ·) yes yes
TQ rationals (without ·) yes yes
TRDS recursive data structures no yes
+
TRDS acyclic recursive data structures yes yes
TA arrays no yes
TA= arrays with extensionality no yes

Table 3.2. Complexities for decidable theories


Theory Complexity
PL NP-complete„ «
2kn
“ n”
TN , TZ Ω 22 , O 22
“ kn ”
TR O 22
“ kn ”
TQ Ω (2n ), O 22
+
TRDS not elementary recursive


3.7 Survey of Decidability and Complexity

We survey the known decidability and complexity results of the theories of


this chapter.
Table 3.1 summarizes the decidability results for the first-order theories.
The quantifier-free fragment of each theory that we study in Part II of this
book is decidable.
Table 3.2 summarizes the complexity results for satisfiability in PL and the
decidable first-order theories. For all complexities, n is the size of the input
formula, and k is some positive integer. A decision problem is not elementary
recursive if its running time cannot be bounded by a fixed-height stack of
exponentials. Only decision procedures for satisfiability in PL scale well to
large problems.
Table 3.3 summarizes the complexity results for the quantifier-free frag-
ments. As satisfiability in PL is already NP-complete, we consider only con-
junctive formulae, which are just conjunctions of literals. For example, sat-
isfiability of propositional conjunctive formulae is decidable in linear time: if
both P and ¬P appear in F , for some propositional variable P , then F is un-
satisfiable; otherwise, F is satisfiable. For quantifier-free (but not conjunctive)
formulae, all complexities except that for TR are NP-complete. Satisfiability in
+
the quantifier-free fragments of TE , TQ , TRDS , and TRDS is efficiently decidable.
3.8 Combination Theories 91

Table 3.3. Complexities for quantifier-free, conjunctive fragments of theories


Theory Complexity Theory Complexity
PL Θ(n) TE “ log
O(n n)

kn
TN , TZ NP-complete TR O 22
+
TQ PTIME TRDS Θ(n)
TRDS O(n log n) TA NP-complete

3.8 Combination Theories


In practice, the formulae that we want to check for satisfiability or validity
span multiple theories. For example, in program verification, one might want
to prove a property about an array of integers or a list of reals. We will see
many such examples in Chapter 5. Thus, decision procedures for fragments of
first-order theories are essentially useless unless they can be combined.
What does every signature of every theory presented so far have in com-
mon? They all have equality, =. Nelson and Oppen made equality the focal
predicate in their general method for combining quantifier-free fragments of
first-order theories (with some restrictions). Given two theories T1 and T2 such
that Σ1 ∩ Σ2 = {=} — only = is shared — the combined theory T1 ∪ T2 has
signature Σ1 ∪ Σ2 and axioms A1 ∪ A2 . Nelson and Oppen showed that if
• satisfiability in the quantifier-free fragment of T1 is decidable;
• satisfiability in the quantifier-free fragment of T2 is decidable;
• and certain technical requirements are met,
then satisfiability in the quantifier-free fragment of T1 ∪ T2 is decidable. Fur-
thermore, if the decision procedures for T1 and T2 are in P (in NP), then the
combined decision procedure for T1 ∪ T2 is in P (in NP).
Chapter 10 studies the Nelson-Oppen combination of decision procedures.

Example 3.18. To prove that the (ΣA= ∪ ΣZ )-formula

F : a = b → a[i] ≥ b[i]

is (TA= ∪ TZ )-valid we assume otherwise: there is a (TA= ∪ TZ )-interpretation I


such that I 6|= F :

1. I 6|= F assumption
2. I |= a=b 1, →
3. I 6|= a[i] ≥ b[i] 1, →
4. I |= ¬(a[i] ≥ b[i]) 3, ¬
5. I |= a[i] = b[i] 2, TA= (extensionality)
6. I |= ⊥ 4, 5, TA= ∪ TZ

Line 6 summarizes the argument that it is impossible for a TZ -interpretation


to satisfy both a[i] = b[i] and ¬(a[i] ≥ b[i]). 
92 3 First-Order Theories

Example 3.19. The (ΣE ∪ ΣZ )-formula

1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2)

is (TE ∪ TZ )-unsatisfiable, for x cannot be either 1 or 2 without violating


(function congruence). Seen as a (ΣE ∪ ΣQ )-formula, it is (TE ∪ TQ )-satisfiable:
choose x = 23 .
The (ΣE ∪ ΣQ )-formula

f (f (x) − f (y)) 6= f (z) ∧ x ≤ y ∧ y + z ≤ x ∧ 0 ≤ z

is (TE ∪TQ )-unsatisfiable. In particular, the final three literals imply that z = 0
and x = y, so that f (x) = f (y). But then from the first literal, f (0) 6= f (0)
since both f (x) − f (y) and z equal 0.
Finally, the (ΣE ∪ ΣZ )-formula

1 ≤ x ∧ x ≤ 3 ∧ f (x) 6= f (1) ∧ f (x) 6= f (3) ∧ f (1) 6= f (2)

is (TE ∪TZ )-satisfiable since x can be 2 without violating (function congruence).




3.9 Summary
Important data types in software and hardware models include integers; ratio-
nals; recursive data structures like records, lists, stacks, and trees; and arrays.
This chapter introduces first-order theories that formalize these data types.
It covers:
• First-order theories. Formalizations of structures and operations into
first-order logic: signatures, axioms. Fragments of theories, in particular
quantifier-free fragments. Interpretations, satisfiability, validity.
• Specific theories:
– Equality defines the binary predicate = as a congruence relation. Sat-
isfiability in the quantifier-free fragment is efficiently decidable, and
the decision procedure is the basis for decision procedures for data
structures (see Chapter 9).
– Integer arithmetic. Satisfiability in integer arithmetic without multipli-
cation is decidable.
– Rational and real arithmetic. Satisfiability in real arithmetic with mul-
tiplication is decidable with high complexity. Satisfiability in ratio-
nal arithmetic without multiplication is efficiently decidable. Rational
arithmetic without multiplication is indistinguishable from real arith-
metic without multiplication.
– Recursive data structures include records, lists, stacks, and queues.
Satisfiability in the quantifier-free fragment is efficiently decidable.
Exercises 93

– Arrays can be read and written. Satisfiability in the quantifier-free


fragment is decidable. Chapter 11 studies a larger fragment in which
satisfiability is still decidable.
• Combination theories. How can decision procedures for multiple theories
be combined to decide satisfiability in combination theories?
Studying first-order theories is important for two reasons. First, theories
formalize into FOL interesting structures and operations on the structures.
Second, satisfiability in some theories or fragments of theories is decidable and
thus can be reasoned about algorithmically, whereas satisfiability in general
FOL is undecidable. Part II of this book focuses on such theories and frag-
ments that are useful for program analysis. Chapters 5 and 6 provide many
examples of formulae from combinations of these theories in the context of
program verification.

Bibliographic Remarks
The undecidability of validity in FOL [13] motivated the subsequent study of
first-order theories and fragments. In 1929, Presburger proved that satisfiabil-
ity in arithmetic without multiplication is decidable [73]. Tarski showed in the
1930s that real arithmetic is decidable even with multiplication, although the
Second World War delayed the publication of this result [90]. The axiomati-
zation of recursive data structures that we study is from work by Nelson and
Oppen [66]. Oppen studied a variation in which structures are acyclic [69].
The axiomatization of arrays, in particular the read-over-write axioms, is due
to McCarthy [59]. The Nelson-Oppen combination method is based on work
by Nelson and Oppen in the late 1970s and early 1980s [65].

Exercises
3.1 (Semantic argument in TE ). Use the semantic method to argue the
validity of the following ΣE -formulae, or identify a counterexample (a falsifying
TE -interpretation).
(a) f (x, y) = f (y, x) → f (a, y) = f (y, a)
(b) f (g(x)) = g(f (x)) ∧ f (g(f (y))) = x ∧ f (y) = x → g(f (x)) = x
(c) f (f (f (a))) = f (f (a)) ∧ f (f (f (f (a)))) = a → f (a) = a
(d) f (f (f (a))) = f (a) ∧ f (f (a)) = a → f (a) = a
(e) p(x) ∧ f (f (x)) = x ∧ f (f (f (x))) = x → p(f (x))

3.2 (Semantic argument in TZ ). Use the semantic method to argue the


validity of the following ΣZ -formulae, or identify a counterexample (a falsifying
TZ -interpretation).
(a) x ≤ y ∧ z = x + 1 → z ≤ y
94 3 First-Order Theories

(b) x ≤ y ∧ z = x − 1 → z≤y
(c) 3x = 2 → x ≤ 0
(d) 1 ≤ x ∧ x ≤ 2 → x=1 ∨ x=2
(e) 1 ≤ x ∧ x + y ≤ 3 ∧ 1≤y → x=1 ∨ x=2
(f) 0 ≤ x ∧ 0 ≤ x + y ∧ x + y ≤ 1 ∧ (y ≤ −2 ∨ 2 ≤ y) → 0 ≤ −1

3.3 (Semantic argument in TQ ). Use the semantic method to argue the va-
lidity of the following ΣQ -formulae, or identify a counterexample (a falsifying
TQ -interpretation).
(a) 3x = 2 → x ≤ 0
(b) 0 ≤ x + 2y ∧ 2x + y ≤ 1
(c) 1 ≤ x ∧ x ≤ 2 → x = 1 ∨ x = 2

3.4 (Semantic argument in Tcons ). Use the semantic method to argue the
validity of the following Σcons -formulae, or identify a counterexample (a falsi-
fying Tcons -interpretation).
(a) car(x) = y ∧ cdr(x) = z → x = cons(y, z)
(b) ¬atom(x) ∧ car(x) = y ∧ cdr(x) = z → x = cons(y, z)

3.5 (Semantic argument in TA ). Use the semantic method to argue the va-
lidity of the following ΣA -formulae, or identify a counterexample (a falsifying
TA -interpretation).
(a) ahi ⊳ ei[j] = e → i = j
(b) ahi ⊳ ei[j] = e → a[j] = e
(c) ahi ⊳ ei[j] = e → i = j ∨ a[j] = e
(d) ahi ⊳ eihj ⊳ f i[k] = g ∧ j 6= k ∧ i = j → a[k] = g

3.6 (Semantic argument in combinations). For each of the following for-


mulae, identify the combination of theories in which it lies. To avoid ambiguity,
prefer TZ to TQ . Then argue its validity in that combination of theories using
the semantic method, or identify a counterexample.
(a) 1 ≤ x ∧ x ≤ 2 ∧ cons(1, y) 6= cons(x, y) → cons(2, y) = cons(x, y)
(b) a[i] ≥ 1 ∧ a[i] + x ≤ 2 ∧ x > 0 ∧ x = i → ahx ⊳ 2i[i] = 1
(c) 1 ≤ x ∧ x ≤ 2 ∧ cons(1, y) 6= cons(x, y) → cons(2, y) = cons(x, y)
(d) x + y = z ∧ f (z) = z → f (x + y) = z
(e) g(x + y, z) = f (g(x, y)) ∧ x + z = y ∧ z ≥ 0 ∧ x ≥ y ∧ g(x, x) = z
→ f (z) = g(2x, 0)

3.7 (Semantic argument in combinations). Redo Exercise 3.6, preferring


TQ to TZ .
4
Induction

Even though this proposition may have an infinite number of cases,


I shall give a very short proof of it assuming two lemmas. The first,
which is self evident, is that the proposition is valid for the second row.
The second is that if the proposition is valid for any row then it must
necessarily be valid for the following row.
— Blaise Pascal
Traité du Triangle Arithmetique, c. 1654
This chapter discusses induction, a classic proof technique for proving
first-order theorems with universal quantifiers. Section 4.1 begins with step-
wise induction, which may be familiar to the reader from earlier education.
Section 4.2 then introduces complete induction in the context of arithmetic.
Complete induction is theoretically equivalent in power to stepwise induction
but sometimes produces more concise proofs. Section 4.3 generalizes com-
plete induction to well-founded induction in the context of arithmetic and
recursive data structures. Finally, Section 4.4 covers a form of well-founded
induction over logical formulae called structural induction. It is useful for
reasoning about correctness of decision procedures and properties of logical
theories and their interpretations.
We apply induction in various ways throughout the book. Structural induc-
tion is applied in proofs. Additionally, induction is the basis for the program
verification methods of Chapter 5.

4.1 Stepwise Induction


We review stepwise induction for arithmetic and then show that it extends
naturally to other theories, such as the theory of lists Tcons .
96 4 Induction

Arithmetic

Recall from Chapter 3 that the theory of Peano arithmetic TPA formalizes
arithmetic over the natural numbers. Its axioms include an instance of the
(induction) axiom schema

F [0] ∧ (∀n. F [n] → F [n + 1]) → ∀x. F [x]

for each ΣPA -formula F [x] with only one free variable x. This axiom schema
says that to prove ∀x. F [x] — that is, F [x] is TPA -valid for all natural numbers
x — it is sufficient to do the following:
• For the base case, prove that F [0] is TPA -valid.
• For the inductive step, assume as the inductive hypothesis that for
some arbitrary natural number n, F [n] is TPA -valid. Then prove that F [n+
1] is TPA -valid under this assumption.
These two steps comprise the stepwise induction principle for Peano (and
Presburger) arithmetic.
+
Example 4.1. Consider the theory TPA obtained from augmenting TPA with
the following axioms:
• ∀x. x0 = 1 (exp. zero)
• ∀x, y. xy+1 = xy · x (exp. successor)
• ∀x, z. exp 3 (x, 0, z) = z (exp 3 zero)
• ∀x, y, z. exp 3 (x, y + 1, z) = exp 3 (x, y, x · z) (exp 3 successor)
The first two axioms define exponentiation xy , while the latter two axioms
define a ternary function exp 3 (x, y, z).
+
Let us prove that the following formula is TPA -valid:

∀x, y. exp 3 (x, y, 1) = xy . (4.1)

We need to choose either x or y as the induction variable. Considering the


exp 3 axioms, it appears that y is the smarter choice: (exp 3 successor) defines
exp 3 recursively by considering the predecessor of y + 1.
Therefore, we prove by stepwise induction on y that

F [y] : ∀x. exp 3 (x, y, 1) = xy .

For the base case, we prove

F [0] : ∀x. exp 3 (x, 0, 1) = x0 .

But x0 = 1 by (exp. zero), and exp 3 (x, 0, 1) = 1 by (exp 3 zero).


Assume as the inductive hypothesis that for arbitrary natural number n,

F [n] : ∀x. exp 3 (x, n, 1) = xn . (4.2)


4.1 Stepwise Induction 97

We want to prove that


F [n + 1] : ∀x. exp 3 (x, n + 1, 1) = xn+1 . (4.3)
By (exp 3 successor), we have
exp 3 (x, n + 1, 1) = exp 3 (x, n, x · 1) .
Unfortunately, the inductive hypothesis (4.2) does not apply to the left side
of the equation since n 6= n + 1, and it does not apply to the right side of
the equation because the third argument is x · 1 rather than 1. Continuing to
apply axioms is unlikely to bring us closer to the proof. Thus, we have failed
to prove the property.
What went wrong in the proof? Did we choose the wrong induction vari-
able? Would x have worked better? In fact, it is often the case that the prop-
erty must be strengthened to allow the induction to go through. A stronger
theorem provides a stronger inductive hypothesis.
Let us strengthen the property to be proved to
∀x, y, z. exp 3 (x, y, z) = xy · z . (4.4)
It clearly implies the desired property (4.1): just choose z = 1.
Again, we must choose the induction variable. Based on (exp 3 successor),
we use y again. Thus, we prove by stepwise induction on y that
F [y] : ∀x, z. exp 3 (x, y, z) = xy · z .
For the base case, we prove
F [0] : ∀x, z. exp 3 (x, 0, z) = x0 · z .
From (exp 3 zero), we have exp 3 (x, 0, z) = z, while from (exp. zero), we have
x0 · z = 1 · z = z.
Assume as the inductive hypothesis that
F [n] : ∀x, z. exp 3 (x, n, z) = xn · z (4.5)
for arbitrary natural number n. We want to prove that
F [n + 1] : ∀x, z ′ . exp 3 (x, n + 1, z ′ ) = xn+1 · z ′ , (4.6)
where we have renamed z to z ′ for convenience. We have
exp 3 (x, n + 1, z ′ ) = exp 3 (x, n, x · z ′ ) (exp 3 successor)
= xn · (x · z ′ ) IH (4.5), z 7→ x · z ′
= xn+1 · z ′ (exp. successor)
finishing the proof. The annotation z 7→ x·z ′ indicates that x·z ′ is substituted
for z when applying the inductive hypothesis (4.5). This substitution is jus-
tified because z is universally quantified. Renaming z to z ′ avoids confusion
during the application of the inductive hypothesis in the second line. 
98 4 Induction

Lists

We can define stepwise induction over recursive data structures such as lists
(see Chapters 3 and 9). Consider the theory of lists Tcons . Stepwise induction
in Tcons is defined according to the following schema

(∀ atom u. F [u]) ∧ (∀u, v. F [v] → F [cons(u, v)]) → ∀x. F [x]

for Σcons -formulae F [x] with only one free variable x. The notation ∀ atom u. F [u]
abbreviates ∀u. atom(u) → F [u]. In other words, to prove ∀x. F [x] — that is,
F [x] is Tcons -valid for all lists x — it is sufficient to do the following:
• For the base case, prove that F [u] is Tcons -valid for an arbitrary atom u.
• For the inductive step, assume as the inductive hypothesis that for
some arbitrary list v, F [v] is valid. Then prove that for arbitrary list u,
F [cons(u, v)] is Tcons -valid under this assumption.
These steps comprise the stepwise induction principle for lists.
+
Example 4.2. Consider the theory Tcons obtained from augmenting Tcons with
the following axioms:
• ∀ atom u. ∀v. concat (u, v) = cons(u, v) (concat. atom)
• ∀u, v, x. concat (cons(u, v), x) = cons(u, concat (v, x)) (concat. list)
• ∀ atom u. rvs(u) = u (reverse atom)
• ∀x, y. rvs(concat (x, y)) = concat (rvs(y), rvs(x)) (reverse list)
• ∀ atom u. flat(u) (flat atom)
• ∀u, v. flat (cons(u, v)) ↔ atom(u) ∧ flat (v) (flat list)
The first two axioms define the concat function, which concatenates two lists
together. For example,

concat(cons(a, b), cons(b, cons(c, d)))


= cons(a, cons(b, cons(b, cons(c, cons(d))))) .

The next two axioms define the rvs function, which reverses a list. For exam-
ple,

rvs(cons(a, cons(b, c))) = cons(c, cons(b, a)) .

Note, however, that rvs is undefined on lists like cons(cons(a, b), c), for
cons(cons(a, b), c) cannot result from concatenating two lists together. There-
fore, the final two axioms define the flat predicate, which evaluates to ⊤ on
a list iff every element is an atom. For example, cons(a, cons(b, c)) is flat , but
cons(cons(a, b), c) is not because the first element of the list is itself a list.
+
Let us prove that the following formula is Tcons -valid:

∀x. flat (x) → rvs(rvs(x)) = x . (4.7)

For example,
4.2 Complete Induction 99

rvs(rvs(cons(a, cons(b, c)))) = rvs(cons(c, cons(b, a)))


= cons(a, cons(b, c))

We prove by stepwise induction on x that

F [x] : flat (x) → rvs(rvs(x)) = x .

For the base case, we consider arbitrary atom u and prove

F [u] : flat (u) → rvs(rvs(u)) = u .

But rvs(rvs(u)) = u follows from two applications of (reverse atom).


Assume as the inductive hypothesis that for arbitrary list v,

F [v] : flat(v) → rvs(rvs(v)) = v . (4.8)

We want to prove that for arbitrary list u,

F [cons(u, v)] : flat (cons(u, v)) → rvs(rvs(cons(u, v))) = cons(u, v) . (4.9)

Consider two cases: either atom(u) or ¬atom(u).


If ¬atom(u), then

flat(cons(u, v)) ⇔ atom(u) ∧ flat(v) ⇔ ⊥ ,

by (flat list) and assumption. Therefore, (4.9) holds since its antecedent is ⊥.
If atom(u), then we have that

flat(cons(u, v)) ⇔ atom(u) ∧ flat(v) ⇔ flat(v)

by (flat list). Furthermore,

rvs(rvs(cons(u, v)))
= rvs(rvs(concat (u, v))) (concat. atom)
= rvs(concat (rvs(v), rvs(u))) (reverse list)
= concat(rvs(rvs(u)), rvs(rvs(v))) (reverse list)
= concat(u, rvs(rvs(v))) (reverse atom)
= concat(u, v) IH (4.8), since flat (v)
= cons(u, v) (concat. atom)

which finishes the proof. 

4.2 Complete Induction


Complete induction is a form of induction that sometimes yields more
concise proofs. For the theory of arithmetic TPA it is defined according to the
following schema
100 4 Induction

(∀n. (∀n′ . n′ < n → F [n′ ]) → F [n]) → ∀x. F [x]

for ΣPA -formulae F [x] with only one free variable x. In other words, to prove
∀x. F [x] — that is, F [x] is TPA -valid for all natural numbers x — it is sufficient
to follow the complete induction principle:
• Assume as the inductive hypothesis that for arbitrary natural number
n and for every natural number n′ such that n′ < n, F [n′ ] is TPA -valid.
Then prove that F [n] is TPA -valid.
It appears that we are missing a base case. In practice, a case analysis usually
requires at least one base case. In other words, the base case is implicit in
the structure of complete induction. For example, for n = 0, the inductive
hypothesis does not provide any information — there does not exist a natural
number n′ < 0. Hence, F [0] must be shown separately without assistance from
the inductive hypothesis.

Example 4.3. Consider another augmented version of Peano arithmetic, TPA ,
that defines integer division. It has the usual axioms of TPA plus the following:
• ∀x, y. x<y → quot(x, y) = 0 (quotient less)
• ∀x, y. y>0 → quot(x + y, y) = quot (x, y) + 1 (quotient successor)
• ∀x, y. x<y → rem(x, y) = x (remainder less)
• ∀x, y. y>0 → rem(x + y, y) = rem(x, y) (remainder successor)
These axioms define functions for computing integer quotients quot(x, y) and
remainders rem(x, y). For example, quot (5, 3) = 1 and rem(5, 3) = 2. We
prove two properties, which the reader may recall from grade school, about
these functions. First, we prove that the remainder is always less than the
divisor:

∀x, y. y > 0 → rem(x, y) < y . (4.10)

Then we prove that

∀x, y. y > 0 → x = y · quot(x, y) + rem(x, y) . (4.11)

For property (4.10), (remainder successor) suggests that we apply complete


induction on x to prove

F [x] : ∀y. y > 0 → rem(x, y) < y . (4.12)

Thus, for the inductive hypothesis, assume that for arbitrary natural number
x,

∀x′ . x′ < x → ∀y. y > 0 → rem(x′ , y) < y . (4.13)


| {z }
F [x′ ]

Let y be an arbitrary positive natural number. Consider two cases: either


x < y or ¬(x < y).
4.2 Complete Induction 101

If x < y, then
rem(x, y) = x (remainder less)
<y by assumption x < y
as desired.
If ¬(x < y), then there is a natural number n, n < x, such that x = n + y.
Compute
rem(x, y) = rem(n + y, y) x=n+y
= rem(n, y) (remainder successor)
<y IH (4.13), x′ 7→ n, since n < x
finishing the proof of this property.
For property (4.11), (remainder successor) again suggests that we apply
complete induction on x to prove
G[x] : ∀y. y > 0 → x = y · quot (x, y) + rem(x, y) . (4.14)
Thus, for the inductive hypothesis, assume that for arbitrary natural number
x,
∀x′ . x′ < x → ∀y. y > 0 → x′ = y · quot(x′ , y) + rem(x′ , y) . (4.15)
| {z }
G[x′ ]

Let y be an arbitrary positive natural number. Consider two cases: either


x < y or ¬(x < y).
If x < y, then
y·quot (x, y) + rem(x, y)
= y · 0 + rem(x, y) (quotient less)
=x (remainder less)
as desired.
If ¬(x < y), then there is a natural number n < x such that x = n + y.
Compute
y·quot (x, y) + rem(x, y)
= y · quot(n + y, y) + rem(n + y, y) x=n+y
= y · (quot (n, y) + 1) + rem(n + y, y) (quotient successor)
= y · (quot (n, y) + 1) + rem(n, y) (remainder successor)
= (y · quot (n, y) + rem(n, y)) + y
=n+y IH (4.15), x′ 7→ n, since n < x
=x x=n+y
finishing the proof of this property. 
In the next section, we generalize complete induction so that we can apply
it in other theories.
102 4 Induction

4.3 Well-Founded Induction


A binary predicate ≺ over a set S is a well-founded relation iff there does
not exist an infinite sequence s1 , s2 , s3 , . . . of elements of S such that each
successive element is less than its predecessor:

s1 ≻ s2 ≻ s3 ≻ · · · ,

where s ≺ t iff t ≻ s. In other words, each sequence of elements of S that


decreases according to ≺ is finite.

Example 4.4. The relation < is well-founded over the natural numbers. Any
sequence of natural numbers decreasing according to < is finite:

1023 > 39 > 30 > 29 > 8 > 3 > 0 .

However, the relation < is not well-founded over the rationals. Consider the
infinite decreasing sequence
1 1 1
1> > > > ··· ,
2 3 4
1
that is, the sequence si = i for i ≥ 0. 
PA
Example 4.5. Consider the theory Tcons , which includes the axioms of Tcons
and TPA and the following axioms:
• ∀ atom u, v. u c v ↔ u = v (c (1))
• ∀ atom u. ∀v. ¬atom(v) → ¬(v c u) (c (2))
• ∀ atom u. ∀v, w. u c cons(v, w) ↔ u = v ∨ u c w (c (3))
• ∀u1 , v1 , u2 , v2 . cons(u1 , v1 ) c cons(u2 , v2 )
↔ (u1 = u2 ∧ v1 c v2 ) ∨ cons(u1 , v1 ) c v2 (c (4))
• ∀x, y. x ≺c y ↔ x c y ∧ x 6= y (≺c )
• ∀ atom u. |u| = 1 (length atom)
• ∀u, v. |cons(u, v)| = 1 + |v| (length list)
The first four axioms define the sublist relation c . x c y holds iff x is a
(not necessarily strict) sublist of y. The next axiom defines the strict sublist
relation: x ≺c y iff x is a strict sublist of y. The final two axioms define the
length function, which returns the number of elements in a list.
The strict sublist relation ≺c is well-founded on the set of all lists. One can
prove that the number of sublists of a list is finite; and that its set of strict
sublists is a superset of the set of strict sublists of any of its sublists. Hence,
there cannot be an infinite sequence of lists descending according to ≺c . 

Well-founded induction generalizes complete induction to arbitrary


theory T by allowing the use of any binary predicate ≺ that is well-founded
in the domain of every T -interpretation. It is defined in the theory T with
well-founded relation ≺ by the following schema
4.3 Well-Founded Induction 103

(∀n. (∀n′ . n′ ≺ n → F [n′ ]) → F [n]) → ∀x. F [x]

for Σ-formulae F [x] with only one free variable x. In other words, to prove the
T -validity of ∀x. F [x], it is sufficient to follow the well-founded induction
principle:
• Assume as the inductive hypothesis that for arbitrary element n and
for every element n′ such that n′ ≺ n, F [n′ ] is T -valid. Then prove that
F [n] is T -valid.
Complete induction in TPA of Section 4.2 is a specific instance of well-founded
induction that uses the well-founded relation <.
A theory of lists augmented with the first five axioms of Example 4.5 has
well-founded induction in which the well-founded relation is ≺c .

Example 4.6. Consider proving the trivial property

∀x. |x| ≥ 1 (4.16)


PA
in Tcons , which was defined in Example 4.5. We apply well-founded induction
on x using the well-founded relation ≺c to prove

F [x] : |x| ≥ 1 . (4.17)

For the inductive hypothesis, assume that

∀x′ . x′ ≺c x → |x′ | ≥ 1 . (4.18)


| {z }
F [x′ ]

Consider two cases: either atom(x) or ¬atom(x).


In the first case |x| = 1 ≥ 1 by (length atom).
In the second case x is not an atom, so x = cons(u, v) for some u, v by the
(construction) axiom. Then

|x| = |cons(u, v)|


= 1 + |v| (length list)
≥1+1 IH (4.18), x′ 7→ v, since v ≺c cons(u, v)
≥1

as desired. Exercise 4.2 asks the reader to prove formally that ∀u, v. v ≺c
cons(u, v).
This property is also easily proved using stepwise induction. 

In applying well-founded induction, we need not restrict ourselves to the


intended domain D of a theory T . A useful class of well-founded relations are
lexicographic relations. From a finite set of pairs of sets and well-founded
relations (S1 , ≺1 ), . . . , (Sm , ≺m ), construct the set
104 4 Induction

S = S1 × · · · × Sm ,

and define the relation ≺:


 
m
_ i−1
^
(s1 , . . . , sm ) ≺ (t1 , . . . , tm ) ⇔ si ≺i ti ∧ s j = tj 
i=1 j=1

for si , ti ∈ Si . That is, for elements s : (s1 , . . . , sm ), t : (t1 , . . . , tm ) of S, s ≺ t


iff at some position i, si ≺i ti , and for all preceding positions j, sj = tj .
For convenience, we abbreviate (s1 , . . . , sm ) by s and thus write, for example,
s ≺ t.
Lexicographic well-founded induction has the form

(∀n. (∀n′ . n′ ≺ n → F [n′ ]) → F [n]) → ∀x. F [x]

for Σ-formula F [x] with only free variables x = {x1 , . . . , xm }. Notice that the
form of this induction principle is the same as well-founded induction. The
only difference is that we are considering tuples n = (n1 , . . . , nm ) rather than
single elements n.

Example 4.7. Consider the following puzzle. You have a bag of red, yellow,
and blue chips. If only one chip remains in the bag, you take it out. Otherwise,
you remove two chips at random:
1. If one of the two removed chips is red, you do not put any chips in the
bag.
2. If both of the removed chips are yellow, you put one yellow chip and five
blue chips in the bag.
3. If one of the chips is blue and the other is not red, you put ten red chips
in the bag.
These cases cover all possibilities for the two chips. Does this process always
halt?
We prove the following property: for all bags of chips, you can execute
the choose-and-replace process only a finite number of times before the bag is
empty. Let the triple

(#yellow, #blue, #red)

represent the current state of the bag. Such a tuple is in the set of triples of
natural numbers S : N3 . Let <3 be the natural lexicographic extension of <
to such triples. For example,

(11, 13, 3) 6<3 (11, 9, 104) but (11, 9, 104) <3 (11, 13, 3) .

We prove that for arbitrary bag state (y, b, r) represented by the triple of
natural numbers y, b, and r, only a finite number of steps remain.
4.3 Well-Founded Induction 105

For the base cases, consider when the bag has no chips (state (0, 0, 0)) or
only one chip (one of states (1, 0, 0), (0, 1, 0), or (0, 0, 1)). In the first case, you
are done; in the second set of cases, only one step remains.
Assume for the inductive hypothesis that for any bag state (y ′ , b′ , r′ ) such
that

(y ′ , b′ , r′ ) <3 (y, b, r) ,

only a finite number of steps remain. Now remove two chips from the current
bag, represented by state (y, b, r). Consider the three possible cases:
1. If one of the two removed chips is red, you do not put any chips in the bag.
Then the new bag state is (y − 1, b, r − 1), (y, b − 1, r − 1), or (y, b, r − 2).
Each is less than (y, b, r) by <3 .
2. If both of the removed chips are yellow, you put one yellow chip and five
blue chips in the bag. Then the new bag state is (y − 1, b + 5, r), which is
less than (y, b, r) by <3 .
3. If one of the chips is blue and the other is not red, you put ten red chips in
the bag. Then the new bag state is (y − 1, b − 1, r + 10) or (y, b − 2, r + 10).
Each is less than (y, b, r) by <3 .
In all cases, we can apply the inductive hypothesis to deduce that only a finite
number of steps remain from the next state. Since only one step of the process
is required to get to the next state, there are only a finite number of steps
remaining from the current state (y, b, r). Hence, the process always halts. 

Example 4.8. Consider proving the property

∀x, y. x c y → |x| ≤ |y| (4.19)


PA
in Tcons . Let ≺2c be the natural lexicographic extension of ≺c to pairs of lists.
That is, (x1 , y1 ) ≺2c (x2 , y2 ) iff x1 ≺c x2 ∨ (x1 = x2 ∧ y1 ≺c y2 ).
We apply lexicographic well-founded induction to pairs (x, y) to prove

F [x, y] : x c y → |x| ≤ |y| . (4.20)

For the inductive hypothesis, assume that

∀x′ , y ′ . (x′ , y ′ ) ≺2c (x, y) → x′ c y ′ → |x′ | ≤ |y ′ | . (4.21)


| {z }
F [x′ ,y ′ ]

Now consider arbitrary lists x and y. Consider two cases: either atom(x)
or ¬atom(x).
If atom(x), then

|x| = 1 (length atom)


≤ |y| Example 4.6
106 4 Induction

Hence, regardless of whether x c y, we have that |x| ≤ |y| so that (4.20)


holds.
If ¬atom(x), then consider two cases: either atom(y) or ¬atom(y). If
atom(y), then

x c y ⇔ ⊥

by (c (2)); therefore, (4.20) holds trivially.


For the final case, we have that ¬atom(x) and ¬atom(y). Then x =
cons(u1 , v1 ) and y = cons(u2 , v2 ) for some lists u1 , v1 , u2 , v2 . We have

x c y ⇔ cons(u1 , v1 ) c cons(u2 , v2 ) assumption


⇔ (u1 = u2 ∧ v1 c v2 ) ∨ cons(u1 , v1 ) c v2 (c (4))

The disjunction suggests two possibilities. Consider the first disjunct. Because
v1 ≺c cons(u1 , v1 ) = x, we have that

(v1 , v2 ) ≺2c (x, y) ,

allowing us to appeal to the inductive hypothesis (4.21): from v1 c v2 , deduce


that |v1 | ≤ |v2 |. Then with two applications of (length list), we have

|x| ≤ |y| ⇔ 1 + |v1 | ≤ 1 + |v2 | ⇔ |v1 | ≤ |v2 | .

Therefore, |x| ≤ |y| and (4.20) holds for this case.


Suppose the second disjunct (cons(u1 , v1 ) c v2 ) holds. We again look to
the inductive hypothesis (4.21). We have

(cons(u1 , v1 ), v2 ) ≺2c (x, y)

because cons(u1 , v1 ) = x and v2 ≺c cons(u2 , v2 ) = y. Therefore, the inductive


hypothesis tells us that |x| ≤ |v2 |, while (length list) implies that |v2 | < |y|. In
short,

|x| ≤ |v2 | < |y| ,

which implies |x| ≤ |y| as desired, completing the proof. 

Example 4.9. Augment the theory of Presburger arithmetic TN (see Chap-


ters 3 and 7) with the following axioms to define the Ackermann function:
• ∀y. ack (0, y) = y + 1 (ack left zero)
• ∀x. ack (x + 1, 0) = ack (x, 1) (ack right zero)
• ∀x, y. ack (x + 1, y + 1) = ack (x, ack (x + 1, y)) (ack successor)
The Ackermann function grows quickly for increasing arguments:
• ack (0, 0) = 1
• ack (1, 1) = 3
4.3 Well-Founded Induction 107

• ack (2, 2) = 7
• ack (3, 3) = 61
16
22
• ack (4, 4) = 22 −3
One might expect that proving properties about the Ackermann function
would be difficult.
However, lexicographic well-founded induction allows us to reason about
certain properties of the function. Define <2 as the natural lexicographic ex-
tension of < to pairs of natural numbers. Now consider input arguments to
ack and the resulting arguments in recursive calls:
• (ack left zero) does not involve a recursive call.
• In (ack right zero), (x + 1, 0) >2 (x, 1).
• In (ack successor),
– (x + 1, y + 1) >2 (x + 1, y), and
– (x + 1, y + 1) >2 (x, ack (x + 1, y)).
As the arguments decrease according to <2 with each level of recursion, we
conclude that the computation of ack (x, y) halts for every x and y. In Chap-
ter 5, we show that finding well-founded relations is a general technique for
showing that functions always halt.
Additionally, we can induct over the execution of ack to prove properties
of the ack function itself. Let us prove that

∀x, y. ack (x, y) > y (4.22)

is TNack -valid. We apply lexicographic well-founded induction to the arguments


of ack to prove

F [x, y] : ack (x, y) > y (4.23)

for arbitrary natural numbers x and y. For the inductive hypothesis, assume
that

∀x′ , y ′ . (x′ , y ′ ) <2 (x, y) → ack (x′ , y ′ ) > y ′ . (4.24)


| {z }
F [x′ ,y ′ ]

Consider three cases: x = 0, x > 0 ∧ y = 0, and x > 0 ∧ y > 0.


If x = 0, then ack (0, y) = y + 1 > y by (ack left zero), as desired.
If x > 0 ∧ y = 0, then

ack (x, 0) = ack (x − 1, 1)

by (ack right zero). Since

(x′ : x − 1, y ′ : 1) <2 (x, y) ,

the inductive hypothesis (4.24) tells us that


108 4 Induction

ack (x − 1, 1) > 1 .

Therefore, we have

ack (x, 0) = ack (x − 1, 1) > 1 ,

so ack (x, 0) > 0 as desired.


For the final case, x > 0 ∧ y > 0, we have

ack (x, y) = ack (x − 1, ack (x, y − 1))

by (ack successor). Since

(x′ : x − 1, y ′ : ack (x, y − 1)) <2 (x, y) ,

the inductive hypothesis (4.24) implies that

ack (x − 1, ack (x, y − 1)) > ack (x, y − 1) .

Furthermore, since

(x′ : x, y ′ : y − 1) <2 (x, y) ,

the inductive hypothesis (4.24) implies that

ack (x, y − 1) > y − 1 .

All together, then, we have

ack (x, y) = ack (x − 1, ack (x, y − 1)) > ack (x, y − 1) > y − 1 ;

hence, ack (x, y) > (y − 1) + 1 = y, completing the proof. 

4.4 Structural Induction


Induction has many other applications outside of reasoning about the valid-
ity of first-order formulae. In this section, we introduce the proof technique
of structural induction for proving properties about formulae themselves.
Structural induction is applied in Section 2.7, in analyzing the quantifier elim-
ination procedures of Chapter 7, and in other applications throughout the
book.
Define the strict subformula relation over FOL formulae as follows:
two formulae F1 and F2 are related by the strict subformula relation iff F1 is
a strict subformula of F2 . The strict subformula relation is well founded over
the set of FOL formulae since every formula, having only a finite number of
symbols, has only a finite number of strict subformulae; and each of its strict
subformulae has fewer strict subformulae that it does. To prove a desired
property of FOL formulae, instantiate the well-founded induction principle
with the strict subformula relation:
4.4 Structural Induction 109

• Assume as the inductive hypothesis that for arbitrary FOL formula F


and for every strict subformula G of F , G has the desired property. Then
prove that F has the property.
Since atoms do not have strict subformulae, they are treated as base cases.
This induction principle is the structural induction principle.

Example 4.10. Exercise 1.3 asks the reader to prove that certain logical
connectives are redundant in the presence of others. Formally, the exercise
is asking the reader to prove the following claim: Every propositional formula
F is equivalent to a propositional formula F ′ constructed with only the logical
connectives ⊤, ∧, and ¬.
There are three base cases to consider:
• The formula ⊤ can be represented directly as ⊤.
• The formula ⊥ is equivalent to ¬⊤.
• Any propositional variable P can be represented directly as P .
For the inductive step, consider formulae G, G1 , and G2 , and assume as
the inductive hypothesis that each is equivalent to formulae G′ , G′1 , and G′2 ,
respectively, which are constructed only from the connectives ⊤, ∨, and ¬
(and propositional variables, of course). We show that each possible formulae
that can be constructed from G, G1 , and G2 with only one logical connective
is equivalent to another constructed with only ⊤, ∨, and ¬:
• ¬G is equivalent to ¬G′ from the inductive hypothesis.
• By considering the truth table in which the four possible valuations of
G1 and G2 are considered, one can establish that G1 ∨ G2 is equivalent
to ¬(¬G′1 ∧ ¬G′2 ). By the inductive hypothesis, the latter formula is con-
structed only from propositional variables, ⊤, ∧, and ¬.
• By similar reasoning, G1 → G2 is equivalent to ¬(G′1 ∧¬G′2 ), which satisfies
the claim.
• Similar reasoning handles G1 ↔ G2 as well.
Hence, the claim is proved.
Note that the main argument is essentially similar to the answer that the
reader might have provided in answering Exercise 1.3. Structural induction
merely provides the basis for lifting the truth-table argument to a general
statement about propositional formulae. 

Structural induction is also useful for reasoning about interpretations of


formulae, as the following example shows.

Example 4.11. This example relies on several basic concepts of set theory;
however, even the reader unfamiliar with set theory can understand the ap-
plication of structural induction without understanding the actual claim.
Consider ΣQ -formulae F [x1 , . . . , xn ] in which the only predicate is ≤,
the only logical connectives are ∨ and ∧, and the only quantifier is ∀. We
110 4 Induction

prove that the set of satisfying TQ -interpretations of F (intuitively, those TQ -


interpretations that assign to x1 , . . . , xn values from Qn that satisfy F ) de-
scribes a closed subset of Qn .
For the base case, consider any inequality α ≤ β with free variables
x1 , . . . , xn . From basic set theory, the set of satisfying points is closed.
For the inductive step, consider formulae G, G1 , and G2 constructed
as specified. Assume as the inductive hypothesis that the satisfying TQ -
interpretations for each comprise closed sets. Consider applying the allowed
logical connectives and quantifier:
• G1 ∧ G2 : The set described by this formula is the set-theoretic intersection
of the sets described by G1 and G2 , and is thus closed by the inductive
hypothesis and set theory.
• G1 ∨ G2 : Similarly, the set described by this formula is the set-theoretic
union of the sets described by G1 and G2 , and is thus closed by the induc-
tive hypothesis and set theory.
• ∀x. G: Consider subformula G with free variable x (if x is not free in G,
then the formula is equivalent to just G, which describes a closed set by
the inductive hypothesis). For each value ab ∈ Q, consider the formula

G ab : G ∧ bx ≤ a ∧ a ≤ bx .

The set described by each G ab is closed according to the inductive hy-


pothesis and reasoning similar to the previous cases. From set theory,
the conjunction of all such sets is still closed, so the set of satisfying TQ -
interpretations of ∀x. G describes a closed set.
The induction is complete, so the claim is proved.
Results from Chapter 7 prove that ∃ also preserves closed sets in TQ . 

Remark 4.12. Example 4.11 considers a subset of FOL formulae. However,


this subset is by definition closed under conjunction, disjunction, and universal
quantification: if F , F1 , and F2 are in the subset, then so are F1 ∧ F2 , F1 ∨ F2 ,
and ∀x. F ; and conversely. In other words, all strict subformulae of a formula
in the subset are also in the subset, so that structural induction is applicable.

The proof of Lemma 2.31 provides another example of the application of


structural induction.

4.5 Summary

This chapter covers several induction principles in several first-order theories:


• Stepwise induction is presented in the context of integer arithmetic and
lists. The induction principle requires defining a step such as adding one
or constructing a list with one more element.
Exercises 111

• Complete induction is presented in the context of integer arithmetic. The


induction principle relies on the well-foundedness of the < predicate.
Rather than assuming that the desired property holds for one element
n and proving the property for the case n + 1 as in stepwise reduction, one
assumes that the property holds for all elements n′ < n and proves that
it holds for n. This stronger assumption sometime yields easier or more
concise proofs.
• Well-founded induction generalizes complete induction to other theories; it
is presented in the context of lists and lexicographic tuples. The induction
principle requires a well-founded relation over the domain.
• Structural induction is an instance of well-founded induction in which the
domain is formulae and the well-founded relation is the strict subformula
relation.
Besides being an important tool for proving first-order validities, induc-
tion is the basis for both verification methodologies studied in Chapter 5.
Structural induction also serves as the basis for the quantifier elimination
procedures studied in Chapter 7.

Bibliographic Remarks
The induction proofs in Examples 4.1, 4.3, and 4.9 are taken from the text of
Manna and Waldinger [55].
Blaise Pascal (1623–1662) and Jacob Bernoulli (1654–1705) are recognized
as having formalized stepwise and complete induction, respectively. Less for-
mal versions of induction appear in texts by Francesco Maurolico (1494–1575);
Rabbi Levi Ben Gershon (1288–1344), who recognized induction as a distinct
form of mathematical proof; Abu Bekr ibn Muhammad ibn al-Husayn Al-
Karaji (953–1029); and Abu Kamil Shuja Ibn Aslam Ibn Mohammad Ibn
Shaji (850–930) [97]. Some historians claim that Euclid may have applied
induction informally.

Exercises
+ +
4.1 (Tcons ). Prove the following in Tcons :
(a) ∀u, v. flat(u) ∧ flat (v) → flat (concat (u, v))
(b) ∀u. flat(u) → flat (rvs(u))
PA PA
4.2 (Tcons ). Prove or disprove the following in Tcons :
(a) ∀u. u c u
(b) ∀u, v, w. cons(u, v) c w → v c w
(c) ∀u, v. v ≺c cons(u, v)
112 4 Induction

+ PA + PA
4.3 (Tcons ∪ Tcons ). Prove the following in Tcons ∪ Tcons :
(a) ∀u, v. |concat (u, v)| = |u| + |v|
(b) ∀u. flat(u) → |rvs(u)| = |u|

4.4 (Chips). Does the process of Example 4.7 still halt if


(a) in Step 1, you return one red chip to the bag?
(b) in Step 1, you add one blue chip?
(c) in Step 1, you add one blue chip; and in Step 3, you return the blue chip
to the bag but do not add any other chips?

4.5 (Strict sublist). Modify Example 4.8 to prove

∀x, y. x ≺c y → |x| < |y| .

4.6 (Structural induction). Prove that every first-order formula F is equiv-


alent to a first-order formula F ′ constructed with only the logical connectives
⊤, ∧, and ¬ and the quantifier ∀.

4.7 (Finite number of sublists). Prove that the number of sublists of a


PA
list (defined in Tcons ) is finite.
PA
4.8 (⋆ ≺c is well-founded). Prove that ≺c , defined in Tcons , is well-founded
over lists. To avoid circularity, do not apply well-founded induction in this
proof. Hint: Prove that ≺c is transitive and irreflexive (∀u. ¬(u ≺c u)); then
apply Exercise 4.7.
5
Program Correctness: Mechanics

When examining the detail of the algorithm, it seems probable that the
proof will be helpful in explaining not only what is happening but why.
— Tony Hoare
An Axiomatic Basis for Computer Programming, 1969
We are finally ready to apply FOL and induction to a real problem: spec-
ifying and proving properties of programs. In this chapter, we develop the
three foundational methods that underly all verification and program analy-
sis techniques. In the next chapter, we discuss strategies for applying them.
First, specification is the precise statement of properties that a program
should exhibit. The language of FOL offers precision. The remaining task
is to develop a scheme for embedding FOL statements into program text
as program annotations. We focus on two forms of properties. Partial
correctness properties, or safety properties, assert that certain states —
typically, error states — cannot ever occur during the execution of a program.
An important subset of this form of property is the partial correctness of
programs: if a program halts, then its output satisfies some relation with
its input. Total correctness properties, or progress properties, assert that
certain states are eventually reached during program execution. Section 5.1
presents specification in the context of a simple programming language, pi.
The next foundational method is the inductive assertion method for
proving partial correctness properties. The inductive assertion method is
based on the mathematical induction of Chapter 4. To prove that every state
during the execution of a program satisfies FOL formula F , prove as the base
case that F holds at the beginning of execution; assume as the inductive hy-
pothesis that F currently holds (at some point during the execution); and
prove as the inductive step that F holds after one more step of the program.
Section 5.2 discusses the mechanics for reducing a program with a partial cor-
rectness specification to this inductive argument. The challenge in applying
this method is to discover additional annotations to make the induction go
through. Chapter 6 discusses strategies for finding the extra information.
114 5 Program Correctness: Mechanics

The third foundational method is the ranking function method for


proving total correctness properties. Proving total correctness breaks down
into two arguments. First, one proves that some partial correctness property
holds using the inductive assertion method; second, one argues that some set
of loops and recursive functions always halt. The ranking function method
applies to the latter argument: one associates with each loop and recursive
function a ranking function that maps the program variables to a well-
founded domain (see Chapter 4); then one proves that whenever program
control moves from one ranking function to the next, the value decreases ac-
cording to the well-founded relation. Since the relation is well-founded, the
looping and recursion must eventually halt. A typical total correctness prop-
erty asserts that the program halts and its output satisfies some relation with
its input. The ranking function method applies to the first conjunct (the pro-
gram halts); the inductive assertion method applies to the second (its output
satisfies some relation with its input). Section 5.3 presents the ranking func-
tion method.
We explain all concepts in this and the next chapter with a set of example
programs that manipulate arrays. We chose these programs for several rea-
sons. First, they should be familiar to most readers; the reader can thus focus
on the verification methodology rather than on understanding new complex
programs. Second, our correctness proofs rely heavily on the decision proce-
dures that are discussed in Part II of this book. Finally, they are small but
dense, allowing us to exhibit common techniques for proving interesting facts
about programs. The reader should keep in mind, however, that the methods
of this chapter underlie software and hardware analyses that are applied in
practice.
The reader may find the contents of this chapter to be rather technical.
Indeed, the transformation of an annotated program into a set of verification
conditions is a purely mechanical task. Fortunately, a verifying compiler
does this work in practice: it parses an annotated program, checking the syn-
tax and semantics as usual, and generates a set of verification conditions. But
just as learning how compilers work is important for understanding program-
ming languages, learning how verifying compilers work is important for un-
derstanding verification and programming languages with annotations. More-
over, applying the steps of generating verification conditions provides practice
in manipulating and understanding FOL formulae, a useful skill. Chapter 6
discusses strategies for writing annotations, a task that cannot be fully auto-
mated.

5.1 pi: A Simple Imperative Language


This section introduces the programming language pi, an imperative language
with facilities for annotations. To allow us to focus on the fundamentals of
program verification, pi lacks complicating features of typical imperative lan-
5.1 pi: A Simple Imperative Language 115

@pre ⊤
@post ⊤
bool LinearSearch(int[] a, int ℓ, int u, int e) {
for @ ⊤
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;
}

Fig. 5.1. LinearSearch

guages: its data types do not include pointer or reference types; and it does
not allow global variables, although it does have global constants (see Exercise
6.5). After reading this chapter and Chapter 12, the interested reader should
consult the wide literature on program analysis to learn how the techniques of
these chapters extend to reasoning about standard programming languages.

5.1.1 The Language

Because pi is superficially a C-like language with restrictions, we present the


essential features of pi through examples.

Example 5.1. Figure 5.1 lists the function LinearSearch, which searches the
range [ℓ, u] of an array a of integers for a value e. It returns true iff the given
array contains the value between the lower bound ℓ and upper bound u. It
behaves correctly only if 0 ≤ ℓ and u < |a|; otherwise, the array a is accessed
outside of its domain [0, |a| − 1]. |a| denotes the length of array a.
Observe that most of the syntax is similar to C. For example, the for loop
sets i to be ℓ initially and then executes the body of the loop and increments i
by 1 as long as i ≤ u. Also, an integer array has type int[], which is constructed
from base type int. One syntactic difference occurs in assignment, which is
written := to distinguish it from the equality predicate =. We use = as the
equality predicate, rather than ==, to correspond to the standard equality
predicate of FOL. Finally, unlike C, pi has type bool and constants true and
false.
Notice the lines beginning with @. They are program annotations, which
we discuss in detail in the next section.
In LinearSearch, a, ℓ, u, and e are the formal parameters (also, param-
eters) of the function. If LinearSearch is called as LinearSearch(b, 0, |b| − 1, v),
then b, 0, |b| − 1, and v are the arguments. 

Example 5.2. Figure 5.2 lists the recursive function BinarySearch, which
searches a range [ℓ, u] of a sorted (weakly increasing: a[i] ≤ a[j] if i ≤ j)
array a of integers for a value e. Like LinearSearch, it returns true iff the
116 5 Program Correctness: Mechanics

@pre ⊤
@post ⊤
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
int m := (ℓ + u) div 2;
if (a[m] = e) return true;
else if (a[m] < e) return BinarySearch(a, m + 1, u, e);
else return BinarySearch(a, ℓ, m − 1, e);
}
}

Fig. 5.2. BinarySearch

given array contains the value in the range [ℓ, u]. It behaves correctly only if
0 ≤ ℓ and u < |a|.
One level of recursion operates as follows. If the lower bound ℓ of the range
is greater than the upper bound u, then the (empty) subarray cannot contain
e, so it returns false. Otherwise, it examines the middle element a[m] of the
subarray: if it is e, then the subarray clearly contains e; otherwise, it recurses
on the left half if a[m] < e and on the right half if a[m] > e.
pi syntactically distinguishes between integer division and real division: for
int variables a and b, write a div b instead of a/b. Integer division is defined
as follows:
jak
def
a div b = .
b
a
That is, a div b is equal to the greatest integer less than or equal to b (the
floor of ab ). 

Example 5.3. Figure 5.3 lists the function BubbleSort, which sorts an integer
array. It works by “bubbling” the largest element of the left unsorted region of
the array toward the sorted region on the right; this element then becomes the
left element of the sorted region, enlarging the region by one cell. In Figure
5.4, for example, the first line shows an array in which the rightmost boxed
cells comprise the sorted region and the other cells comprise the unsorted
region. In the final line, the sorted region has been expanded by one cell.
Figure 5.4 lists a portion of a sample execution trace. The right two cells
(5, 6) of the array have already been sorted. In the trace, the inner loop
moves the largest element 4 of the unsorted region to the right to join the
sorted region, which is indicated by the dotted rectangle. In the first two
steps, a[j] ≤ a[j + 1] (2 ≤ 3 and 3 ≤ 4), so the values of cell j and j + 1
are not swapped in either case. In the subsequent two steps, a[j] > a[j + 1]
(4 > 1 and 4 > 2), causing a swap at each step. In the fifth step, the inner
loop’s guard i < j no longer holds, so the inner loop exits and the outer
5.1 pi: A Simple Imperative Language 117

@pre ⊤
@post ⊤
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := |a| − 1; i > 0; i := i − 1) {
for @ ⊤
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.3. BubbleSort

2 3 4 1 2 5 6
j i

2 3 4 1 2 5 6
j i

2 3 4 1 2 5 6
j i

2 3 1 4 2 5 6
j i

2 3 1 2 4 5 6
j, i

2 3 1 2 4 5 6
j i

Fig. 5.4. Sample execution of BubbleSort

loop decrements i by 1. The sorted region has been expanded by one cell, as
indicated by the final dotted rectangle. The last step shows the beginning of
the next round of the inner loop.
Because pi does not have pointer or reference types, all data are passed by
value, including arrays and structures. If BubbleSort were missing the return
118 5 Program Correctness: Mechanics

typedef struct qs {
int pivot;
int[] array;
} qs;

Fig. 5.5. Structure qs

statement, then calling it would not have any discernible effect on the calling
context. Additionally, pi does not allow updates to parameters, so BubbleSort
assigns a0 to a fresh variable a in the first line. This artificial requirement
makes reasoning about functions easier: in annotations (see Section 5.1.2)
throughout the function, one can always reference the input. 
In this book, our example programs manipulate arrays rather than re-
cursive data structures. The reason is that we can express more interesting
properties about arrays in the fragment of the theory of arrays studied in
Chapter 11 than we can about lists in the fragment of the theory of recursive
data structures studied in Chapter 9. This bias is a reflection of the structure
and content of this book, not of what is theoretically possible.
However, we sometimes use records, a basic recursive data type, to allow
a function to return multiple values. The following example illustrates such a
record type, which is used in the program QuickSort (see Section 6.2).
Example 5.4. The structure qs of Figure 5.5 is a record with two fields: the
pivot field of type int and the array field of type array. If x is a variable of
type qs, then x.pivot returns the value in its pivot field; also, x.array[i] := v
assigns v to position i of x’s array field. 

5.1.2 Program Annotations


The most important feature of pi is the capacity for complex function an-
notations. An annotation is a FOL formula F whose free variables include
only the program variables of the function in which the annotation occurs. An
annotation F at location L asserts that F is true whenever program control
reaches L. We discuss several forms of annotations in this section.

Function Specifications
The function specification of a function is a pair of annotations. The func-
tion precondition is a formula F whose free variables include only the for-
mal parameters. It specifies what should be true upon entering the function
— or, in other words, under what inputs the function is expected to work. The
function postcondition is a formula G whose free variables include only the
formal parameters and the special variable rv representing the return value
of the function. The postcondition relates the function’s output (the return
value rv ) to its input (the parameters).
5.1 pi: A Simple Imperative Language 119

@pre 0 ≤ ℓ ∧ u < |a|


@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e
bool LinearSearch(int[] a, int ℓ, int u, int e) {
for @ ⊤
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;
}

Fig. 5.6. LinearSearch with function specification

@pre ⊤
@post rv ↔ ∃i. 0 ≤ ℓ ≤ i ≤ u < |a| ∧ a[i] = e
bool LinearSearch(int[] a, int ℓ, int u, int e) {
if (ℓ < 0 ∨ u ≥ |a|) return false;
for @ ⊤
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;
}

Fig. 5.7. Robust LinearSearch with function specification

Example 5.5. In Example 5.1, we informally specified the behavior of Lin-


earSearch as follows: LinearSearch returns true iff the array a contains the
value e in the range [ℓ, u]. It behaves correctly only when ℓ ≥ 0 and u < |a|.
Function specifications formalize such statements. Figure 5.6 presents Lin-
earSearch with its specification. The precondition asserts that the lower bound
ℓ should be at least 0 and that the upper bound u should be less than the
length |a| of the array a. The postcondition asserts that the return value rv
is true iff a[i] = e for some index i ∈ [ℓ, u] of a. 

Example 5.6. A nontrivial precondition (a formula other than ⊤) is not al-


ways acceptable, especially if a function is public to a module. Figure 5.7 lists
a more robust version of linear search. The formula

0 ≤ ℓ ≤ i ≤ u < |a|

abbreviates

0 ≤ ℓ ∧ ℓ ≤ i ∧ i ≤ u ∧ u < |a| .

A nontrivial precondition is sometimes acceptable for a function that is private


to a module. The verification method of this chapter checks that every instance
of a call to such a function obeys the precondition. 
120 5 Program Correctness: Mechanics

@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)


@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
int m := (ℓ + u) div 2;
if (a[m] = e) return true;
else if (a[m] < e) return BinarySearch(a, m + 1, u, e);
else return BinarySearch(a, ℓ, m − 1, e);
}
}

Fig. 5.8. BinarySearch with function specification

@pre ⊤
@post sorted(rv , 0, |rv | − 1)
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := |a| − 1; i > 0; i := i − 1) {
for @ ⊤
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.9. BubbleSort with function specification

Example 5.7. Figure 5.8 lists BinarySearch with its specification. As ex-
pected, its postcondition is identical to the postcondition of LinearSearch.
However, its precondition also states that the array a is sorted.
The sorted predicate is defined in the combined theory of integers and
arrays, TZ ∪ TA :

sorted(a, ℓ, u) ⇔ ∀i, j. ℓ ≤ i ≤ j ≤ u → a[i] ≤ a[j] .

Example 5.8. Figure 5.9 lists BubbleSort with its specification. Given any
array, the returned array is sorted. Of course, other properties are desirable
5.1 pi: A Simple Imperative Language 121

and could be specified as well. For example, the returned array rv should be
a permutation of the original array a0 (see Exercise 6.5). 
Section 5.2 presents a method for proving that a function satisfies its
partial correctness specification: if the function precondition is satisfied and
the function halts, then the function postcondition holds upon return. Section
5.3 discusses a method for proving that, additionally, the function always halts.

Loop Invariants
Each for loop and while loop has an attendant annotation called the loop
invariant. A while loop
while
@F
(hcondition i) {
hbody i
}
says to apply the hbodyi as long as hcondition i holds. The assertion F must
hold at the beginning of every iteration. It is evaluated before the hconditioni
is evaluated, so it must hold even on the final iteration when hcondition i is
false. Therefore, on entering the hbody i of the loop,
F ∧ hcondition i
must hold, and on exiting the loop,
F ∧ ¬hcondition i
must hold.
To consider a for loop, translate the loop
for
@F
(hinitialize i; hconditioni; hincrement i) {
hbody i
}
into the equivalent loop
hinitializei;
while
@F
(hcondition i) {
hbody i
hincrement i
}
F must hold after the hinitializei statement has been evaluated and, on each
iteration, before the hcondition i is evaluated.
122 5 Program Correctness: Mechanics

@pre 0 ≤ ℓ ∧ u < |a|


@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e
bool LinearSearch(int[] a, int ℓ, int u, int e) {
for
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;
}

Fig. 5.10. LinearSearch with loop invariant

Example 5.9. Figure 5.10 lists LinearSearch with a nontrivial loop invariant
at L. It asserts that whenever control reaches L, the loop index is at least ℓ
and that a[j] 6= e for previously examined indices j. 
Section 5.2 shows that loop invariants are crucial for constructing an in-
ductive argument that a function obeys its specification.

Assertions

In pi, one can add an annotation anywhere. When an annotation is not a


function precondition, function postcondition, or loop invariant, we call it an
assertion. Assertions allow programmers to provide a formal comment. For
example, if at the statement

i := i + k;

the programmer thinks that k is positive, then the programmer can add an
assertion stating that supposition:

@ k > 0;
i := i + k;

Later, the programmer’s hyothesis about k is verified with formal verification


at compile time or with dynamic assertion tests at runtime.
Runtime assertions are a special class of assertions. In most program-
ming languages, runtime errors include division by 0, modulo by 0, and
dereference of null. In particular, division by 0 and modulo by 0 cause hard-
ware exceptions, while only some languages, such as Java, catch a dereference
of null. In pi, runtime errors include division by 0, modulo by 0, and accessing
an array out of bounds. The pi compiler generates runtime assertions to catch
runtime errors.
Example 5.10. Figure 5.11 lists LinearSearch with runtime assertions. The
array read a[i] is protected by the assertion that i is a legal index of a. 
5.2 Partial Correctness 123

@pre ⊤
@post ⊤
bool LinearSearch(int[] a, int ℓ, int u, int e) {
for @ ⊤
(int i := ℓ; i ≤ u; i := i + 1) {
@ 0 ≤ i < |a|;
if (a[i] = e) return true;
}
return false;
}

Fig. 5.11. LinearSearch with runtime assertions

@pre ⊤
@post ⊤
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
@ 2 6= 0;
int m := (ℓ + u) div 2;
@ 0 ≤ m < |a|;
if (a[m] = e) return true;
else {
@ 0 ≤ m < |a|;
if (a[m] < e) return BinarySearch(a, m + 1, u, e);
else return BinarySearch(a, ℓ, m − 1, e);
}
}
}

Fig. 5.12. BinarySearch with runtime assertions

Example 5.11. Figure 5.12 lists BinarySearch with runtime assertions. The
first assertion protects the division: it asserts that 2 6= 0, which clearly holds.
The next two assertions protect the array reads. 

Example 5.12. Figure 5.13 lists BubbleSort with compiler-generated runtime


assertions. All assertions protect array accesses. The first two runtime asser-
tions are sufficient to protect all array accesses. Figure 5.14 lists a concise
version. 

5.2 Partial Correctness


Having specified and implemented each function of a program, we would like
to prove that the functions obey their specifications (from another perspective,
124 5 Program Correctness: Mechanics

@pre ⊤
@post ⊤
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := |a| − 1; i > 0; i := i − 1) {
for @ ⊤
(int j := 0; j < i; j := j + 1) {
@ 0 ≤ j < |a|;
@ 0 ≤ j + 1 < |a|;
if (a[j] > a[j + 1]) {
@ 0 ≤ j < |a|;
int t := a[j];
@ 0 ≤ j < |a|;
@ 0 ≤ j + 1 < |a|;
a[j] := a[j + 1];
@ 0 ≤ j + 1 < |a|;
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.13. BubbleSort with runtime assertions

that their specifications reflect their actual behavior). A function is partially


correct if when the function’s precondition is satisfied on entry, its post-
condition is satisfied when the function returns (if it ever does). In general,
functions may not halt: a loop’s guard could be incorrect, or a recursive func-
tion could fail to handle a particular base case. Section 5.3 discusses how to
prove that a function always halts.
We present the inductive assertion method for proving that a program
is partially correct. The method reduces each function and its annotations to
a finite set of verification conditions (VCs), which are FOL formulae. If
all of a function’s VCs are valid, then the function obeys its specification. The
reduction occurs in two stages: first, each function of the annotated program is
broken down into a finite set of basic paths (Sections 5.2.1 and 5.2.2); second,
each basic path generates a verification condition (Section 5.2.4). Sections
5.2.3 and 5.2.5 provide a more abstract view of the inductive assertion method.
Loops complicate proofs of partial correctness because they create an un-
bounded number of paths from function entry to exit. Recursive functions
similarly complicate proofs. A path is a sequence of program statements. For
loops, loop invariants cut the paths into a finite set of basic paths, while for
recursive functions, the function specification of the recursive function cuts
the paths. In this section, we assume that we are given function specifications
5.2 Partial Correctness 125

@pre ⊤
@post ⊤
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := |a| − 1; i > 0; i := i − 1) {
for @ ⊤
(int j := 0; j < i; j := j + 1) {
@ 0 ≤ j < |a| ∧ 0 ≤ j + 1 < |a|;
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.14. BubbleSort with compressed runtime assertions

@pre 0 ≤ ℓ ∧ u < |a|


@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e
bool LinearSearch(int[] a, int ℓ, int u, int e) {
for
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;
}

Fig. 5.15. LinearSearch with loop invariants

and loop invariants and concentrate on the task of generating the correspond-
ing verification conditions. In practice, this task is performed by a verifying
compiler. Chapter 6 discusses strategies for constructing specifications and
loop invariants.

5.2.1 Basic Paths: Loops

A basic path is a sequence of instructions that begins at the function pre-


condition or a loop invariant and ends at a loop invariant, an assertion, or
the function postcondition. Moreover, a loop invariant can only occur at the
beginning or the ending of a basic path. Thus, basic paths do not cross loops.
The following examples illustrate the characteristics of basic paths.
126 5 Program Correctness: Mechanics

Example 5.13. Figure 5.15 lists an annotated version of LinearSearch. Its


basic paths are the following. The first basic path starts at the function pre-
condition, enters the for loop via the initialization statement, and ends at
the loop invariant L:
(1)
@pre 0 ≤ ℓ ∧ u < |a|
i := ℓ;
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)

The second basic path begins at the loop invariant at L, passes the loop
guard i ≤ u, passes the guard a[i] = e of the if statement, executes the
return (of true), and ends at the postcondition:
(2)
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
assume i ≤ u;
assume a[i] = e;
rv := true;
@post rv ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e

This path exhibits two new aspects of basic paths. First, return statements
become assignments to the special variable rv representing the return value.
Second, guards arising in program statements (in for loop guards, while
loop guards, or if statements) become assume statements in basic paths.
An assume statement assume c in a basic path means that the remainder of
the basic path is executed only if the condition c holds at assume c. Each
guard with condition c results in two assumptions: the guard holds (c) or
it does not hold (¬c). Therefore, each guard produces two paths with the
same prefix up to the guard. They diverge on the assumption: one basic path
has the statement assume c, and the other has the statement assume ¬c.
These assumptions and the control structure of the program determine the
construction of the remainder of the basic paths.
For example, the third path has the same prefix as (2) but makes the
opposite assumption at the if statement guard: it assumes a[i] 6= e rather
than a[i] = e. Therefore, this path loops back around to the loop invariant:
(3)
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
assume i ≤ u;
assume a[i] 6= e;
i := i + 1;
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)

The final basic path has the same prefix as (2) and (3) but makes the
opposite assumption at the for loop guard: it assumes i > u rather than
i ≤ u. Therefore, this path exits the loop and returns false:
5.2 Partial Correctness 127

@pre
(1)
(3) L
(2),(4)
@post

Fig. 5.16. Visualization of basic paths of LinearSearch

(4)
@L : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
assume i > u;
rv := false;
@post rv ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e

To avoid forgetting a basic path, we list paths in a depth-first order. When


a guard is encountered, assume that it holds and generate the resulting paths;
then assume that it does not hold and generate the resulting paths. In this
example, the loop guard i ≤ u in (2) is first encountered; (2) and (3) follow
from the assumption that the loop guard holds, while (4) follows from the
assumption that it does not hold. The if statement guard a[i] = e is next
encountered in (2); (2) follows from the assumption that it holds, while (3)
follows from the assumption that it does not hold. Figure 5.16 visualizes these
basic paths. 

Example 5.14. Figure 5.17 lists BubbleSort with loop invariants. The outer
loop invariant at L1 asserts that
• i is in the range [−1, |a| − 1] (if |a| = 0, then i is initially −1);
• a is sorted in the range [i, |a| − 1];
• and a is partitioned such that each element in the range [0, i] is at most
(less than or equal to) each element in the range [i + 1, |a| − 1].
Its inner loop invariant at L2 asserts that
• i is in the range [1, |a| − 1], and j is in the range [0, i];
• a is sorted in the range [i, |a| − 1] as in the outer loop;
• a is partitioned as in the outer loop;
• and a is also partitioned such that each element in the range [0, j − 1] is
at most a[j].
The partitioned predicate is defined in the theory TZ ∪ TA :

partitioned(a, ℓ1 , u1 , ℓ2 , u2 )
⇔ ∀i, j. ℓ1 ≤ i ≤ u1 < ℓ2 ≤ j ≤ u2 → a[i] ≤ a[j] .
128 5 Program Correctness: Mechanics

@pre ⊤
@post sorted(rv , 0, |rv | − 1)
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for 2 3
−1 ≤ i < |a|
@L1 : 4 ∧ partitioned(a, 0, i, i + 1, |a| − 1)5
∧ sorted(a, i, |a| − 1)
(int i := |a| − 1; i > 0; i := i − 1) {
for 2 3
1 ≤ i < |a| ∧ 0 ≤ j ≤ i
6 ∧ partitioned(a, 0, i, i + 1, |a| − 1)7
@L2 : 6
4 ∧ partitioned(a, 0, j − 1, j, j)
7
5
∧ sorted(a, i, |a| − 1)
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.17. BubbleSort with loop invariants

Performing a depth-first exploration, the first basic path starts at the pre-
condition and ends at the outer loop invariant at L1 :
(1)
@pre ⊤;
a := a0 ;
i := |a| − 1;
@L1 : −1 ≤ i < |a| ∧ partitioned(a, 0, i, i + 1, |a| − 1) ∧ sorted(a, i, |a| − 1)

The second basic path starts at L1 and ends at the inner loop invariant at L2
(recall that the annotation is checked after the loop initialization j := 0):
(2)
@L1 : −1 ≤ i < |a| ∧ partitioned(a, 0, i, i + 1, |a| − 1) ∧ sorted(a, i, |a| − 1)
assume i > 0;
j := 0; 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)

The third and fourth basic paths follow the inner loop, each handling one
assumption on the guard a[j] > a[j + 1] of the if statement:
5.2 Partial Correctness 129

 (3) 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)
assume j < i;
assume a[j] > a[j + 1];
t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
j := j+ 1; 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)

 (4) 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)
assume j < i;
assume a[j] ≤ a[j + 1];
j := j+ 1; 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)

The fifth basic path starts at L2 , exits the inner loop, and decrements i on its
way to L1 :
 (5) 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i ∧ partitioned(a, 0, i, i + 1, |a| − 1)
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)
assume j ≥ i;
i := i − 1;
@L1 : −1 ≤ i < |a| ∧ partitioned(a, 0, i, i + 1, |a| − 1) ∧ sorted(a, i, |a| − 1)

The final basic path starts at L1 , exits the outer loop, and then exits the
function, returning the (presumably sorted) array a:
(6)
@L1 : −1 ≤ i < |a| ∧ partitioned(a, 0, i, i + 1, |a| − 1) ∧ sorted(a, i, |a| − 1)
assume i ≤ 0;
rv := a;
@post sorted(rv , 0, |rv| − 1)

Figure 5.18 visualizes these basic paths. 

Example 5.15. Figure 5.19 lists BubbleSort with runtime assertions at L3


and a different set of loop invariants at L1 and L2 relevant for proving the
runtime assertions. Six basic paths correspond in structure to those of Figure
5.18, although their content varies based on the new precondition, postcondi-
130 5 Program Correctness: Mechanics

@pre
(1)
L1
(6) (5) (2)
@post L2

(3), (4)

Fig. 5.18. Visualization of basic paths of BubbleSort

@pre ⊤
@post ⊤
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for
@L1 : −1 ≤ i < |a|
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : 0 < i < |a| ∧ 0 ≤ j ≤ i
(int j := 0; j < i; j := j + 1) {
@L3 : 0 ≤ j < |a| ∧ 0 ≤ j + 1 < |a|;
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.19. BubbleSort with runtime assertions

tion, and loop invariants. These basic paths ignore the runtime assertion at
L3 . Then one additional basic path ends at the runtime assertion:
(7)
@L2 : 0 < i < |a| ∧ 0 ≤ j ≤ i
assume j < i;
@L3 : 0 ≤ j < |a| ∧ 0 ≤ j + 1 < |a|


5.2 Partial Correctness 131

@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)


@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
int m := (ℓ + u) div 2;
if (a[m] = e) return true;
else if (a[m] < e) {
@R1 : 0 ≤ m + 1 ∧ u < |a| ∧ sorted(a, m + 1, u);
return BinarySearch(a, m + 1, u, e);
} else {
@R2 : 0 ≤ ℓ ∧ m − 1 < |a| ∧ sorted(a, ℓ, m − 1);
return BinarySearch(a, ℓ, m − 1, e);
}
}
}

Fig. 5.20. BinarySearch with function call assertions

5.2.2 Basic Paths: Function Calls

Like loops, recursive functions create an unbounded number of paths within


programs. But just as loop invariants cut loops to produce a finite number of
basic paths, function specifications cut function calls.
Recall that the function postcondition is a relation between the return
value rv and the formal parameters. A function’s postcondition summarizes
the effects of calling it. We use these summaries to replace function calls in
basic paths.

Remark 5.16. The postconditions of the functions of a program need only


include information that is relevant for proving the given specification, so the
summaries may be incomplete. Ignoring irrelevant aspects of functions reduces
the size of annotations. Chapter 6 discusses techniques for developing function
specifications.

The replacement of function calls by function summaries makes the listing


of basic paths (and the resulting analysis described in Section 5.2.4) local to
functions. Basic paths do not span multiple functions. However, recall that the
function postcondition is guaranteed to hold on return only when the function
precondition is satisfied on entry. To ensure that the precondition is satisfied,
each instance of a function call generates an extra basic path in which the
called function’s precondition is asserted. An example clarifies this discussion.

Example 5.17. Figure 5.8 lists BinarySearch with its function specification.
BinarySearch contains two (recursive) function calls. In Figure 5.20, each func-
tion call is protected by a function call assertion at R1 and R2 . Each asser-
tion is constructed by applying a substitution to BinarySearch’s precondition
132 5 Program Correctness: Mechanics

F [a, ℓ, u, e] : 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u) .

The first function call is BinarySearch(a, m + 1, u, e), so the function call as-
sertion at R1 is F σ1 , where

σ1 : {a 7→ a, ℓ 7→ m + 1, u 7→ u, e 7→ e} .

The notation of Section 1.5 allows us to write F [a, m + 1, u, e].


The second function call is BinarySearch(a, ℓ, m − 1, e), so the function call
assertion at R2 is F [a, ℓ, m−1, e]. These assertions are treated in the same way
as other assertions, such as runtime assertions. So far, we have the following
basic paths:
(1)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ > u;
rv := false;
@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

(2)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] = e;
rv := true;
@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

(3)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] < e;
@R1 : 0 ≤ m + 1 ∧ u < |a| ∧ sorted(a, m + 1, u)

(5)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] ≥ e;
@R2 : 0 ≤ ℓ ∧ m − 1 < |a| ∧ sorted(a, ℓ, m − 1)

Because BinarySearch lacks loops, each basic path starts at the function pre-
condition.
5.2 Partial Correctness 133

It remains to consider paths (4) and (6), which pass through the recursive
function calls and end at the postcondition. Paths (3) and (5) end in the
function call assertions at R1 and R2 protecting these function calls. Since
they assert that the called BinarySearch’s precondition holds, we can assume
that the returned values obey the postcondition of BinarySearch in each of
the calling contexts. Therefore, we can use the function postcondition as a
summary of the function call:
(4)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] < e;
assume v1 ↔ ∃i. m + 1 ≤ i ≤ u ∧ a[i] = e;
rv := v1 ;
@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

The lines

assume v1 ↔ ∃i. m + 1 ≤ i ≤ u ∧ a[i] = e;


rv := v1 ;

arise as follows. First, translate the return statement

return BinarySearch(a, m + 1, u, e);

into an assignment to rv , as usual:

rv := BinarySearch(a, m + 1, u, e);

Next, given that the precondition holds (from path (3)), assume that the
postcondition holds. Therefore, summarize the function call with a relation
based on BinarySearch’s postcondition,

G[a, ℓ, u, e, rv] : rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e .

Specifically, the relation is G[a, m + 1, u, e, v1 ], where v1 is a fresh variable


that captures the return value. In the basic path, assume this relation; then
use the return value v1 in the assignment:

assume G[a, m + 1, u, e, v1 ];
rv := v1 ;

These are the penultimate lines of (4). Hence, (4) replaces the function call
BinarySearch(a, m + 1, u, e) with a summary based on the function postcon-
dition. Now reasoning about the basic path does not require reasoning about
all of BinarySearch at once.
134 5 Program Correctness: Mechanics

@pre

(3),(4) (5),(6)

R1 (1) (2) R2

(4) (6)

@post

Fig. 5.21. Visualization of basic paths of BinarySearch

Construct the final basic path for the function call BinarySearch(a, ℓ, m −
1, e) similarly:
(6)
@pre 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] ≥ e;
assume v2 ↔ ∃i. ℓ ≤ i ≤ m − 1 ∧ a[i] = e;
rv := v2 ;
@post rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

Again, v2 is a fresh variable.


Figure 5.21 visualizes these basic paths. Paths (4) and (6) are shown to
pass through locations R1 and R2 , respectively. 
For the general case, consider function f with prototype
@pre F [p1 , . . . , pn ]
@post G[p1 , . . . , pn , rv ]
type0 f(type1 p1 , . . . , typen pn )
Suppose that f is called in context
w := f(e1 , . . . , en );
where e1 , . . . , en are expressions. Then augment the calling context with the
function call assertion:
@ F [e1 , . . . , en ];
w := f(e1 , . . . , en );
Treat this new assertion the same as any assertion: it results in at least one
basic path ending in
...
@ F [e1 , . . . , en ]
5.2 Partial Correctness 135

Finally, in basic paths that pass through the function call, replace the function
call by an assumption and assignment constructed from the postcondition,
where v is a fresh variable:
...
assume G[e1 , . . . , en , v];
w := v;
...

Note that rv need not have type bool as in BinarySearch. For example, for
a function with prototype
@pre ⊤
@post rv ≥ x
int g(int x)
the statement
w := g(n + 1);
is summarized in basic paths as follows:
assume v ≥ n + 1;
w := v;

5.2.3 Program States

Before presenting the final step in proving partial correctness, we formalize


program state. A program state s is an assignment of values of the proper
type to all variables. The program variables include a distinguished variable
pc, the program counter. It holds the current location of control.
Example 5.18. The state
s : {pc 7→ L1 , a 7→ [2; 0; 1], i 7→ 2, j 7→ 0, t 7→ 2, rv 7→ []}
is a state of BubbleSort in which control resides at L1 . 
A state can be extended to a logical interpretation. Suppose that T is the
theory that captures the functions (+, −, etc.) and predicates (=, <, etc.)
of the program. Extend a state s to a logical T -interpretation I : (DI , αI ):
let DI be all values of the program types; and construct αI by using the as-
signments of s and adding assignments for all logical functions and predicates
so that I is a T -interpretation. Subsequently, when we say a state, we mean
either the assignment of program variables to values or the extension to a T -
interpretation for the appropriate theory T , depending on the context. When
we write s |= F , we mean that I |= F for a T -interpretation I extending s.
136 5 Program Correctness: Mechanics

•s′

•s F

wp(F, S)

Fig. 5.22. Weakest precondition

Example 5.19. To extend

s : {pc 7→ L1 , a 7→ [2; 0; 1], i 7→ 2, j 7→ 0, t 7→ 2, rv 7→ []}

to a (TZ ∪ TA )-interpretation I : (DI , αI ), let DI be the set of all integers and


arrays of integers; and for αI , assign the standard functions to ·[·], ·h· ⊳ ·i, +,
−, etc. 

5.2.4 Verification Conditions

Our goal is to reduce an annotated function to a finite set of FOL formu-


lae, called verification conditions, such that their validity implies that the
function’s behavior agrees with its annotations. The reduction to basic paths
reduces reasoning about the function to reasoning about a finite set of ba-
sic paths. The final reduction from basic paths to VCs requires a mechanism
for incorporating the effects of program statements into FOL formulae. The
weakest precondition predicate transformer is the mechanism. A predi-
cate transformer p is a function

p : FOL × stmts → FOL

that maps a FOL formula F ∈ FOL and program statement S ∈ stmts to a


FOL formula.
The weakest precondition wp(F, S) has the defining characteristic that if
state s is such that

s |= wp(F, S)

and if statement S is executed on state s to produce state s′ , then

s′ |= F .

This situation is visualized in Figure 12.1(a). The region labeled F is the set
of states that satisfy F ; similarly, the region labeled wp(F, S) is the set of
5.2 Partial Correctness 137

states that satisfy wp(F, S). Every state s on which executing statement S
leads to a state s′ in the F region must be in the wp(F, S) region.
Define the weakest precondition for the two statement types of basic paths
introduced in Section 5.2.1:
• Assumption: What must hold before statement assume c is executed to
ensure that F holds afterward? If c → F holds before, then satisfying c in
assume c guarantees that F holds afterward:
wp(F, assume c) ⇔ c → F
• Assignment : What must hold before statement v := e is executed to ensure
that F [v] holds afterward? If F [e] holds before, then assigning e to v with
v := e makes F [v] hold afterward:
wp(F [v], v := e) ⇔ F [e]
For a sequence of statements S1 ; . . . ; Sn , define
wp(F, S1 ; . . . ; Sn ) ⇔ wp(wp(F, Sn ), S1 ; . . . ; Sn−1 ) .
The weakest precondition moves a formula backward over a sequence of state-
ments: for F to hold after executing S1 ; . . . ; Sn , wp(F, S1 ; . . . ; Sn ) must hold
before executing the statements. Because basic paths have only assumption
and assignment statements, the definition of wp is complete.
Then the verification condition of basic path

@F
S1 ;
..
.
Sn ;
@G

is
F → wp(G, S1 ; . . . ; Sn ) .
Its validity implies that when F holds before the statements of the path are
executed, then G holds afterward. Traditionally, this verification condition is
denoted by the Hoare triple
{F }S1 ; . . . ; Sn {G} .
Example 5.20. Consider the basic path
(1)
@x≥0
x := x + 1;
@x≥1

The VC is
138 5 Program Correctness: Mechanics

{x ≥ 0}x := x + 1{x ≥ 1} : x ≥ 0 → wp(x ≥ 1, x := x + 1)

so compute

wp(x ≥ 1, x := x + 1)
⇔ (x ≥ 1){x 7→ x + 1}
⇔ x+1≥1
⇔ x≥0

Simplifying the VC based on this computation produces

x≥0 → x≥0,

which is clearly TZ -valid. 

Example 5.21. We generate the VC corresponding to basic path (2) in Ex-


ample 5.13 (LinearSearch):
(2)
@L : F : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
S1 : assume i ≤ u;
S2 : assume a[i] = e;
S3 : rv := true;
@post G : rv ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e

The VC is

F → wp(G, S1 ; S2 ; S3 ) ,

so compute

wp(G, S1 ; S2 ; S3 )
⇔ wp(wp(rv ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e, rv := true), S1 ; S2 )
⇔ wp(true ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e, S1 ; S2 )
⇔ wp(∃j. ℓ ≤ j ≤ u ∧ a[j] = e, S1 ; S2 )
⇔ wp(wp(∃j. ℓ ≤ j ≤ u ∧ a[j] = e, assume a[i] = e), S1 )
⇔ wp(a[i] = e → ∃j. ℓ ≤ j ≤ u ∧ a[j] = e, S1 )
⇔ wp(a[i] = e → ∃j. ℓ ≤ j ≤ u ∧ a[j] = e, assume i ≤ u)
⇔ i ≤ u → (a[i] = e → ∃j. ℓ ≤ j ≤ u ∧ a[j] = e)

Replacing F and wp(G, S1 ; S2 ; S3 ) according to the computed formula results


in the VC

ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)


→ (i ≤ u → (a[i] = e → ∃j. ℓ ≤ j ≤ u ∧ a[j] = e))

or, equivalently,

ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e) ∧ i ≤ u ∧ a[i] = e


→ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e
5.2 Partial Correctness 139

according to the equivalence

F1 ∧ F2 → (F3 → (F4 → F5 )) ⇔ (F1 ∧ F2 ∧ F3 ∧ F4 ) → F5 .


This formula is (TZ ∪ TA )-valid. 
Example 5.22. For basic path (3) of Example 5.13 (LinearSearch),
(3)
@L : F : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
S1 : assume i ≤ u;
S2 : assume a[i] 6= e;
S3 : i := i + 1;
@L : G : ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)

we use a different strategy to compute the corresponding VC


F → wp(G, S1 ; S2 ; S3 ) .

We collect all modifications to G during applications of wp and then apply


them all at once. Compute
wp(G, S1 ; S2 ; S3 )
⇔ wp(wp(G, i := i + 1), S1 ; S2 )
⇔ wp(G{i 7→ i + 1}, S1 ; S2 )
⇔ wp(wp(G{i 7→ i + 1}, assume a[i] 6= e), S1 )
⇔ wp(a[i] 6= e → G{i 7→ i + 1}, S1 )
⇔ wp(a[i] 6= e → G{i 7→ i + 1}, assume i ≤ u)
⇔ i ≤ u → a[i] 6= e → G{i 7→ i + 1}
Then the VC is
F → i ≤ u → a[i] 6= e → G{i 7→ i + 1} ,

or
ℓ ≤ i ∧ (∀j. ℓ ≤ j < i → a[j] 6= e) ∧ i ≤ u ∧ a[i] 6= e
→ ℓ ≤ i + 1 ∧ ∀j. ℓ ≤ j < i + 1 → a[j] 6= e ,

which is (TZ ∪ TA )-valid. 


Example 5.23. Consider basic path (2) of Example 5.17 (BinarySearch):
(2)
@pre F : 0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u)
S1 : assume ℓ ≤ u;
S2 : m := (ℓ + u) div 2;
S3 : assume a[m] = e;
S4 : rv := true;
@post G : rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

The VC is
140 5 Program Correctness: Mechanics

F → wp(G, S1 ; S2 ; S3 ; S4 ) ,

so compute

wp(G, S1 ; S2 ; S3 ; S4 )
⇔ wp(wp(G, rv := true), S1 ; S2 ; S3 )
⇔ wp(G{rv 7→ true}, S1 ; S2 ; S3 )
⇔ wp(wp(G{rv 7→ true}, assume a[m] = e), S1 ; S2 )
⇔ wp(a[m] = e → G{rv 7→ true}, S1 ; S2 )
⇔ wp(wp(a[m] = e → G{rv 7→ true}, m := (ℓ + u) div 2), S1 )
⇔ wp((a[m] = e → G{rv 7→ true}){m 7→ (ℓ + u) div 2}, S1 )
⇔ wp((a[m] = e → G{rv 7→ true}){m 7→ (ℓ + u) div 2}
, assume ℓ ≤ u)
⇔ ℓ ≤ u → (a[m] = e → G{rv 7→ true}){m 7→ (ℓ + u) div 2}

Applying the substitutions produces

ℓ ≤ u → a[(ℓ + u) div 2] = e → G{rv 7→ true, m 7→ (ℓ + u) div 2} .

Simplifying the VC accordingly

0 ≤ ℓ ∧ u < |a| ∧ sorted(a, ℓ, u) ∧ ℓ ≤ u ∧ a[(ℓ + u) div 2] = e


→ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e

reveals that it is (TZ ∪ TA )-valid: ℓ ≤ u from the antecedent implies that


ℓ ≤ (ℓ + u) div 2 ≤ u, so that a[(ℓ + u) div 2] = e implies that ∃i. ℓ ≤ i ≤
u ∧ a[i] = e. 

Example 5.24. Consider basic path (3) of Example 5.14 (BubbleSort):


 (3) 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i
@L2 : F :  ∧ partitioned(a, 0, i, i + 1, |a| − 1) 
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)
S1 : assume j < i;
S2 : assume a[j] > a[j + 1];
S3 : t := a[j];
S4 : a[j] := a[j + 1];
S5 : a[j + 1] := t;
S6 : j := j+ 1; 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i
@L2 : G :  ∧ partitioned(a, 0, i, i + 1, |a| − 1) 
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)

The VC is

F → wp(G, S1 ; S2 ; S3 ; S4 ; S5 ; S6 )

so compute
5.2 Partial Correctness 141

wp(G, S1 ; S2 ; S3 ; S4 ; S5 ; S6 )
⇔ wp(wp(G, j := j + 1), S1 ; S2 ; S3 ; S4 ; S5 )
⇔ wp(G{j 7→ j + 1}, S1 ; S2 ; S3 ; S4 ; S5 )
⇔ wp(wp(G{j 7→ j + 1}, a[j + 1] := t), S1 ; S2 ; S3 ; S4 )
⇔ wp(G{j 7→ j + 1}{a 7→ ahj + 1 ⊳ ti}, S1 ; S2 ; S3 ; S4 )

Array assignment a[j + 1] := t translates to the functional substitution {a 7→


ahj + 1 ⊳ ti} according to the theory of arrays, TA . That is, the assignment
is modeled by substituting ahj + 1 ⊳ ti for every instance a affected by the
assignment. Continuing,

⇔ wp(G{j 7→ j + 1, a 7→ ahj + 1 ⊳ ti}, S1 ; S2 ; S3 ; S4 )


⇔ wp(wp(G{j 7→ j + 1, a 7→ ahj + 1 ⊳ ti}, a[j] := a[j + 1])
, S1 ; S2 ; S3 )
⇔ wp(G{j 7→ j + 1, a 7→ ahj + 1 ⊳ ti}{a 7→ ahj ⊳ a[j + 1]i}
, S1 ; S2 ; S3 )
⇔ wp(G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ ti}, S1 ; S2 ; S3 )

The array assignment S4 similarly translates to a substitution. Composing


the two substitutions produces the substitution in which the doubly-modified
array ahj ⊳ a[j + 1]ihj + 1 ⊳ ti replaces a. Then

⇔ wp(wp(G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ ti}, t := a[j])


, S1 ; S2 )
⇔ wp(G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ ti}{t 7→ a[j]}
, S1 ; S2 )
⇔ wp(G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}
, S1 ; S2 )

Here, the composition of substitutions results in applying {t 7→ a[j]} to the


term ahj ⊳ a[j + 1]ihj + 1 ⊳ ti, which contains t. To finish,

⇔ wp(wp(G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}


, assume a[j] > a[j + 1])
, S1 )
⇔ wp(a[j] > a[j + 1]
→ G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}
, S1 )
⇔ wp(a[j] > a[j + 1]
→ G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}
, assume j < i)
⇔ j < i → a[j] > a[j + 1]
→ G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}
⇔ j < i ∧ a[j] > a[j + 1]
→ G{j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}

Finally, applying the substitution


142 5 Program Correctness: Mechanics

σ : {j 7→ j + 1, a 7→ ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, t 7→ a[j]}

to G produces Gσ:

1 ≤ i < |ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i| ∧ 0 ≤ j + 1 ≤ i


∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, i, i + 1,
|ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i| − 1)
∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, j, j + 1, j + 1)
∧ sorted(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, i, |ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i| − 1)

The length function | · | obeys the following axiom:

∀a, i, e. |ahi ⊳ ei| = |a| (array length)

In other words, modifying an array’s elements does not change its size. Under
this axiom, Gσ is equivalent to

1 ≤ i < |a| ∧ 0 ≤ j + 1 ≤ i
∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, i, i + 1, |a| − 1)
∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, j, j + 1, j + 1)
∧ sorted(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, i, |a| − 1)

Thus, the VC is
 
1 ≤ i < |a| ∧ 0 ≤ j ≤ i
 ∧ partitioned(a, 0, i, i + 1, |a| − 1)
 
 ∧ partitioned(a, 0, j − 1, j, j) 
 
 ∧ sorted(a, i, |a| − 1) 
∧ j < i ∧ a[j] > a[j + 1] 
1 ≤ i < |a| ∧ 0 ≤ j + 1 ≤ i
 ∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, i, i + 1, |a| − 1)
→   ∧ partitioned(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, 0, j, j + 1, j + 1) 

∧ sorted(ahj ⊳ a[j + 1]ihj + 1 ⊳ a[j]i, i, |a| − 1)

which is (TZ ∪ TA )-valid. 

5.2.5 P -Invariant and P -Inductive

Let us consider the inductive assertion method abstractly.


A program is a set of functions with some distinguished entry function.
Let P be a program with distinguished entry function f such that f has func-
tion precondition Fpre and initial location L0 . A computation of program P ,
or a P -computation, is a sequence of states

s 0 , s1 , s2 , . . .

such that
5.3 Total Correctness 143

1. s0 [pc] = L0 and s0 |= Fpre ;


2. and for each index i, si+1 is a result of executing the instruction at si [pc]
on state si .
The notation si [pc] indicates the value of pc given by state si . A computation
may be finite or infinite. Infinite computations arise when a program contains
a function that does not always halt (either through recursion or looping).
A formula F annotating location L of program P is P -invariant (also, is
an invariant of P , or is a P -invariant) iff for all P -computations s0 , s1 , s2 , . . .
and for each index i, if si [pc] = L then si |= F . The annotations of P are
P -invariant iff each annotation is P -invariant. This definition is not imple-
mentable: checking if F is P -invariant requires checking an infinite number of
P -computations in general, even if all computations are finite.
Instead, we use the inductive assertion method introduced in this chapter.
If all verification conditions generated from program P are T -valid (in the
proper theory T ), then the annotations are P -inductive (also, are inductive
P -invariants) and therefore P -invariant. In summary, we have the following
theorem.

Theorem 5.25 (Verification Conditions). If for every basic path

@Li : F
S1 ;
..
.
Sn ;
@Lj : G

of program P , the verification condition

{F }S1 ; . . . ; Sn {G}

is valid (in the appropriate theory), then the annotations are P -invariant. In
particular, the annotations are P -inductive.

In other words, when P ’s annotations are P -inductive, each function of P


obeys its specification.
Henceforth, when P is obvious from the context, we say invariant instead
of P -invariant and inductive instead of P -inductive.

5.3 Total Correctness


Partial correctness is just one step in proving the total correctness of a
function or program. Total correctness of a function asserts that if the input
satisfies the function precondition, then the function eventually halts and pro-
duces output that satisfies its function postcondition. In other words, total
144 5 Program Correctness: Mechanics

correctness requires proving partial correctness and that the function always
halts on input satisfying its precondition. We focus now on the latter task.
Proving function termination is based on well-founded relations (see Chap-
ter 4). Define a set S with a well-founded relation ≺. Then find a function δ
mapping program states to S such that δ decreases according to ≺ along every
basic path. Since ≺ is well-founded, there cannot exist an infinite sequence of
program states; otherwise, they would map to an infinite decreasing sequence
in S. The function δ is called a ranking function.

Choosing Well-Founded Relations and Ranking Functions

The concept of well-founded relations is the fundamental principle of this


method. The ranking function itself is really just a convenience. One could
directly construct a well-founded relation over the program states and show
that the output state of each basic path is less than the input state according
to this relation. However, it is often conceptually easier to find a function
that maps states to a known set S with a known well-founded relation ≺. For
example, we usually choose as S the set of n-tuples of natural numbers and as
≺ the lexicographic extension <n of < to n-tuples of natural numbers, where
n varies according to the application.
As with partial correctness, we consider total correctness in the context
of loops first and then in the context of recursion. Section 6.2 examines a
program with both loops and recursion.

Example 5.26. Figure 5.23 lists BubbleSort with ranking annotations. It con-
tains one new type of annotation: ↓ (i + 1, i + 1) and ↓ (i + 1, i − j) assert that
the functions (i + 1, i + 1) and (i + 1, i − j), respectively, are ranking functions.
These functions map states of BubbleSort onto pairs of natural numbers S : N2
with well-founded relation <2 . Intuitively, we have captured two separate ar-
guments. The outer loop eventually finishes because i decreases to 0; hence
i + 1 decreases as well. Why do we use i + 1 rather than i? When |a| = 0, the
initial assignment to i is −1. While i + 1 is always nonnegative, i is not; and
recall that we want to map into the natural numbers. The inner loop halts
because j increases to i; hence i − j decreases to 0. Therefore, our intuition
tells us that i + 1 is important for the outer loop, and i − j is important for
the inner loop.
Placing these two functions i + 1 and i − j together as a pair (i + 1, i − j)
provides the annotation for the inner loop. We expect i + 1 to remain constant
while the inner loop is executing and decreasing i − j. For the annotation of
the outer loop, we note that i + 1 > i − j = i − 0 = i on entry to the inner
loop, so that (i + 1, i + 1) >2 (i + 1, i − j).
The loop annotations assert that the ranking functions (i + 1, i + 1) and
(i + 1, i − j) map program states to pairs of natural numbers. Hence, we need
to prove that the loop annotations are inductive using the inductive assertion
5.3 Total Correctness 145

@pre ⊤
@post ⊤
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for
@L1 : i + 1 ≥ 0
↓ (i + 1, i + 1)
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : i + 1 ≥ 0 ∧ i − j ≥ 0
↓ (i + 1, i − j)
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 5.23. BubbleSort with annotations to prove termination

method. We leave this step to the reader. It only remains to prove that the
functions decrease along each basic path.
The relevant basic paths are the following:
(1)
@L1 : i + 1 ≥ 0
↓L1 : (i + 1, i + 1)
assume i > 0;
j := 0;
↓L2 : (i + 1, i − j)

(2)
@L2 : i + 1 ≥ 0 ∧ i − j ≥ 0
↓L2 : (i + 1, i − j)
assume j < i;
assume a[j] > a[j + 1];
t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
j := j + 1;
↓L2 : (i + 1, i − j)
146 5 Program Correctness: Mechanics

(3)
@L2 : i + 1 ≥ 0 ∧ i − j ≥ 0
↓L2 : (i + 1, i − j)
assume j < i;
assume a[j] ≤ a[j + 1];
j := j + 1;
↓L2 : (i + 1, i − j)

(4)
@L2 : i + 1 ≥ 0 ∧ i − j ≥ 0
↓L2 : (i + 1, i − j)
assume j ≥ i;
i := i − 1;
↓L1 : (i + 1, i + 1)

The paths entering and exiting the outer loop at L1 are not relevant for the
termination argument. The entering path does not begin with a ranking func-
tion annotation, so there is nothing to prove. The exiting path leads to the
return statement.
For termination purposes, paths (2) and (3) can be treated the same:

@L2 : i + 1 ≥ 0 ∧ i − j ≥ 0
↓L2 : (i + 1, i − j)
assume j < i;
···
j := j + 1;
↓L2 : (i + 1, i − j)

The excluded statements do not impact the value of the ranking functions.


Verification Conditions

A basic path beginning and ending with a ranking function

@F
↓ δ[x]
S1 ;
..
.
Sk ;
↓ κ[x]

induces a verification condition


5.3 Total Correctness 147

F → wp(κ ≺ δ[x0 ], S1 ; · · · ; Sk ){x0 7→ x} .

In words, we must prove that the value of κ after executing statements


S1 ; · · · ; Sk is less than the value of δ before executing the statements. There-
fore, we rename the variables of δ to something new (x0 in this case), compute
the weakest precondition of κ ≺ δ[x0 ] across the statements S1 ; · · · ; Sk , and
then rename the new variables back to their original names (x in this case).
This renaming process preserves the value of δ[x] in its context. The anno-
tation F can provide extra invariant information with which to prove this
relation.

Example 5.27. Let us return to the proof of Example 5.26 that BubbleSort
halts. Path (1) induces the following verification condition:

i + 1 ≥ 0 ∧ i > 0 → (i + 1, i − 0) <2 (i + 1, i + 1) ,

which is valid. Paths (2) and (3) induce the verification condition:

i + 1 ≥ 0 ∧ i − j ≥ 0 ∧ j < i → (i + 1, i − (j + 1)) <2 (i + 1, i − j) ,

which is valid. Finally, (4) induces the verification condition

i + 1 ≥ 0 ∧ i − j ≥ 0 ∧ j ≥ i → ((i − 1) + 1, (i − 1) + 1) <2 (i + 1, i − j) ,

which is also valid. Hence, BubbleSort always halts. Combined with the proof
of the sortedness property, we can now say that BubbleSort is totally correct
with respect to its specification: it always halts and returns a sorted array.
Let us work through the construction of the final verification condition
for basic path (4). First, replace i and j with i0 and j0 , respectively, in the
function annotating L2 : (i0 + 1, i0 − j0 ). Then compute

wp((i + 1, i + 1) <2 (i0 + 1, i0 − j0 ), assume j ≥ i; i := i − 1)


⇔ wp(((i − 1) + 1, (i − 1) + 1) <2 (i0 + 1, i0 − j0 ), assume j ≥ i)
⇔ j ≥ i → (i, i) <2 (i0 + 1, i0 − j0 )

Now, having preserved the original value of (i + 1, i − j) at L2 , replace i0 and


j0 with i and j, respectively:

j ≥ i → (i, i) <2 (i + 1, i − j) .

Noting that (4) begins by asserting i + 1 ≥ 0 ∧ i − j ≥ 0, we have our


verification condition:

i + 1 ≥ 0 ∧ i − j ≥ 0 ∧ j ≥ i → (i, i) <2 (i + 1, i − j) .

In this proof, the loop annotations (other than the ranking functions) do
not have any bearing on the termination argument. Their purpose is only to
prove that the given functions map to the natural numbers. 
148 5 Program Correctness: Mechanics

@pre u − ℓ + 1 ≥ 0
@post ⊤
↓ u−ℓ+1
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
int m := (ℓ + u) div 2;
if (a[m] = e) return true;
else if (a[m] < e) return BinarySearch(a, m + 1, u, e);
else return BinarySearch(a, ℓ, m − 1, e);
}
}

Fig. 5.24. BinarySearch always halts

Besides loops, recursion is another source of nontermination and hence re-


quires ranking annotations as well. Again, we break the termination argument
down to the level of basic paths.

Example 5.28. Figure 5.24 lists the BinarySearch function. u − ℓ + 1 maps


the formal parameters of BinarySearch to S : N with well-founded relation <.
Why do we use u − ℓ + 1 as the ranking function? Intuitively, the interval
[ℓ, u] shortens at each level of recursion, so u − ℓ is a good guess at a ranking
function. However, it may be that ℓ > u so that u − ℓ does not map into N;
but in this case ℓ = u + 1, so u − ℓ + 1 does map into N.
We now need to prove two properties of u − ℓ + 1:
1. Because u − ℓ + 1 has type int, we must prove that u − ℓ + 1 in fact maps
to N: whenever u − ℓ + 1 is evaluated at function entry, it must be the
case that u − ℓ + 1 ≥ 0.
2. We must prove that u−ℓ+1 decreases from function entry to each recursive
call.
The function precondition asserts the first property, so we need only prove that
the function specification is inductive as usual. To prove the second property,
we reduce the argument to basic paths: u − ℓ + 1 should decrease across each
basic path.
Convince yourself that the annotations other than the ranking argument
are inductive.
We now consider the ranking argument. The relevant basic paths of the
function are the following:
5.4 Summary 149

(1)
@pre u − ℓ + 1 ≥ 0
↓ u−ℓ+1
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] < e;
↓ u − (m + 1) + 1

(2)
@pre u − ℓ + 1 ≥ 0
↓ u−ℓ+1
assume ℓ ≤ u;
m := (ℓ + u) div 2;
assume a[m] 6= e;
assume a[m] ≥ e;
↓ (m − 1) − ℓ + 1

Two other basic paths exist from function entry to the first two return state-
ments; however, as the recursion ends at each, they are irrelevant to the ter-
mination argument.
The basic paths induce two verification conditions. Before examining them,
notice that the assume statements about a[m] are irrelevant to the termination
argument. Now, the first VC is

u − ℓ + 1 ≥ 0 ∧ ℓ ≤ u ∧ ···
→ u − (((ℓ + u) div 2) + 1) + 1 < u − ℓ + 1 ,

where · · · elides the literals involving a[m]. It is TZ -valid. The VC

u − ℓ + 1 ≥ 0 ∧ ℓ ≤ u ∧ ···
→ (((ℓ + u) div 2) − 1) − ℓ + 1 < u − ℓ + 1

for the second basic path is also TZ -valid, so BinarySearch halts on all input
in which ℓ is initially at most u + 1.
Section 6.2 provides an alternative to using the awkward ranking function
u − ℓ + 1. Additionally, the argument proves termination on all input. 

5.4 Summary
This chapter introduces the specification and verification of sequential pro-
grams. It covers:
• The programming language pi.
150 5 Program Correctness: Mechanics

• Program specification. Specifying a program involves writing function pre-


conditions and function postconditions as first-order assertions. A func-
tion precondition asserts on which inputs the function is defined, while a
function postcondition asserts the form of the returned data. Other anno-
tations: loop invariants, assertions, runtime assertions.
• Partial correctness, which guarantees that if a function halts and the in-
put satisfied the function precondition, then its returned value satisfies the
function postcondition. Partial correctness is proved via an inductive argu-
ment. Additional annotations strengthen the inductive hypothesis. Basic
paths, program state, verification conditions, inductive invariants.
• Total correctness, which guarantees additionally that the program or func-
tion always halts. Total correctness requires mapping, via a ranking func-
tion, program states to a domain with a well-founded relation and proving
that the mapped values in the domain decrease as computation progresses.
Typically, proving termination requires additional partial correctness an-
notations.
Chapter 6 discusses strategies for applying the techniques of this chapter.
The methods introduced in this chapter are fundamental to verification
of software and hardware. However, we present them in a simple context.
Decades of research have made much more possible.
The validity of verification conditions listed in examples or produced by
programs in the chapter are all decided using the decision procedures dis-
cussed in Part II. We have focused on specifications that can be expressed in
the fragments of theories introduced in Chapter 3. More complex specifica-
tions may require general mechanical theorem proving. However, mechanical
theorem provers often rely on the decision procedures of Part II when possible
for speed and to minimize human interaction.
Chapter 12 introduces algorithms for deducing annotations.

Bibliographic Remarks

Formally proving program correctness has been a subject of active research for
five decades. McCarthy argues in [59, 58] for a “mathematical science of com-
putation”. Floyd [34] and Hoare [39] introduce the main concepts for proving
property invariance and termination. In particular, they develop Floyd-Hoare
logic. Manna describes a verification style similar to ours [52]. The weakest
precondition predicate transformer was first formalized by Dijkstra [28].
King describes in his thesis [50] the idea of a verifying compiler, which
generates and proves during compilation the verification conditions that arise
from program annotations. See [27] for a discussion of the Extended Static
Checker, a verifying compiler for Java.
Exercises 151

@pre p(a0 )
@post sorted(rv , 0, |rv | − 1)
int[] InsertionSort(int[] a0 ) {
int[] a := a0 ;
for
@ r1 (a, a0 , i, j)
(int i := 1; i < |a|; i := i + 1) {
int t := a[i];
for
@ r2 (a, a0 , i, j)
(int j := i − 1; j ≥ 0; j := j − 1) {
if (a[j] ≤ t) break;
a[j + 1] := a[j];
}
a[j + 1] := t;
}
return a;
}

Fig. 5.25. InsertionSort for Exercise 5.1(a)

Exercises
5.1 (Basic paths). For each of the following functions, replace each @pre ⊤
with a fresh predicate p over the function parameters, each @post ⊤ with a
fresh predicate q over rv and the function parameters, and each @ ⊤ with a
fresh predicate r over the function variables. As an example, see Figure 5.25
for the replacements for part (a). Then list the basic paths.
(a) InsertionSort of Figure 6.8.
(b) merge of Figure 6.9.
(c) ms of Figure 6.9.

5.2 (Weakest precondition). Compute the following formulae:


(a) wp(x ≥ 0, x := x − k; assume k ≤ 1)
(b) wp(x ≥ 0, assume k ≤ x; x := x − k)
(c) wp(x ≥ 0, x := x − k; assume k ≤ x)
(d) wp(x + 2y ≥ 3, x := x + 1; assume x > 0; y := y + x)

5.3 (Verification condition generation). Generate the VCs for the follow-
ing basic paths:
(1)
@ x > 0;
x := x − k;
assume k ≤ 1;
@ x ≥ 0;
152 5 Program Correctness: Mechanics

(2)
@ ⊤;
assume k ≤ x;
x := x − k;
@ x ≥ 0;

(3)
@ ⊤;
x := x − k;
assume k ≤ x;
@ x ≥ 0;

(4)
@ k ≥ 0;
x := x − k;
assume k ≤ x;
@ x ≥ 0;

(5)
@ y ≥ 0;
x := x + 1;
assume x > 0;
y := y + x;
@ x + 2y ≥ 3;

Which are TZ -valid? Which are TQ -valid?


5.4 (Verification conditions). For each basic path generated in Exercise
5.1, list the corresponding VCs.
5.5 (Public functions). Example 5.6 asserts that a function that is acces-
sible outside a module should have a reasonable function precondition, such
as ⊤. Implement verified public wrapper functions to LinearSearch and Bina-
rySearch for searching an entire array. The function preconditions should be
reasonable.
5.6 (The div function). Integer division, even by a constant, is not a func-
tion of TZ ; however, it is useful for reasoning about programs like BinarySearch.
Show how basic paths that include the div function can be altered to use only
standard linear arithmetic. Hint : Use an additional assume statement. How
does this change affect the resulting VCs?
5.7 (Ackermann function). Implement the ack function of Example 4.9 as
a pi program and prove that it always halts.
6
Program Correctness: Strategies

The basis of our approach is the notion of an interpretation of a pro-


gram: that is, an association of a proposition with each connection in
the flow of control through a program.
— Robert W. Floyd
Assigning Meanings to Programs, 1967
As in other applications of mathematical induction (see Example 4.1),
the main challenge in applying the inductive assertion method is discovering
extra information to make the inductive argument succeed. Consider a typical
partial correctness property that asserts that a function’s output satisfies some
relation with its input. Assuming this property as the inductive hypothesis
does not provide any information about how the function behaves between
entry and exit. It is a weak hypothesis. Therefore, one must provide more
information about the function in the form of additional program annotations.
Section 6.1 discusses strategies for discovering this additional information.
Section 6.2 applies the strategies to prove that QuickSort always returns a
sorted array.

6.1 Developing Inductive Annotations


The machinery presented in Section 5.2 automatically reduces an annotated
function to a finite set of verification conditions. Furthermore, the decision
procedures discussed in Part II of this text make deciding the validity of
many verification conditions an automatable task. For example, all verification
conditions in this text fall within decidable fragments of FOL, unless otherwise
noted. Thus, determining if a program’s annotations are inductive can be
automated in many cases. Ongoing research in decision procedures continually
expands the set of annotations that produce decidable verification conditions.
Developing annotations is a different matter. As expected, writing a func-
tion specification requires human ingenuity, just as implementing the function
154 6 Program Correctness: Strategies

requires it. Of course, simple assertions, such as that a program is free of run-
time errors, can be generated automatically.
Writing the loop invariants also requires human ingenuity. A certain level
of human intervention is acceptable: the programmer ought to know certain
facts about her/his code. Loop invariants often capture insights into how the
code works and what it accomplishes. Developing the implementation and an-
notations simultaneously results in more robust systems. Finally, annotations
formally document code, facilitating better development in team projects.
However, Section 5.2.5 points out a fundamental limitation of the inductive
assertion method of program verification: loop invariants must be inductive
for the corresponding verification conditions to be valid, not just invariant.
Consequently, the programmer can assert many facts that are indeed invariant;
yet if the annotations are not inductive, the facts cannot be proved.
Much research addresses automatic (inductive) invariant discovery. For ex-
ample, algorithms exist for discovering linear and polynomial relations among
integer and real variables. Such invariants can, for example, provide loop in-
dex bounds, prove the lack of division by 0, or prove that an index into an
array is within bounds. Other methods exist for discovering the “shape” of
memory in programming languages with pointers, allowing, for example, the
partially automated analysis of linked lists. One of the most important roles
of automatic invariant discovery is strengthening the programmer’s annota-
tions into inductive annotations. Chapter 12 introduces invariant generation
procedures. However, no set of algorithms will ever fully replace humans in
writing verified software.
In this section, we suggest structured techniques for developing inductive
annotations to prove partial correctness. We emphasize that the methods are
just heuristics: human ingenuity is still the most important ingredient in form-
ing proofs.

6.1.1 Basic Facts

To begin a proof, include basic facts in loop invariants. Basic facts include loop
index ranges and other “obvious” facts. To be inductive, complex assertions
usually require these basic facts. We illustrate the development of basic facts
through several examples.
Example 6.1. Consider the loop of LinearSearch(see also Figure 5.1):

for
@L : ⊤
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}

Based on the initialization of i, the loop guard, and that i is only modified by
being incremented in the loop update, we know that at L,
6.1 Developing Inductive Annotations 155

ℓ≤i≤u+1 .

Notice the upper bound. It is a common mistake to forget that on the final
iteration, the loop guard is not true. Our basic annotation of the loop is the
following:

for
@L : ℓ ≤ i ≤ u + 1
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}

Example 6.2. Consider the loops of BubbleSort (see also Figure 5.3):

for
@L1 : ⊤
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : ⊤
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}

The outer loop index i ranges according to

−1 ≤ i < |a|

Why −1? If |a| = 0, then |a| − 1 = −1 so that i is initially −1. Keep in mind
that “corner cases” like this one are just as important as normal cases (and
perhaps even more important when considering correctness: corner cases are
often the source of bugs). In the inner loop, the range of i is more restricted:

0 < i < |a|

because of the outer loop guard.


In the inner loop, j ranges according to

0≤j≤i.

Therefore, our basic annotation of the two loops is the following:


156 6 Program Correctness: Strategies

for
@L1 : −1 ≤ i < |a|
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : 0 < i < |a| ∧ 0 ≤ j ≤ i
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}

Note that the loops modify just the elements of a, not a itself. Therefore,
we could add the annotation

|a| = |a0 |

to both loops. Such an annotation would be useful if the postcondition as-


serted, for example, that

|rv | = |a0 | .

For the property that we address (sorted(rv , 0, |rv | − 1)), this annotation is
not useful. 

6.1.2 The Precondition Method

Basic facts provide a foundation for more interesting information. The pre-
condition method (also called the “backward substitution” or “backward
propagation” method) is a strategy for developing more interesting informa-
tion in a structured way. Again, we emphasize that the method is a heuristic,
not an algorithm: it provides some guidance for the human rather than re-
placing the human’s intuition and ingenuity.
The precondition method consists of the following steps:
1. Identify a fact F that is known at one location L in the function (@L : F )
but that is not supported by annotations earlier in the function.
2. Repeat:
a) Compute the weakest preconditions of F backward through the func-
tion, ending at loop invariants or at the beginning of the function.
b) At each new annotation location L′ , generalize the new facts to new
formula F ′ (@L′ : F ′ ).
We illustrate the technique through examples.
6.1 Developing Inductive Annotations 157

Example 6.3. Consider the loop of LinearSearch (see also Figure 5.1), anno-
tated with basic facts:

for
@L : ℓ ≤ i ≤ u + 1
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
return false;

The postcondition of LinearSearch is

rv ↔ ∃i. ℓ ≤ i ≤ u ∧ a[i] = e .

Consider basic path (4) of Example 5.13 but with the current loop invariant
substituted for the first assertion:
(4)
@L : F1 : ℓ ≤ i ≤ u + 1
S1 : assume i > u;
S2 : rv := false;
@post F2 : rv ↔ ∃j. ℓ ≤ j ≤ u ∧ a[j] = e

Note that we continue to number basic paths as they were numbered in Ex-
ample 5.13. The VC

{F1 }S1 ; S2 {F2 } : ℓ ≤ i ≤ u + 1 ∧ i > u → ¬(∃j. ℓ ≤ j ≤ u ∧ a[j] = e)

is not (TZ ∪ TA )-valid. Essentially, the antecedent does not assert anything
useful about the content of a. Write the consequent as

F : ∀j. ℓ ≤ j ≤ u → a[j] 6= e

by pushing in the negation. F says that if LinearSearch exits via S2 , then no


element of a in the range [ℓ, u] is e. But F is not supported by the current
loop invariant at L. In short, F2 is a fact that is not supported by earlier
annotations.
Having identified an unsupported fact, we compute preconditions. To prop-
agate F2 back to the loop invariant, compute

wp(F2 , S1 ; S2 )
⇔ wp(wp(F2 , rv := false), S1 )
⇔ wp(F2 {rv 7→ false}, S1 )
⇔ wp(F2 {rv 7→ false}, assume i > u)
⇔ i > u → F2 {rv 7→ false}
⇔ i > u → ∀j. ℓ ≤ j ≤ u → a[j] 6= e

The final formula


158 6 Program Correctness: Strategies

G : i > u → ∀j. ℓ ≤ j ≤ u → a[j] 6= e ,


in particular the antecedent i > u, and some intuition suggests the general-
ization
G′ : ∀j. ℓ ≤ j < i → a[j] 6= e .
We compute one backward iteration through the loop to increase our confi-
dence:
(3)
@L : H : ?
S1 : assume i ≤ u;
S2 : assume a[i] 6= e;
S3 : i := i + 1;
@L : G : i > u → ∀j. ℓ ≤ j ≤ u → a[j] 6= e

Then
wp(G, S1 ; S2 ; S3 )
⇔ wp(wp(G, i := i + 1), S1 ; S2 )
⇔ wp(G{i := i + 1}, S1 ; S2 )
⇔ wp(wp(G{i := i + 1}, assume a[i] 6= e), S1 )
⇔ wp(a[i] 6= e → G{i := i + 1}, S1 )
⇔ wp(a[i] 6= e → G{i := i + 1}, assume i ≤ u)
⇔ i ≤ u → a[i] 6= e → G{i := i + 1}
⇔ i ≤ u ∧ a[i] 6= e ∧ i + 1 > u → ∀j. ℓ ≤ j ≤ u → a[j] 6= e
⇔ i = u ∧ a[u] 6= e → ∀j. ℓ ≤ j ≤ u → a[j] 6= e
⇔ i = u ∧ a[u] 6= e → ∀j. ℓ ≤ j ≤ u − 1 → a[j] 6= e
⇔ i = u ∧ a[u] 6= e → ∀j. ℓ ≤ j ≤ i − 1 → a[j] 6= e
To obtain the second-to-last line from the third-to-last, note that the an-
tecedent already asserts that a[u] 6= e; hence, its occurrence as the case j = u
of ∀j. ℓ ≤ j ≤ u · · · is redundant. The final line is realized by applying the
equality i = u to the upper bound on j. As we suspected, it seems that the
right bound on j should be related to the progress of i, rather than being fixed
to u. This observation from computing the weakest precondition matches our
intuition. One trick to generalize assertions is to replace fixed terms (bounds,
indices, etc.) with terms that evolve according to the loop counter.
Thus, we settle on the formula
G′ : ∀j. ℓ ≤ j < i → a[j] 6= e .
That is, all previously checked entries of a do not equal e. We add this assertion
to the loop invariant:
for
@L : ℓ ≤ i ≤ u + 1 ∧ (∀j. ℓ ≤ j < i → a[j] 6= e)
(int i := ℓ; i ≤ u; i := i + 1) {
if (a[i] = e) return true;
}
6.1 Developing Inductive Annotations 159

The result is similar to the annotation in Figure 5.15. Generating and checking
the corresponding VCs reveals that the annotations are inductive. 

Example 6.4. Consider the version of BinarySearch of Figure 5.12 that con-
tains runtime assertions but has only a trivial function specification ⊤. Using
the precondition method, we infer a function precondition that makes the an-
notations inductive. Contexts that call BinarySearch are then forced to obey
this function precondition, guaranteeing a lack of runtime errors.
Consider the path from function entry to the assertion protecting the array
access:
(·)
@pre H : ?
S1 : assume ℓ ≤ u;
S2 : m := (ℓ + u) div 2;
@ F : 0 ≤ m < |a|

Compute

wp(F, S1 ; S2 )
⇔ wp(wp(F, m := (ℓ + u) div 2), S1 )
⇔ wp(F {m 7→ (ℓ + u) div 2}, S1 )
⇔ wp(F {m 7→ (ℓ + u) div 2}, assume ℓ ≤ u)
⇔ ℓ ≤ u → F {m 7→ (ℓ + u) div 2}
⇔ ℓ ≤ u → 0 ≤ (ℓ + u) div 2 < |a|
⇐ 0 ≤ ℓ ∧ u < |a|

The final line implies the penultimate line, for if 0 ≤ ℓ ∧ u < |a| and ℓ ≤ u,
then both 0 ≤ ℓ < |a| and 0 ≤ u < |a|; hence, their mean is also in the range
[0, |a| − 1]. Therefore, it is guaranteed that

0 ≤ ℓ ∧ u < |a| → wp(F, S1 ; S2 )

is TZ -valid.
The formula 0 ≤ ℓ ∧ u < |a| appears as the function precondition in
Figure 6.1. The annotations are inductive, proving that the runtime assertion
0 ≤ m < |a| holds in every execution of BinarySearch in which the precondition
0 ≤ ℓ ∧ u < |a| is satisfied. 

Example 6.5. Consider the following code fragment of BubbleSort (see also
Figure 5.3)
160 6 Program Correctness: Strategies

@pre 0 ≤ ℓ ∧ u < |a|


@post ⊤
bool BinarySearch(int[] a, int ℓ, int u, int e) {
if (ℓ > u) return false;
else {
@ 2 6= 0;
int m := (ℓ + u) div 2;
@ 0 ≤ m < |a|;
if (a[m] = e) return true;
else if (a[m] < e) return BinarySearch(a, m + 1, u, e);
else return BinarySearch(a, ℓ, m − 1, e);
}
}

Fig. 6.1. BinarySearch with runtime assertions

for
@L1 : −1 ≤ i < |a|
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : 0 < i < |a| ∧ 0 ≤ j ≤ i
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
and its postcondition
F : sorted(rv , 0, |rv| − 1) .
Consider the path
(6)
@L1 : G : ?
S1 : assume i ≤ 0;
S2 : rv := a;
@post F : sorted(rv , 0, |rv| − 1)

Computing wp(F, S1 ; S2 ) produces the formula


F ′ : i ≤ 0 → sorted(a, 0, |a| − 1) ,
which tells us (not surprisingly) that a should be sorted upon exiting the
outer loop. Observe the index variable of the outer loop: it starts at |a| − 1
6.1 Developing Inductive Annotations 161

and decrements down to 0. Therefore, recalling the trick to replace fixed terms
(bounds, indices, etc.) with terms that evolve according to the loop counter
suggests the following generalization of F ′ :

G : sorted(a, i, |a| − 1) .

G trivially holds upon entering the outer loop; moreover, it follows from the
behavior of i that progress is made by working down the array. The outer loop
invariant L1 should include G. Thus, we have

@L1 : −1 ≤ i < |a| ∧ sorted(a, i, |a| − 1)

so far.
Propagate G via wp to the inner loop along the path from the exit of the
inner loop L2 to the top of the outer loop L1 :
(5)
@L2 : H : ?
S1 : assume j ≥ i;
S2 : i := i − 1;
@L1 : G : sorted(a, i, |a| − 1)

The result at L2 is the formula

H ′ : j ≥ i → sorted(a, i − 1, |a| − 1) ,

which states that when the inner loop has finished, the range [i − 1, |a| − 1] is
sorted. Immediately generalizing H ′ to

H ′′ : sorted(a, i − 1, |a| − 1)

is too strong. For suppose H ′′ were to annotate the inner loop at L2 , and
consider the path
(2)
@L1 : G : sorted(a, i, |a| − 1)
S1 : assume i > 0;
S2 : j := 0;
@L2 : H ′′ : sorted(a, i − 1, |a| − 1)

Computing

G → wp(H ′′ , assume i > 0; j := 0)

produces

sorted(a, i, |a| − 1) ∧ i > 0 → sorted(a, i − 1, |a| − 1) ,

which is not (TZ ∪ TA )-valid. All we know at L1 (with respect to sortedness of


a) is G : sorted(a, i, |a|−1). Essentially, sorted(a, i−1, |a|−1) at L2 is a special
162 6 Program Correctness: Strategies

case that definitely holds only when the inner loop has finished. Therefore,
we generalize H ′ to the weaker assertion H : sorted(a, i, |a| − 1), which claims
that a smaller subrange of a is sorted.
At this point, we have annotated the loops of BubbleSort as follows:

for
@L1 : −1 ≤ i < |a| ∧ sorted(a, i, |a| − 1)
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : 0 < i < |a| ∧ 0 ≤ j ≤ i ∧ sorted(a, i, |a| − 1)
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
The resulting VCs are not valid. Further annotations require some insight on
our part, which leads us to the next section. 

6.1.3 A Strategy

In general, proofs require insights beyond generalizing formulae obtained


through the precondition method. We adopt the following strategy when prov-
ing partial correctness.
First, decompose the function specification into atomic properties. Then
analyze each atomic property. For example, to prove that BubbleSort returns a
sorted array that is a permutation of the input, study the sortedness property
and permutation property separately. In some cases, several atomic properties
may have to be examined together to complete the proof. For each basic
property, apply the following steps:
1. Assert basic facts (Section 6.1.1).
2. Repeat:
a) Use the precondition method to propagate annotations (Section 6.1.2).
b) Formalize an insight.
The second step suggests applying the precondition method until nothing
more can be learned. Then pause, understand another essential fact about
the program, and resume applying the precondition method.
While Chapter 12 discusses the foundations of algorithms for automati-
cally generating inductive annotations, the reader should be aware that even
the best of these algorithms cannot approach the abilities of a human. Take
heart! Experience has shown that students quickly become adept at annotat-
ing programs.
6.1 Developing Inductive Annotations 163

Example 6.6. We resume our analysis of BubbleSort from Example 6.5. Some
cogitation (and observation of sample traces; see Figure 5.4) suggests that
BubbleSort exhibits the following behavior: the inner loop propagates the
largest value of the unsorted region to the right side of the unsorted region,
thus expanding the sorted region. At every iteration, j is the index of the
largest value found so far. In other words, all values in the range [0, j − 1] are
at most a[j]:

F : partitioned(a, 0, j − 1, j, j) .

This observations should be added as an annotation at L2 . Having gained new


insight into BubbleSort, we return to the precondition method and propagate
F back to the outer loop at L1 along the path
(2)
@L1 : H : ?
S1 : assume i > 0;
S2 : j := 0;
@L2 : F : partitioned(a, 0, j − 1, j, j)

resulting in the new annotation

wp(F, S1 ; S2 ) : i > 0 → partitioned(a, 0, −1, 0, 0)

at L1 . The result is trivially valid according to the definition of partitioned,


so it does not contribute any new information. Thus, we finish this round of
Step 2 with the annotations

for
@L1 : −1 ≤ i < |a| ∧ sorted(a, i, |a| − 1)
(int i := |a| − 1; i > 0; i := i − 1) {
for  
0 < i < |a| ∧ 0 ≤ j ≤ i
@L2 :
∧ partitioned(a, 0, j − 1, j, j) ∧ sorted(a, i, |a| − 1)
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}

The annotations are not yet inductive.


Some further meditation enlightens us with the following: the sorted region
must contain the largest elements of a, for the inner loop has assiduously
moved the largest element of the unsorted region to the sorted region. In
other words, the sorted range [i + 1, |a| − 1] contains the largest elements of a:
164 6 Program Correctness: Strategies

G : partitioned(a, 0, i, i + 1, |a| − 1) .

This observation should be added as an annotation at L1 . Now we prop-


agate G from L1 to L2 . Recall from Example 6.5 that our propagation of
sorted(a, i, |a| − 1) to the inner loop was unsuccessful. Similarly,

wp(G, assume j ≥ i; i := i − 1)
⇔ j ≥ i → partitioned(a, 0, i − 1, i, |a| − 1)

for the path


(5)
@L2 : H : ?
assume j ≥ i;
i := i − 1;
@L1 : G : partitioned(a, 0, i, i + 1, |a| − 1)

cannot be generalized to

partitioned(a, 0, i − 1, i, |a| − 1) .

Instead, consider the path from L1 to L2 :


(2)
@L1 : G : partitioned(a, 0, i, i + 1, |a| − 1)
S1 : assume i > 0;
S2 : j := 0;
@L2 : H : ?

Find the strongest formula H that can annotate the inner loop such that the
VC

partitioned(a, 0, i, i + 1, |a| − 1) → wp(H, S1 ; S2 )

for the path is valid. In other words, seek a formula H annotating the inner
loop that is supported by the annotation G of the outer loop. The strongest
such formula is G itself.
These new annotations result in Figure 5.17. 

6.2 Extended Example: QuickSort


In this section, we bring together the concepts studied in this and the last
chapters through a single example. We prove that QuickSort always halts and
returns a sorted array. We argue at the level of annotations, leaving a computer
or the reader to check the VCs.
Figure 6.2 lists the high-level functions of QuickSort. QuickSort is a wrapper
function (or public interface) for the recursive function qsort, which sorts array
a0 in the range [ℓ, u]. As in BubbleSort, the first line of qsort assigns a0 to an
6.2 Extended Example: QuickSort 165

typedef struct qs {
int pivot;
int[] array;
} qs;

@pre ⊤
@post sorted(rv , 0, |rv | − 1)
int[] QuickSort(int[] a) {
return qsort(a, 0, |a| − 1);
}

@pre ⊤
@post ⊤
int[] qsort(int[] a0 , int ℓ, int u) {
int[] a := a0 ;
if (ℓ ≥ u) return a;
else {
qs p := partition(a, ℓ, u);
a := p.array;
a := qsort(a, ℓ, p.pivot − 1);
a := qsort(a, p.pivot + 1, u);
return a;
}
}

Fig. 6.2. Main functions of QuickSort

array a because qsort modifies a (recall that pi does not allow parameters
to be modified). The qs data structure holds the two data that the partition
function, listed in Figure 6.3, returns: the pivot index pivot and the partitioned
array array.
One level of recursion of qsort works as follows. If ℓ ≥ u, then the trivial
range [ℓ, u] of a0 is already sorted. Otherwise, partition chooses a pivot index
pi ∈ [ℓ, u], remembering the pivot value a[pi] as pv. It then swaps cells pi and
u of a so that the randomly chosen pivot now appears on the right side of the
[ℓ, u] subarray. random has the following prototype:

@pre ℓ ≤ u
@post ℓ ≤ rv ≤ u
int random(int ℓ, int u);

The for loop of partition partitions a such that all elements at most pv
are on the left and all elements greater than pv are on the right. Within the
loop, j < u, so that the pivot value pv, stored in a[u], remains untouched.
When the loop finishes, if i < u − 1, then the value a[i + 1] is the first value
greater than pv; otherwise, all elements of a are at most pv. Finally, partition
166 6 Program Correctness: Strategies

@pre ⊤
@post ⊤
qs partition(int[] a0 , int ℓ, int u) {
int[] a := a0 ;
int pi := random(ℓ, u);
int pv := a[pi];
a[pi] := a[u];
a[u] := pv;

int i := ℓ − 1;
for @ ⊤
(int j := ℓ; j < u; j := j + 1) {
if (a[j] ≤ pv) {
i := i + 1;
t := a[i];
a[i] := a[j];
a[j] := t;
}
}

t := a[i + 1];
a[i + 1] := a[u];
a[u] := t;
return
{ pivot = i + 1;
a = a;
};
}

Fig. 6.3. QuickSort’s partition function

swaps the pivot value a[u] with a[i + 1] so that a is partitioned as follows in
the range [ℓ, u]: cells to the left of i + 1 have value at most pv; a[i + 1] = pv;
and cells to the right of i + 1 have value greater than pv. It returns the pivot
index i + 1 and the partitioned array a via an instance of the qs data type.
Finally, qsort recursively sorts the subarrays to the left and to the right of
the pivot index.
Figure 6.4 presents a sample trace. In the first line, partition chooses the
second cell as the pivot and swaps it with cell u. The subsequent six lines follow
the partition’s loop as it partitions elements according to pv. The penultimate
line shows the swap that brings the pivot element into the pivot position. The
final line shows the state of the array when it is returned to qsort. qsort calls
itself recursively on the two indicated subarrays. We encourage the reader to
understand QuickSort and the sample trace before reading further.
6.2 Extended Example: QuickSort 167

0 3 2 6 5
ℓ pi u
9
0 5 2 6 3 0≤3 >
>
i j
>
>
>
>
>
>
>
>
>
0 6 2 5 3
>
>
>
>
>
i, j >
>
>
>
>
>
>
>
0 5 2 6 3 5 6≤ 3
>
>
>
>
i j
>
>
=
loop
0 5 2 6 3 2≤3
>
>
>
>
i j
>
>
>
>
>
>
>
>
>
0 5 2 6 3
>
>
>
>
>
i j
>
>
>
>
>
>
>
>
0 2 5 6 3 6 6≤ 3
>
>
>
>
i j
;

0 2 5 6 3
i i+1 j, u

0 2 3 6 5

Fig. 6.4. Sample execution of QuickSort

6.2.1 Partial Correctness

We prove that if QuickSort halts, then it returns a sorted array. First, we


develop the function specifications for qsort and partition so that QuickSort
and qsort have inductive annotations. We then leave the annotation of the
loop of partition as Exercise 6.3.
First, annotate qsort and partition with their function specifications. To
avoid runtime errors, the function preconditions should include
0 ≤ ℓ ∧ u < |a0 | .
Next, while the returned array is not the same as the input array a0 in either
function, we know that their lengths are the same:
|rv | = |a0 | .
We also observe that neither qsort nor partition modify the array outside of
the range [ℓ, u]. Thus, we note in the function postcondition that
168 6 Program Correctness: Strategies

beq(rv , a0 , 0, ℓ − 1) ∧ beq(rv , a0 , u + 1, |a0 | − 1) .

The bounded equality predicate beq is defined

beq(a, b, k1 , k2 ) ⇔ ∀i. k1 ≤ i ≤ k2 → a[i] = b[i]

in the theory TZ ∪ TA . It asserts that two arrays are equal in the index range
[k1 , k2 ].
The annotations for partition vary slightly because of its return type:

|rv .array| = |a0 | ∧ beq(rv .array, a0 , 0, ℓ − 1)


∧ beq(rv .array, a0 , u + 1, |a0 | − 1)

Since partition returns an integer (pivot) and an array (array) in a qs struc-


ture, the postcondition asserts facts about rv .array.
To finish with qsort, formalize that qsort sorts the range [ℓ, u] of the given
array:

sorted(rv , ℓ, u) .

So far, then, we have specified qsort as follows:

@pre 0 ≤ ℓ ∧ u < |a0 | 


|rv | = |a0 | ∧ beq(rv , a0 , 0, ℓ − 1) ∧ beq(rv , a0 , u + 1, |a0 | − 1)
@post
∧ sorted(rv , ℓ, u)
int[] qsort(int[] a0 , int ℓ, int u)

To finish the specification of partition, recall that partition is intended to


return an array that is partitioned around the pivot element. Therefore, let us
formalize the description that we gave above. Observe that the specification
of random guarantees that

ℓ ≤ rv .pivot ≤ u ,

as desired. Now, for the left subarray,

∀i. ℓ ≤ i < rv .pivot → rv .array[i] ≤ rv .array[rv .pivot] ,

or, as a partition,

partitioned(rv .array, ℓ, rv .pivot − 1, rv.pivot, rv .pivot) ,

while for the right subarray,

∀i. rv .pivot < i ≤ u → rv .array[rv .pivot] < rv .array[i] .

We weaken this assertion slightly to

partitioned(rv .array, rv .pivot, rv .pivot, rv .pivot + 1, u) ,


6.2 Extended Example: QuickSort 169

which does not capture the strict inequality but is more convenient for rea-
soning. For partition, we have thus specified the following:

@pre 0≤ ℓ ∧ u < |a0 | 


|rv .array| = |a0 | ∧ beq(rv .array, a0 , 0, ℓ − 1)
 ∧ beq(rv .array, a0 , u + 1, |a0 | − 1) 
 

@post ∧ ℓ ≤ rv .pivot ≤ u 

 ∧ partitioned(rv .array, ℓ, rv .pivot − 1, rv .pivot, rv .pivot) 
∧ partitioned(rv .array, rv .pivot, rv .pivot, rv .pivot + 1, u)
qs partition(int[] a0 , int ℓ, int u)

Let us step back a moment. Essentially, we have specified that qsort does
not modify the array outside of the range [ℓ, u]. Regarding the subarray given
by [ℓ, u], all we have asserted is that it is sorted in the returned array. Focus
on the recursive calls to qsort: is knowing that the ranges [ℓ, p.pivot − 1] and
[p.pivot+ 1, u] of a are sorted enough to conclude that the range [ℓ, u] is sorted
when a is returned? In other words, is the VC corresponding to the following
basic path valid? The basic path follows the path from the precondition to
the second return statement, using the function call abstraction introduced
in Section 5.2.1 to abstract away functions calls:
(·)
@pre 0 ≤ ℓ ∧ u < |a0 |
a := a0 ;
assumeℓ < u; 
|v1 .array| = |a| ∧ beq(v1 .array, a, 0, ℓ − 1)
 ∧ beq(v1 .array, a, u + 1, |a| − 1) 
 

assume ∧ ℓ ≤ v1 .pivot ≤ u ;

 ∧ partitioned(v1 .array, ℓ, v1 .pivot − 1, v1 .pivot, v1 .pivot) 
∧ partitioned(v1 .array, v1 .pivot, v1 .pivot, v1 .pivot + 1, u)
p := v1 ;
a := p.array;
 
|v | = |a| ∧ beq(v2 , a, 0, ℓ − 1) ∧ beq(v2 , a, p.pivot, |a| − 1)
assume 2 ;
∧ sorted(v2 , ℓ, p.pivot − 1)
a := v2; 
|v | = |a| ∧ beq(v3 , a, 0, p.pivot) ∧ beq(v3 , a, u + 1, |a| − 1)
assume 3 ;
∧ sorted(v3 , p.pivot + 1, u)
a := v3 ;
rv:= a; 
|rv | = |a0 | ∧ beq(rv , a0 , 0, ℓ − 1) ∧ beq(rv , a0 , u + 1, |a0 | − 1)
@
∧ sorted(rv , ℓ, u)

The corresponding VC is not valid. The assumptions about v1 , v2 , and v3 are


not strong enough to imply that the range [ℓ, u] of rv is sorted.
The standard approach to addressing this problem is to reason simulta-
neously that qsort returns an array that is a permutation of its input array
170 6 Program Correctness: Strategies

(a permuted array contains the same elements as the original array but pos-
sibly in a different order). However, reasoning about permutations presents
a problem. A straightforward formalization of permutation is not possible in
FOL, instead requiring second-order logic. We could assert that the output
is a weak permutation of the input: all values occurring in the input array
occur in the output array but possibly with a varying number of occurrences.
Formally,

∀e. (∃i. a0 [i] = e) ↔ (∃j. rv [j] = e) .

In Exercise 6.5, we ask the reader to explore an approximation to weak per-


mutation.
However, that the elements are permuted is a stronger statement than
necessary to prove sortedness. Instead, notice that QuickSort imposes a larger
partitioning of the intermediate arrays than we have previously observed in
our analysis. At every level of recursion of qsort, the elements of a0 in the
range [ℓ, u] are at least the elements to their left and at most the elements to
their right. Formally, we strengthen the specification of qsort as follows:
 
0 ≤ ℓ ∧ u < |a0 |
@pre ∧ partitioned(a0 , 0, ℓ − 1, ℓ, u) 
 ∧ partitioned(a 0 , ℓ, u, u + 1, |a 0 | − 1) 
|rv | = |a0 | ∧ beq(rv , a0 , 0, ℓ − 1) ∧ beq(rv , a0 , u + 1, |a0 | − 1)
 ∧ partitioned(rv , 0, ℓ − 1, ℓ, u) 
@post ∧ partitioned(rv , ℓ, u, u + 1, |rv | − 1)


∧ sorted(rv , ℓ, u)
int[] qsort(int[] a0 , int ℓ, int u)

Of course, now the specification of partition must be strengthened to carry


this reasoning through the main basic path of qsort:
 
0 ≤ ℓ ∧ u < |a0 |
@pre ∧ partitioned(a0 , 0, ℓ − 1, ℓ, u) 
 ∧ partitioned(a0 , ℓ, u, u + 1, |a0 | − 1) 
|rv .array| = |a0 | ∧ beq(rv .array, a0 , 0, ℓ − 1)
 ∧ beq(rv .array, a0 , u + 1, |a0 | − 1) 
 
 ∧ partitioned(rv .array, 0, ℓ − 1, ℓ, u) 
 

@post ∧ partitioned(rv .array, ℓ, u, u + 1, |rv .array| − 1) 

 ∧ ℓ ≤ rv .pivot ≤ u 
 
 ∧ partitioned(rv .array, ℓ, rv .pivot − 1, rv .pivot, rv .pivot) 
∧ partitioned(rv .array, rv .pivot, rv .pivot, rv .pivot + 1, u)
qs partition(int[] a0 , int ℓ, int u)

That is, partition preserves this partitioning even as it manipulates the ele-
ments in the range [ℓ, u]. Indeed, partition itself imposes the necessary parti-
6.2 Extended Example: QuickSort 171

tioning for the next level of recursion, which we already observed earlier as
the final partitioned assertions of the function postcondition.
The annotations of QuickSort and qsort are inductive. Exercise 6.3 asks
the reader to finish the proof by annotating the for loop of partition so that
the annotations of partition are also inductive.

6.2.2 Total Correctness

To prove total correctness — that QuickSort actually returns a sorted arrays —


we need to prove that QuickSort always halts. Our implementation of QuickSort
has both recursive behavior in function qsort and looping behavior in function
partition. We must analyze both possible sources of nontermination.
To prove that the loop in partition always halts, let us use the obvious
ranking function of δ1 : u − j suggested by the structure of the loop. δ1 clearly
maps the program state to Z, but we prove the stronger fact that δ1 actually
maps the program state to N with well-founded relation <. In particular, we
prove that
• u − j ≥ 0 is a loop invariant;
• and u − j decreases on each iteration.
Annotating the loop with the bounds on j suggested by the loop structure
proves that the loop always halts:

for
@L1 : ℓ ≤ j ∧ j ≤ u
↓ δ1 : u − j
(int j := ℓ; j < u; j := j + 1)

Proving that the recursion of qsort always halts is superficially more dif-
ficult. The argument that we would like to make is that u − ℓ decreases on
each recursive call, which requires proving that the pivot value returned by
partition lies within the range [ℓ, u].
Observe, however, that u − ℓ may be negative when qsort is called with
ℓ > u. But in this case, ℓ = u + 1, for either |a0 | = 0, and qsort was called
from QuickSort; or p.pivot = ℓ or p.pivot = u, and qsort was called recursively.
More generally, we can establish that u − ℓ + 1 ≥ 0 is an invariant of qsort.
Hence, δ2 : u − ℓ + 1 is our proposed ranking function that maps the program
states to N with well-founded relation <.
Figure 6.5 formalizes the arguments that δ1 and δ2 are ranking functions.
Notice that bounds on i are proved as loop invariants at L1 . These bounds
imply that rv .pivot lies within the range [ℓ, u] as required.
One trick that would avoid reasoning about the case in which ℓ > u is to
cut the recursion at a point within qsort rather than at function entry. Figure
6.6 provides an alternate argument in which the ranking function labels the
172 6 Program Correctness: Strategies

@pre u − ℓ + 1 ≥ 0
@post ⊤
↓ δ2 : u − ℓ + 1
int[] qsort(int[] a0 , int ℓ, int u) {
int[] a := a0 ;
if (ℓ ≥ u) return a;
else {
qs p := partition(a, ℓ, u);
a := p.array;
a := qsort(a, ℓ, p.pivot − 1);
a := qsort(a, p.pivot + 1, u);
return a;
}
}

@pre ℓ ≤ u
@post ℓ ≤ rv .pivot ∧ rv .pivot ≤ u
qs partition(int[] a0 , int ℓ, int u) {
..
.
int i := ℓ − 1;
for
@L1 : ℓ ≤ j ∧ j ≤ u ∧ ℓ − 1 ≤ i ∧ i < j
↓ δ1 : u − j
(int j := ℓ; j < u; j := j + 1) {
..
.
}
..
.
return
{ pivot = i + 1;
a = a;
};
}

Fig. 6.5. QuickSort always halts

else branch in qsort. The first branch terminates the recursion. partition is
annotated as in Figure 6.5.

6.3 Summary
This chapter presents strategies for specifying and proving the correctness of
sequential programs. It covers:
• Strategies for proving partial correctness. The need for strengthening an-
notations. Basic facts; the precondition method.
Exercises 173

@pre ⊤
@post ⊤
int[] qsort(int[] a0 , int ℓ, int u) {
int[] a := a0 ;
if (ℓ ≥ u) return a;
else {
↓ δ3 : u − ℓ
qs p := partition(a, ℓ, u);
a := p.array;
a := qsort(a, ℓ, p.pivot − 1);
a := qsort(a, p.pivot + 1, u);
return a;
}
}

Fig. 6.6. Alternate argument that QuickSort always halts

@pre ⊤
@post ∀i. 0 ≤ i < |rv | → rv [i] ≥ 0
int[] abs(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := 0; i < |a|; i := i + 1) {
if (a[i] < 0) {
a[i] := − a[i];
}
}
return a;
}

Fig. 6.7. Computing the absolute value of elements of a0

• A full proof that QuickSort returns a sorted array.

Bibliographic Remarks
QuickSort was discovered by Tony Hoare, who also proposed specifying and
verifying programs using FOL [39].

Exercises
6.1 (Absolute value). Prove the partial correctness of abs in Figure 6.7.
That is, annotate the function; list basic paths and verification conditions;
and argue that the VCs are valid.
174 6 Program Correctness: Strategies

@pre ⊤
@post sorted(rv , 0, |rv | − 1)
int[] InsertionSort(int[] a0 ) {
int[] a := a0 ;
for @ ⊤
(int i := 1; i < |a|; i := i + 1) {
int t := a[i];
for @ ⊤
(int j := i − 1; j ≥ 0; j := j − 1) {
if (a[j] ≤ t) break;
a[j + 1] := a[j];
}
a[j + 1] := t;
}
return a;
}

Fig. 6.8. InsertionSort

6.2 (InsertionSort). Prove the partial correctness of InsertionSort. That is,


annotate the function; list basic paths and verification conditions; and argue
that the VCs are valid. See Figure 6.8.
As in other languages, the break statement moves control to the loop exit.
6.3 (QuickSort). Finish the proof of the sortedness property of QuickSort. That
is, annotate partition of Figure 6.3 so that its precondition and postcondition
annotations given at the end of Section 6.2 are inductive.

6.4 (MergeSort). Prove the partial correctness of MergeSort. See Figure 6.9.
The function merge uses the pi keyword new, which allocates an array of the
specified size. Therefore, it is known after the allocation to buf that |buf | =
u − ℓ + 1.
First, deduce the function specifications for ms and merge by focusing
on MergeSort and ms. Prove MergeSort and ms correct with respect to these
annotations. Then analyze merge.
Since MergeSort is fairly long, you need not list basic paths and VCs. Just
present MergeSort with its inductive annotations.

6.5 (Weak permutation). Define weak permutation as follows:

∀e. (∃i. a[i] = e) ↔ (∃j. b[j] = e) . (6.1)

Unfortunately, the decision procedures for arrays discussed in Chapter 11 can-


not decide the validity of VCs arising from wperm annotations, as such VCs
fall outside of the studied fragments of TA . Instead, we describe an approxi-
mation.
Exercises 175

Consider annotating BubbleSort as in Figure 6.10. The define keyword


defines a global constant. In this case, e is defined to have some nondetermin-
istic value; that is, e stands for an arbitrary integer. The annotations then use
this e in wperm literals, where wperm is defined as follows:

wperm(a, a0 , e) ⇔ (∃i. a[i] = e) ↔ (∃j. a0 [j] = e) .

Compared to the full definition (6.1) of weak permutation, wperm does not
have a universally quantified variable e; instead, it uses a given expression e,
in this case the global constant e.
(a) Argue that the annotations of Figure 6.10 are inductive. That is, list the
VCs and argue their validity.
(b) Argue that the annotations imply that BubbleSort actually satisfies the
weakest permutation property. That is, prove that the validity of the VC

wperm(a, a0 , e) ∧ a′ = . . . → wperm(a′ , a0 , e)

implies the validity of the VC

(∀e. (∃i. a[i] = e) ↔ (∃j. a0 [j] = e)) ∧ a′ = . . .


→ (∀e. (∃i. a′ [i] = e) ↔ (∃j. a0 [j] = e)) .

(c) Can this approximation be used to prove the weakest permutation prop-
erty of
(i) InsertionSort (Figure 6.8)?
(ii) MergeSort (Figure 6.9)?
(iii) QuickSort (Section 6.2)?
If so, prove it. If not, explain why not.

6.6 (Sets with arrays). Implement an API (application programming


interface) for manipulating sets. The underlying data structure of the imple-
mentation is arrays.
(a) Prove the correctness of the union function of Figure 6.11 by adding in-
ductive annotations.
(b) Implement and specify and prove the correctness of an intersection func-
tion, which takes two arrays a0 and b0 and returns the intersection of the
sets they represent.
(c) Implement and specify and prove the correctness of a subset function,
which takes two arrays a0 and b0 and returns true iff the first set, repre-
sented by a0 , is a subset of the second set, represented by b0 .

6.7 (Sets with sorted arrays). Implement an API for manipulating sets.
The underlying data structure of the implementation is sorted arrays.
(a) Prove the correctness of the union function of Figure 6.12 by adding in-
ductive annotations.
176 6 Program Correctness: Strategies

(b) Implement and specify and prove the correctness of an intersection func-
tion, which takes two sorted arrays a0 and b0 and returns the intersection
of the sets they represent as a sorted set.
(c) Implement and specify and prove the correctness of a subset function,
which takes two sorted arrays a0 and b0 and returns true iff the first set,
represented by a0 , is a subset of the second set, represented by b0 .

6.8 (QuickSort halts). Provide the basic paths and verification conditions for
the proof of Section 6.2.2 that QuickSort always halts.

6.9 (Intuitive ranking functions). Following the proof that the recursion
of qsort halts, move the location of the ranking function annotations in the
following functions to produce more intuitive arguments:
(a) BinarySearch, Figure 5.2
(b) BubbleSort, Figure 5.3
(c) InsertionSort, Figure 6.8

6.10 (Fewer annotations). Notice in the annotated BubbleSort of Figure


5.17 that there are only a finite number of basic paths from function entry to
L2 , from L2 to function exit, and from L2 back to itself.
(a) List these basic paths.
(b) Annotate only the inner loop of BubbleSort so that the VCs corresponding
to the basic paths of (a) are valid.
(c) Treat InsertionSort of Figure 5.25 similarly.
(d) Similarly, annotate only the inner loop of BubbleSort with a ranking an-
notation.
(e) Treat InsertionSort similarly.
Exercises 177

@pre ⊤
@post sorted(rv , 0, |rv | − 1)
int[] MergeSort(int[] a) {
return ms(a, 0, |a| − 1);
}

@pre ⊤
@post ⊤
int[] ms(int[] a0 , int ℓ, int u) {
int[] a := a0 ;
if (ℓ ≥ u) return a;
else {
int m := (ℓ + u) div 2;
a := ms(a, ℓ, m);
a := ms(a, m + 1, u);
a := merge(a, ℓ, m, u);
return a;
}
}

@pre ⊤
@post ⊤
int[] merge(int[] a0 , int ℓ, int m, int u) {
int[] a := a0 , buf := new int[u − ℓ + 1];
int i := ℓ, j := m + 1;
for @ ⊤
(int k := 0; k < |buf |; k := k + 1) {
if (i > m) {
buf [k] := a[j];
j := j + 1;
} else if (j > u) {
buf [k] := a[i];
i := i + 1;
} else if (a[i] ≤ a[j]) {
buf [k] := a[i];
i := i + 1;
} else {
buf [k] := a[j];
j := j + 1;
}
}
for @ ⊤
(k := 0; k < |buf |; k := k + 1) {
a[ℓ + k] := buf [k];
}
return a;
}

Fig. 6.9. MergeSort


178 6 Program Correctness: Strategies

define int e = ?;

@pre ⊤
@post wperm(a, a0 , e)
int[] BubbleSort(int[] a0 ) {
int[] a := a0 ;
for
@L1 : −1 ≤ i < |a| ∧ wperm(a, a0 , e)
(int i := |a| − 1; i > 0; i := i − 1) {
for
@L2 : 0 ≤ j < i ∧ i < |a| ∧ wperm(a, a0 , e)
(int j := 0; j < i; j := j + 1) {
if (a[j] > a[j + 1]) {
int t := a[j];
a[j] := a[j + 1];
a[j + 1] := t;
}
}
}
return a;
}

Fig. 6.10. BubbleSort with annotations for weak permutation

define int e = ?;

@pre ⊤» –
(∃i. 0 ≤ i < |rv | ∧ rv [i] = e)
@post
↔ (∃i. 0 ≤ i < |a0 | ∧ a0 [i] = e) ∨ (∃i. 0 ≤ i < |b0 | ∧ b0 [i] = e)
int[] union(int[] a0 , int[] b0 ) {
int[] u := new int[|a0 | + |b0 |];
int j := 0;
for @ ⊤
(int i = 0; i < |a0 |; i := i + 1) {
u[j] := a0 [i];
j := j + 1;
}
for @ ⊤
(int i = 0; i < |b0 |; i := i + 1) {
u[j] := b0 [i];
j := j + 1;
}
return u;
}

Fig. 6.11. Function union of the linear set implementation


Exercises 179

define int e = ?;

@pre sorted(a 0 , 0, |a0 | − 1) ∧ sorted(b0 , 0, |b0 | − 1)


2 3
sorted(rv
» , 0, |rv | − 1) –
@post4 (∃i. 0 ≤ i < |a0 | ∧ a0 [i] = e) ∨ (∃i. 0 ≤ i < |b0 | ∧ b0 [i] = e) 5

↔ (∃i. 0 ≤ i < |rv | ∧ rv [i] = e)
int[] union(int[] a0 , int[] b0 ) {
int[] u := new int[|a0 | + |b0 |];
int i := 0, j := 0;
for @ ⊤
(int k = 0; k < |u|; k := k + 1) {
if (i ≥ |a0 |) {
u[k] := b0 [j];
j := j + 1;
}
else if (j ≥ |b0 |) {
u[k] := a0 [i];
i := i + 1;
}
else if (a0 [i] ≤ b0 [j]) {
u[k] := a0 [i];
i := i + 1;
}
else {
u[k] := b0 [j];
j := j + 1;
}
}
return u;
}

Fig. 6.12. Function union of the sorted set implementation


10
Combining Decision Procedures

The expressions which arise in program manipulation often do not fall


within any. . . naturally defined theories — they usually involve mixed
terms containing functions and predicates from several theories.
— Greg Nelson and Derek C. Oppen
Simplification by Cooperating Decision Procedures, 1979
Chapters 7–9 consider decision procedures for theories that each formalize
just one data type. Yet almost all formulae in Chapter 5 are formulae of
union theories. For example, many assert facts in TZ ∪ TA about arrays of
integers indexed by integers. Additionally, the decision procedure for the array
property fragment of TAZ that we discuss in Chapter 11 requires a procedure for
the quantifier-free fragment of TZ ∪ TA . Can we reuse the decision procedures
of Chapters 7–9 to decide satisfiability of formulae in union theories, or must
we invent a new procedure for each combination?
Fortunately, there is a general result for quantifier-free fragments of union
theories that allows us to reuse the procedures. This chapter discusses the
Nelson-Oppen combination method for constructing decision procedures
for union theories from decision procedures for individual theories. Section
10.1 introduces the method and discusses its limitations. Then Section 10.2
presents a nondeterministic version, for which correctness is proved in Section
10.4; and Section 10.3 presents the more practical deterministic version.
In this chapter, decision procedures for individual theories apply just to
quantifier-free fragments. We rely on Cooper’s method with all optimizations
for considering quantifier-free ΣZ -formulae. Procedures for the other theories
already apply only to their quantifier-free fragments.

10.1 Combining Decision Procedures


Consider two theories T1 and T2 over signatures Σ1 and Σ2 , respectively. For
the quantifier-free fragments of T1 and T2 , we have decision procedures P1 and
P2 . How do we decide satisfiability in the quantifier-free fragment of T1 ∪ T2 ?
270 10 Combining Decision Procedures

Example 10.1. Consider the (ΣE ∪ ΣZ )-formula

F : 1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2) .

Chapter 9 describes a decision procedure for TE , while Chapter 7 presents a


decision procedure for TZ . We would like to combine these decision procedures
to decide the (TE ∪ TZ )-satisfiability of F and other quantifier-free (ΣE ∪ ΣZ )-
formulae. 

The Nelson-Oppen combination method (N-O method) combines


decision procedures for the quantifier-free fragments of several theories into
one decision procedure for the quantifier-free fragment of the union theory.
In our presentation of the N-O method, we usually discuss combining two
theories and their decision procedures; however, the N-O method can com-
bine an arbitrary number of theories and procedures. Additionally, we restrict
ourselves to considering conjunctive formulae; however, the satisfiability of
arbitrary (quantifier-free) formulae can be considered by converting to DNF
and checking each disjunct.
Besides being restricted to quantifier-free formulae, the N-O method has
two additional restrictions. First, the signatures Σ1 and Σ2 can only share
equality =:

Σ1 ∩ Σ2 = {=} .

Second, the theories T1 and T2 must be stably infinite.


A theory T with signature Σ is stably infinite if for every quantifier-free
Σ-formula F , if F is T -satisfiable, then there exists some T -interpretation
that satisfies F and has a domain of infinite cardinality. We illustrate this
concept with two example theories.

Example 10.2. Consider the theory Ta,b with signature

Σ2 : {a, b, =} ,

where both a and b are constants, and axiom


1. ∀x. x = a ∨ x = b (two)
Because of axiom (two), every Ta,b -interpretation I : (DI , αI ) is such that
the domain DI has at most two elements: |DI | ≤ 2. Hence, Ta,b is not stably
infinite. 

Example 10.3. We prove that TE is stably infinite. Consider the TE -satisfiable


quantifier-free ΣE -formula F with arbitrary satisfying TE -interpretation I :
(DI , αI ) in which αI maps = to =I . Let A be any infinite set disjoint from
DI . Then construct new interpretation J : (DJ , αJ ):
• DJ = DI ∪ A
10.2 Nelson-Oppen Method: Nondeterministic Version 271

• αJ = {= 7→ =J , . . .}, where for v1 , v2 ∈ DJ ,



 v1 =I v2 if v1 , v2 ∈ DI
def
v1 =J v2 = ⊤ if v1 is the same element as v2

⊥ otherwise

J is a TE -interpretation satisfying F with infinite domain. Hence, TE is stably


infinite. 
The other theories discussed in this book are also stably infinite.
Example 10.4. Consider the quantifier-free conjunctive (ΣE ∪ ΣZ )-formula

F : 1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2) .

The signatures of TE and TZ only share =. Also, both theories are stably
infinite. Hence, the N-O combination of the decision procedures for TE and TZ
decides the (TE ∪ TZ )-satisfiability of F .
Intuitively, F is (TE ∪ TZ )-unsatisfiable. For the first two literals imply
x = 1 ∨ x = 2 so that f (x) = f (1) ∨ f (x) = f (2). Yet the last two literals
contradict this conclusion. 

10.2 Nelson-Oppen Method: Nondeterministic Version


In this section, we discuss the nondeterministic version of the N-O method.
While simple to present, it suffers from high complexity. Section 10.3 refor-
mulates the method to be deterministic and efficient.
Consider a quantifier-free conjunctive (Σ1 ∪ Σ2 )-formula F . The N-O
method proceeds in two steps.

10.2.1 Phase 1: Variable Abstraction

The variable abstraction phase transforms a quantifier-free conjunctive for-


mula F into two quantifier-free conjunctive formulae, a Σ1 -formula F1 and a
Σ2 -formula F2 , such that F and F1 ∧ F2 are (T1 ∪ T2 )-equisatisfiable. That
is, F is (T1 ∪ T2 )-satisfiable iff F1 ∧ F2 is (T1 ∪ T2 )-satisfiable. F1 and F2 are
linked via a set of shared variables.
For term t, let hd(t) be the root symbol; e.g., hd(f (x)) = f . Then for
i, j ∈ {1, 2} and i 6= j, repeat the following transformations as long as possible:
1. if function f ∈ Σi and hd(t) ∈ Σj ,

F [f (t1 , . . . , t, . . . , tn )] =⇒ F [f (t1 , . . . , w, . . . , tn )] ∧ w = t

2. if predicate p ∈ Σi and hd(t) ∈ Σj ,

F [p(t1 , . . . , t, . . . , tn )] =⇒ F [p(t1 , . . . , w, . . . , tn )] ∧ w = t
272 10 Combining Decision Procedures

3. if hd(s) ∈ Σi and hd(t) ∈ Σj ,

F [s = t] =⇒ F [w = t] ∧ w = s

w is a fresh variable in each application of a transformation. Transformation


3 also applies to s 6= t literals: replace F [s 6= t] with F [w 6= t] ∧ w = s.
After applying the transformations, each literal of the resulting formula
falls entirely within the signature of one of the two theories (or possibly within
each if it is just an equality x = y or a disequality x 6= y between variables:
such literals are in every signature since they do not have symbols other
than =). Divide the literals into two sets, one for each theory. These sets are
not disjoint when there is a literal that is an equality or disequality between
variables. Then return the conjunction of each set.

Example 10.5. Consider (ΣE ∪ ΣZ )-formula

F : 1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2) .

Since f ∈ Σ= and 1 ∈ ΣZ , replace f (1) by f (w1 ) and add w1 = 1 by trans-


formation 1. Similarly, replace f (2) by f (w2 ) and add w2 = 2.
Now, the literals

1 ≤ x, x ≤ 2, w1 = 1, and w2 = 2

are TZ -literals, while the literals

f (x) 6= f (w1 ) and f (x) 6= f (w2 )

are TE -literals. Hence, construct the ΣZ -formula

FZ : 1 ≤ x ∧ x ≤ 2 ∧ w1 = 1 ∧ w2 = 2

and the ΣE -formula

FE : f (x) 6= f (w1 ) ∧ f (x) 6= f (w2 ) .

FZ and FE share the variables x, w1 , and w2 . FZ ∧FE is (TE ∪TZ )-equisatisfiable


to F . 

Example 10.6. Consider the (ΣE ∪ ΣZ )-formula

F : f (x) = x + y ∧ x ≤ y + z ∧ x + z ≤ y ∧ y = 1 ∧ f (x) 6= f (2) .

Intuitively, F is (TE ∪TZ )-satisfiable: consider an interpretation in which x = 0,


y = 1, z = 1, f (0) = 1, and f (2) = 2.
In the first literal, hd(f (x)) = f ∈ ΣE and hd(x + y) = + ∈ ΣZ ; thus, by
transformation 3, replace the literal with

w1 = x + y ∧ w1 = f (x) .
10.2 Nelson-Oppen Method: Nondeterministic Version 273

In the last literal, f ∈ ΣE but 2 ∈ ΣZ , so by transformation 1, replace it with


f (x) 6= f (w2 ) ∧ w2 = 2 .
Now, separating the literals results in two formulae:
FZ : w1 = x + y ∧ x ≤ y + z ∧ x + z ≤ y ∧ y = 1 ∧ w2 = 2
is a ΣZ -formula, and
FE : w1 = f (x) ∧ f (x) 6= f (w2 )
is a ΣE -formula. The conjunction FZ ∧ FE is (TE ∪ TZ )-equisatisfiable to F . 

10.2.2 Phase 2: Guess and Check


Phase 1 separates (Σ1 ∪ Σ2 )-formula F into two formulae, Σ1 -formula F1 , and
Σ2 -formula F2 . F1 and F2 are linked by a set of shared variables. Let
V = shared(F1 , F2 ) = free(F1 ) ∩ free(F2 )
be the shared variables of F1 and F2 . Let E be an equivalence relation over
V . The arrangement α(V, E) of V induced by E is the formula
^ ^
α(V, E) : u=v ∧ u 6= v ,
u,v ∈ V. uEv u,v ∈ V. ¬(uEv)

which asserts that variables related by E are equal and that variables unre-
lated by E are not equal. The formula F is (T1 ∪ T2 )-satisfiable iff there exists
an equivalence relation E of V such that
• F1 ∧ α(V, E) is T1 -satisfiable, and
• F2 ∧ α(V, E) is T2 -satisfiable.
Otherwise, F is (T1 ∪ T2 )-unsatisfiable.
Example 10.7. Consider (ΣE ∪ ΣZ )-formula
F : 1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2) .
Phase 1 separates this formula into the ΣZ -formula
FZ : 1 ≤ x ∧ x ≤ 2 ∧ w1 = 1 ∧ w2 = 2
and the ΣE -formula
FE : f (x) 6= f (w1 ) ∧ f (x) 6= f (w2 ) ,
with
V = shared(FZ , FE ) = {x, w1 , w2 } .
There are 5 equivalence relations to consider, which we list by stating the
partitions:
274 10 Combining Decision Procedures

1. {{x, w1 , w2 }}, i.e., x = w1 = w2 : FE ∧ α(V, E) is TE -unsatisfiable because


it cannot be the case that both x = w1 and f (x) 6= f (w1 ).
2. {{x, w1 }, {w2 }}, i.e., x = w1 , x 6= w2 : FE ∧ α(V, E) is TE -unsatisfiable
because it cannot be the case that both x = w1 and f (x) 6= f (w1 ).
3. {{x, w2 }, {w1 }}, i.e., x = w2 , x 6= w1 : FE ∧ α(V, E) is TE -unsatisfiable
because it cannot be the case that both x = w2 and f (x) 6= f (w2 ).
4. {{x}, {w1 , w2 }}, i.e., x 6= w1 , w1 = w2 : FZ ∧ α(V, E) is TZ -unsatisfiable
because it cannot be the case that both w1 = w2 and w1 = 1 ∧ w2 = 2.
5. {{x}, {w1 }, {w2 }}, i.e., x 6= w1 , x 6= w2 , w1 6= w2 : FZ ∧ α(V, E) is TZ -
unsatisfiable because it cannot be the case that both x 6= w1 ∧ x 6= w2
and x = w1 = 1 ∨ x = w2 = 2 (since 1 ≤ x ≤ 2 implies that x = 1 ∨ x = 2
in TZ ).
Hence, F is (TE ∪ TZ )-unsatisfiable. 

Example 10.8. Consider the (Σcons ∪ ΣZ )-formula

F : car(x) + car(y) = z ∧ cons(x, z) 6= cons(y, z) .

After two applications of transformation 1, Phase 1 separates F into the Σcons -


formula

Fcons : w1 = car(x) ∧ w2 = car(y) ∧ cons(x, z) 6= cons(y, z)

and the ΣZ -formula

FZ : w1 + w2 = z ,

with

V = shared(Fcons , FZ ) = {z, w1 , w2 } .

Consider the equivalence relation E given by the partition

{{z}, {w1}, {w2 }} .

The arrangement

α(V, E) : z 6= w1 ∧ z 6= w2 ∧ w1 6= w2

satisfies both Fcons and FZ : Fcons ∧α(V, E) is Tcons -satisfiable, and FZ ∧α(V, E)
is TZ -satisfiable. Hence, F is (Tcons ∪ TZ )-satisfiable. 

10.2.3 Practical Efficiency

Phase 2 is formulated as “guess and check”: first, guess an equivalence relation


E, then check the induced arrangement. Unfortunately, the number of equiva-
lence relations increases significantly with the number of shared variables. The
10.2 Nelson-Oppen Method: Nondeterministic Version 275

number of equivalence relations is given by the sequence of Bell numbers,


which grows super-exponentially. For example, just 12 shared variables induce
over four million equivalence relations. Hence, the guess-and-check method is
impractical.
However, there is no need to guess the entire equivalence relation at once;
instead, construct it incrementally, as the following example illustrates:

Example 10.9. In Example 10.6, Phase 1 separates the (ΣE ∪ ΣZ )-formula

F : f (x) = x + y ∧ x ≤ y + z ∧ x + z ≤ y ∧ y = 1 ∧ f (x) 6= f (2)

into ΣZ -formula

FZ : w1 = x + y ∧ x ≤ y + z ∧ x + z ≤ y ∧ y = 1 ∧ w2 = 2

and ΣE -formula

FE : w1 = f (x) ∧ f (x) 6= f (w2 )

Then

V = shared(FZ , FE ) = {x, w1 , w2 } .

We attempt to construct an arrangement.


1. Suppose x = w1 . But then w1 = x + y of FZ implies that y = 0, yet FZ
asserts that y = 1. Hence, x 6= w1 .
2. FZ ∧ x 6= w1 and FE ∧ x 6= w1 are TZ - and TE -satisfiable, respectively.
3. Suppose x = w2 . But f (x) 6= f (w2 ) of FE contradicts this supposition.
Hence, x 6= w2 .
4. FZ ∧ x 6= w1 ∧ x 6= w2 and FE ∧ x 6= w1 ∧ x 6= w2 are TZ - and
TE -satisfiable, respectively.
5. Suppose w1 = w2 . No contradiction exists.
We discovered the arrangement

x 6= w1 ∧ x 6= w2 ∧ w1 = w2 ,

so F is (TE ∪ TZ )-satisfiable. 

Readers interested in implementing a simple Nelson-Oppen-based decision


procedure could consider this incremental-construction “optimization” of the
nondeterministic method. However, in practice, implementations are based on
the deterministic method described in the next section.
276 10 Combining Decision Procedures

10.3 Nelson-Oppen Method: Deterministic Version


Phase 1 of the deterministic version is the same as in the nondeterministic
version.
Phase 2 of the nondeterministic method (both the guess-and-check method
and the optimized incremental construction) proposes a set of equalities and
disequalities and then lets each decision procedure Pi check the set with the
corresponding formula Fi . In contrast, Phase 2 of the deterministic version
asks the decision procedures P1 and P2 to propagate information in the form
of new equalities.
A convex theory is particularly well-suited for propagating equalities.
Section 10.3.1 discusses convex theories. Then Section 10.3.2 presents the
deterministic Nelson-Oppen method.

10.3.1 Convex Theories

If a conjunctive formula in a convex theory implies a disjunction of equalities


between variables, then it actually implies a single equality. Formally, consider
a quantifier-free conjunctive Σ-formula F and a disjunction
n
_
G: ui = vi , (10.1)
i=1

for variables ui and vi . Theory T is convex if for every such F and G, if


n
_
F ⇒ ui = vi
i=1

then

F ⇒ ui = vi for some i ∈ {1, . . . , n} .

If F implies G, then F actually implies one of the disjuncts of G.


Intuitively, F cannot be “covered” by any disjunction of equalities — no
matter how many — if no single equality covers F (F is covered by a formula
if F implies it). This intuition is especially apparent for vector spaces (Section
8.2): a plane cannot be covered by a finite disjunction of lines; it cannot even
be covered by a finite disjunction of other planes unless at least one of the
planes is the plane itself.

Example 10.10. The theory of integers TZ is not convex. For consider the
quantifier-free conjunctive ΣZ -formula

F : 1≤z ∧ z≤2 ∧ u=1 ∧ v=2.

Then
10.3 Nelson-Oppen Method: Deterministic Version 277

F ⇒ z=u ∨ z=v ,

but neither

F ⇒ z = u nor F ⇒ z=v .


Example 10.11. The theory of arrays TA is not convex. For consider the
quantifier-free conjunctive ΣA -formula

F : ahi ⊳ vi[j] = v .

Then

F ⇒ i = j ∨ a[j] = v ,

but neither

F ⇒ i=j nor F ⇒ a[j] = v .


Example 10.12. ⋆ The theory of rationals TQ is convex, as it is convex in a
geometric sense (see Chapter 8).
Each equality ui = vi of the disjunction G of (10.1) is geometrically convex,
but G itself is not. Consider, for example,

H: x=y ∨ x=z .

Let SH be the set of points satisfying H. The point (x, y, z) = (0, 0, 1) is


included in SH , as is the point (1, 0, 1). However, the average of the two
points, ( 12 , 0, 1) (choosing λ = 21 ), is not in SH . Indeed, choose any two points

(u, u, v1 ) and (w, v2 , w)

from Sx=y and Sx=z , respectively, such that neither is in their intersection
Sx=y=z (i.e., v1 6= u and v2 6= w). Then for any λ ∈ (0, 1), the point

(λu + (1 − λ)w, λu + (1 − λ)v2 , λv1 + (1 − λ)w)

is neither in Sx=y nor in Sx=z . W


n
Suppose, then, that F ⇒ G : i=1 ui = vi , but for no i ∈ {1, . . . , n} does
F ⇒ ui = vi . Then it must be the case that there are two points s1 and s2 of
SF in separate subsets Sui =vi , Suj =vj , i 6= j, of SG . By the argument above,
the points on the line segment between s1 and s2 are not in SG and thus not
in SF . Then F is not geometrically convex, a contradiction.
Thus, TQ is convex. 
Exercise 10.5 asks the reader to prove that the theories TE and Tcons are
also convex.
278 10 Combining Decision Procedures

10.3.2 Phase 2: Equality Propagation

Recall that the nondeterministic version guesses an equivalence relation E


over the shared variables V and checks that both F1 ∧ α(V, E) is T1 -satisfiable
and F2 ∧α(V, E) is T2 -satisfiable. If it finds a satisfying equivalence relation E,
it declares that F is (T1 ∪ T2 )-satisfiable. This method suffers from the enor-
mous number of equivalence relations that are possible even over small sets
of shared variables. In the deterministic version, a central manager asks the
decision procedures P1 and P2 to report any new implied equalities between
shared variables. It then adds this new information to the already discovered
equalities and propagates it to the other decision procedure. This method is
efficient.
In the context of already discovered equalities E, a decision procedure Pi
for a convex theory Ti discovers a new equality u = v, for shared variables u
and v, when

Fi ∧ E ⇒ u = v .

The central manager then propagates this new equality to the other decision
procedure.
If Tj is not convex, Pj discovers a new disjunction of equalities S when
_
Fj ∧ E ⇒ (ui = vi ) ,
ui =vi ∈ S

for shared variables ui and vi . In this case, the central manager must split the
disjunction and search along multiple branches. Each branch assumes one of
the disjuncts. The search along a branch ends either when a full arrangement
is discovered (so the original formula is (T1 ∪ T2 )-satisfiable; see below) or
when all sub-branches end in contradiction (Ti -unsatisfiability for some i).
In the latter case, the central manager tries another branch. If no branches
remain to try, then the central manager declares the original formula to be
(T1 ∪ T2 )-unsatisfiable.
If at some point, neither P1 nor P2 finds a new equality (or a disjunction
of equalities in the non-convex case), then the central manager concludes that
the given formula is (T1 ∪ T2 )-satisfiable. For if E is the set of all learned
equalities, S is the set of all possible remaining equalities, and
_ _
F1 ∧ E 6⇒ (ui = vi ) and F2 ∧ E 6⇒ (ui = vi ) ,
ui =vi ∈ S ui =vi ∈ S

(which must hold when no new disjunctions of equalities are discovered), then
^ ^
F1 ∧ E ∧ (ui 6= vi ) and F2 ∧ E ∧ (ui 6= vi )
ui =vi ∈ S ui =vi ∈ S

are T1 -satisfiable and T2 -satisfiable, respectively. Hence, the discovered ar-


rangement is
10.3 Nelson-Oppen Method: Deterministic Version 279
^
α(V, E) = E ∧ (ui 6= vi ) ,
ui =vi ∈ S

and F is (T1 ∪ T2 )-satisfiable.

Example 10.13. Consider the (ΣE ∪ ΣQ )-formula

F : f (f (x) − f (y)) 6= f (z) ∧ x ≤ y ∧ y + z ≤ x ∧ 0 ≤ z .

F is (TE ∪TQ )-unsatisfiable: the final three literals imply that z = 0 and x = y,
so that f (x) = f (y). But then from the first literal, f (0) 6= f (0) since both
f (x) − f (y) and z equal 0.
Phase 1 separates F into two formulae. According to transformation 1, it
replaces f (x) by u, f (y) by v, and u − v by w, resulting in ΣE -formula

FE : f (w) 6= f (z) ∧ u = f (x) ∧ v = f (y)

and ΣQ -formula

FQ : x ≤ y ∧ y + z ≤ x ∧ 0 ≤ z ∧ w = u − v ,

with

V = shared(FE , FQ ) = {x, y, z, u, v, w} .

Recall that TE and TQ are convex theories. The decision procedure PQ for
TQ discovers

FQ ⇒ x = y

from x ≤ y ∧ y + z ≤ x ∧ 0 ≤ z, so

E1 : x = y .

Then PE discovers the new congruence f (x) = f (y) from x = y, so that

FE ∧ E1 ⇒ u = v ,

yielding

E2 : x = y ∧ u = v .

But then

FQ ∧ E2 ⇒ z = w

since w = u − v = 0, according to u = v, and z = 0. Propagating this equality


back to PE via

E3 : x = y ∧ u = v ∧ z = w
280 10 Combining Decision Procedures

{}
FQ |= x = y
{x = y}
FE ∧ x = y |= u = v
{x = y, u = v}
FQ ∧ u = v |= z = w
{x = y, u = v, z = w}
FE ∧ z = w |= ⊥

Fig. 10.1. Summary of Example 10.13

reveals the contradiction

FE ∧ E3 ⇒ ⊥ ;

in particular, z = w contradicts f (w) 6= f (z). Therefore, F is (TE ∪ TQ )-


unsatisfiable.
Since both TE and TQ are convex, no case splitting was required.
Figure 10.1 summarizes this argument. The left and right halves list deduc-
tions made in TQ and TE , respectively. The sets in the middle are the deduced
sets of shared equalities. The deductions terminate with ⊥, indicating that F
is (TE ∪ TQ )-unsatisfiable. 
Example 10.14. Consider the (ΣE ∪ ΣZ )-formula

F : 1 ≤ x ∧ x ≤ 2 ∧ f (x) 6= f (1) ∧ f (x) 6= f (2) .

While TE is convex, TZ is not. Thus, we should expect some case splits.


According to transformation 1, Phase 1 replaces f (1) by f (w1 ) and f (2)
by f (w2 ), resulting in the ΣZ -formula

FZ : 1 ≤ x ∧ x ≤ 2 ∧ w1 = 1 ∧ w2 = 2

and the ΣE -formula

FE : f (x) 6= f (w1 ) ∧ f (x) 6= f (w2 ) ,

with

V = shared(FZ , FE ) = {x, w1 , w2 } .

Immediately, PZ recognizes that

FZ ⇒ x = w1 ∨ x = w2 ,

since 1 ≤ x ≤ 2 implies that either x = 1 or x = 2. Hence, case split on these


two disjuncts. For the first case, propagate
10.3 Nelson-Oppen Method: Deterministic Version 281

{}

x = w1 ⋆ x = w2

{x = w1 } {x = w2 }

FE ∧ x = w1 |= ⊥ FE ∧ x = w2 |= ⊥

⊥ ⊥

⋆ : FZ |= x = w1 ∨ x = w2

Fig. 10.2. Summary of Example 10.14

E1a : x = w1

to PE , which discovers that

FE ∧ E1a ⇒ ⊥ ,

as x = w1 contradicts f (x) 6= f (w1 ).


For the second case,

E1b : x = w2 .

Again, PE discovers that

FE ∧ E1b ⇒ ⊥ ,

as x = w2 contradicts f (x) 6= f (w2 ).


As all branches end in contradiction, F is (TE ∪ TZ )-unsatisfiable.
Figure 10.2 summarizes this argument. Unlike in Example 10.13 and Fig-
ure 10.1, the nonconvexity of TZ causes the argument to branch along two
possibilities. Each branch ends in a contradiction. 

Example 10.15. Consider the (ΣE ∪ ΣZ )-formula

F : 1 ≤ x ∧ x ≤ 3 ∧ f (x) 6= f (1) ∧ f (x) 6= f (3) ∧ f (1) 6= f (2) .

Applying transformation 1 of Phase 1 three times produces the ΣZ -formula

FZ : 1 ≤ x ∧ x ≤ 3 ∧ w1 = 1 ∧ w2 = 2 ∧ w3 = 3

and the ΣE -formula

FE : f (x) 6= f (w1 ) ∧ f (x) 6= f (w3 ) ∧ f (w1 ) 6= f (w2 ) ,

with
282 10 Combining Decision Procedures

V = shared(FZ , FE ) = {x, w1 , w2 , w3 } .

From 1 ≤ x ≤ 3, PZ discovers that

FZ ⇒ x = w1 ∨ x = w2 ∨ x = w3 .

Recall that TZ is not convex. On case

E1a : x = w1 ,

PE finds that

FE ∧ E1a ⇒ ⊥

because of f (x) 6= f (w1 ). On case

E1b : x = w2 ,

neither PZ nor PE discovers any contradiction or new equality. That is,

FZ ∧ E1b 6⇒ x = w1 ∨ x = w3 ∨ w1 = w2 ∨ w1 = w3 ∨ w2 = w3

and

FE ∧ E1b 6⇒ x = w1 ∨ x = w3 ∨ w1 = w2 ∨ w1 = w3 ∨ w2 = w3 ;

or, in other words,

FZ ∧ E1b ∧ x 6= w1 ∧ x 6= w3 ∧ w1 6= w2 ∧ w1 6= w3 ∧ w2 6= w3

is TZ -satisfiable, and

FE ∧ E1b ∧ x 6= w1 ∧ x 6= w3 ∧ w1 6= w2 ∧ w1 6= w3 ∧ w2 6= w3

is TE -satisfiable. Thus, F is (TE ∪ TZ )-satisfiable.


Figure 10.3 summarizes this argument. The middle branch terminates with
a satisfying arrangement. We did not actually explore the right branch. 

10.3.3 Equality Propagation: Implementation

Equality propagation can be implemented somewhat efficiently without mod-


ifying the individual decision procedures. For convex theory Tj , test each
possible equality ui = vi . Suppose that Fj is the Σj -formula constructed in
Phase 1 and E is the conjunction of equalities discovered so far. Then check
if any equality ui = vi is implied:

Fj ∧ E ⇒ ui = vi .

Any implied equality should be propagated to the other theories.



10.4 Correctness of the Nelson-Oppen Method 283

{}

x = w1 x = w2 x = w3

{x = w1 } {x = w2 } {x = w3 }

FE ∧ x = w1 |= ⊥ FE ∧ x = w3 |= ⊥

⊥ ⊥

⋆ : FZ |= x = w1 ∨ x = w2 ∨ x = w3

Fig. 10.3. Summary of Example 10.15

This procedure is not applicable to a non-convex theory Tk . A procedure


for a non-convex theory must be able to find disjunctions of equalities that
are implied by a Σk -formula Fk . Moreover, the disjunctions should be as small
as possible since the Nelson-Oppen method must branch on each disjunct. A
disjunction is minimal if it is implied by Fk and if each smaller disjunction
is not implied by Fk .
A simple procedure to find a minimal disjunction is based on the obser-
vation that any disjunction that contains a minimal disjunction — which is
implied by Fk by definition — is also implied by Fk . Therefore, we can strip
off extra disjuncts one-by-one. First, consider the disjunction of all equalities
at once. If it is not implied, then no subset is implied either, so we are done.
Otherwise, drop each equality in turn: if the remaining disjunction is still
implied by Fk , continue with this smaller disjunction; otherwise, restore the
equality and continue. When all equalities have been considered, the result-
ing disjunction is minimal. This procedure requires checking Tk -satisfiability
O(|V |2 ) times, where V is the set of shared variables. Exercise 10.4 asks the
reader to describe a procedure based on binary search that requires asymp-
totically fewer satisfiability checks when the final disjunction is small relative
to the disjunction of all equalities.


10.4 Correctness of the Nelson-Oppen Method
In this section, we prove the correctness of the Nelson-Oppen combination
method. We reason at the level of arrangements, which is more suited to the
nondeterministic version of the method. However, Section 10.3 shows how to
construct an arrangement in the deterministic version, as well, so the following
proof can be extended to the deterministic version. We also focus on the second
phase of the nondeterministic procedure, which chooses an arrangement if
one exists. We thus assume that the variable abstraction phase is correct: it
284 10 Combining Decision Procedures

produces formulae F1 and F2 such that F1 ∧ F2 is (T1 ∪ T2 )-equivalent to the


given (Σ1 ∪ Σ2 )-formula F .
A theory has equality (or is a theory with equality) if its signature in-
cludes the binary predicate = and its axioms imply reflexivity, symmetry, and
transitivity of equality. The pure equality fragment of a theory with equality
is composed of formulae that are possibly quantified Boolean combinations of
equalities between variables.
Theorem 10.16 (Sound & Complete). Consider stably infinite theories T1
and T2 such that Σ1 ∩Σ2 = {=}. For conjunctive quantifier-free Σ1 -formula F1
and conjunctive quantifier-free Σ2 -formula F2 , F1 ∧ F2 is (T1 ∪ T2 )-satisfiable
iff there exists an arrangement K = α(shared(F1 , F2 ), E) such that F1 ∧ K is
T1 -satisfiable and F2 ∧ K is T2 -satisfiable.
Soundness is straightforward. Suppose that F1 ∧ F2 is (T1 ∪ T2 )-satisfiable
with satisfying (T1 ∪ T2 )-interpretation I. Extract from I the equivalence
relation E such that the arrangement K = α(shared(F1 , F2 ), E) is satisfied by
I. Then F1 ∧K and F2 ∧K are both satisfied by I, which can be viewed as both
a T1 -interpretation and a T2 -interpretation, so that they are T1 -satisfiable and
T2 -satisfiable, respectively.
Completeness is more complicated. Let K = α(shared(F1 , F2 ), E) be an
arrangement such that F1 ∧K and F2 ∧K are T1 -satisfiable and T2 -satisfiable,
respectively. Suppose that F1 ∧ F2 is (T1 ∪ T2 )-unsatisfiable. We derive a
contradiction.
The outline of the proof is the following. Because F1 ∧ F2 is (T1 ∪ T2 )-
unsatisfiable, we know that F1 implies ¬F2 in T1 ∪ T2 . An adaptation of
the Craig Interpolation Lemma (Theorem 2.38) tells us that there is
a quantifier-free formula H such that F1 implies H over all infinite T1 -
interpretations (T1 -interpretations with infinite domains) and F2 implies ¬H
over all infinite T2 -interpretations: H interpolates between F1 and F2 . We then
show that the arrangement K implies H, which means that F2 implies ¬K over
all infinite T2 -interpretations. In other words, no infinite T2 -interpretation sat-
isfies F2 ∧K. Yet if T2 is stably infinite and F2 ∧K is T2 -satisfiable as assumed,
then F2 ∧ K is satisfied by some infinite T2 -interpretation, a contradiction.
We now present the details of the proof. First, because we are considering
only stably infinite theories, we need only consider interpretations with infinite
domains. For we can extend a T1 - or T2 -interpretation with a finite domain
to a T1 - or T2 -interpretation with an infinite domain. Therefore, define ⇒∗ as
a weaker form of implication: F ⇒∗ G iff G is true on every interpretation I
that has an infinite domain and that satisfies F . Similarly, weaken ⇔ to ⇔∗ .
If F ⇒∗ G, we say that F weakly implies G; if F ⇔∗ G, we say that F is
weakly equivalent to G.
Recall from Section 2.7.4 the following theorem.
Theorem 10.17 (Compactness Theorem). A countable set of first-order
formulae S is simultaneously satisfiable iff the conjunction of every finite sub-
set is satisfiable.

10.4 Correctness of the Nelson-Oppen Method 285

Since F1 ∧ F2 is (T1 ∪ T2 )-unsatisfiable, the Compactness Theorem tells


us that there exist a conjunction S1 of a finite subset of axioms of T1 and a
conjunction S2 of a finite subset of axioms of T2 such that S1 ∧ F1 ∧ S2 ∧ F2 is
(first-order) unsatisfiable. Choose S1 and S2 to include the axioms that imply
reflexivity, symmetry, and transitivity of equality. Then, rearranging, we have
that

S1 ∧ F1 ⇒ ¬S2 ∨ ¬F2 . (10.2)

Recall from Section 2.7.4 the following theorem.


Theorem 10.18 (Craig Interpolation Lemma). If F1 ⇒ F2 , then there
exists a formula H such that F1 ⇒ H, H ⇒ F2 , and each free variable,
function symbol, and predicate symbol of H appears in F1 and F2 .
Hence, from implication (10.2), there exists an interpolant H ′ such that
free(H ′ ) = shared(F1 , F2 ) and

S1 ∧ F1 ⇒ H ′ and S2 ∧ H ′ ⇒ ¬F2 .

The latter implication is derived by rearranging H ′ ⇒ ¬S2 ∨ ¬F2 . Because =


is the only predicate or function shared between S1 ∧F1 and S2 ∧F2 , H ′ is of a
special form: its atoms are equalities between variables of shared(F1 , F2 ). How-
ever, H ′ may have quantifiers. We prove next that in fact a “weak” quantifier-
free interpolant H exists.
Lemma 10.19 (Weak Quantifier Elimination for Pure Equality). Con-
sider any stably infinite theory T with equality. For each pure equality formula
F , there exists a quantifier-free pure equality formula F ′ such that F is weakly
T -equivalent to F ′ .
Proof. Consider pure equality formula ∃x. G[x, y], where G is quantifier-free
with free variables x and y. Define

G0 : G{x = x 7→ true, x = y1 7→ false, . . . , x = yn 7→ false}

and, for i ∈ {1, . . . , n},

Gi : G{x 7→ yi } .

We claim that ∃x. G is weakly T -equivalent to

G′ : G0 ∨ G1 ∨ · · · ∨ Gn .

For G′ asserts that x is either equal to some free variable yi or not. Because
we consider only interpretations with infinite domains, it is always possible
for x not to equal any yi .
By Section 7.1, we have a weak quantifier elimination procedure over the
pure equality fragment of T . It is weak because equivalence is only guaranteed
to hold on infinite interpretations. 
286 10 Combining Decision Procedures

Example 10.20. Consider the pure equality formula

F : x 6= y ∧ (∀z. z = x ∨ z = y) .

For eliminating z, consider the negation of the second conjunct,

G : ∃z. z 6= x ∧ z 6= y ,

for which we have

G0 : ¬⊥ ∧ ¬⊥ ⇔ ⊤

and

Gx : x 6= x ∧ x 6= y ⇔ ⊥ Gy : y 6= x ∧ y 6= y ⇔ ⊥ .

Then

G′ : G0 ∨ Gx ∨ Gy ⇔ ⊤ .

Substituting into F , we have

x 6= y ∧ ¬(⊤) ⇔ ⊥ .

Hence, over infinite interpretations satisfying the axioms of equality, F is


equivalent to ⊥.
However, in an interpretation with a two-element domain that satisfies the
equality axioms, F is not equivalent to ⊥, but rather to x 6= y. For if x 6= y
on such an interpretation, then every element is equal either to x or to y. 

Continuing the main theorem, we claim that there exists a quantifier-free


pure equality formula H over shared(F1 , F2 ) such that

S1 ∧ F1 ⇒∗ H and S2 ∧ H ⇒∗ ¬F2 .

For by Lemma 10.19, a quantifier-free pure equality formula H exists such


that H is weakly equivalent to the Craig interpolant H ′ in any stably infinite
theory with equality.
For the next step, recall from the beginning of the proof that F1 ∧ K is
T1 -satisfiable and F2 ∧ K is T2 -satisfiable, where K = α(shared(F1 , F2 ), E) is
an arrangement. We thus know that

S1 ∧ F1 ∧ K and S2 ∧ F2 ∧ K

are (first-order) satisfiable. Moreover, as T1 and T2 are stably infinite, each of


these formulae has an interpretation with an infinite domain.
Now, K is a conjunction of equalities and disequalities between pairs of
variables of shared(F1 , F2 ). Moreover, by the definition of an arrangement, K
is as strong as possible: no additional equality literals L over shared(F1 , F2 )

10.5 Complexity 287

can be added to K without either K and K ∧ L being equivalent in a theory


with equality or K ∧ L being unsatisfiable in a theory with equality. Based on
this observation, construct the formula K ′ by conjoining additional equality
literals: for each pair of variables u, v ∈ shared(F1 , F2 ), conjoin either u = v
or u 6= v, depending on which maintains the satisfiability of K ′ in a theory
with equality. Now, since S1 ∧ F1 ∧ K is satisfiable, then so is S1 ∧ F1 ∧ K ′ ,
indeed by the same interpretations.
We claim that the DNF representation of H must include K ′ or a (con-
junctive) subformula of K ′ as a disjunct. Suppose not; then every disjunct
of the DNF representation of H contradicts the satisfying interpretations of
S1 ∧ F1 ∧ K ′ , of which at least one exists. Therefore, K ′ ⇒ H, and — because
K and K ′ are equivalent in a theory with equality — K ⇒ H. In other words,
the discovered arrangement K is a special case of the weak interpolant H.
To finish, we have

S2 ∧ H ⇒∗ ¬F2 ,

or, rearranging,

S2 ∧ F2 ⇒∗ ¬H .

From K ⇒ H, we have ¬H ⇒ ¬K, so

S2 ∧ F2 ⇒∗ ¬K .

But this weak implication contradicts that S2 ∧ F2 ∧ K is satisfied by some


infinite interpretation. Thus, F1 ∧ F2 is actually (T1 ∪ T2 )-satisfiable, and the
Nelson-Oppen method is correct.


10.5 Complexity
Assume that T1 and T2 are stably infinite theories such that Σ1 ∩ Σ = {=}.
Also, they have decision procedures P1 and P2 for their respective conjunctive
quantifier-free fragments.

Theorem 10.21 (Complexity: Convex Theories). If convex theories T1


and T2 have PTIME decision procedures P1 and P2 , then the Nelson-Oppen
combination based on equality propagation is a PTIME decision procedure for
the conjunctive quantifier-free fragment of T1 ∪ T2 .

Theorem 10.22 (Complexity: Non-Convex Theories). If T1 and T2


have NPTIME decision procedures P1 and P2 , then the Nelson-Oppen com-
bination based on equality propagation is an NPTIME decision procedure for
the conjunctive quantifier-free fragment of T1 ∪ T2 .
288 10 Combining Decision Procedures

10.6 Summary
Combining decision procedures in a general and efficient manner is crucial
for most applications. This chapter covers the Nelson-Oppen combination
method, in particular:
• The nondeterministic Nelson-Oppen method. Three requirements: the the-
ories only share =; the theories are stably infinite; and the considered for-
mula is quantifier-free. Variable abstraction, separation into theory-specific
formulae. Shared variables, equivalence relations over shared variables, ar-
rangements.
• The deterministic Nelson-Oppen method. Convex theories. Equality prop-
agation.
• Correctness of the Nelson-Oppen method, which follows from the Craig
Interpolation Lemma of Chapter 2.
• Complexity. When the individual decision procedures are convex and run
in polynomial time, the combination procedure runs in polynomial time.
The Nelson-Oppen combination method provides a general means of reasoning
simultaneously about the theories studied in this book using the individual
decision procedures. Being able to reason in union theories is crucial. For
example, almost all of the verification conditions of Chapters 5 and 6 are
expressed in multiple signatures.

Bibliographic Remarks
Nelson and Oppen describe the Nelson-Oppen combination method [65]. Their
original proof of correctness was flawed; Oppen presents a corrected proof in
[70], and Nelson presents a corrected proof in [64]. Oppen also proves in [70]
the complexity results that we state. Tinelli and Harandi present an alter-
nate proof of correctness in [92]. Our correctness proof derives from that of
Nelson and Oppen. See [56] for another presentation of the method and its
correctness.
Another general combination method that has received much attention is
that of Shostak [84]. See the work of Ruess and Shankar [78] for a correct
presentation of the method.

Exercises
10.1 (DP for combinations). For each of the following formulae, identify
the combination of theories in which it lies. To avoid ambiguity, prefer TZ to
TQ . Then apply the N-O method using the appropriate decision procedures.
Use either the nondeterministic or deterministic version. Provide a level of
detail as in the examples of the chapter.
Exercises 289

(a) 1 ≤ x ∧ x ≤ 2 ∧ cons(1, y) 6= cons(x, y) ∧ cons(2, y) 6= cons(x, y)


(b) a[i] ≥ 1 ∧ a[i] + x ≤ 2 ∧ x > 0 ∧ x = i ∧ ahx ⊳ 2i[i] 6= 1

10.2 (Deterministic N-O). Apply the deterministic N-O method to the


following formulae. Prefer TQ to TZ .
(a) 1 ≤ x ∧ x ≤ 2 ∧ cons(1, y) 6= cons(x, y) ∧ cons(2, y) 6= cons(x, y)
(b) x + y = z ∧ f (z) = z ∧ f (x + y) 6= z
(c) g(x + y, z) = f (g(x, y)) ∧ x + z = y ∧ z ≥ 0 ∧ x ≥ y
∧ g(x, x) = z ∧ f (z) 6= g(2x, 0)

10.3 (⋆ Equality propagation in TE ). Section 10.3.3 explains general tech-


niques for propagating equalities. However, some decision procedures are eas-
ily modified to propagate new equalities. Describe such a modification of the
congruence closure algorithm of Chapter 9.

10.4 (⋆ Equality propagation). Consider conjunctive Σ-formula F of non-


convex theory T and the disjunction of equalities
n
_
G: ui = vi
i=1

such that F ⇒ G. Describe a procedure based on binary search that discovers


a minimal disjunction G′ of the equalities of G that is implied by F . If the
procedure returns a disjunction with m equalities, then it should have invoked
the decision procedure for T at most O(m lg n) times. Hint: The solution is
related to the solution of Exercise 8.1(e).

10.5 (⋆ Convex theories). Prove that the following theories are convex:
(a) TE
(b) Tcons

10.6 (⋆ Complexity). Prove the complexity results about the N-O method.
(a) Theorem 10.21.
(b) Theorem 10.22.

You might also like