0% found this document useful (0 votes)
29 views358 pages

Logic Gate

The document is a comprehensive introduction to computer architecture, covering mathematical preliminaries, digital logic basics, and various components such as switches, transistors, and logic gates. It includes detailed sections on combinatorial and sequential logic, as well as representations of data and digital systems. The content is structured to guide readers through fundamental concepts and practical applications in computer architecture.

Uploaded by

Badam Niazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views358 pages

Logic Gate

The document is a comprehensive introduction to computer architecture, covering mathematical preliminaries, digital logic basics, and various components such as switches, transistors, and logic gates. It includes detailed sections on combinatorial and sequential logic, as well as representations of data and digital systems. The content is structured to guide readers through fundamental concepts and practical applications in computer architecture.

Uploaded by

Badam Niazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 358

A Practical Introduction to Computer

Architecture

Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 2


© Daniel Page ⟨dan@phoo.org⟩

CONTENTS

I Tools and Techniques 13


1 Mathematical preliminaries 15
1.1 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.1 Connectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Collections: sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 Basic definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.3 Advanced definition and short-hands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Collections: sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Basic definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.3 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.4 Advanced definition and short-hands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Collections: some additional special-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.1 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6.1 Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.6.3 Normal (or standard) forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.8.1 Bits, bytes and words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.8.2 Positional number systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8.3 Representing integer numbers, i.e., members of Z . . . . . . . . . . . . . . . . . . . . . . . 48
1.8.4 Representing real numbers, i.e., members of R . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.8.5 Representing characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.9 A conclusion: steps toward a digital logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2 Basics of digital logic 65


2.1 Switches and transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.1.1 A brief tour of fundamental principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.1.2 Implementing transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Combinatorial logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.1 A suite of simplified logic gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

git # 8b6da880 @ 2023-09-27 3


© Daniel Page ⟨dan@phoo.org⟩

2.2.2 Harnessing the universality of NAND and NOR . . . . . . . . . . . . . . . . . . . . . . . . 77


2.2.3 Designing circuits for arbitrary combinatorial functions . . . . . . . . . . . . . . . . . . . . 78
2.2.4 Physical properties of combinatorial logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.2.5 Building block components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.3 Sequential logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.3.1 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.3.2 Latches, flip-flops and registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.3.3 Putting everything together: general clocking strategies . . . . . . . . . . . . . . . . . . . 115
2.4 Pipelined logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.4.1 An analogy: car production lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.4.2 Treating logic as a production line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.4.3 Some concrete examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2.5 Implementation and fabrication technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
2.5.1 Silicon fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.5.2 (Re)programmable fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

3 Basics of computer arithmetic 135


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.2 High-level ALU architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.3 Components for addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.3.3 Carry and overflow detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.4 Components for shift and rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.4.1 Introductory concepts and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.4.2 Iterative designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.4.3 Combinatorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.5 Components for multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.5.1 Introductory concepts and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.5.2 Iterative, bit-serial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.5.3 Iterative, digit-serial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
3.5.4 Combinatorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3.5.5 Some multiplier case-studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
3.6 Components for comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
3.6.1 Unsigned comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.6.2 Signed comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
3.6.3 Beyond equality and less than . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

4 Basics of memory technology 183


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2 Memory cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2.1 Static RAM (SRAM) cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2.2 Dynamic RAM (DRAM) cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2.3 ROM cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.3 Memory cells { devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.3.1 Static RAM (SRAM) devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.3.2 Dynamic RAM (DRAM) devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.3.3 ROM devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.4 Memory devices { modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5 Computational machines: Finite State Machines (FSMs) 187


5.1 State machines: from simple to more complex control-paths . . . . . . . . . . . . . . . . . . . . . 187
5.1.1 A rough overview of FSM-related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.1.2 Practical implementation of FSMs in hardware . . . . . . . . . . . . . . . . . . . . . . . . . 190

II Appendices 201
A Example exam-style questions 203
A.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
A.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
A.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

git # 8b6da880 @ 2023-09-27 4


© Daniel Page ⟨dan@phoo.org⟩

A.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245


A.5 Chapter ?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
A.6 Chapter ?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

B Example exam-style solutions 263


B.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
B.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
B.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
B.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
B.5 Chapter ?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
B.6 Chapter ?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

git # 8b6da880 @ 2023-09-27 5


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 6


© Daniel Page ⟨dan@phoo.org⟩

LIST OF FIGURES

1.1 A collection of Venn diagrams for standard set operations. . . . . . . . . . . . . . . . . . . . . . . 22


1.2 An example Venn diagram showing membership of two sets. . . . . . . . . . . . . . . . . . . . . 22
1.3 Number lines illustrating the mapping of 8-bit sequences to integer values using three different
representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.4 A visualisation of the impact of increasing q, the number of fractional digits, in a fixed-point
representation; the result is increased detail within the rendering of a Mandelbrot fractal. . . . . 55
1.5 Single- and double- precision IEEE-754 floating-point formats described graphically as bit-
sequences and concretely as C structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.6 A short C program that performs direct manipulation of IEEE floating-point numbers. . . . . . . 61
1.7 A teletype machine being used by UK-based Royal Air Force (RAF) operators during WW2
(public domain image, source: http://en.wikipedia.org/wiki/File:WACsOperateTeletype.
jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.8 A table describing the printable ASCII character set. . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.1 The sub-atomic structure of a lithium atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66


2.2 A simple circuit conditionally connecting a capacitor (or battery) to a lamp depending on the
state of a switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.3 Some simple examples of Boolean-style control of a lamp by combinations of switches. . . . . . . 67
2.4 A 6P1P (i.e., a 100W to 200W, photo-sensitive type) vacuum tube (public domain image, source:
http://en.wikipedia.org/wiki/File:6P1P.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.5 A moth found by operations of the Harvard Mark 2; the “bug” was trapped within the computer
and caused it to malfunction (public domain image, source: http://en.wikipedia.org/wiki/
File:H96566k.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.6 A replica of the first point-contact transistor, a precursor of designs such as the MOSFET, con-
structed at Bell Labs (public domain image, source: http://en.wikipedia.org/wiki/File:
Replica-of-first-transistor.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7 A high-level diagram of a MOSFET transistor, showing the terminal and body materials. . . . . 71
2.8 A pair of N-MOSFET and P-MOSFET transistors, arranged to form a CMOS cell. . . . . . . . . . 71
2.9 Symbolic descriptions of N-MOSFET and P-MOSFET transistors. . . . . . . . . . . . . . . . . . . 72
2.10 MOSFET-based implementations of NOT, NAND and NOR logic gates. . . . . . . . . . . . . . . 73
2.11 A voltage-oriented truth table for NOT, NAND and NOR logic gates. . . . . . . . . . . . . . . . . 74
2.12 Representation of standard logic gates in English, Boolean algebra, C and symbolic notations. . . 76
2.13 Truth tables for standard logic gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.14 Identities for standard logic gates in terms of NAND and NOR. . . . . . . . . . . . . . . . . . . . 77
2.15 4- and 3-input example Boolean functions respectively. . . . . . . . . . . . . . . . . . . . . . . . . 83
2.16 Quine-McCluskey simplification, step #1: extraction of prime implicants. . . . . . . . . . . . . . . 88
2.17 Quine-McCluskey simplification, step #2: covering the prime implicants table. . . . . . . . . . . 88
2.18 An illustration of idealised and realistic switching activity wrt. a MOSFET-based NOT gate. . . 91
2.19 A behavioural waveform demonstrating the effects of propagation delay on an XOR implemen-
tation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

git # 8b6da880 @ 2023-09-27 7


© Daniel Page ⟨dan@phoo.org⟩

2.20 A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour. . . 91
2.21 A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive
n target gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.22 An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells. . . . . . 95
2.23 Application of the isolated and cascaded replication design patterns. . . . . . . . . . . . . . . . . 96
2.24 An overview of equality and less than comparators. . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.25 An overview of half- and full-adder cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.26 Gate universality used to implement a NAND- and NOR-based half-adder. Note that the
dashed boxes in the NAND and NOR implementations (middle and bottom) are translations of
the primitive gates within the more natural description (top). . . . . . . . . . . . . . . . . . . . . . 100
2.27 An example encoder/decode pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.28 An incorrect counter design, using naive “looped” feedback. . . . . . . . . . . . . . . . . . . . . . 103
2.29 An illustration of standard features in 1- and 2-phase clocks. . . . . . . . . . . . . . . . . . . . . . 104
2.30 Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation
around en). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.31 A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated
(middle and bottom) control features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.32 A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases
for S = 0 and R = 0, and no sane cases for when S = 1 and R = 1. . . . . . . . . . . . . . . . . . . . 110
2.33 An annotated SR latch, decomposed into two NOR gates and then into transistors; r0 , the output
of the top NOR gate, is used as input by the bottom NOR gate and r1 , the output from the bottom
NOR gate, is used as input by the top NOR gate (although the physical connections are not drawn).112
2.34 A NOR-based D-type flip-flop created using a glitch generator. . . . . . . . . . . . . . . . . . . . 114
2.35 A NOR-based D-type flip-flop created using a primary-secondary organisation of latches. . . . . 114
2.36 An n-bit register, with n replicated 1-bit components synchronised using the same enable signal. 115
2.37 A correct counter design, using sequential logic components. . . . . . . . . . . . . . . . . . . . . . 116
2.38 Two illustrative waveforms, outlining stages of computation within the associated counter design.117
2.39 Two different high-level clocking strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.40 Production line #1, staffed with pre-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.41 Production line #2, staffed with post-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.42 Four different ways to split a (hypothetical) component X into stages. . . . . . . . . . . . . . . . . 121
2.43 A problematic pipeline, and a solution involving the use of pipeline registers and a control signal
to indicate when each stage should advance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.44 An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock. 124
2.45 An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative. . . . . . . . . 124
2.46 An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative. . 125
2.47 An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative. . . . . . . . . . 126
2.48 A high-level illustration of a lithography-based fabrication process. . . . . . . . . . . . . . . . . . 127
2.49 Bonding wires connected to a high quality gold pad (public domain image, source: http:
//en.wikipedia.org/wiki/Image:Wirebond-ballbond.jpg). . . . . . . . . . . . . . . . . . . . . 128
2.50 A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public
domain image, source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_
z-clip.png). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.51 A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.
intel.com/technology/mooreslaw/). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.52 Conceptual diagrams of a PLA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.53 Conceptual diagrams of an FPGA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

3.1 Two high-level ALU architectures: each combines a number of sub-components, but does so
using a different strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2 An n-bit, ripple-carry adder described using a circuit diagram. . . . . . . . . . . . . . . . . . . . . 139
3.3 An n-bit, ripple-carry subtractor described using a circuit diagram. . . . . . . . . . . . . . . . . . 139
3.4 An n-bit, ripple-carry adder/subtractor described using a circuit diagram. . . . . . . . . . . . . . 139
3.5 An n-bit, carry look-ahead adder described using a circuit diagram. . . . . . . . . . . . . . . . . . 139
3.6 An illustration depicting the structure of carry look-ahead logic, which is formed by an upper-
and lower-tree of OR and AND gates respectively (with leaf nodes representing gi and pi terms
for example). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.7 An overview of half- and full-subtractor cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.8 An iterative design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . . . . 152
3.9 A combinatorial design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . 152

git # 8b6da880 @ 2023-09-27 8


© Daniel Page ⟨dan@phoo.org⟩

3.10 Two examples demonstrating different strategies for accumulation of base-b partial products
resulting from two 3-digit operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.11 An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram. . . 162
3.12 A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs. 174
3.13 An (n × n)-bit tree multiplier design, described using a circuit diagram. . . . . . . . . . . . . . . . 175
3.14 A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram. . 175
3.15 An n-bit, unsigned equality comparison described using a circuit diagram. . . . . . . . . . . . . . 178
3.16 An n-bit, unsigned less than comparison described using a circuit diagram. . . . . . . . . . . . . 178

4.1 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


4.2 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.3 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.4 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.5 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.6 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.7 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.8 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.1 An example FSM to decide whether there is an odd number of 1 elements in some sequence X. . 188
5.2 An example FSM modelling a simple vending machine. . . . . . . . . . . . . . . . . . . . . . . . . 188
5.3 Two generic FSM frameworks (for different clocking strategies) into which one can place imple-
mentations of the state, δ (the transition function) and ω (the output function). . . . . . . . . . . 191
5.4 Two illustrative waveforms (for different clocking strategies), outlining stages of computation
within the associated FSM framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5 An example FSM modelling an ascending modulo 6 counter. . . . . . . . . . . . . . . . . . . . . . 194
5.6 An example FSM modelling an ascending or descending modulo 6 counter. . . . . . . . . . . . . 195
5.7 An example FSM modelling a traffic light controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A.1 A set of 5 different Karnaugh maps, captioned with an associated option. . . . . . . . . . . . . . . 207
A.2 A truth table for the 4-input Boolean function f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
A.3 MOSFET-based implementations of C0 and C1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
A.4 An implementation of a full-adder cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
A.5 An implementation of a cyclic n-bit counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
A.6 A combinatorial logic design, described using N-type and P-type MOSFET transistors. . . . . . . 221
A.7 A sequential logic design, containing two D-type flip-flops. . . . . . . . . . . . . . . . . . . . . . . 221
A.8 A combinatorial logic design, described using N-type and P-type MOSFET transistors; note that
the pull-down network is (partially) missing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
A.9 An SR-latch, described in terms of abstract components labelled ⊙. . . . . . . . . . . . . . . . . . 224
A.10 An SR-latch variant, which includes additional inputs P, C, and en. . . . . . . . . . . . . . . . . . 224
A.11 A 4Mbit DRAM block diagram (source: http://www.micross.com/pdf/MT4C4001J.pdf). . . . . 246
A.12 A diagrammatic description of an 8-bit micro-processor and associated memory system. . . . . . 247
A.13 An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and rst on the left-hand side; 8-bit s
spread within the design) and 1 output (1-bit r on the right-hand side). . . . . . . . . . . . . . . . 251
A.14 A waveform describing behaviour of Φ1 , Φ2 , and rst within Figure A.13. . . . . . . . . . . . . . . 252
A.15 A NAND-based implementation of a D-type latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
A.16 A NAND-based implementation of a 2-input XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . 252
A.17 A NAND-based implementation of a 2-input, 1-bit multipliexer. . . . . . . . . . . . . . . . . . . . 252
A.18 Implementation of a simple FSM, using D-type latches and a 2-phase clock. . . . . . . . . . . . . 253
A.19 Implementation of a simple FSM, using D-type latches and a 2-phase clock. . . . . . . . . . . . . 254
A.20 The instruction set for an example 4-register counter machine. . . . . . . . . . . . . . . . . . . . . 257
A.21 The high-level data- and control-path for an example 4-register counter machine. . . . . . . . . . 258
A.22 The low-level decoder implementation for an example 4-register counter machine. . . . . . . . . 259

B.1 A time line illustrating behaviour of a multiplexer-based, primary-secondary flip-flop. . . . . . . 284

git # 8b6da880 @ 2023-09-27 9


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 10


© Daniel Page ⟨dan@phoo.org⟩

LIST OF TABLES

git # 8b6da880 @ 2023-09-27 11


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 12


© Daniel Page ⟨dan@phoo.org⟩

Part I

Tools and Techniques

git # 8b6da880 @ 2023-09-27 13


© Daniel Page ⟨dan@phoo.org⟩

CHAPTER

1
MATHEMATICAL PRELIMINARIES

In Mathematics you don’t understand things. You just get used to them.

– von Neumann

The goal of this Chapter is to provide a fairly comprehensive overview of theory that underpins the rest of the book. At
first glance the content may seem a little dry, and is often excluded in other similar books. It seems clear, however, that
without a solid understanding of said theory, using the constituent topics to solve practical problems will be much harder.
The topics covered all relate to the field of discrete Mathematics; they include propositional logic, sets and functions,
Boolean algebra and number systems. These four topics combine to produce a basis for formal methods to describe,
manipulate and implement digital systems. Readers with a background in Mathematics or Computer Science might skip
this Chapter and use it simply for reference; those approaching it from some other background would be advised to read
the material in more detail.

1.1 Propositional logic


Definition 1.1. A proposition is a statement whose meaning, termed the truth value, is either true or false (less
formally, we say the statement is true if it has a truth value of true and false if it has a truth value of false). A given
proposition can involve one or more variables; only when concrete values are assigned to the variables can the meaning
of a proposition be evaluated.

In part because we use them naturally in language, it almost seems too formal to define what a proposition is.
However, by doing so we can start to use them as a building block to describe what propositional logic is and
how it works. This is best explained step-by-step by example:

Example 1.1. The statement

“the temperature is 90◦ C”

is a proposition since it is definitely either true or false. When we take a proposition and decide whether it is true
or false, we say we have evaluated it. However, there are clearly a lot of statements that are not propositions
because they do not state any proposal. For example,

“turn off the heat”

is a command or request of some kind, it does not evaluate to a truth value. Propositions must also be well
defined in the sense that they are definitely either true or false, i.e., there are no “grey areas” in between. The
statement

“90◦ C is too hot”

git # 8b6da880 @ 2023-09-27 15


© Daniel Page ⟨dan@phoo.org⟩

is not a proposition, because it could be true or false depending on the context: 90◦ C is probably too hot for
body temperature, but not for a cup of coffee. Finally, some statements seem to be propositions but cannot be
evaluated because they are paradoxical: a famous example is the so-called liar paradox, usually attributed to
the Greek philosopher Eubulides, who stated it as

“a man says that he is lying, is what he says true or false?”

although a clearer version is the more commonly referenced

“this statement is false” .

If the man is telling the truth, everything he says must be true which means he is lying and hence everything
he says is false. Conversely, if the man is lying everything he says is false so he cannot be lying (because he
said that he was). In terms of the statement, we cannot be sure of the truth value so this cannot be classified as
a proposition.

Example 1.2. When a proposition contains one or more variables, we can only evaluate it having first assigned
each a concrete value. For example, consider

“x◦ C equals 90◦ C”

where x is a variable. By assigning x a value we get a proposition; setting x = 10, for example, gives

“10◦ C equals 90◦ C”

which clearly evaluates to false. Setting x = 90◦ C gives

“90◦ C equals 90◦ C”

which evaluates to true.

Definition 1.2. Informally, a propositional function is just a short-hand way of writing a proposition; we give the
function a name and a list of free variables. So, for example, the function

f (x, y) : x = y

is called f and has two variables named x and y. If we use the function as f (10, 20), performing the binding x = 10 and
y = 20, it has the same meaning as 10 = 20.

Example 1.3. We might write


g : “the temperature is 90◦ C”
and hence use g (the left-hand side) as a short-hand for the longer proposition (the right-hand side): it works
the same way in the sense that if g tells us the truth value of said proposition. Here, g has no free variables;
imagine we extend our example to write

h(x) : “x◦ C equals 90◦ C”.

Now, h is representing a longer proposition. When we bind x to a value via h(10), we find

h(10) = “10◦ C equals 90◦ C”

which can be evaluated to false.

1.1.1 Connectives
Definition 1.3. A connective binds together a number of propositional terms into a single, compound proposition called
an expression. For brevity, we use symbols to denote common connectives:

• “not x” is denoted ¬x, and often termed logical complement (or negation).

• “x and y” is denoted x ∧ y, and often termed logical conjunction.

• “x or y” is denoted x ∨ y, and often called an inclusive-or, and termed logical (inclusive) disjunction.

• “x or y but not x and y” is denoted x ⊕ y, and often called an exclusive-or, and termed logical (exclusive) disjunc-
tion.

git # 8b6da880 @ 2023-09-27 16


© Daniel Page ⟨dan@phoo.org⟩

• “x implies y” is denoted x ⇒ y, and sometimes written as “if x then y”, and termed logical implication, and
finally
• “x is equivalent to y” is denoted x ≡ y, and sometimes written as “x if and only if y” or even “x iff. y”. termed
logical equivalence.

Note that we group statements using parentheses when there could be some confusion about the order they are applied.
As such (x ∧ y) is the same as x ∧ y, and (x ∧ y) ∨ z simply means we apply the ∧ connective to x and y first, then ∨ to
the result and z.
Definition 1.4. Provided we include parentheses in a compound proposition, there will be no ambiguity wrt. the order
connectives are applied. For instance, if we write
(x ∧ y) ∨ z
it is clear that we first resolve the conjunction of x and y, then the disjunction of that result and z.
If parentheses are not included however, we rely on precedence rules to determine the order for us. In short, the
following list

1. ¬,
2. ∧,
3. ∨,
4. ⇒,
5. ≡

assigns a precedence level to each connective. Using the same example as above, if we omit the parentheses and instead
write
x∧y∨z
we still get the same result: ∧ has a higher precedence level than ∨ (sometimes we say ∧ “binds more tightly” to operands
than ∨), so we resolve the former before the latter.
Example 1.4. For example, the expression

“the temperature is less than 90◦ C ∧ the temperature is greater than 10◦ C”

contains two terms that propose

“the temperature is less than 90◦ C”

and

“the temperature is greater than 10◦ C” .

These terms are joined together using the ∧ connective so that the whole expression evaluates to true if both of
the terms are true, otherwise it evaluates to false. In a similar way we might write a compound proposition

“the temperature is less than x◦ C ∧ the temperature is greater than y◦ C”

which can only be evaluated when we assign values to the variables x and y.
Definition 1.5. The meaning of connectives is usually describe in a tabular form which enumerates the possible values
each term can take and what the resulting truth value is; we call this a truth table.

x y ¬x x∧y x∨y x⊕y x⇒y x≡y


false false true false false false true true
false true true false true true true false
true false false false true true false false
true true false true true false true true

Example 1.5. The ¬ connective complements (or negates) the truth value of a given expression. Considering
the expression
¬(x > 10),
we find that the expression ¬(x > 10) is true if the term x > 10 is false and the expression is false if x > 10 is
true. If we assign x = 9, x > 10 is false and hence the expression ¬(x > 10) is true. If we assign x = 91, x > 10 is
true and hence the expression ¬(x > 10) is false.

git # 8b6da880 @ 2023-09-27 17


© Daniel Page ⟨dan@phoo.org⟩

Example 1.6. The meaning of the ∧ connective is also as one would expect; the expression
(x > 10) ∧ (x < 90)
is true if both the expressions x > 10 and x < 90 are true, otherwise it is false. So if x = 20, the expression is true.
But if x = 9 or x = 91, then it is false: even though one or other of the terms is true, they are not both true.
Example 1.7. The inclusive-or and exclusive-or connectives are fairly similar. The expression
(x > 10) ∨ (x < 90)
is true if either x > 10 or x < 90 is true or both of them are true. Here we find that all the assignments x = 20,
x = 9 and x = 91 mean the expression is true; in fact it is hard to find an x for which it evaluates to false!
Conversely, the expression
(x > 10) ⊕ (x < 90)
is only true if only one of either x > 10 or x < 90 is true; if they are both true then the expression is false. We
now find that setting x = 20 means the expression is false while both x = 9 and x = 91 mean it is true.
Example 1.8. Implication is more tricky. If we write x ⇒ y, we typically call x the hypothesis and y the
conclusion. In order to justify the truth table for implication, consider the example
(x is prime ) ∧ (x , 2) ⇒ (x ≡ 1 (mod 2))
i.e., if x is a prime other than 2, it follows that it is odd. Therefore, if x is prime then the expression is true if
x ≡ 1 (mod 2) and false otherwise (since the implication is invalid). If x is not prime, then the expression does
not really say anything about the expected outcome: we only know what to expect if x was prime. Since it could
still be that x ≡ 1 (mod 2) even when x is not prime, based on what we know from the example, we assume it
is true when this case occurs.
Put in a less formal way, the idea is that anything can follow from a false hypothesis. If the hypothesis is
false, we cannot be sure whether or not the conclusion is false: we therefore we assume it is possibly true,
which is sort of an “optimistic default”. Consider a less formal example to support this. The statement “if I
am unhealthy then I will die” means x = “I am unhealthy” and y = “I will die”, and that r = x ⇒ y has four
possible cases:
1. I am healthy and do not die, so x = false, y = false and r = true,
2. I am healthy and die, so x = false, y = true and r = true,
3. I am unhealthy and do not die, so x = true, y = false and r = false, and
4. I am unhealthy and die, so x = true, y = true and r = true.
The first two cases do not contradict the original statement (since in them I am healthy, so it doesn’t apply):
only the third case does, in that I do not die (maybe I had a good doctor for instance).
Example 1.9. In contrast, equivalence is fairly simple. The expression x ≡ y is only true if x and y evaluate
to the same value. This matches the concept of equality in other contexts, such as between numbers. As an
example, consider
(x is odd ) ≡ (x ≡ 1 (mod 2)).
This expression is true since if the left side is true, the right side must also be true and vice versa. If we change
it to
(x is odd ) ≡ (x is prime ),
then the expression is false. To see this, note that only some odd numbers are prime: just because a number
is odd does not mean it is always prime although if it is prime it must be odd (apart from the corner case of
x = 2). So the equivalence works in one direction but not the other and hence the expression is false.
Definition 1.6. An expression which is equivalent to true, no matter what values are assigned to any variables, is called
a tautology; an expression which is equivalent to false is called a contradiction.
Definition 1.7. We call two expressions logically equivalent if they are composed of the same variables and have the
same truth value for every possible assignment to those variables. More formally, two expressions x and y are equivalent
iff. x ≡ y can be proved a tautology.
Various subtleties emerge when trying to prove two expressions are logically equivalent, but for our purposes
it suffices to adopt a brute-force approach by a) enumerating the values each variable can take, then b) checking
whether or not the expressions produce identical truth values in all cases. Note that, in practise, this can clearly
become difficult wrt. amount of work required: with n variables there will be 2n possible assignments, which
grows (very) quickly as n grows.

git # 8b6da880 @ 2023-09-27 18


© Daniel Page ⟨dan@phoo.org⟩

1.1.2 Quantifiers
Definition 1.8. A free variable in a given expression is one which has not yet been assigned a value. Roughly speaking,
a quantifier allows a free variable to take one of many values:
• the universal quantifier “for all x, y is true” is denoted ∀ x [y], while
• the existential quantifier “there exists an x such that y is true” is denoted ∃ x [y].
We say that binding a quantifier to a variable quantifies it; after it has been quantified we say it is bound (rather than
free).
As an aside, quantifiers can be roughly viewed as moving from propositional logic into predicate (or first-order)
logic (with second-order logic then a further extension, e.g., allowing quantification of relations). Put more
simply, however, when we encounter an expression such as
∃ x [y]
we are essentially assigning x all possible values; to make the expression true, just one of these values needs to
make the expression y true. Likewise, when we encounter
∀ x [y]
we are again assigning x all possible values. This time however, to make the expression true, all of them need
to make the expression y true.
Example 1.10. Consider the following
“there exists an x such that x ≡ 0 (mod 2)”
which we can rewrite symbolically as
∃ x [x ≡ 0 (mod 2)].
In this case, x is bound by an ∃ quantifier; we are asserting that for some value of x, it is true that x ≡ 0 (mod 2).
Restating the same thing another way, if just one x means x ≡ 0 (mod 2) is true then the whole (quantified)
expression is true. Clearly x = 2 satisfies this condition, so the expression is true.
Example 1.11. Consider the following
“for all x, x ≡ 0 (mod 2)”
which we can rewrite symbolically as
∀ x [x ≡ 0 (mod 2)].
This is a more general assertion about x, demanding that for all x it is true that x ≡ 0 (mod 2). Taking the
opposite approach to the above, to conclude the whole (quantified) expression is false we need an x such that
x . 0 (mod 2). This is easy, because any odd value of x is good enough, so the expression is false.

1.2 Collections: sequences


Definition 1.9. A sequence is an ordered collection of elements, which can be of any (but normally homogeneous) type.
The size or length of a sequence, denoted |X| (or, alternativly, #X elsewhere), is the number of elements it contains.
The order of elements is important, with an index used to refer to each one: the i-th element of a sequence X is denoted
Xi , st. 0 ≤ i < |X| and X j =⊥ for j < 0 or j ≥ |X|.

1.2.1 Basic definition


Example 1.12. Consider a sequence of elements
A = ⟨0, 3, 1, 2⟩
which one can think of as like a list, read from left-to-right. In this case, we conclude, for example, that |A| = 4,
A0 = 0, A1 = 3, A2 = 1, and A3 = 2; A4 =⊥, because that element does not exist (i.e., the index 4 is too large, and
so deemed out-of-bounds).
Example 1.13. Each element in the sequence A is a number, but we might equally define a sequence of characters
such as
B = ⟨‘a’, ‘b’, ‘c’, ‘d’, ‘e’⟩.
However, since the order or elements is important if we define
C = ⟨2, 1, 3, 0⟩
then clearly A , C because, for example, A0 , C0 . Both A and B are sequences of homogeneous type: their
elements are all numbers and characters respectively.

git # 8b6da880 @ 2023-09-27 19


© Daniel Page ⟨dan@phoo.org⟩

1.2.2 Operations
The concatenate operator can be used to join two together sequences. Although most often used on the right-
hand side of an equality (or an assignment), it is also allowed on the left-hand side: in such a case, it performs
“deconcatination” by splitting apart a sequence.

Example 1.14. Imagine we start with two 4-element sequences

F = ⟨0, 1, 2, 3⟩
G = ⟨4, 5, 6, 7⟩

Their concatenation is denoted

H = F ∥ G = ⟨0, 1, 2, 3⟩ ∥ ⟨4, 5, 6, 7⟩ = ⟨0, 1, 2, 3, 4, 5, 6, 7⟩

noting that the result H is an 8-element sequence whose first (resp. last) four elements match F (resp. G).
Likewise, we might write
I ∥ J=H

where now the concatenation operator appears on the left-hand side: this works basically the same way but in
reverse, meaning
I ∥ J = H = ⟨0, 1, 2, 3, 4, 5, 6, 7⟩ = ⟨0, 1, 2, 3⟩ ∥ ⟨4, 5, 6, 7⟩

and so I = F = ⟨0, 1, 2, 3⟩ and J = G = ⟨4, 5, 6, 7⟩. Note that this approach demands the left- and right-hand sides
have the same length, so elements can be organised appropriately.

1.2.3 Advanced definition and short-hands


It can make sense to avoid enumerating a sequence completely, which is the approach used above: explicitly
including each element can become laborious, error prone, or simply inconvenient. The examples below show
various short-hands to address this problem:

1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.
For example, again consider the sequence

B = ⟨‘a’, ‘b’, ‘c’, ‘d’, ‘e’⟩

which we could rewrite as


B = ⟨‘a’, ‘b’, . . . , ‘e’⟩.

with the ellipsis representing the sub-sequence ⟨‘c’, ‘d’⟩. In fact, this approach is sometimes required. In B
there was a well defined start and end to the sequence, but in

E = ⟨1, 2, 3, 4, . . .⟩.

the ellipsis represents elements we either do not know, or which do not matter: because there is no end
to the sequence, we cannot necessarily fill in the ellipsis as before. Note that this also means |E| might be
infinite or simply unknown.

2. It can be convenient to apply similar reasoning to the indices used to specify elements. For example,

B0,1,...,3 = B0,1,2,3
= B0 ∥ B1 ∥ B2 ∥ B3
= ⟨‘a’, ‘b’, ‘c’, ‘d’⟩

3. The so-called comprehension (or builder) notation allows generation of a sequence using a rule. Consider

F = ⟨x | 4 ≤ x < 8⟩ = ⟨4, 5, 6, 7⟩

for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate, i.e.,
4 ≤ x < 8) that limits the instances of variables considered when forming the output. Informally, you
might read this example as “all x such that x is between 4 and 7”.

git # 8b6da880 @ 2023-09-27 20


© Daniel Page ⟨dan@phoo.org⟩

1.3 Collections: sets


Definition 1.10. A set is an unordered collection of elements; the elements may only occur once (otherwise we have a
bag or multi-set), and can normally be of any (but homogeneous) type.
The size or cardinality of a set, denoted |X| (or, alternativly, #X elsewhere), is the number of elements it contains. If
the element x is in (resp. not in) the set X, we say x is a member of X (resp. not a member) or write x ∈ X (resp. x < X).

As an aside, this suggests the elements can potentially be other sets. Russell’s paradox, a discovery by Bertrand
Russell in 1901, describes an issue with formal set theory that stems from this fact. In a sense, the paradox is a
rephrasing of the liar paradox seen earlier. Consider A, the set of all sets which do not contain themselves: the
question is, does A contain itself? If it does, it should not be in A by definition but it is; if it does not, it should
be in the set A by definition but it is not.

1.3.1 Basic definition


Example 1.15. Consider the set of integers between two and eight (inclusive), which we can define as

A = {2, 3, 4, 5, 6, 7, 8}.

In this case, we conclude, for example, that |A| = 7, 2 ∈ A, and 9 < A (i.e., 2 is a member, but 9 is not a member,
of A). Notice that, unlike a sequence, because the order of elements is irrelevant, it makes no sense to refer to
them via an index: Ai implies there is some specific i-th element, but, without a specific order, which element it
refers to is unclear. However, also note the same fact means if we define

B = {8, 7, 6, 5, 4, 3, 2}

then we can conclude A = B.

1.3.2 Operations
Definition 1.11. A sub-set, say Y, of a set X is such that for every y ∈ Y we have that y ∈ X. This is denoted Y ⊆ X.
Conversely, we can say X is a super-set of Y and write X ⊇ Y.
From this definition, it follows that every set is a valid sub-set and super-set of itself and, therefore, that X = Y iff.
X ⊆ Y and Y ⊆ X. If X , Y we use the terms proper sub-set and proper super-set, and so write Y ⊂ X and X ⊃ Y
respectively.

Definition 1.12. For sets X and Y, we have that

• the union of X and Y is X ∪ Y = {x | x ∈ X ∨ x ∈ Y},

• the intersection of X and Y is X ∩ Y = {x | x ∈ X ∧ x ∈ Y},

• the difference of X and Y is X − Y = {x | x ∈ X ∧ x < Y}, and

• the complement of X is X = {x | x ∈ U ∧ x < X}.

We say X and Y are disjoint (or mutually exclusive) if X ∩ Y = ∅. Note also that the complement operation can be
rewritten X − Y = X ∩ Y.

Definition 1.13. The union and intersection operations preserve a law of cardinality called the principle of inclusion,
which states
|A ∪ B| = |A| + |B| − |A ∩ B|.
This property has a simple intuition, in that elements in both A and B will be counted twice by |A| and |B|; this is corrected
via the last term (i.e., via |A ∩ B|).

Definition 1.14. The power set of a set X, denoted P(X), is the set of every possible sub-set of X. Note that ∅ is a member
of all power sets.

On first reading, these definitions can seem quite abstract. However, we have another tool at our disposal
which describes what they mean in a more concrete, visual way. This tool is a Venn diagram, named after
mathematician John Venn who invented the concept in 1881. The idea is that sets are represented by regions
drawn inside a frame that implicitly represents the universal set U. By placing the regions inside each other
and overlapping their boundaries, we can describe most set-related concepts very easily.

git # 8b6da880 @ 2023-09-27 21


© Daniel Page ⟨dan@phoo.org⟩

A B A B

(a) A ∪ B. (b) A ∩ B.

A B A B

(c) A − B. (d) A.

Figure 1.1: A collection of Venn diagrams for standard set operations.

7 9

1 5
3
A B
4
2 6

8 10

Figure 1.2: An example Venn diagram showing membership of two sets.

git # 8b6da880 @ 2023-09-27 22


© Daniel Page ⟨dan@phoo.org⟩

Example 1.16. Figure 1.1 includes four Venn diagrams which describe the union, intersection, difference, and
complement operations: there is a shaded region representing members of each resulting set. For example, in
the diagram for A ∪ B the shaded region covers all of the sets A and B: the result contains all elements in either
A or B or both.

Example 1.17. Consider the sets


A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
where the universal set is
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
Recalling that elements within a given region are members of that set, Figure 1.2 describes several cases. Notice
that

1. the union of A and B is A ∪ B = {1, 2, 3, 4, 5, 6}, i.e., elements which are either members of A or B or both;
note that the elements 3 and 4 do not appear twice because said result is a set,

2. the intersection of A and B is A ∩ B = {3, 4}, i.e., elements that are members of both A and B,

3. the difference between A and B is A − B = {1, 2}, i.e., elements that are members of A but not also members
of B, and

4. the complement of A is A = {5, 6, 7, 8, 9, 10}, i.e., elements that are not members of A.

We can also use this example to verify that the principle of inclusion holds: given |A| = 4 and |B| = 4, checking
the above shows |A ∪ B| = 6 and |A ∩ B| = 2 so by the principle of inclusion we have 6 = 4 + 4 − 2.

1.3.3 Products
Definition 1.15. The Cartesian product (or cross product) of n sets, say X0 , X1 , . . . , Xn−1 , is defined as

X0 × X1 × · · · × Xn−1 = {⟨x0 , x1 , . . . , xn−1 ⟩ | x0 ∈ X0 ∧ x1 ∈ X1 ∧ · · · ∧ xn−1 ∈ Xn−1 }.

In the most simple case of n = 2, the Cartesian product X0 × X1 is the set of all possible pairs where the first item in the
pair is a member of X0 and the second item is a member of X1 .

Definition 1.16. The Cartesian product of a set X with itself n times is denoted Xn ; for completeness, we define X0 = ∅
and X1 = X. A special-case of this notation is X∗ , which applies the Kleene star operator: this captures the Cartesian
product of X with itself a finite number of times (i.e., zero or more): a more precise definition is therefore

X∗ = {⟨x0 , x1 , . . . , xn−1 ⟩ | n ≥ 0, xi ∈ X},

which is sometimes extended to include a so-called Kleen plus st.

X+ = {⟨x0 , x1 , . . . , xn−1 ⟩ | n ≥ 1, xi ∈ X}.

Example 1.18. Imagine we have the set A = {0, 1}. The Cartesian product of A with itself is

A × A = A2 = {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}.

That is, the pairs in A × A (or A2 , if you prefer) represent all possible sequences a) whose length is two, and b)
whose elements are members of A.

1.3.4 Advanced definition and short-hands


Definition 1.17. Some sets are hard (or impossible) to define using the notation used so far, and therefore need some
special treatment:

• The set ∅, called the null set or empty set, contains no elements: it is empty, meaning |∅| = 0. Note that ∅ is a set
not an element: one cannot write the empty set as {∅} since this is the set with one element, that element being the
empty set itself.

• The contents of the set U, called the universal set, depends on the context. Roughly speaking, it contains every
element from the problem being considered.

git # 8b6da880 @ 2023-09-27 23


© Daniel Page ⟨dan@phoo.org⟩

It can make sense to avoid enumerating a set completely, which is the approach used above: explicitly including
each element can become laborious, error prone, or simply inconvenient. The examples below show various
short-hands to address this problem:

1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.
For example, we might rewrite the set A as

A = {2, 3, . . . , 7, 8}

with the ellipsis representing the sub-set {4, 5, 6}. In fact, this approach is sometimes required. Imagine we
want to define a set of even integers which are greater than or equal to two: this set has an infinite size,
so we need to defined it as
C = {2, 4, 6, 8, . . .}.

2. The so-called comprehension (or builder) notation allows generation of a set using a rule. Consider

D = {x | f (x)}.

for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate,
i.e., f (x)) that limits the instances of variables considered when forming the output. Informally, you
might read this example as “all x such that f (x) = true”. Using the same idea, we could rewrite previous
examples as
A = {x | 2 ≤ x ≤ 8},

and
C = {x | x > 0 ∧ x ≡ 0 (mod 2)}

and so define the same sets we defined explicitly.

Definition 1.18. Several useful sets that relate to numbers can be defined:

• The integers are whole numbers which can be positive or negative and also include zero; this set is denoted by

Z = {. . . , −3, −2, −1, 0, +1, +2, +3, . . .}

or alternatively
Z = {0, ±1, ±2, ±3, . . .}.

• The natural numbers are whole numbers which are positive; they are denoted by the set

N = {0, 1, 2, 3, . . .}.

and represent a sub-set of Z.

• The binary numbers are simply one and zero, i.e.,

B = {0, 1},

and represent a sub-set of N.

• The rational numbers are those which can be expressed in the form x/y, where x and y are both integers and
termed the numerator and denominator. This set is denoted

Q = {x/y | x ∈ Z ∧ x ∈ Z ∧ y , 0}

where we disallow a value of y = 0 to avoid problems. Clearly the set of rational numbers is a super-set of Z, N,
and B, since, for example, we can write x/1 to convert any x ∈ Z as a member of Q. However, not all numbers are
rational: some are irrational in the sense that it is impossible to find a x and y such that they exactly represent the
required result. Examples include the value of π which is approximated by, but not exactly equal to, 22/7.

git # 8b6da880 @ 2023-09-27 24


© Daniel Page ⟨dan@phoo.org⟩

1.4 Collections: some additional special-cases


1.4.1 Tuples
Definition 1.19. It is common to use the term tuple as a synonym for sequence: a sequence of n elements is an n-tuple,
or simply a tuple if the number of elements is irrelevant. Note that the special cases

n=2 { 2-tuple { pair


n=3 { 3-tuple { triple

have intuitive names.

Note that, from here on, we use the terms sequence and tuple as an informal way to distinguish between cases
where elements are, respectively, a) of (potentially) homogeneous and heterogeneous type, and/or b) mutable
(i.e., can be altered) and immutable (i.e., cannot be altered).

Example 1.19. Noting the bracketing style used to differentiate it from a sequence, we can define an example
2-tuple or pair as
A = (4, ‘f’)

In this case, the elements A0 = 4 and A1 = ‘f’ clearly have different types: the first is a number and one which
is a character.

1.4.2 Strings
Definition 1.20. An alphabet is a non-empty set of symbols.

Definition 1.21. A string X wrt. some alphabet Σ is a sequence, of finite length, whose elements are members of Σ, i.e.,

X = ⟨X0 , X1 , . . . , Xn−1 ⟩

for some n st. Xi ∈ Σ for 0 ≤ i < n; if n is zero, we term X the empty string and denote it ϵ. It can be useful, and
is common to write elements in in human-readable form termed a string literal: this basically just means writing them
from right-to-left without any associated notation (e.g., brackets or commas).

Definition 1.22. A language is a set of strings.

Example 1.20. If
Σ = {0, 1}

then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:

⟨0, 0⟩ ≡ 00
⟨1, 0⟩ ≡ 01
⟨0, 1⟩ ≡ 10
⟨1, 1⟩ ≡ 11

Example 1.21. If
Σ = {‘a’, ‘b’, . . . , ‘z’}

then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:

⟨‘a’, ‘a’⟩ ≡ aa
⟨‘b’, ‘a’⟩ ≡ ab
..
.
⟨‘a’, ‘b’⟩ ≡ ba
⟨‘b’, ‘b’⟩ ≡ bb
..
.
⟨‘z’, ‘z’⟩ ≡ zz

git # 8b6da880 @ 2023-09-27 25


© Daniel Page ⟨dan@phoo.org⟩

1.5 Functions
Definition 1.23. If X and Y are sets, a function f from X to Y is a process that maps each element of X to an element of
Y. We write this as
f :X→Y

where X is termed the domain of f and Y is the codomain of f . For an element x ∈ X, which we term the pre-image,
there is only one y = f (x) ∈ Y which is termed the image of x. Finally, the set

{y | y = f (x) ∧ x ∈ X ∧ y ∈ Y}

which is all possible results, is termed the range of f and is always a sub-set of the codomain.

From this definition it might seem as though we can only have functions with one input and one output.
However, we are perfectly entitled to use sets of sets; this means we can use a Cartesian product as the domain.
For example, we can define a function
f :A×A→B

which takes elements from the Cartesian product A × A as input, and produces an element of B as output. So
since the inputs are of the form ⟨x, y⟩ ∈ A × A, f takes two input values “packaged up” as a single pair.

Example 1.22. Consider a function Inv which takes an integer x as input, and produces the rational number
1/x as output:



 Z → Q
Inv : 


 x 7→ 1/x

Note that here we write the function signature, which defines the domain and codomain of Inv, inline with
the definition of the function behaviour. This is simply a short-hand for writing the function signature

Inv : Z → Q

and function behaviour


Inv(x) 7→ 1/x

separately. In either case the domain of Inv is Z, because it accepts an integer as input; the codomain is Q,
because it produces a rational number as output. If we take an integer and apply the function to get something
like Inv(2) = 1/2, we have that 1/2 is the image of 2 or conversely 2 is the pre-image of 1/2 under Inv.

Example 1.23. Consider the function




 Z×Z → Z



Max : 

if x > y
(
 x
 ⟨x, y⟩ 7→


y otherwise

This is the maximum function on integers; it takes two integers as input and produces an integer, the maximum
of the inputs, as output. So if we take the pair of integers ⟨2, 4⟩ say, and then apply the function, we get
Max(2, 4) = 4. In this case, the domain of Max is Z × Z and the codomain is Z; the integer 4 is the image of the
pair ⟨2, 4⟩ under Max.

1.5.1 Composition
Definition 1.24. Given two functions f : X → Y and g : Y → Z, the composition of f and g is denoted

g ◦ f : X → Z.

The notation g ◦ f should be read as “apply g to the result of applying f ”. That is, given some input x ∈ X, this
composition is equivalent to applying y = f (x) and then z = g(y) to get the result z ∈ Z. More formally, we have

(g ◦ f )(x) = g( f (x)).

git # 8b6da880 @ 2023-09-27 26


© Daniel Page ⟨dan@phoo.org⟩

1.5.2 Properties
Definition 1.25. For a given function f , we say that f is

• surjective if the range equals the codomain, i.e., there are no elements in the codomain which do not have a
pre-image in the domain,

• injective if no two elements in the domain have the same image in the range, and

• bijective if the function is both surjective and injective, i.e., every element in the domain is mapped to exactly one
element in the codomain.

Using the examples above, we clearly have that Inv is not surjective but Max is. This follows because we
can construct a rational 2/3 which does not have an integer pre-image under Inv so the function cannot be
surjective. Equally, for any integer x in the range of Max there is always a pair ⟨x, y⟩ in the domain such that
x > y so Max is surjective, in fact there are lots of them since Z is infinite in size! In the same way, we have that
Inv is injective but Max is not. Only one pre-image x maps to the value 1/x in the range under Inv but there
are multiple pairs ⟨x, y⟩ which map to the same image under Max, for example 4 is the image of both ⟨1, 4⟩ and
⟨2, 4⟩ under Max.

Definition 1.26. The identity function I on a set X is defined by





 X → X
I:


 x

7→ x

so that it maps all elements to themselves. Given two functions f and g defined by f : X → Y and g : Y → X, if g ◦ f is
the identity function on set X and f ◦ g is the identity on set Y, then f is the inverse of g and g is the inverse of f . We
denote this by f = g−1 and g = f −1 . If a function f has an inverse, we hence have f −1 ◦ f = I.

The inverse of a function maps elements from the codomain back into the domain, reversing the original
function. It is easy to see that not all functions have an inverse. In particular, if a function is not injective there
will be more than one potential pre-image for the inverse of any image; this suggests we cannot sensibly map
from the codomain back into the domain. The Inv function is another, more concrete example: some value
such as 1/x do have an inverse, namely x/1, yet others such as 2/3 do not. For example, 3/2 is not an integer,
i.e., not a member of the domain Z, so we cannot map it from the codomain back into the domain. Put another
way, Inv, as we have defined it at least, has no inverse.

Example 1.24. Consider the successor function on integers





 Z → Z
Succ : 

x+1

 x →

7

which takes an integer x as input and produces the successor (or next) integer x + 1 as output. This function is
bijective, since the codomain and range are the same and no two integers have the same successor. As a result,
the inverse is easy to describe as



 Z → Z
Pred : 


 x 7→ x − 1

which is the predecessor function: it takes an integer x as input and produces x − 1 as output. To see that
Succ−1 = Pred and Succ−1 = Pred note that

(Pred ◦ Succ)(x) = (x + 1) − 1 = x

which is the identity function, and conversely that

(Succ ◦ Pred)(x) = (x − 1) + 1 = x

which is also the identity function.

git # 8b6da880 @ 2023-09-27 27


© Daniel Page ⟨dan@phoo.org⟩

1.5.3 Relations
Definition 1.27. Informally, a binary relation f on a set X is like a propositional function which takes members of the
set as input and “filters” them to produce an output. As a result, for a set X the relation f forms a sub-set of X × X. For
a given set X and a binary relation f , we say f is

• reflexive if f (x, x) = true for all x ∈ X,

• symmetric if f (x, y) = true implies f (y, x) = true for all x, y ∈ X, and

• transitive if f (x, y) = true and f (y, z) = true implies f (x, z) = true for all x, y, z ∈ X.

If f is reflexive, symmetric and transitive, then we call it an equivalence relation.

Example 1.25. Consider a set A = {1, 2, 3, 4}, whose Cartesian product is


 


 ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, 


 ⟨2, 1⟩, ⟨2, 2⟩, ⟨2, 3⟩, ⟨2, 4⟩, 
A×A= .

 



 ⟨3, 1⟩, ⟨3, 2⟩, ⟨3, 3⟩, ⟨3, 4⟩, 



⟨4, 1⟩, ⟨4, 2⟩, ⟨4, 3⟩, ⟨4, 4⟩

 

Imagine we define a function




 Z × Z → {false, true}



Equ : 

if x = y
(
 true
 ⟨x, y⟩ 7→


false otherwise

which tests whether two inputs are equal. Using the function we can form a sub-set of A × A called AEqu , for
example, by “filtering out” the pairs (x, y) Equ(x, y) = true to get

AEqu = {⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨4, 4⟩}.

For members of A, say x, y, z ∈ A,

1. Equ(x, x) = true, so the relation is reflexive,

2. if Equ(x, y) = true then Equ(y, x) = true, so the relation is symmetric, and

3. if Equ(x, y) = true and Equ(y, z) = true then Equ(x, z) = true, so the relation is transitive

and hence an equivalence relation. Now imagine we define another function




 Z × Z → {false, true}



Lth : 

if x < y
(
 true
 ⟨x, y⟩ 7→


false otherwise

which tests whether one input is less than another. Taking the same approach as above, we can form

ALth = {⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨2, 3⟩, ⟨2, 4⟩, ⟨3, 4⟩}

of all pairs (x, y) with x, y ∈ A st. Lth(x, y) = true. Now, for members of A, say x, y, z ∈ A,

1. Lth(x, x) = false, so the relation is not reflexive (it is irreflexive),

2. if Lth(x, y) = true then Lth(y, x) = false, so the relation is not symmetric (it is anti-symmetric), but

3. if Lth(x, y) = true and Lth(y, z) = true then Lth(x, z) = true, so the relation is transitive.

git # 8b6da880 @ 2023-09-27 28


© Daniel Page ⟨dan@phoo.org⟩

1.6 Boolean algebra


Most people encounter elementary algebra fairly early on at school. Even if the name is unfamiliar, the basic
idea should be: one has

• a set of values, e.g., Z,

• a set of operators, e.g., +,

• a set of relations, e.g., =, and

• a set of axioms which dictate what the operators and relations mean and how they work.

Again, you may not know what these axioms are called, but you probably do know how they work. For
example, given x, y, z ∈ Z, you might know a) we can write x + (y + z) = (x + y) + z, i.e., say that addition is
associative, or b) we can write x · 1 = x, i.e., say that the multiplicative identity of x is 1. In reality, we can be
much more general than this: when we discuss “an” algebra, all we really mean is a set of values for which
there is a well defined set of operators, relations and axioms; abstract algebra is basically concerned with sets
of values that are, potentially, not numbers.

Definition 1.28. An abstract algebra includes

• a set of values, say X,

• a set of binary operators


⊙ : X × X → X,

• a set of unary operators


⊘ : X → X,

• a set of binary relations


⊖ : X × X → {false, true},
and

• a set of axioms which dictate what the operators and relations mean and how they work.

In the early 1840s, mathematician George Boole put this generality to good use by combining (or, in fact,
unifying) concepts in logic and set theory: the result forms Boolean algebra [1]. Put (rather too) simply, Boole
saw that working with logic a expression is much the same as working with an arithmetic expression, and
reasoned that the axioms of the latter should apply to the former as well. Based on what we already know, for
example, 0 and false and ∅ are all sort of equivalent, as are 1 and true and U; likewise, x ∧ y and x ∩ y are sort
of equivalent, as are x ∨ y and x ∪ y and ¬x and x. More formally, we can see that the identity axiom applies in
same way:
x ∨ false = x x ∧ true = x
x∪∅ = x x∩U = x
x+0 = x x·1 = x
Ironically, this was viewed as somewhat obscure; Boole himself did not necessarily regard logic directly as a
mathematical concept. It was not until 1937 that Claude Shannon, then a student of Electrical Engineering and
Mathematics, saw the potential of using Boolean algebra to represent and manipulate digital information [7].
This insight is fundamentally important, essentially allowing a “link” between theory (i.e., Mathematics) and
practice (i.e., physical circuits that we can build).

Definition 1.29. Putting everything together produces the following definition for Boolean algebra. Consider the set
B = {0, 1} on which there are two binary operators


 B×B → B




x = 0 and y=0
 

  0 if
∧:
 
x = 0 and y=1

 
 0 if
⟨x, y⟩ 7 →
 
x = 1 and y=0

0 if

 


 
x = 1 and y=1
 
1 if
 

git # 8b6da880 @ 2023-09-27 29


© Daniel Page ⟨dan@phoo.org⟩

and 

 B×B → B




x = 0 and y=0
 

  0 if
∨:
 
x = 0 and y=1

 
 1 if
⟨x, y⟩ 7 →
 
x = 1 and y=0

1 if

 


 
x = 1 and y=1
 
1 if
 

and a unary operator




 B → B



¬:

if x = 0
(
 1
 x 7→

if x = 1

0

AND, OR and NOT respectively; they are governed the following axioms

commutativity x∧y ≡ y∧x


association (x ∧ y) ∧ z ≡ x ∧ (y ∧ z)
distribution x ∧ (y ∨ z) ≡ (x ∧ y) ∨ (x ∧ z)
identity x∧1 ≡ x
null x∧0 ≡ 0
idempotency x∧x ≡ x
inverse x ∧ ¬x ≡ 0
absorption x ∧ (x ∨ y) ≡ x
de Morgan ¬(x ∧ y) ≡ ¬x ∨ ¬y

commutativity x∨y ≡ y∨x


association (x ∨ y) ∨ z ≡ x ∨ (y ∨ z)
distribution x ∨ (y ∧ z) ≡ (x ∨ y) ∧ (x ∨ z)
identity x∨0 ≡ x
null x∨1 ≡ 1
idempotency x∨x ≡ x
inverse x ∨ ¬x ≡ 1
absorption x ∨ (x ∧ y) ≡ x
de Morgan ¬(x ∨ y) ≡ ¬x ∧ ¬y

equivalence x≡y ≡ (x ⇒ y) ∧ (y ⇒ x)
implication x⇒y ≡ ¬x ∨ y
involution ¬¬x ≡ x

Note that the ∧ and ∨ operations in Boolean algebra behave in a similar way to · and + in a elementary algebra: as such,
they are sometimes referred to as “product” and “sum” operations (and denoted · and + as a result).
Definition 1.30. In line with propositional logic, it is common to add a third binary operator called XOR:


 B×B → B




x = 0 and y = 0
 

  0 if
⊕:
 
x = 0 and y = 1

 
 1 if
⟨x, y⟩ 7 →
 
x = 1 and y = 0

1 if

 


 
x = 1 and y = 1
 
0 if
 

More generally, XOR is an example of a derived operator, a name which hints at the fact it is a short-hand derived from
operators we already have. Put another way, because

x ⊕ y ≡ (¬x ∧ y) ∨ (x ∧ ¬y),

XOR can be defined in terms of AND, OR and NOT. Two other examples, which will be useful later, are

• “NOT-AND” or NAND, which is denoted and defined as

x ∧ y ≡ ¬(x ∧ y),

and

git # 8b6da880 @ 2023-09-27 30


© Daniel Page ⟨dan@phoo.org⟩

• “NOT-OR” or NOR, which is denoted and defined as

x ∨ y ≡ ¬(x ∨ y).

Definition 1.31. A functionally complete (or universal) set of Boolean operators is st. every possible truth table can
be described by combining the constituent members into a Boolean expression. For example, the sets {¬, ∧} and {¬, ∨} are
functionally complete.

In 1921, Emil Post developed [5] a set of necessary and sufficient conditions for such a description to be valid
(i.e., a method to prove whether a given set is or is not functionally complete); where such a set is singleton,
i.e., contains one operator only, that operator is termed a Sheffer function [8] (after Henry Sheffer, who, during
1912, independently rediscovered work of 1880 by Charles Sanders Peirce). For example, the singleton sets
{ ∧ } and { ∨ } are functionally complete, meaning NAND and NOR can both be described as Sheffer functions.

Definition 1.32. Certain operators (and hence axioms) are termed monotone: this means changing an operand either
leaves the result unchanged, or that it always changes the same way as the operand. Conversely, other operators are
termed non-monotone when these conditions do not hold.

Example 1.26. We can describe


x∧0
as monotone, because changing x does not change the result (which is always 0); the same argument applies to

x ∧ 1.

In this case, notice that if x = 0 then the result is 0 whereas if x = 1 then the result is 1: this suggests changing x
from 0 to 1 (resp. from 1 to 0) changes the result in the same way.

Definition 1.33. The fact there are AND and OR forms of most axioms hints at a more general underlying principle.
Consider a Boolean expression e: the principle of duality states that the dual expression eD is formed by

1. leaving each variable as is,

2. swapping each ∧ with ∨ and vice versa, and

3. swapping each 0 with 1 and vice versa.

Of course e and eD are different expressions, and clearly not equivalent; if we start with some e ≡ f however, then we do
still get eD ≡ f D .
As an example, consider axioms for

1. distribution, e.g., if
e = x ∧ (y ∨ z) ≡ (x ∧ y) ∨ (x ∧ z)
then
eD = x ∨ (y ∧ z) ≡ (x ∨ y) ∧ (x ∨ z)
and

2. identity, e.g., if
e=x∧1≡x
then
eD = x ∨ 0 ≡ x.

Definition 1.34. The de Morgan axiom can be turned into a more general principle. Consider a Boolean expression e: the
principle of complements states that the complement expression ¬e is formed by

1. swapping each variable x with the complement ¬x,

2. swapping each ∧ with ∨ and vice versa, and

3. swapping each 0 with 1 and vice versa.

git # 8b6da880 @ 2023-09-27 31


© Daniel Page ⟨dan@phoo.org⟩

As an example, consider that if


e = x ∧ y ∧ z,
then by the above we should find
f = ¬e = (¬x) ∨ (¬y) ∨ (¬z).
Proof:
x y z ¬x ¬y ¬z e f
0 0 0 1 1 1 0 1
0 0 1 1 1 0 0 1
0 1 0 1 0 1 0 1
0 1 1 1 0 0 0 1
1 0 0 0 1 1 0 1
1 0 1 0 1 0 0 1
1 1 0 0 0 1 0 1
1 1 1 0 0 0 1 0

1.6.1 Manipulation
Saying we have manipulated an expression just means we have transformed it from one form to another; when
done correctly, this should imply the original and alternative, transformed forms are equivalent. Often this is
presented as a derivation, or sequence of steps which relate to an axiom or assumption (so is assumed valid
by definition).

Example 1.27. Consider the (supposed) equality

(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c) = b,

for example, which we can prove is valid via the derivation

(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c)
= (a ∧ b ∧ c) ∨ (a ∧ b ∧ ¬c) ∨ (¬a ∧ b) (commutativity)
= (a ∧ b) ∧ (c ∨ ¬c) ∨ (¬a ∧ b) (distribution)
= (a ∧ b) ∧1 ∨ (¬a ∧ b) (inverse)
= (a ∧ b) ∨ (¬a ∧ b) (identity)
= b ∧ (a ∨ ¬a) (distribution)
= b ∧1 (inverse)
= b (identity)

Of course we might employ a brute-force approach instead. If we write a truth table for the left- and right-hand
sides, this allows us to compare them: if the outputs match in all rows, we can conclude the left- and right-hand
sides are equivalent. For example,

a b c t0 = a ∧ b ∧ c t1 = ¬a ∧ b t2 = a ∧ b ∧ ¬c t0 ∨ t1 ∨ t2
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 1 0 1
0 1 1 0 1 0 1
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 0 1 1
1 1 1 1 0 0 1

shows the left- and right-hand sides are equivalent, as expected. Of course if there were more variables, we
would need to enumerate all possible values of each one. Our truth table would grow, and, at some point,
the derivation-type approach starts to become more attractive: we achieve the same outcome, but without
brute-force enumeration.

Example 1.28. Another motivation for manipulating a given expression is to produce an alternative with some
goal or metric in mind; a common metric to use is the number of operators each expression uses, i.e., how
simple they are st. with the task then termed simplification, which is one way to judge their evaluation cost.
Consider the exclusive-or operator, i.e., an expression x ⊕ y, which we can write as the more complicated
expression
(y ∧ ¬x) ∨ (x ∧ ¬y)

git # 8b6da880 @ 2023-09-27 32


© Daniel Page ⟨dan@phoo.org⟩

An aside: how many n-input Boolean functions are there?

To be more concrete, imagine we are interested in the function

f : Bn → B.

Note that each of the n inputs can obviously be assigned one of two values, namely 0 or 1, so there are 2n
possible assignments to n inputs. For example, if f were to have n = 1 input, say x, there would be 21 = 2
possible assignments because x can either be 0 or 1. In the same way, for n = 2 inputs, say x and y, there are
22 = 4 possible assignments: we can have

x=0 y=0
x=0 y=1
x=1 y=0
x=1 y=1

This is why a truth table for n inputs will have 2n rows: each row details one assignment to the inputs, and the
associated output.
So, how many functions are there? A function with n-inputs is specified by a truth table with 2n rows; each
row includes an output that is assigned 0 or 1, depending on exactly which function the truth table describes.
So to count how many functions there are, we can just count how many possible assignments there are to the
n
2n outputs. The correct answer is 22 .

or the less complicated (i.e., simpler) expression

(x ∨ y) ∧ ¬(x ∧ y).

One can prove these are equivalent by writing truth tables for them, as we did above. To do so, however, we
need the expressions in the first place: how did we get the alternative from the original one?
The answer is we start with one expression, and (somehow intelligently) apply axioms to move step-by-
step toward the other. For example, to do so more easily, notice that we can manipulate each term in the first
expression whose form is p ∧ ¬q as follows:

(p ∧ ¬q)
= (p ∧ ¬q) ∨ 0 (identity)
= (p ∧ ¬q) ∨ (p ∧ ¬p) (inverse)
= p ∧ (¬p ∨ ¬q) (distribution)

This introduces a new rule that we can make use of; since it was derived from axioms we assume are valid, we
can assume it is valid as well. Using it, we can rewrite the original expression as

(y ∧ ¬x) ∨ (x ∧ ¬y)
= (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y)) (p ∧ ¬q rule above)
= (x ∨ y) ∧ (¬x ∨ ¬y) (distribution)
= (x ∨ y) ∧ ¬(x ∧ y) (de Morgan)

which gives us the alternative we are looking for, noting it requires 4 operators rather than 5.

1.6.2 Functions

Definition 1.35. Given the definition of Boolean algebra, it is perhaps not surprising that a generic n-input, 1-output
Boolean function f can be described as
f : Bn → B.
It is possible to extend this definition so it caters for m-outputs; we write the function signature as

g : Bn → Bm .

git # 8b6da880 @ 2023-09-27 33


© Daniel Page ⟨dan@phoo.org⟩

An aside: an enumeration of 2-input Boolean functions.

n
We know there are 22 Boolean functions with n inputs; this represents a lot of functions as n grows. However,
n 2
for a small number of inputs, say n = 2, 22 = 22 = 24 = 16 functions is fairly manageable. In fact, we can easily
write them all down: if fi denotes the i-th such function, we find

x y f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15


0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
n
This hints that one way to see why 22 is correct is to view each column for fi as filled by i represented in binary.
n
How many (unsigned) integers can be represented in m bits? The answer is 2m , suggesting 22 (unsigned)
n
integers can represented in 2n bits and hence there are 22 functions.
Some of the functions should look familiar, and, either way, we can try to describe them in English language
terms vs. their truth table. Note, for example, that

• f0 is the constant 0 function (i.e., f0 (x, y) = 0, ignoring x and y),


• f1 is disjunction composed with complement (i.e., f1 (x, y) = ¬(x ∨ y)),

• f2 is inhibition (i.e., f2 (x, y) = y ∧ ¬x, which is like x < y),


• f3 is complement (i.e., f3 (x, y) = ¬x, ignoring y),
• f4 is inhibition (i.e., f4 (x, y) = x ∧ ¬y, which is like y < x),

• f5 is complement (i.e., f5 (x, y) = ¬y, ignoring x),


• f6 is non-equivalence (i.e., f5 (x, y) = x ⊕ y, which is like x , y),
• f7 is conjunction composed with complement (i.e., f7 (x, y) = ¬(x ∧ y)),
• f8 is conjunction (i.e., f8 (x, y) = x ∧ y),

• f9 is equivalence (i.e., f5 (x, y) = ¬(x ⊕ y), which is like x = y),


• f10 is identity (i.e., f10 (x, y) = y, ignoring x),
• f11 is implication (i.e., f11 (x, y) = y =⇒ x),

• f12 is identity (i.e., f12 (x, y) = x, ignoring y),


• f13 is implication (i.e., f13 (x, y) = x =⇒ y),
• f14 is disjunction (i.e., f14 (x, y) = x ∨ y), and
• f15 is the constant 1 function (i.e., f15 (x, y) = 1, ignoring x and y).

git # 8b6da880 @ 2023-09-27 34


© Daniel Page ⟨dan@phoo.org⟩

This can be thought of as m separate n-input, 1-output Boolean functions, i.e.,

g0 : Bn → B
g1 : Bn → B
..
.
gm−1 : Bn → B

where the output of g is described by

g(x) 7→ g0 (x) ∥ g1 (x) ∥ . . . ∥ gm−1 (x).

That is, the output of g is just the m individual 1-bit outputs gi (x) concatenated together. This is often termed a vectorial
Boolean function: the inputs and outputs are vectors (or sequences) over the set B rather than single elements of it.
Definition 1.36. A Boolean-valued function (or predicate function)

f : X → {0, 1}

is a function whose output is a Boolean value: note the contrast with a Boolean function, in so far as it places no restriction
on what the input (i.e., the set X) must be.
Example 1.29. Consider a 2-input, 1-output Boolean function, whose signature we can write as

f : B2 → B

st. for r, x, y ∈ B, the input is a pair ⟨x, y⟩ and the output for a given x and y is written r = f (x, y). The function
itself can be specified in two ways. First, as previously, we could enumerate all possible input combinations,
and specify corresponding outputs. This can be written equivalently in the form of an inline function behaviour,
or as a truth table:
x y f (x, y)
if x = 0 and y = 0

 0
 0 0 0
if x = 0 and y = 1

 1


f (x, y) 7→  ≡ 0 1 1
 1 if x = 1 and y = 0
1 0 1

if x = 1 and y = 1

 0

1 1 0
However, with a large number of inputs, this becomes difficult. As a short-hand, we can therefore specify f as
a Boolean expression instead, e.g.,
f : ⟨x, y⟩ 7→ (¬x ∧ y) ∨ (x ∧ ¬y).
This basically tells us how to compute outputs, rather than listing those outputs explicitly.
Example 1.30. Consider a 2-input, 2-output Boolean function
 2

 B → B2




x = 0 and y=0
 

  ⟨0, 0⟩ if
h:
 
x = 0 and y=1

 
 ⟨1, 0⟩ if
⟨x, y⟩ 7→
 
x = 1 and y=0

⟨1, 0⟩ if

 


 
x = 1 and y=1
 
⟨0, 1⟩ if
 

which we might write more compactly as the truth table

x y h(x, y)
0 0 ⟨0, 0⟩
0 1 ⟨1, 0⟩
1 0 ⟨1, 0⟩
1 1 ⟨0, 1⟩

Clearly we can decompose h into


 2

 B → B




x = 0 and y = 0
 

  0 if
h0 : 
 
x = 0 and y = 1

 
 1 if
⟨x, y⟩ 7→
 
x = 1 and y = 0

1 if

 


 
x = 1 and y = 1
 
0 if
 

git # 8b6da880 @ 2023-09-27 35


© Daniel Page ⟨dan@phoo.org⟩

and  2

 B → B




x = 0 and y = 0
 

  0 if
h1 : 
 
x = 0 and y = 1

 
 0 if
⟨x, y⟩ 7→
 
x = 1 and y = 0

0 if

 


 
x = 1 and y = 1
 
1 if
 

meaning that
h(x, y) ≡ h0 (x, y) ∥ h1 (x, y).

1.6.3 Normal (or standard) forms


Definition 1.37. Consider a Boolean expression:

1. When the expression is written as a sum (i.e., OR) of terms which each comprise the product (i.e., AND) of variables,
e.g.,
(a ∧ b ∧ c) ∨(d ∧ e ∧ f ),
| {z }
minterm
it is said to be in disjunctive normal form or Sum of Products (SoP) form; the terms are called the minterms.
Note that each variable can exist as-is or complemented using NOT, meaning

(¬a ∧ b ∧ c) ∨(d ∧ ¬e ∧ f ),
| {z }
minterm

is also a valid SoP expression.

2. When the expression is written as a product (i.e., AND) of terms which each comprise the sum (i.e., OR) of variables,
e.g.,
(a ∨ b ∨ c) ∧(d ∨ e ∨ f ),
| {z }
maxterm
it is said to be in conjunctive normal form or Product of Sums (PoS) form; the terms are called the maxterms.
As above each variable can exist as-is or complemented using NOT.

Example 1.31. Consider a 1-input, 1-output Boolean function




 B → B




x = 0 and y=0
 

  0 if
g:
 
x = 0 and y=1

 
 1 if
x 7 →
 
x = 1 and y=0

1 if

 


 
x = 1 and y=1
 
0 if
 

Writing this as a truth table, i.e.,


x y g(x, y)
0 0 0
0 1 1
1 0 1
1 1 0
the minterms are the second and third rows, while the maxterms are the first and fourth lines. An expression
for g in SoP form is
gSoP (x, y) = (¬x ∧ y) ∨ (x ∧ ¬y).
where terms ¬x ∧ y and x ∧ ¬y represent minterms of g: when the term is 1 or 0 the corresponding output is 1 or
0. It is usually crucial that all the variables appear in all the minterms so that the function is exactly described.
To see why this is so, consider writing an incorrect SoP expression by removing the reference to y from the first
minterm so as to get
(¬x) ∨ (x ∧ ¬y).

git # 8b6da880 @ 2023-09-27 36


© Daniel Page ⟨dan@phoo.org⟩

Now ¬x is 1 for the first and second rows, rather than the second (as was the case with ¬x ∧ y), so we have
described another function h , g described by

x y h(x, y)
0 0 1
0 1 1
1 0 1
1 1 0

In a similar way, we can construct a PoS expression for g as

gPoS (x, y) = (x ∨ y) ∧ (¬x ∨ ¬y).

where x ∨ y and ¬x ∨ ¬y are the maxterms of g. By manipulating the expressions, we can prove that gSoP and
gPoS are just two different ways to write the same function, i.e., g. Recall that for p and q

(p ∧ ¬q) = (p ∧ ¬q) ∨ 0 (identity)


= (p ∧ ¬q) ∨ (p ∧ ¬p) (inverse)
= p ∧ (¬p ∨ ¬q) (distribution)
Using this rule, we can show

gSoP (x, y) = (¬x ∧ y) ∨ (x ∧ ¬y)


= (y ∧ ¬x) ∨ (x ∧ ¬y) (commutativity)
= (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y)) (p ∧ ¬q rule above)
= (x ∨ y) ∧ (¬x ∨ ¬y) (distribution)
= gPoS

1.7 Signals
Definition 1.38. In general, a signal can be described as a descriptive function (abstractly), or a physical quantity
(concretely), that varies in time or space so as to represent and/or communicate (i.e., convey) information. We say that

• a discrete-time signal is valid for a discrete (so finite) range of time indices, e.g., t ∈ Z,
• a continuous-time signal is valid for a continuous (so infinite) range of time indices, e.g., t ∈ R,
• a discrete-value signal has a value from a discrete (so finite) range, e.g., f (t) ∈ Z, and
• a continuous-value signal has a value from a continuous (so infinite) range, e.g., f (t) ∈ R.

Definition 1.39. The term analogue signal is a synonym of continuous-value signal: a physical quantity that varies in
time is typically used to represent (that is, it is analagous to) some abstract variable.
Definition 1.40. Strictly speaking, digital signal is a synonym of discrete-value signal: it will have a digital (i.e.,
discrete or exact) value. This terminology is often overloaded, however, and taken to mean a signal whose value is either
0 or 1 (cf. logic signal).
The transition of a digital signal from 0 to 1 (resp. 1 to 0) is called a positive (resp. negative) edge; we often say it
has toggled from 0 to 1 (resp. 1 to 0). During any time the signal has a value of 1 (resp. 0), we say it is at a positive
(resp. negative) level (and use the term pulse as a synonym for positive level, i.e., the period between a positive and
negative edge).
Definition 1.41. It is common to describe a signal by plotting it as a waveform: the y-axis represents the value of the
signal as time varies over time as represented by the x-axis.
Note that it is common, though incorrect, to describe discrete-time signals by using a continuous plot; connecting
discrete points implies a formally incorrect description (i.e., it gives the impression of a continuous-time
signal). Doing so typically stems from either a) the fact said discrete-time signal is derived from an associated
continuous-time signal (e.g., the latter has been quantised wrt. time, by sampling it at discrete time indices),
or b) aesthetics, in the sense it is easier to see when printed.

1.8 Representations
God made the integers; all the rest is the work of man.

– Kronecker

git # 8b6da880 @ 2023-09-27 37


© Daniel Page ⟨dan@phoo.org⟩

1.8.1 Bits, bytes and words


Definition 1.42. Used as a term by Claude Shannon in 1948 [6] (but attributed to John Tukey), a bit is a binary digit.
As a result, a given bit is a member of the set ={0, 1}; it can be used to represent a truth value, i.e., false or true, and
hence a Boolean value within the context of Boolean algebra.

Definition 1.43. An n-bit bit-sequence (or binary sequence) is a member of the set Bn , i.e., it is an n-tuple of bits.
Much like other sequences, we use Xi to denote the i-th bit of a binary sequence X and |X| = n to denote the number of
bits in X.

Definition 1.44. Instead of writing out X ∈ Bn symbolically, i.e., writing ⟨X0 , X1 , . . . , Xn−1 ⟩, we sometimes prefer to list
the bits within a bit-literal (or bit-string, wrt. an implicit alphabet Σ = {0, 1}). For example, consider the following
bit-sequence
X = ⟨1, 1, 0, 1, 1, 1, 1⟩
st. |X| = 7, which can be written as the bit-literal

X = 1111011.

The question is however, what does a bit-sequence mean: what does it represent, other than just an (unstruc-
tured) sequence of bits? The answer is they can represent anything we decide they do; there is just one key
concept, namely

X̂ 7→ X
the representation

maps to

the value
of X

of X

That is, all we need is a) a representation and mapping specified concretely (i.e., written down, vs. reasoned
about abstractly), and b) a mapping that means the right thing wrt. values, plus is ideally consistent in both
directions (e.g., does not change based on the context, and is injective st. a single representation cannot be
interpreted ambiguously). Notice the (subtle) annotation on the left-hand side of the mapping: X̂ is intended to
highlight this is a representation of some X, whose value therefore depends on the mapping used. Put another
way, this suggests different mappings may legitimately map the same X̂ to different values by interpreting the
bit-sequence differently. This is, essentially, what means we can represent such a rich set of data (e.g., the pixels
in an image) using only a bit (or sequence thereof) as a starting point.

1.8.1.1 Properties
Definition 1.45. Following the idea of vectorial Boolean function, given an n-element bit-sequence X, and an m-element
bit-sequence Y we can clearly

1. overload ⊘ ∈ {¬}, i.e., write


R = ⊘X,
to mean
Ri = ⊘Xi
for 0 ≤ i < n,

2. overload ⊖ ∈ {∧, ∨, ⊕}, i.e., write


R = X ⊖ Y,
to mean
Ri = Xi ⊖ Yi
for 0 ≤ i < n = m, where if n , m, we pad either X or Y with 0 until the n = m.

Definition 1.46. Given two n-bit sequences X and Y, we can define some important properties named after Richard
Hamming, a researcher at Bell Labs:

git # 8b6da880 @ 2023-09-27 38


© Daniel Page ⟨dan@phoo.org⟩

An aside: the origins and impact of endianness.

The term endianness stems from a technical article [2], written in the 1980s by Danny Cohen, using Gulliver’s
Travels as an inspiration/analogy: an argument over whether cracking the big- or small-end of a soft-boiled
egg is proper in the former, inspired terminology wrt. arguments over byte ordering in the latter. It does a
brilliant job of surveying the significant impact of what is, at face value, a fairly trivial choice.

• The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of times Xi = 1. This
can be expressed as
n−1
X
HW(X) = Xi .
i=0

• The Hamming distance between X and Y is the number of bits in X that differ from the corresponding bit in Y,
i.e., the number of times Xi , Yi . This can be expressed as

n−1
X
HD(X, Y) = Xi ⊕ Yi .
i=0

Note that both quantities naturally generalise to non-binary sequences.

Example 1.32. For example, given A = ⟨1, 0, 0, 1⟩ and B = ⟨0, 1, 1, 1⟩ we find that

n−1
X
HW(A) = Ai = 1 + 0 + 0 + 1 = 2
i=0

and
n−1
X
HD(A, B) = Ai ⊕ Bi = (1 ⊕ 0) + (0 ⊕ 1) + (0 ⊕ 1) + (1 ⊕ 1) = 1 + 1 + 1 + 0 = 3
i=0

st. two bits in A equal 1, and three bits differ between A and B.

Definition 1.47. Imagine that X ∥ p appends a parity bit p to some n-bit sequence X. Doing so implements a form of
error detecting code: having defined

Par+ (X) =
Pi<n Li<n
i=0 Xi (mod 2) = X
i=0 i
Pi<n Li<n
Par (X) = ¬ ( i=0 Xi

(mod 2) ) = ¬( X )
i=0 i

we say that

• setting p = Par+ (X) implements an even-parity code, in the sense that X ∥ p will have an even number of i st.
Xi = 1, whereas

• setting p = Par− (X) implements an odd-parity code, in the sense that X ∥ p will have an odd number of i st.
Xi = 1.

If/when the type is irrelevant, we drop the super-script and simply write Par(X) instead.

Example 1.33. For example, given A = ⟨1, 0, 0, 1⟩ and B = ⟨0, 1, 1, 1⟩ we find that

Par+ (A)
Ln−1
= Ai = 1⊕0⊕0⊕1 = 0
Li=0
n−1
Par− (A) = ¬( Ai ) = ¬(1⊕0⊕0⊕1) = 1
Li=0
Par+ (B)
n−1
= Bi = 0⊕1⊕1⊕1 = 1
Li=0
n−1
Par− (B) = ¬( B ) =
i=0 i
¬(0⊕1⊕1⊕1) = 0

git # 8b6da880 @ 2023-09-27 39


© Daniel Page ⟨dan@phoo.org⟩

1.8.1.2 Ordering

There is, by design, no “structure” to a bit-literal. This can be problematic if, for example, we need a way
to make sure the order of bits in the bit-literal is clear wrt. the corresponding bit-sequence. The same issues
appear whenever describing a large(r) quantity in terms of small(er) parts, but, focusing on bits, we can describe
endianness as follows:
Definition 1.48. A given literal, say
X = 1111011,
can be interpreted in two ways:
1. A little-endian ordering is where we read bits in a literal from right-to-left, i.e.,
XLE = ⟨X0 , X1 , X2 , X3 , X4 , X5 , X6 ⟩ = ⟨1, 1, 0, 1, 1, 1, 1⟩,
where

• the Least-Significant Bit (LSB) is the right-most in the literal (i.e., X0 ), and
• the Most-Significant Bit (MSB) is the left-most in the literal (i.e., Xn−1 = X6 ).

2. A big-endian ordering is where we read bits in a literal from left-to-right, i.e.,


XBE = ⟨X6 , X5 , X4 , X3 , X2 , X1 , X0 ⟩ = ⟨1, 1, 1, 1, 0, 1, 1⟩,
where

• the Least-Significant Bit (LSB) is the left-most in the literal (i.e., Xn−1 = X6 ), and
• the Most-Significant Bit (MSB) is the right-most in the literal (i.e., X0 ).

Unless specified, from here on it is (fairly) safe to assume that a little-endian convention is used. Keep in mind
that having selected an endianness convention, which acts as a rule for conversion, there is no real distinction
between a bit-sequence and a bit-literal: we can convert between them in either little-endian or bit-endian
cases.

1.8.1.3 Grouping
Definition 1.49. Some bit-sequences are given special names depending on their length. Given a word size w (e.g., the
natural size as dictated by a given processor), we can defined
bit ≡ 1-bit
nybble ≡ 4-bit
byte ≡ 8-bit

half-word ≡ (w/2)-bit
word ≡ w-bit
double-word ≡ (w · 2)-bit
quad-word ≡ (w · 4)-bit
but note that standards in particular often use the term octet as a synonym for byte (st. an octet string is therefore a
byte-sequence): although less natural, we follow this terminology where it seems of value to match associated literature.
Example 1.34. Given a bit-sequence
B = ⟨1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0⟩
it can be attractive to group the bits into short(er) sub-sequences. For example, we could rewrite the sequence
as either
C = ⟨⟨1, 1, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨1, 0, 1, 0⟩⟩
= ⟨1, 1, 0, 0⟩ ∥ ⟨0, 0, 0, 0⟩ ∥ ⟨1, 0, 0, 0⟩ ∥ ⟨1, 0, 1, 0⟩
= ⟨1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0⟩
= B

D = ⟨⟨1, 1, 0, 0, 0, 0, 0, 0⟩, ⟨1, 0, 0, 0, 1, 0, 1, 0⟩⟩


= ⟨1, 1, 0, 0, 0, 0, 0, 0⟩ ∥ ⟨1, 0, 0, 0, 1, 0, 1, 0⟩
= ⟨1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0⟩
= B

git # 8b6da880 @ 2023-09-27 40


© Daniel Page ⟨dan@phoo.org⟩

st. C has four elements (each of which is a sub-sequence of four bits from B), while D has two elements (each of
which is a sub-sequence of eight bits from B). It is important to see that we have not altered the bits themselves,
just how they are grouped together: we can easily “flatten out” the sub-sequences and reconstruct the original
sequence B.
Example 1.35. Consider four nibbles in C, i.e., the four 4-bit sub-sequences
C0 = ⟨1, 1, 0, 0⟩
C2 = ⟨0, 0, 0, 0⟩
C3 = ⟨1, 0, 0, 0⟩
C4 = ⟨1, 0, 1, 0⟩
If we want to reconstruct C itself, we need to know which order to put the sub-sequences in: via a little-endian
convention we get
CLE = ⟨C0 , C1 , C2 , C3 ⟩ = ⟨⟨1, 1, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨1, 0, 1, 0⟩⟩
whereas via a big-endian convention we get
CBE = ⟨C3 , C2 , C1 , C0 ⟩ = ⟨⟨1, 0, 1, 0⟩, ⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 1, 0, 0⟩⟩.

1.8.1.4 Units
There is a standard notation for measuring multiplicities of bits and bytes: a suffix specifies the quantity (‘b’ or
“bit” for bits, ‘B’ for bytes), and a prefix specifies a multiplier. Although the notation remains consistent, some
ambiguities about how to interpret prefixes complicate matters.
The International System of Units (SI) works with decimal, base-10 prefixes so, for example, a kilobit means
103 = 1000 bits. As a result, we find that
1 kilobit (kbit) = 103 bits = 1000 bits
1 megabit (Mbit) = 106 bits = 1 000 000 bits
1 gigabit (Gbit) = 109 bits = 1 000 000 000 bits
1 terabit (Tbit) = 1012 bits = 1 000 000 000 000 bits

1 kilobyte (kB) = 103 bytes = 1000 bytes


1 megabyte (MB) = 106 bytes = 1 000 000 bytes
1 gigabyte (GB) = 109 bytes = 1 000 000 000 bytes
1 terabyte (TB) = 1012 bytes = 1 000 000 000 000 bytes
However, in the context of Computer Science the same English prefixes are commonly (ab)used to specify
a binary, base-2 multiplier. For example, kilo will be read to means 210 = 1024 ≃ 1000: RAM or hard disk
capacity is commonly measured in this way, for example. To eliminate resulting ambiguity, the International
Electrotechnical Commission (IEC) added some more SI prefixes; the result is that

1 kibibit (Kibit) = 210 bits = 1024 bits


1 mebibit (Mibit) = 220 bits = 1 048 576 bits
1 gigibit (Gibit) = 230 bits = 1 073 741 824 bits
1 tebibit (Tibit) = 240 bits = 1 099 511 627 776 bits

1 kibibyte (KiB) = 210 bytes = 1024 bytes


1 mebibyte (MiB) = 220 bytes = 1 048 576 bytes
1 gigibyte (GiB) = 230 bytes = 1 073 741 824 bytes
1 tebibyte (TiB) = 240 bytes = 1 099 511 627 776 bytes
The question is, which should we use? Does it really matter? Clearly, yes: if we buy a hard disk which says it
holds 1 terabyte of data, we hope they are talking, in traditional terms, about a tebibyte, i.e., 1, 099, 511, 627, 776
bytes rather than 1, 000, 000, 000, 000 bytes, because then we get more storage capacity! In the same way,
imagine we are comparing two hard disks: we need to make sure their quoted storage capacity use the same
units, or the comparison will be unfair.
From here on, we try to make consistent use of the new SI prefixes: when we say kilobyte or kB we mean
103 bytes, and when we say kibibyte or KiB we mean 210 bytes. On one hand, this might not be popular from
a historical point of view; on the other hand, it should mean we clear and consistent.

git # 8b6da880 @ 2023-09-27 41


© Daniel Page ⟨dan@phoo.org⟩

An aside: the shift-and-mask paradigm, part #1.

Given some w-bit word, the shift-and-mask paradigm allows us to extract (or isolate) individual or contiguous
sequences of bits. Understanding this is crucial in many areas, and often used in lower-level C programs; this,
and related techniques, it is often termed “bit twiddling” or “bit bashing”.

• Imagine we want to set the i-th bit of some x, i.e., xi , to 1. This can be achieved by computing

x ∨ (1 ≪ i)

For example, if x = 0011(2) and i = 2 then we compute

x ∨ ( 0001(2) ≪ i )
0011(2) ∨ ( 0001(2) ≪ 2 )
0011(2) ∨ 0100(2)
0111(2)

meaning initially x2 = 0, then we changed it so x2 = 1.

• Imagine we want to set the i-th bit of some x, i.e., xi , to 0. This can be achieved by computing

x ∧ ¬(1 ≪ i)

For example, if x = 0111(2) and m = 2 then we compute

x ∧ ¬ ( 0001(2) ≪ i )
0111(2) ∧ ¬ ( 0001(2) ≪ 2 )
0111(2) ∧ ¬ ( 0100(2) )
0111(2) ∧ 1011(2)
0011(2)

meaning initially x2 = 1, then we changed it so x2 = 0.

In both cases, the idea is to first create an appropriate mask then combine it with x to get x′ ; in both cases we
do no actual arithmetic, only Boolean-style operations.

git # 8b6da880 @ 2023-09-27 42


© Daniel Page ⟨dan@phoo.org⟩

An aside: the shift-and-mask paradigm, part #2.

Imagine we want to extract an m-bit sub-word (i.e., m contiguous bits) starting at the i-th bit of some x. This
can be achieved by computing
(x ≫ i) ∧ ((1 ≪ m) − 1)
The computation is a little more complicated, but basically the same principles apply: first we create an
appropriate mask (the right-hand term) and combine it with x (the left-hand term). For example, if x = 1011(2)
and m = 2:

• If i = 0 then we want to extract the sub-word ⟨x1 , x0 ⟩

( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 0 ) ∧ ( ( ≪ 2 ) − 1 )
( 1011(2) ) ∧ ( ( 0100(2) ) − 1 )
( 1011(2) ) ∧ ( 0011(2) )
0011(2)

meaning ⟨x1 , x0 ⟩ = ⟨1, 1⟩ as expected.


• If i = 1 then we want to extract the sub-word ⟨x1 , x2 ⟩

( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 1 ) ∧ ( ( 1 ≪ 2 ) − 1 )
( 0101(2) ) ∧ ( ( 0100(2) ) − 1 )
( 0101(2) ) ∧ ( 0011(2) )
0001(2)

meaning ⟨x1 , x2 ⟩ = ⟨1, 0⟩ as expected.

• If i = 2 then we want to extract the sub-word ⟨x2 , x3 ⟩

( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 2 ) ∧ ( ( 1 ≪ 2 ) − 1 )
( 0010(2) ) ∧ ( ( 0100(2) ) − 1 )
( 0010(2) ) ∧ ( 0011(2) )
0010(2)

meaning ⟨x2 , x3 ⟩ = ⟨0, 1⟩ as expected.

Notice that the (0001(2) ≪ m) − 1 term is basically giving us a way to create a value y where ym−1...0 = 1, i.e.,
whose 0-th through to (m − 1)-th bits are 1. If we know m ahead of time, we can clearly simplify this by
providing y directly rather than computing it.

git # 8b6da880 @ 2023-09-27 43


© Daniel Page ⟨dan@phoo.org⟩

An aside: the shift-and-mask paradigm, part #3.

As a special case of extracting an m-element sub-sequence, when we set m = 1 we extract the i-th bit of x alone.
This is a useful and common operation: following the above, it is achieved by computing

(x ≫ i) ∧ 1,

i.e., replacing the general-purpose mask with the special-purpose constant (1 ≪ 1) − 1 = 2 − 1 = 1. For example:

• If x = 0011(2) and i = 2 then we compute

( x ≫ i ) ∧ 1
( 0011(2) ≫ 2 ) ∧ 1
( 0000(2) ) ∧ 1
0000(2)

meaning x2 = 0.
• If x = 0011(2) and i = 0 then we compute

( x ≫ i ) ∧ 1
( 0011(2) ≫ 0 ) ∧ 1
( 0011(2) ) ∧ 1
0001(2)

meaning x0 = 1.

1.8.2 Positional number systems


As humans, and because (mostly) we have ten fingers and toes, we are used to working with numbers written
down using digits from the set {0, 1, . . . , 9}. Imagine we write down such a number, say 123. It may not be as
common, but hopefully you can believe this is roughly the same as writing the sequence

 = ⟨A0 , A1 , A2 ⟩
= ⟨3, 2, 1⟩
given that 3 is the first digit of 123, 2 is the second digit and so on; we are reading digits in the sequence from
left-to-right vs. right-to-left in the literal, but otherwise they capture the same meaning.
But how do we know what either 123 or  means? Informally at least, writing 123 intuitively means the
value “one hundred and twenty three” which might be rephrased as “one hundred, two tens and three units”.
The latter case suggests how to add formalism to this intuition: we are just weighting each digit 1, 2 and 3 by
some amount then adding everything up. For example, per the above we are computing the value via

 7→ 123 = 1 · 100 + 2 · 10 + 3 · 1.

We could also write the same thing as

 7→ 123 = 1 · 102 + 2 · 101 + 3 · 100

given 100 = 1 and 101 = 10, or more formally still as


|A|−1
X
 7→ 123 = Ai · 10i .
i=0

meaning we add up the terms

A0 · 100 = 3 · 100 = 3· 1 = 3
A1 · 101 = 2 · 101 = 2 · 10 = 20
A2 · 102 = 1 · 102 = 1 · 100 = 100
to make a total of 123 as expected. Put another way, the sequence A represents the value “one hundred and
twenty three”. Two two facts start to emerge, namely

git # 8b6da880 @ 2023-09-27 44


© Daniel Page ⟨dan@phoo.org⟩

1. each digit is being weighted by a power of some base (or radix), which in this case is 10, and
2. the exponent in said weight is related to the position of the corresponding digit: the i-th digit is weighted
by 10i .
A neat outcome of identifying the base as some sort of parameter is that we can consider choices other than
b = 10. Generalising the example somewhat provides the following definition:
Definition 1.50. A base-b (or radix-b) positional number system uses digits from a digit set X = {0, 1, . . . , b − 1}.
A number x is represented using n digits in total, m of which form the fractional part, i.e.,
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

n−m−1
xi · bi
P
7→ ±
i=−m

where xi ∈ X; we term x̂ the base-b expansion of X.


Definition 1.51. The following common choices correspond to
b= 2 { binary
b= 8 { octal
b = 10 { decimal
b = 16 { hexadecimal
numbers.
Example 1.36. Reconsider the example above: imagine we select b = 2, then make a claim that “one hundred
and twenty three” is represented by
B̂ = ⟨B0 , B1 , B2 , B3 , B4 , B5 , B6 , B7 ⟩
= ⟨1, 1, 0, 1, 1, 1, 1, 0⟩
where, per the definition, now the digit set used is st. Bi ∈ {0, 1} for 0 ≤ i < 8 (and thus implicitly setting n = 8
and m = 0). The value represented by B̂ is given using exactly the same approach, i.e.,
|B|−1
X
Bi · 2i ,
i=0

noting that where we previously had 10 we now have 2, st. we add up the terms
B0 · 20 = 1 · 20 = 1· 1 = 1
B1 · 21 = 1 · 21 = 1· 2 = 2
B2 · 22 = 0 · 22 = 0· 4 = 0
B3 · 23 = 1 · 23 = 1· 8 = 8
B4 · 24 = 1 · 24 = 1 · 16 = 16
B5 · 25 = 1 · 25 = 1 · 32 = 32
B6 · 26 = 1 · 26 = 1 · 64 = 64
B7 · 27 = 0 · 27 = 0 · 128 = 0
to obtain a total of 123 as before.

1.8.2.1 Digits
Describing elements in the digit set {0, 1, . . . , b − 1}, for whatever b, using a single digit can be fairly important;
using multiple digits, for example, can start to introduce some ambiguity wrt. how we interpret a literal. In
particular, once we select a b > 10 we hit a problem: we run out of single Roman-style digits that we can write
down.
Example 1.37. Consider the same example as above where we have the literal 123: we know that if b = 10 and
 = ⟨3, 2, 1⟩ then
 7→ 123 = 1 · 102 + 2 · 101 + 3 · 100 .
However, if b = 16, although we know
123 = 7 · 161 + 11 · 160
we have no single-digit way to write 11. To solve this problem, we use the symbols (or in fact characters) A . . . F
to represent 10 . . . 15. Otherwise everything works the same way, meaning for example, that if B̂ = ⟨B, 7⟩ then
B̂ 7→ 123 = 7 · 161 + B · 160
= 7 · 161 + 11 · 160

git # 8b6da880 @ 2023-09-27 45


© Daniel Page ⟨dan@phoo.org⟩

An aside: octal and hexadecimal as a short-hand for binary.

It is useful to remember is that octal and hexadecimal can be viewed as just a short-hand for binary: each octal
or hexadecimal digit represents exactly three or four binary digits respectively. This can make it much easier to
write and remember long sequences of binary digits. As an example, consider hexadecimal. Each hexadecimal
digit xi ∈ {0, 1, . . . , 15} can be represented using four bits (since there are 24 = 16 possible combinations), so can
be viewed instead as those four binary digits.
Using a concrete example, the following translation steps

2223 = 1·1 +1·2 + 1 · 4 + 1 · 8 + 0 · 16 + 1 · 32 +


0 · 64 + 1 · 128 + 0 · 256 + 0 · 512 + 0 · 1024 + 1 · 2048
= 1 · 20 + 1 · 21 + 1 · 22 + 1 · 23 + 0 · 24 + 1 · 25 +
0 · 26 + 1 · 27 + 0 · 28 + 0 · 29 + 0 · 210 + 1 · 211
= ⟨1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1⟩(2)
= ⟨⟨1, 1, 1, 1⟩(2) , ⟨0, 1, 0, 1⟩(2) , ⟨0, 0, 0, 1⟩(2) ⟩(16)
= ⟨15(10) , 10(10) , 8(10) ⟩(16)
= ⟨F(16) , A(16) , 8(16) ⟩(16)
= ⟨F, A, 8⟩(16)
= 15 · 160 + 10 · 161 + 8 · 162
= 15 · 1 + 10 · 16 + 8 · 256
= 2223

are clearly valid.


In C you can in fact write decimal literals (which is the default), and hexadecimal literals using the prefix
0x. However, beware of literals starting with 0: this will be interpreted as octal! For example, 012 has the same
value as 10 because
012 7→ 1 · 81 + 2 · 80
= 10(10)
= 1 · 101 + 0 · 100
7→ 10
You cannot directly express binary literals in C, although doing so is possible in other languages (e.g., Python,
via a 0b prefix).

git # 8b6da880 @ 2023-09-27 46


−128(10) 10000000 −127(10) 10000000 −127(10) 11111111

git # 8b6da880 @ 2023-09-27


−127(10) 10000001
© Daniel Page ⟨dan@phoo.org⟩

−2(10) 11111101 −2(10) 10000010


−2(10) 11111110 −1(10) 11111110 −1(10) 10000001
−1(10) 11111111 −0(10) 11111111 −0(10) 10000000
±0(10) 00000000 +0(10) 00000000 +0(10) 00000000
+1(10) 00000001 +1(10) 00000001 +1(10) 00000001
+2(10) 00000010 +2(10) 00000010 +2(10) 00000010

+127(10) 01111111 +127(10) 01111111 +127(10) 01111111


+128(10) 10000000 +128(10) 10000000 +128(10) 10000000
+129(10) 10000001 +129(10) 10000001 +129(10) 10000001

direct copy, contiguous number line


direct copy, non-contiguous number line
reversed copy, non-contiguous number line

(a) A number line for sign-magnitude representation.

(c) A number line for two’s-complement representation.


(b) A number line for one’s-complement representation.
+254(10) 11111110 +254(10) 11111110 +254(10) 11111110
+255(10) 11111111 +255(10) 11111111 +255(10) 11111111

47
Figure 1.3: Number lines illustrating the mapping of 8-bit sequences to integer values using three different representations.
© Daniel Page ⟨dan@phoo.org⟩

1.8.2.2 Notation
Amazingly there are not many jokes about Computer Science, but here are two (bad, comically speaking)
examples:
1. There are only 10 types of people in the world: those who understand binary, and those who do not.
2. Why did the Computer Scientist always confuse Halloween and Christmas? Because 31 Oct equals 25
Dec.
Whether or not you laughed at them, both jokes stem from ambiguity in the representation of numbers: there
is an ambiguity between “ten” written in decimal and binary in the former, and “twenty five” written in octal
and decimal in the latter.
Look at the first joke: it is basically saying that the literal 10 can be interpreted as binary or decimal, i.e., as
1 · 2 + 0 · 1 = 2 in binary and 1 · 10 + 0 · 1 = 10 in decimal. So the two types of people are those who understand
that 2 can be represented by 10, and those that do not. Now look at the second joke: this is a play on words in
that “Oct” can mean “October” but also “octal” or base-8 and “Dec” can mean “December” but also “decimal”
or base-10. With this in mind, we see that
3 · 8 + 1 · 1 = 25 = 2 · 10 + 5 · 1.
i.e., 31 Oct equals 25 Dec in the sense that 31 in base-8 equals 25 in base-10.
Put in context, we saw above that the decimal sequence  and decimal number 123 are basically the same
iff. we interpret A in the right way. The if in that statement is a problem, in the sense there is ambiguity: if we
follow the same reasoning as in the jokes, how do we know what base the literal 01111011 is written down in? It
could mean the decimal number 123 (i.e., “one hundred and twenty three”) if we interpret it using b = 2, or the
decimal number 01111011 (i.e., “one million, one hundred and eleven thousand and eleven”) if we interpret it
using b = 10; clearly that is quite a difference!
To clear up this ambiguity, where necessary we write literal numbers and representations with the base
appended to them. For example, we write 123(10) to show that 123 should be interpreted in base-10, or
01111011(2) to shows that 01111011 should be interpreted in base-2. We can now be clear, for example, that
123(10) = 01111011(2) ; using this notation, the two jokes become even less amusing when written simply as
10(2) = 2(10) and 31(8) = 25(10) .
Example 1.38. Consider a case where m , 0, which allows negative values of i and therefore negative powers
of the base: whereas m = 0 implies no fraction part to the resulting value, because 10−1 = 1/10 = 0.1 and
10−2 = 1/100 = 0.01, for example, when m , 0 we can write down numbers which do have fractional parts.
Consider that
123.3125(10) = 1 · 102 + 2 · 101 + 3 · 100 + 3 · 10−1 + 1 · 10−2 + 2 · 10−3 + 5 · 10−4
given we have n = 7 digits, m = 4 of which capture the fractional part. Of course since the definition is the
same, we can do the same thing using a different base, e.g.,
123.3125(10) = 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 + 0 · 2−1 + 1 · 2−2 + 0 · 2−3 + 1 · 2−4
= 1111011.0101(2) .
The decimal point in the former has the same meaning (i.e., as a separator between fractional and non-fractional
parts) when translated into a binary point in the latter; more generally we call this a fractional point where
the base is irrelevant.
Example 1.39. We mentioned previously that certain numbers are irrational: the definition of Q suggested that,
in such cases, we could not find an x and y such that x/y provided the result required.
In fact, the base such numbers are represented in has some impact on their (ir)rationality. Informally, we
already know that when we write 1/3 as a decimal number we have 0.3333 . . .; the ellipsis mean the sequence
recurs infinitely. 1/10, however, is rational when written as 0.1 in decimal, but irrational when written in
binary; the closest approximation is 0.000110011 . . ..

1.8.3 Representing integer numbers, i.e., members of Z


The positional number systems explored above afford a flexible way to represent numbers in theory, e.g., if
we just want to write down some examples. However, some (implicit) challenges exist when using them in
practice. Considering the set of integers Z, for example, we have no way to cater for a) the infinite size of this
set, given the finite concrete resources available to us (which bound the number of digits we can have in a
given base-b expansion), and b) the fact members of the set can be positive or negative. In even more concrete
terms, consider some integer data types available in C:

git # 8b6da880 @ 2023-09-27 48


© Daniel Page ⟨dan@phoo.org⟩

An aside: the actual range of C integer data types.

In describing the C data type int as implying an associated set (or range)

Zint = {−231 , . . . , 0, . . . + 231 − 1}

of values, we simplified what is, in reality, a somewhat complicated issue. In short, the C language specifies the
above much more abstractly; the C compiler and platform (i.e., processor) make the details concrete, allowing us
to reason as we did above.
It is worth looking at this issue in more detail: on one hand it is not often covered elsewhere, but, on
the other hand, will help avoid making assumptions that may be (subtly, and infrequently) incorrect. Other
descriptions exist, but we follow that in [9] due to the clarity of presentation. Considering integer data types
only, i.e., for each type
T ∈ {char, short, int, long, long long},
the C language defines two abstract properties:

1. the signed’ness of a type T is denoted


(
0 if T is unsigned
S(T) =
1 if T is signed

and allows us to destinguish between unsigned int and int, for example, and
2. the rank of a type T, denoted R(T), is an abstract measure of size (and hence range); rather than a numerical
value, types are simply ordered st.

R(char) < R(short) < R(int) < R(long) < R(long long).

The platform provides concrete detail, in particular assigning a width (or size) of

W(T) ∈ {1, 2, 4, 8}

bytes to each type; this is termed the data model. Based on use of two’s-complement, we can derive the range
of each type as

{0, . . . , +28·W(T) − 1} if S(T) = 0, st. T is unsigned


(
I(T) =
{−28·W(T)−1 , . . . , 0, . . . , +28·W(T)−1 − 1} if S(T) = 1, st. T is signed

which matches our own definitions. Although the platform can select W(T) for each T, a crucial restriction
applies: for any types T1 and T2 where R(T1 ) < R(T2 ), the property W(T1 ) ≤ W(T2 ) must hold. Put another
way, we can be sure the width of int is less than or equal to that of long, even if those widths are not known; it
cannot be the other way around, for example, st. long is wider than int.
So we assumed W(int) = 4 in our description, but this is not the only possibilty. [9, Table 1] surveys various
data models, noting, for example, that

LP32 ILP32
W(char) 1 1
W(short) 2 2
W(int) 2 4
W(long) 4 4
W(long long) 8 8

are valid posibilities: if we assume W(int) = 4 in a program compiled and executed on a platform associated
with the left-hand data model problems may well occur, whereas the right-hand data model matches.

git # 8b6da880 @ 2023-09-27 49


© Daniel Page ⟨dan@phoo.org⟩

unsigned char 7→ Zunsigned char = { 0, . . . , +28 − 1 }


unsigned int 7→ Zunsigned int = { 0, . . . , +232 − 1 }
char 7→ Zchar = { −2 , . . . , 0, . . . , +27 − 1 }
7

int 7→ Zint = { −231 , . . . , 0, . . . , +231 − 1 }

This is meant to illustrate, for example, that the int data type, which one might say as “an integer”, is in fact an
approximation of the integers (i.e., of Z): the range of values is finite. That said, however, why use this particular
approximation?
We can answer this question by investigating concrete representations used in C, basing our discussion on
positional number systems via use of bit-sequences (of fixed length n) to encode members of Z. Note that where
appropriate, we use colour to highlight parts of each representation that determine the sign and magnitude
(or size) of the associated value; since we are representing integers, we implicitly set m = 0 within the general
definition of a positional number system (since there is, by definition, no fractional part in an integer).

1.8.3.1 Unsigned integers


Natural binary expansion
Definition 1.52. An unsigned integer can be represented in n bits by using the natural binary expansion. That is, we
have
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

n−1
xi · 2i
P
7→
i=0

for xi ∈ {0, 1}, and


0 ≤ x ≤ 2n − 1.
Example 1.40. If n = 8 for example, we can represent values in the range +0 . . . + 255; selected cases are as
follows:
11111111 7→ 1 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +255(10)
.. ..
. .
10000101 7→ 1 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = +133(10)
.. ..
. .
10000000 →
7 1 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +128(10)
01111111 → 7 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)
.. ..
. .
01111011 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)
.. ..
. .
00000001 →
7 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)
00000000 → 7 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)

Binary Coded Decimal (BCD) BCD is an alternative method of representing unsigned integers: rather than
representing the number itself as a bit-sequence, the idea is to write it in decimal and encode each decimal digit
independently. The overall representation is the concatenation of bit-sequences which result from encoding
the decimal digits:
Definition 1.53. Consider the function

{0, 1, . . . , 9} → B4







d=0
 


 
 ⟨0, 0, 0, 0⟩ if
d=1
 
⟨1, 0, 0, 0⟩ if

 


 
d=2
 
⟨0, 1, 0, 0⟩ if

 

 

d=3
 
⟨1, 1, 0, 0⟩ if

 

f :
 
d=4


 d

 ⟨0, 0, 1, 0⟩ if
7 →
 
⟨1, 0, 1, 0⟩ if d=5

 


 
d=6
 
⟨0, 1, 1, 0⟩ if

 

 

d=7
 
⟨1, 1, 1, 0⟩ if

 


 
d=8

⟨0, 0, 0, 1⟩ if

 


 

d=9

⟨1, 0, 0, 1⟩ if
 

git # 8b6da880 @ 2023-09-27 50


© Daniel Page ⟨dan@phoo.org⟩

which encodes a decimal digit d into a corresponding 4-bit sequence; this function corresponds to the Simple Binary Coded
Decimal (SBCD), or BCD 8421, standard. Given the decimal number

x = ⟨x0 , x1 , . . . , xn−1 ⟩(10) ,

the BCD representation is

x̂ = ⟨ f (x0 ), f (x1 ), . . . , f (xn−1 )⟩.

Example 1.41. If n = 8 for example, we can represent values in the range +0 . . . + 99999999; selected cases are
as follows:
* +
⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩,
10011001100110011001100110011001 7→
⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩
7→ ⟨9, 9, 9, 9, 9, 9, 9, 9⟩(10)
= +99999999(10)

..
.
* +
⟨1, 1, 0, 0⟩, ⟨0, 1, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000100100011 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7→ ⟨3, 2, 1, 0, 0, 0, 0, 0⟩(10)
= +123(10)

..
.
* +
⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000000000001 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7→ ⟨1, 0, 0, 0, 0, 0, 0, 0⟩(10)
= +1(10)
* +
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000000000000 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7 → ⟨0, 0, 0, 0, 0, 0, 0, 0⟩(10)
= +0(10)

1.8.3.2 Signed integers

Sign-magnitude

Definition 1.54. A signed integer can be represented in n bits by using the sign-magnitude approach; 1 bit is reserved
for the sign (0 means positive, 1 means negative) and n − 1 for the magnitude. That is, we have

x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

n−2
(−1)xn−1 · xi · 2i
P
7→
i=0

for xi ∈ {0, 1}, and

−2n−1 + 1 ≤ x ≤ +2n−1 − 1.

Note that there are two representations of zero (i.e., +0 and −0).

Example 1.42. If n = 8, for example, we can represent values in the range −127 . . . + 127; selected cases are as

git # 8b6da880 @ 2023-09-27 51


© Daniel Page ⟨dan@phoo.org⟩

follows:

01111111 7→ (−1)0 · ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 ) = +127(10)


.. ..
. .
01111011 7→ (−1)0 · ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 ) = +123(10)
.. ..
. .
00000001 7→ (−1)0 · ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 ) = +1(10)
00000000 7 → (−1)0 · ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 ) = +0(10)
10000000 7 → (−1)1 · ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 ) = −0(10)
10000001 7 → (−1)1 · ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 ) = −1(10)
.. ..
. .
11111011 7→ (−1)1 · ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 ) = −123(10)
.. ..
. .
11111111 7→ (−1)1 · ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 ) = −127(10)

One’s-complement

Definition 1.55. The one’s-complement method represents a signed integer in n bits by assigning the complement of
x (i.e., ¬x) the value −x. That is, given
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

n−2
xi · 2i
P
7→
i=0

for xi ∈ {0, 1}, then the encoding of ¬x is assumed to represent −x. This means we have

−2n−1 − 1 ≤ x ≤ +2n−1 − 1.

Note that there are two representations of zero (i.e., +0 and −0).

Example 1.43. If n = 8 for example, we can represent values in the range −127 . . . + 127; selected cases are as
follows:

01111111 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)
.. ..
. .
01111011 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)
.. ..
. .
00000001 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)
00000000 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)
11111111 7→ −0(10)
11111110 7→ −1(10)
..
.
10000100 7→ −123(10)
..
.
10000000 7→ −127(10)

Two’s-complement

Definition 1.56. A signed integer can be represented in n bits by using the two’s-complement approach. The basic
idea is to weight the (n − 1)-th bit using −2n−1 rather than +2n−1 , and all other bits as normal. That is, we have

x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

n−2
xn−1 · −2n−1 + xi · 2i
P
7→
i=0

for xi ∈ {0, 1}, and


−2n−1 ≤ x ≤ +2n−1 − 1.

git # 8b6da880 @ 2023-09-27 52


© Daniel Page ⟨dan@phoo.org⟩

Example 1.44. If n = 8 for example, we can represent values in the range −128 . . . + 127; selected cases are as
follows:

01111111 7→ 0 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)


.. ..
. .
01111011 7→ 0 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)
.. ..
. .
00000001 →
7 0 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)
00000000 → 7 0 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)
11111111 → 7 1 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = −1(10)
.. ..
. .
10000101 7→ 1 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = −123(10)
.. ..
. .
10000000 7→ 1 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = −128(10)

Given that two’s-complement is the de facto choice for signed integer representation, it warrants some further
explanation: it is important to grasp how the representation works.
One approach is to consider Figure 1.3c, which is a a number line of values in two’s-complement repre-
sentation. Offset a little to the left, it shows that 0 (bottom) is represented by the literal 00000000 (which is, of
course, equivalent to a bit-sequence ⟨0, 0, 0, 0, 0, 0, 0, 0⟩); reading from the point toward the right, shows unsigned
integers up to 255 could be represented use their natural binary representation. Sometimes you see a number
line wrapped into a circle, to emphasise the fact that the values it captures will wraps-around: when we reach
255 (or 11111111), and give we have n = 8 bits here, the next value is 0 (or 00000000) because the representation
wraps-around. Toward the left of 0, it starts to be clear that two’s-complement is basically “moving” the upper
or right-hand range of what would be 128 to 255: by using a large, negative weight for the (n − 1)-th bit it moves
the (positive) range 128 to 255 into the (negative) range −128 to −1. This movement is direct, in the sense the
order of the range is preserved; this contrasts with sign-magnitude, for example, which, per the same idea in
Figure 1.3a, reverses the range as it is moved. This difference stems from the fact that two’s-complement fits the
concept of a positional number system naturally, whereas the same cannot be said of sign-magnitude where
the sign bit is sort of a special case (i.e„ weighted abnormally). However, subtle this point is, it is important.
More specifically, the fact that

1. there is one representation of the value zero, and

2. as we step left or right through representations, they remain in-order wrt. the values they represent

means we can apply the same approach to arithmetic using signed integers represented using two’s-complement
as with the simpler case of unsigned integers; this is not true of sign-magnitude, for example, arguably making
it less attractive as a result.
Another approach is via an appeal to intuition: if we have x and add −x, i.e., compute x + (−x), then we
intuitively expect to produce 0 as a result. The two’s-complement representation satisfies this: we can see from
the above that
x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = −2(10) 7→ 1 1 1 1 1 1 1 0 +
c = 1 1 1 1 1 1 1 0 0
r = 0(10) 7→ 0 0 0 0 0 0 0 0
meaning that if we ignore the carry-out (which cannot be captured: we have too few bits), we get the result
expected. As a by-product, this yields a useful fact:

Definition 1.57. The term two’s-complement can be used as a noun (i.e., to describe the representation) or a verb (i.e., to
describe an operation): the latter case defines “taking the two’s-complement of x” to mean negating x and thus computing
the representation of −x. To do so, we compute −x 7→ ¬x + 1.

To see why this is true, first note that for an x represented in two’s-complement, adding x to ¬x produces −1 as
a result. For example:
x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = ¬2(10) 7→ 1 1 1 1 1 1 0 1 +
c = 0 0 0 0 0 0 0 0 0
r = −1(10) 7→ 1 1 1 1 1 1 1 1

git # 8b6da880 @ 2023-09-27 53


© Daniel Page ⟨dan@phoo.org⟩

This should make sense, in that each corresponding i-th bit in x and ¬x will be the opposite of each other: either
one will be 0 and the other is 1 or vice versa, st. their sum will always be 1. The result is off-by-one, however,
in the sense we produce −1 rather than the expected 0. So, if we compute

x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = ¬2(10) + 1 7 → 1 1 1 1 1 1 1 0 +
c = 1 1 1 1 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0 0 0 0 0

instead then we are back to the same example as above: the result is 0, so it −x 7→ ¬x + 1.

1.8.4 Representing real numbers, i.e., members of R


In the above, we considered concrete representations for elements in Z. Each case used positional number
systems as an underlying idea, but in different ways (so with different properties); each one coped with the
infinite size of Z by approximating it with a bit-sequence of fixed length (i.e., using n bits). Using the same
motivation, namely the observation that C yields the approximations

float 7→ Rfloat ≃ { −3.40 · 1038 , . . . , +3.40 · 1038 } ∪ {±∞, NaN}


double 7 → Rdouble ≃ { −1.79 · 10308 , . . . , +1.79 · 10308 } ∪ {±∞, NaN}

we can apply roughly the same approach to represent R, the set of real numbers. Since we know a positional
number system can accommodate numbers with a fractional part (via an m > 0), the fact we can consider
representations for R should not be surprising. However, the approach we use doe differs somewhat: it make
senses to ignore the previous notation etc. and start afresh with another underlying idea. That is, we will
approximate some x by taking a base-b integer m (signed or otherwise) and scaling it, i.e., have

x̂ 7→ m · be ≃ x

for some e. Two more concrete representations based on this idea can be described as follows:

1. if e is fixed (i.e., does not vary between different x and hence m) we have a fixed-point representation,
whereas

2. if e is not fixed (i.e., can vary between different x and hence m) we have a floating-point representation.

1.8.4.1 Fixed-point

Definition 1.58. The goal of a fixed-point representation is to allow expression of real numbers whose form is

x = m · b−q

or, equivalently,
1
x=m· ,
bq
where

• m ∈ Z is the mantissa, and

• q ∈ N is the exponent.

Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa. Since the exponent is
fixed, this essentially means interpreting m, and hence x, as two components, i.e.,

1. a q-digit fractional component taken from the least-significant digits, and

2. a p-digit integral component taken from the most-significant digits

git # 8b6da880 @ 2023-09-27 54


© Daniel Page ⟨dan@phoo.org⟩

(a) QS32,0 (b) QS31,1 (c) QS30,2

(d) QS29,3 (e) QS28,4 (f) QS27,5

(g) QS26,6 (h) QS25,7 (i) QS16,16

Figure 1.4: A visualisation of the impact of increasing q, the number of fractional digits, in a fixed-point representation;
the result is increased detail within the rendering of a Mandelbrot fractal.

git # 8b6da880 @ 2023-09-27 55


© Daniel Page ⟨dan@phoo.org⟩

where n = p + q; we use the notation Qp,q to denote this. Abusing notation a little, we have that
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩

7→ ⟨m0 , . . . , mq−2 , mq−1 , mn−p , . . . , mn−2 , mn−1 ⟩Qp,q


| {z } | {z }
q digits p digits

1
7→ m · bq

n−1
1
mi · bi ·
P
7→ bq
i=0

n−1
mi · bi−q
P
7→
i=0

Definition 1.59. There are some important quantities relating to a fixed-point representation Qp,q :
• The resolution is the smallest difference between representable values, i.e., the value 1
bq .

• The precision is essentially n, the number of digits in the representation; in a sense this (in combination with the
resolution) governs the range of values that can be represented.
Example 1.45. This might seem confusing, but the basic idea is as above. That is, given an integer x, we just
shift the fractional point by a fixed amount to determine the associated value. Imagine we set b = 10, n = 7,
q = 4 and write the literal
x̂ = 1233125.
Interpreting x in the fixed-point representation specified by n and q means there are q = 4 fractional digits, i.e.,
3125, and p = n − q = 7 − 4 = 3 integral digits, i.e., 123. Therefore
1 1
= 1233125 · 4 = 123.3125(10) ,
x̂ 7→ x ·
q
b 10
meaning we have simply taken x and shifted the fractional point by q = 4 digits. Put yet another way, we
again alter the weight associated with each digit: taken as an integer the i-th digit will be weighted by bi , but
interpreting the same digit as above means weighting it by bi−q .
Example 1.46. There is a neat way to visualise the intuitive effect of adding more precision to (i.e., increasing
the number of fractional digits in) a fixed-point representation. Figure 1.4 includes different renderings of the
Mandelbrot fractal, named after mathematician Benoît Mandelbrot. Each rendering uses a 32-bit integer, i.e.,
n = 32, to specify a fixed-point representation but with different values of q, i.e., different numbers of fractional
digits. Quite clearly, as we increase q there is more detail. Without expanding on the detail, the fractal is
rendered by sampling points on a circle of radius 2 centred at the point (0, 0). With no fractional digits, we
can sample points (x, y) with x, y ∈ Z st. x, y ∈ {−2, −1, 0, +1, +2}; this is restrictive it the sense it allows only
a few points. However, by adding more fractional digits we can sample many more intermediate points, e.g.,
(0.5, 0.5) and so on, meaning more detail in the rendering.
Example 1.47. Although the definition is general enough to accommodate any choice of b, it may not be
surprising that b = 2 is attractive: this allows us to reuse what we know about representing integers using
bit-sequences, and apply it to representing real numbers using a fixed-point representation.
• We can describe an unsigned fixed-point representation based on an unsigned integer; imagine we select
n = 8 with p = 5 and q = 3, denoted QU
5,3
. This means
x̂ = ⟨x0 , x1 , . . . , x7 ⟩
Pp+q−1 1
7→ ( i=0
xi · 2i ) · 2q
which produces a value in the range
1
0 ≤ x ≤ 2p −
2q
or rather 0 ≤ x ≤ 31.875 with a resolution of 0.125. For example
x̂ = 15(10)
= 00001111(2)
7 → 00001111(QU )
5,3
7 → 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3
7→ 1.875(10)

git # 8b6da880 @ 2023-09-27 56


© Daniel Page ⟨dan@phoo.org⟩

31

30

23

22

0
s e m

(a) 32-bit, single-precision format as a bit-sequence.

typedef struct __ieee32_t {


uint32_t m : 23, // mantissa
e : 8, // exponent
s : 1; // sign
} ieee32_t ;

(b) 32-bit, single-precision format as a C structure.


63
62

52

51

0
s e m

(c) 64-bit, double-precision format as a bit-sequence.

typedef struct __ieee64_t {


uint64_t m : 52, // mantissa
e : 11, // exponent
s : 1; // sign
} ieee64_t ;

(d) 64-bit, double-precision format as a C structure.

Figure 1.5: Single- and double- precision IEEE-754 floating-point formats described graphically as bit-sequences and
concretely as C structures.

• We can describe an signed fixed-point representation based on a two’s-complement signed integer;


imagine we select n = 8 with p = 5 and q = 3, denoted QS5,3 . This means

x̂ = ⟨x0 , x1 , . . . , x7 ⟩
Pp+q−2
7→ (−xp+q−1 · 2p+q−1 + i=0
xi · 2i ) · 1
2q

which produces a value in the range


1
−2p−1 ≤ x ≤ 2p−1 −
2q
or rather −16 ≤ x ≤ 15.875 with a resolution of 0.125. For example

x̂ = 142(10)
= 10001111(2)
7 → 10001111(QS )
5,3
7 → 1 · −24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3
7→ −14.125(10)

1.8.4.2 Floating-point
Definition 1.60. The goal of a floating-point representation is to allow expression of real numbers whose form is
x = −1s · m · be
where
• s ∈ {0, 1} is the sign bit,
• m ∈ N is the mantissa, and
• e ∈ Z is the exponent.
Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa; since the exponent can
vary, it acts to “float” the fractional point, denoted ◦, into the correct position.
We say that a number of the form
x = −1s · (mn−1 ◦ mn−2 . . . m1 m0 ) · be ,
| {z }
n digits

git # 8b6da880 @ 2023-09-27 57


© Daniel Page ⟨dan@phoo.org⟩

is normalised: the fractional point is initially (i.e., before it is moved via the scaling factor) assumed to be after the first
non-zero digit of the mantissa. Note that n determines the precision.
Definition 1.61. IEEE-754 specifies two floating-point representations (or formats); each format represents a floating-
point number as a bit-sequence by concatenating together three components, i.e., the mantissa, the exponent and the sign
bit. There are two features to keep in mind:

• Imagine x is normalised as above, since b = 2 we know mn−1 = 1 since the leading digit of the mantissa must be
non-zero. This means we do not need to include mn−1 explicitly in the representation of x, the now implicit value
being termed a (or the) hidden digit.
• The exponent needs a signed integer representation; one might imagine that two’s-complement is suitable, but
instead an approach called biasing is used. Essentially this means that the representation of x adds a constant β to
the real value of e so it is always positive, i.e.,
ê 7→ e − β.

The formats are as follows:

• The single-precision, 32-bit floating-point format allocates the least-significant 23 bits to the mantissa, the next-
significant 8 bits to the exponent and the most-significant bit to the sign:

x̂ = ⟨x0 , x1 , . . . , x31 ⟩

= ⟨m0 , m1 , . . . , m22 , e0 , e1 , . . . , e7 , s⟩QIEEE−32−bit


| {z } | {z }
23 bits 8 bits

7→ −1s · m · 2e−127

Note that here, β = 127.


• The double-precision, 64-bit floating-point format allocates the least-significant 52 bits to the mantissa, the next-
significant 11 bits to the exponent and the most-significant bit to the sign:

x̂ = ⟨x0 , x1 , . . . , x63 ⟩

= ⟨m0 , m1 , . . . , m51 , e0 , e1 , . . . , e10 , s⟩QIEEE−64−bit


| {z } | {z }
52 bits 11 bits

7→ −1s · m · 2e−1023

Note that here, β = 1023.

Definition 1.62. The IEEE floating-point representations reserve some values in order to represent special quantities.
For example, reserved values are used to represent +∞, −∞ and NaN, or not-a-number: +∞ and −∞ can occur when
a result overflows beyond the limits of what can be represented, NaN occurs, for example, as a result of division by zero.
For the single-precision, 32-bit format these special values are

00000000000000000000000000000000 7→ +0
10000000000000000000000000000000 7 → −0
01111111100000000000000000000000 7 → +∞
11111111100000000000000000000000 7 → −∞
01111111100000100000000000000000 7 → NaN
11111111100100010001001010101010 7 → NaN

with similar forms for the double-precision, 64-bit format.


Example 1.48. Imagine we want to represent x = 123.3125(10) in the single-precision, 32-bit IEEE floating-point
format. First we write the number in binary

x = 1111011.0101(2)

before normalising it, meaning we shift it so that there is only one digit to the left of the binary point, to get

x = 1.1110110101(2) · 26 .

git # 8b6da880 @ 2023-09-27 58


© Daniel Page ⟨dan@phoo.org⟩

Recalling that we do not store the implicit hidden digit (i.e., the digit to the left of the binary point), our
mantissa, exponent and sign become

m = 11101101010000000000000(2)
e = 00000110(2)
s = 0(2)

noting we pad both with less-significant zeros to ensure each is of the correct length. Finally, we can convert
each component into a literal using their associated representations, i.e.,

m̂ = 11101101010000000000000
ê = 10000101
ŝ = 0

noting that we bias e (i.e., add 127 to e = 6) to get the result, and concatenate the components into the single
literal
x̂ = 01000010111101101010000000000000.
Definition 1.63. Consider a case where the result of some arithmetic operation (or conversion) requires more digits of
precision than are available. That is, it cannot be represented exactly within the n digits of mantissa available. To combat
this problem, we can use the concept of rounding. For example, you probably already know that if we only have two
digits of precision available then

• 1.24(10) is rounded to 1.2(10) because the last digit is less than five, while
• 1.27(10) is rounded to 1.3(10) because the last digit is greater than or equal to five.

Such a rounding mode is essentially a rule that takes the ideal result, i.e., the result if one could use infinite precision,
to the most suitable representable result.
The IEEE-754 specification mandates the availability of four rounding modes. In each case, the idea is to
imagine the ideal result x is written using an l > n digit mantissa m, i.e.,

x = −1s · (ml−1 ◦ ml−2 . . . m1 m0 ) · be .


| {z }
l digits

To round x, we copy the most-significant n digits of m to get

x′ = −1s · (m′n−1 ◦ m′n−2 . . . m′1 m′0 ) · be


| {z }
n digits

where m′i = mi+l−n , then “patch” m′0 = ml−n according to rules given by the rounding mode. Within the following,
we offer some decimal examples for clarity (minor alterations apply in binary), rounding for n = 2 digits of
precision in each example. Note that the the C standard library offers access to these features, using constant
values FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, and FE_TOWARDZERO respectively to refer to the rounding
modes themselves. For example, the rint function rounds a floating-point value using the currently selected
IEEE-754 rounding mode; this can be inspected and set using the fegetround and fesetround functions.
Definition 1.64. Sometimes termed Banker’s Rounding, the round to nearest mode alters basic rounding to provide
more specific treatment when the ideal result is exactly half way between representable results, i.e., when m′0 = 5. It can
be described via the following rules:

• If ml−n−1 ≤ 4, then do not alter m′0 .


• If ml−n−1 ≥ 6, then alter m′0 by adding one.
• If ml−n−1 = 5 and at least one of the trailing digits from ml−n−2 onward is non-zero, the alter m′0 by adding one.
• If ml−n = 5 and all of the trailing digits from ml−n−1 onward are zero, then alter m′0 to the nearest even digit. That
is:

– if m′0 ≡ 0 (mod 2) then do not alter it, but


– if m′0 ≡ 1 (mod 2) then alter it by adding one.

Example 1.49. Using the round to nearest mode, we find that

git # 8b6da880 @ 2023-09-27 59


© Daniel Page ⟨dan@phoo.org⟩

• 1.24(10) rounds to 1.2(10) ,


• 1.27(10) rounds to 1.3(10) ,
• 1.251(10) rounds to 1.3(10) ,
• 1.250(10) rounds to 1.2(10) , and
• 1.350(10) rounds to 1.4(10) .

Definition 1.65. Sometimes termed ceiling, the round toward +∞ mode can be described via the following rules:

• If x is positive (i.e., s = 0), if ml−n−1 is non-zero then alter m′0 by adding one.
• If x is negative (i.e., s = 1), the trailing digits from ml−n−1 onward are discarded.

Example 1.50. Under the round toward +∞ mode, we find that

• 1.24(10) rounds to 1.3(10) ,


• 1.27(10) rounds to 1.3(10) ,
• 1.20(10) rounds to 1.2(10) ,
• −1.24(10) rounds to −1.2(10) ,
• −1.27(10) rounds to −1.2(10) , and
• −1.20(10) rounds to −1.2(10) .

Definition 1.66. Sometimes termed floor, the round toward −∞ mode can be described via the following rules:

• If x is positive (i.e., s = 0), the trailing digits from ml−n−1 onward are discarded.
• If x is negative (i.e., s = 1), if ml−n−1 is non-zero then alter m′0 by adding one.

Example 1.51. Under the round toward −∞ mode, we find that

• −1.24(10) rounds to −1.3(10) ,


• −1.27(10) rounds to −1.3(10) ,
• −1.20(10) rounds to −1.2(10) ,
• 1.24(10) rounds to 1.2(10) ,
• 1.27(10) rounds to 1.2(10) , and
• 1.20(10) rounds to 1.2(10) .

Definition 1.67. The round toward zero mode operates as round toward −∞ for positive numbers and as round toward
+∞ for negative numbers.
Example 1.52. Under the round toward zero mode, we find that

• 1.27(10) rounds to 1.2(10) ,


• 1.24(10) rounds to 1.2(10) ,
• 1.20(10) rounds to 1.2(10) ,
• −1.27(10) rounds to −1.2(10) ,
• −1.24(10) rounds to −1.2(10) , and
• −1.20(10) rounds to −1.2(10) .

Example 1.53. The (slightly cryptic) C program in Figure 1.6 offers a practical demonstration that floating-point
works as expected. The idea is to “overlap” a single-precision, 32-bit floating-point value called x with an
instance of the ieee32_t structure called y; main creates an instance of this union, calling it t. Since we can
access individual fields within t.y (e.g., the sign bit t.y.s, or the mantissa t.y.m), we can observe the effect
altering them has on the value of t.x. Compiling and executing the program gives the following output:

git # 8b6da880 @ 2023-09-27 60


© Daniel Page ⟨dan@phoo.org⟩

typedef union __view32_t {


float x;
ieee32_t y;
} view32_t ;

typedef union __view64_t {


double x;
ieee32_t y;
} view64_t ;

(a) Two unions which “overlap” the representations of an actual floating-point field x with an instance y of the structure(s)
defined in Figure 1.5.

int main( int argc , char* argv [] ) {


view32_t t;

t.x = 2.8;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x01;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.e = 0x81;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x00;
t.y.e = 0xFF;
t.y.m = 0 x400000 ;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x00;
t.y.e = 0xFF;
t.y.m = 0 x000000 ;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

return 0;
}

(b) A driver function main that uses an instance x of view32_t to demonstrate how manipulating fields in t.y impacts
on the value of t.x.

Figure 1.6: A short C program that performs direct manipulation of IEEE floating-point numbers.

+2.800000 0 80 333333
-2.800000 1 80 333333
-5.600000 1 81 333333
+nan 0 FF 400000
+inf 0 FF 000000

The question is, what on earth does this mean? We can answer this by looking at the each part of the program
(each concluding with a call to printf that produces the lines of output):

• t.x is set to 2.8(10) , and then t.x and each component of t.y is printed. The output shows that

t.y.s = 0(16) 7→ 0
t.y.e = 80(16) 7→ 10000000
t.y.m = 333333(16) 7→ 01100110011001100110011

Accounting for the bias and including the hidden bit, this of course represents the value

−10 · 1.011001100110011001100110011(2) · 21

or 2.8(10) as expected.
• t.y.s is set to 01(10) = 1(10) , and then t.x and each component of t.y is printed. We expect that setting
the sign bit to 1 rather than 0 will change t.x from being positive to negative; this is confirmed by the
output, which shows t.x is equal to −2.8(10) as expected.
• t.y.e is set to 81(10) = 129(10) , and then t.x and each component of t.y is printed. We expect that
setting the exponent to 129 rather than 128 will double t.x (the unbiased value of the exponent is now
129 − 127 = 2 st. the mantissa is scaled by 22 = 4 rather than 21 = 2); this is confirmed by the output,
which shows t.x is equal to −5.6(10) as expected.
• t.y.s, t.y.e and t.y.m are set to reserved values corresponding to NaN and +∞.

git # 8b6da880 @ 2023-09-27 61


© Daniel Page ⟨dan@phoo.org⟩

Figure 1.7: A teletype machine being used by UK-based Royal Air Force (RAF) operators during WW2 (public domain
image, source: http://en.wikipedia.org/wiki/File:WACsOperateTeletype.jpg).

1.8.5 Representing characters


So far we have examined techniques to represent numbers, but clearly we might want to work with other
types of data; a computer can process all manner of data such as emails, images, music and so on. The
approach used to represent characters (or letters) is a good example: basically we just need a way translate
from what we want into a numerical representation (which we already know how to deal with) and back
again. More specifically, we need two functions: Ord(x) which takes a character x and gives us back the
corresponding numerical representation, and Chr(y) which takes a numerical representation y and gives back
the corresponding character. But how should the functions work? Fortunately, people have thought about
this problem for us and provided standards we can use. One of the oldest and most simple is the American
Standard Code for Information Interchange (ASCII), pronounced “ass key”.
ASCII has a rich history, but was developed to permit communication between early teleprinter devices.
These were like a combination of a typewriter and a telephone, and were able to communicate text to each
other before innovations such as the fax machine. Later, but long before monitors and graphics cards existed,
similar devices allowed users to send input to early computers and receive output from them. Figure 1.8 shows
the 128-entry ASCII table which tells us how characters are represented as numbers. Of the entries, 95 are
printable characters we can instantly recognise (including SPC which is short for “space”). There are also 33
others which represent non-printable control characters: originally, these would have been used to control the
teleprinter rather than to have it print something. For example, the CR and LF characters (short for “carriage
return” and “line feed”) would combine to move the print head onto the next line; we still use these characters
to mark the end of lines in text files. Other control characters also play a role in modern computers. For
example, the BEL (short for “bell”) characters play a “ding” sound when printed to most UNIX terminals, we
have keyboards with keys that relate to DEL and ESC (short for “delete” and “escape”) and so on.
Since there are 128 entries in the table, ASCII characters can be and are represented by 8-bit bytes. However,
notice that 27 = 128 and 28 = 256 so in fact we could represent 256 characters: essentially one of the bits is not
used by the ASCII encoding. Specific computer systems sometimes use the unused bit to permit use of an
“extended” ASCII table with 256 entries; the extra characters in the this table can be used for special purposes.
For example, foreign language characters are often defined in this range (e.g., é or ø), and “block” characters
are included for use by artists who form text-based pictures. However, the original use of the unused bit was
as an error detection mechanism.
Given the table, we can see that for example that Chr(104) = ‘h’, i.e., if we see the number 104 then this
represents the character ‘h’. Conversely we have that Ord(‘h’) = 104. Although in a sense any consistent
translation between characters and numbers like this would do, ASCII has some useful properties. Look
specifically at the alphabetic characters:

• Imagine we want to test if one character x is alphabetically before some other y. The way the ASCII
translation is specified, we can simply compare their numeric representation. If we find Ord(x) < Ord(y)
then the character x is before the character y in the alphabet. For example ‘a’ is before ‘c’ because

Ord(‘a’) = 97 < 99 = Ord(‘c’).

• Imagine we want to convert a character x from lower-case into upper-case. The lower-case characters are

git # 8b6da880 @ 2023-09-27 62


© Daniel Page ⟨dan@phoo.org⟩

y Chr(y) y Chr(y) y Chr(y) y Chr(y)


Ord(x) x Ord(x) x Ord(x) x Ord(x) x
0 NUL 1 SOH 2 STX 3 ETX
4 EOT 5 ENQ 6 ACK 7 BEL
8 BS 9 HT 10 LF 11 VT
12 FF 13 CR 14 SO 15 SI
16 DLE 17 DC1 18 DC2 19 DC3
20 DC4 21 NAK 22 SYN 23 ETB
24 CAN 25 EM 26 SUB 27 ESC
28 FS 29 GS 30 RS 31 US
32 SPC 33 ! 34 " 35 #
36 $ 37 % 38 & 39 '
40 ( 41 ) 42 * 43 +
44 , 45 - 46 . 47 /
48 0 49 1 50 2 51 3
52 4 53 5 54 6 55 7
56 8 57 9 58 : 59 ;
60 < 61 = 62 > 63 ?
64 @ 65 A 66 B 67 C
68 D 69 E 70 F 71 G
72 H 73 I 74 J 75 K
76 L 77 M 78 N 79 O
80 P 81 Q 82 R 83 S
84 T 85 U 86 V 87 W
88 X 89 Y 90 Z 91 [
92 \ 93 ] 94 ^ 95 _
96 ` 97 a 98 b 99 c
100 d 101 e 102 f 103 g
104 h 105 i 106 j 107 k
108 l 109 m 110 n 111 o
112 p 113 q 114 r 115 s
116 t 117 u 118 v 119 w
120 x 121 y 122 z 123 {
124 | 125 } 126 ~ 127 DEL

Figure 1.8: A table describing the printable ASCII character set.

git # 8b6da880 @ 2023-09-27 63


© Daniel Page ⟨dan@phoo.org⟩

represented numerically as the contiguous range 97 . . . 122; the upper-case characters as the contiguous
range 65 . . . 90. So we can covert from lower-case into upper-case simply by subtracting 32. For example
Chr(Ord(‘a’) − 32) = ‘A’.

1.9 A conclusion: steps toward a digital logic


If we were to summarise all the pieces of theory accumulated above, the list would be roughly as follows:
1. We know that we can define Boolean algebra, which gives us a) a set of values (i.e., B = {0, 1}), b) a set
of (unary and binary) operators (i.e., NOT, AND, and OR), and c) a set of axioms. This means we can
construct Boolean expressions and manipulate them while preserving their meaning.
2. We can describe Boolean functions of the form
Bn → B
and hence also construct more general functions of the form
Bn → Bm
using m separate functions whose outputs are concatenated together. It therefore makes sense that NOT,
AND, and OR are well-defined for the Bn as well as B: we overload AND, for example, and write r = x∧ y
as a short-hand for ri = xi ∧ yi where 0 ≤ i < n.
3. We can represent various objects, such as numbers using sequences of bits. Since we can describe Boolean
functions of the form
Bn → Bm ,
we can construct such functions that perform arithmetic with numbers we are representing. Imagine,
for example, we have two integers x and y represented by the n-bit sequences x̂ and ŷ. To compute the
(integer) sum of x and y, all we need is a Boolean function
f : Bn × Bn → Bn
st.
r̂ = f (x̂, ŷ) 7→ x + y,
i.e., whose output r̂ represents r = x + y.
What we end up with is the ability to perform meaningful computation; fairly simple computation, granted,
but computation none the less. Fundamentally, this is what computers are: they are just devices that perform
computation. So if you follow through all the theory, we have developed as “blueprint” for how to build a real
computer. That is, we have a link (however tenuous) between a theoretical model of computation based on
Mathematics and the first steps toward a practical realisation of that model.

References
[1] G. Boole. An investigation of the laws of thought. Walton & Maberly, 1854 (see p. 29).
[2] D. Cohen. “On Holy Wars and a Plea for Peace”. In: IEEE Computer 14.10 (1981), pp. 48–54 (see p. 39).
[3] D. Goldberg. “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. In: ACM
Computing Surveys 23.1 (1991), pp. 5–48.
[4] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.
[5] E.L. Post. “Introduction to a General Theory of Elementary Propositions”. In: American Journal of Mathe-
matics 43.3 (1921), pp. 163–185 (see p. 31).
[6] C.E. Shannon. “A mathematical theory of communication”. In: Bell System Technical Journal 27.3 (1948),
pp. 379–423 (see p. 38).
[7] C.E. Shannon. “A Symbolic Analysis of Relay and Switching Circuits”. In: Transactions of the American
Institute of Electrical Engineers (AIEE) 57.12 (1938), pp. 713–723 (see p. 29).
[8] H.M. Sheffer. “A set of five independent postulates for Boolean algebras, with applications to logical
constants”. In: Transactions of the American Mathematical Society 14.4 (1913), pp. 481–488 (see p. 31).
[9] C. Wressnegger et al. “Twice the Bits, Twice the Trouble: Vulnerabilities Induced by Migrating to 64-Bit
Platforms”. In: Computer and Communications Security (CCS). 2016, pp. 541–552 (see p. 49).

git # 8b6da880 @ 2023-09-27 64


© Daniel Page ⟨dan@phoo.org⟩

CHAPTER

2
BASICS OF DIGITAL LOGIC

Scientists build to learn; Engineers learn to build.

– Brooks

In the previous Chapter, we made some statements regarding various features of digital logic without backing them
up with any evidence or explanation. Adopting a “from atoms upwards” approach in order to support material in
subsequent Chapters, this Chapter has two central aims that, in combination, describe digital logic. First it expands on
previous statements, such as the above, demonstrating how they can be satisfied using introductory Physics. Note that
a detailed, in-depth treatment of such material could fill another book, and, arguably, is not strictly required given the
remit of this book; the focus is therefore at a high level, offering an overview of only pertinent details at the right level of
abstraction. For example, to connect theory such as Boolean algebra to practice, it is important to understand how we
can design and manufacture implementations of Boolean operators that can physically provide the same functionality.
Then, second, it explains why doing so is useful and important: the bulk of the Chapter demonstrates, step-by-step, how
successively higher-level components, capable of successively more complex and so useful computation, can be designed
and implemented.

2.1 Switches and transistors


Even complex use of digital logic is, at the lowest level of detail, based on remarkably simple building blocks:
fundamentally, all we really need is a way to manufacture a switch.
In the subsequent Sections we focus exclusively on transistors, whose design and behaviour depend on
sub-atomic properties of the materials they are created from. There is a good reason for this focus: transistors
are (currently) the dominant way to realise digital logic, and can be found in most if not all devices we routinely
use. However, it is crucial to remember that transistors are not the only option. Put another way, provided
correct switch-like behaviour is possible, we might legitimately select another implementation technology. Since
new materials and manufacturing processes, applications and quality metrics will all appear due to advances
in technology, understanding the underlying principles is as important as any specific example (such as the
transistor), because it, like anything, could be superseded over time.

2.1.1 A brief tour of fundamental principles


2.1.1.1 Atoms and sub-atomic particles
Everything around us is formed from building blocks called atoms; in turn, each atom is formed from sub-
atomic particles including a) a group of nucleons, either protons or neutrons, in a central core or nucleus, and
b) a cloud of electrons orbiting said nucleus. The number of such sub-atomic particles yields information about
the associated atom. More specifically, the number of protons dictates the atomic number (or family: this is
what we mean by the term element) whereas the number of neutrons dictates the isotope (or instance, within
that family). Likewise, the electrons can orbit the nucleus in one of several levels (or shells) in what is termed
the electron configuration.

git # 8b6da880 @ 2023-09-27 65


© Daniel Page ⟨dan@phoo.org⟩

An aside: describing basic physics using the hydraulic analogy.

For some, the electrical properties of atoms and sub-atomic particles may be an unfamiliar topic. As a result, it
is common, and potentially quite useful, to align them with more familiar concepts via the so-called hydraulic
analogy.
Imagine a water tower (resp. battery), connected via a pipes (resp. wire) which eventually powers a water
wheel (resp. lamp):

• the water pressure (resp. electrical potential) is dictated by how much water (resp. electrical charge) is
held in the water tower,
• water flows along the pipes; a wider pipe (resp. a wire with lower resistivity) allows water to flow more
easily, and hence quicker, than a narrower pipe (resp. wire with higher resistivity),

• when the water reaches the water wheel, it causes it to turn as a result of two properties: the pressure
(resp. voltage), and the flow rate (resp. current) of the water.

+ + +
− − −

Figure 2.1: The sub-atomic structure of a lithium atom.

Example 2.1. By consulting a suitable periodic table, consider that

• silicon has atomic number fourteen; it has three shells containing two, eight and four electrons respec-
tively, whereas

• lithium has atomic number three; it has two shells containing two and one electrons respectively.

Definition 2.1. Each type of sub-atomic particle carries a specific electrical charge: electrons carry a negative charge,
protons carry a positive charge, and neutrons carry no (or a neutral) charge; the unit of measurement is the coulomb
(after Charles-Augustin de Coulomb). This suggests any atom with an imbalance of electrons and protons will have a
non-neutral charge overall; we term such cases an ion, st. negatively (resp. positively) charged ions will have more (resp.
fewer) electrons than protons.

2.1.1.2 Electrical charge, current, and voltage


The sub-atomic particles in an atom are bound together by forces that make sure they remain a cohesive whole.
More specifically, nucleons are bound together by a “strong” nuclear force, whereas electrons are bound to the
nucleus by a “weak” electromagnetic attraction to the protons; electrons in more inner shells are bound more
tightly. That said, the force binding electrons can be overcome in a process called ionisation: an atom can
be turned into an ion by displacing electrons, thereby producing imbalance between the number of electrons

git # 8b6da880 @ 2023-09-27 66


© Daniel Page ⟨dan@phoo.org⟩

insulator

Figure 2.2: A simple circuit conditionally connecting a capacitor (or battery) to a lamp depending on the state of a switch.

(a) A battery-and-lamp AND-style computation.

(b) A battery-and-lamp OR-style computation.

(c) A battery-and-lamp XOR-style computation.

Figure 2.3: Some simple examples of Boolean-style control of a lamp by combinations of switches.

and protons, using some energy. The exact amount of energy required relates to how tightly the electrons are
bound to the nucleus, and so by the type of atom. Electrons also exhibit a property whereby they repel each
other, but are attracted by holes (or “gaps”) in a given electron cloud; this implies they can move.
Definition 2.2. Electrical current refers to a (net) flow of electric charge; the unit of measurement is the ampere (or
amp, after André-Marie Ampére).
Definition 2.3. Electrical potential difference (or, more often, voltage) refers to the difference in electrical potential
energy between two points per unit electric charge; the unit of measurement is the volt (after Alessandro Volta). Informally,
you can think of voltage as the electrical work (or the effort) needed to move (or drive) the electrons and hence cause a
flow of current.
Definition 2.4. Electrical power refers to the rate of electrical work, i.e., the amount of charge driven, per unit of time,
by a given voltage; the unit of measurement is the watt (after James Watt). We say electrical power is dissipated (or
“consumed”) when electrical potential energy associated with some charge is converted into another form (e.g., heat or
light) by a component (or load).
An electron can move between atoms, doing so from a point of more negative charge toward a point of more
positive charge, i.e., from lower to higher voltage, or driven by a potential difference. This movement or flow
of valence electrons from one point to another suggests a (net) flow of charge and hence a current between the
two points.
This is formally termed electron current, in part to distinguish it from conventional current: when we
use the term current, we almost universally mean the latter. Although electron current describes the flow of
negative charge, means we actually focus on what would be the flow of positive charge if that were possible
(i.e., the opposite of electron current). Put another way, some electron moving from a more negative point X to
a more negative point Y will make Y more negative and X more positive; the electron current is from X to Y,
whereas the conventional current is from Y to X. This is why you might traditionally think of charge moving

git # 8b6da880 @ 2023-09-27 67


© Daniel Page ⟨dan@phoo.org⟩

from a terminal labelled +ve to that labelled −ve on a battery. Set in the context of what we now now to be
true, this is confusing1 . However, it also has a clear historical linage we are now stuck with: Benjamin Franklin
adopted this convention in the mid 1700s, also labelling charge using the positive or negative terminology,
during his pioneering study of electricity. Either way, from here on, you should read current as a synonym for
conventional current.

2.1.1.3 Conductors, insulators and semi-conductors


Definition 2.5. Two materials with different sub-atomic composition may exhibit different properties wrt. their conduc-
tivity and resistivity; these terms (which are antithetical) state whether a material allows or prevents the movement of
electrons.

Definition 2.6. A conductor, e.g., a metal, has high-conductivity (resp. low-resistivity) and allows electrons to move
easily, whereas an insulator, e.g., a vacuum, has low-conductivity (resp. high-resistivity) and does not allow electrons to
move easily.

When we describe a material as conductive or resistive, we typically mean it is on a spectrum between the
two: although unlikely to represent a perfect conductor or insulator, we mean it is closer to one end of the
spectrum or other (e.g., is more conductive or more resistive). Although such properties are inherent in the
material, it is possible to explicitly manipulate the sub-atomic composition of semi-conductor materials using
a process called doping. For example, imagine we need a material for some task; any non-ideal material will
have non-ideal properties wrt. the task. The idea is instead to take some non-ideal material as a starting point,
then dope (or combine) it with a dopant material: their combination should be similar to the starting point,
but more ideal wrt. the properties required.

Example 2.2. Consider pure silicon, which has an electron cloud of four electrons (only about half full); it is
more or less an insulator. Doping with a boron or aluminium donor creates extra holes, while doping with
phosphor or arsenic creates extra electrons.

An important use-case for doping is the production of semi-conductor materials. Although various materials
might exhibit the properties of a semi-conductor, doping allows careful control over the ratio of electrons vs.
hole and hence conductivity (resp.resistivity) of the result. Rather than rely on the perfect material being
naturally available, we therefore produce a material with exactly the properties required for a given task.

Definition 2.7. A doped semi-conductor material falls into one of two classes, namely

1. an N-type semi-conductor has an abundance of electrons produced by doping with a donor material, or

2. a P-type semi-conductor has an abundance of holes produced by doping with a acceptor material.

2.1.1.4 Using switches for computation


Example 2.3. Consider Figure 2.2, which includes a capacitor (top), a lamp (bottom-right), and a push-button
switch (bottom-left). The capacitor is constructed using two conductive plates separated by an insulator (called
a dielectric); it stores electrical energy (meaning it is similar to a battery2 ), and, since this one has already been
charged, one of the plates has many electrons and the other many holes. Electrons cannot move through the
insulator, so the only way for them to move from the negative onto the positively charged plate (i.e., from low
to high potential) is via the (conductive) wire. This is only possible when the switch is closed: when the switch
is closed, electrons are allowed to flow through the lamp which causes it to light up.

Following from this example, it may be worth convincing ourselves that a switch is useful for something
beyond controlling a lamp as above. To provide an answer, we just need to generalise the example: we a) use
multiple switches, and b) treat each switch as an input and the lamp as an output. Put another way, imagine
we have two switches labelled x and y; we are interested in how their combination controls the lamp labelled
r, so, in effect, what the function f described by r = f (x, y) is.

Example 2.4. Consider Figure 2.3:

1. Figure 2.3a controls the lamp via two switches, and models an AND operator: r = f (x, y) |= x ∧ y. Only
when both of the switches are closed will the lamp be on: if either is open, there is no connection with
the battery.
1 See,e.g., http://xkcd.com/567/.
2 Although the analogy is reasonable, keep in mind that a battery differs from a capacitor: behaviour of the former is due to chemical a
process, which converts chemical energy to electrical energy and thus delivers a flow of electrons (i.e., a current).

git # 8b6da880 @ 2023-09-27 68


© Daniel Page ⟨dan@phoo.org⟩

2. Figure 2.3b controls the lamp via two switches, and models an OR operator: r = f (x, y) |= x ∨ y. When
one or the other, or both the switches are closed will the lamp be on: there is connection with the battery
unless both of the switches are open.

3. Figure 2.3c controls the lamp via two switches, and models an XOR operator: r = f (x, y) |= x ⊕ y. This
time, the switches sort of operate in the opposite way to each other; to make a connection between the
lamp and battery along the top (resp. bottom) wire, the left-hand switch needs to be closed (resp. open)
while the right-hand switch needs to be open (resp. closed): there is connection with the battery if one
or the other, but not both switches are closed. You often find this sort of arrangement in homes where a
single light on some stairs is controlled by switches located at the top and bottom.

On one hand, the examples above should be encouraging: they show we can mirror the behaviour of Boolean
operators, using a careful organisation of multiple switches. On the other hand, however, push-button switches
are mechanically operated: we want an electrically operated switch, which is actuated (i.e., pressed or released)
via an electrical property (e.g., a flow of electrons) rather than by hand. Crucially, this will allow the output of
one such operator to be used as the input to another, and therefore the implementation of larger expressions.

2.1.2 Implementing transistors


2.1.2.1 Switches in a pre-transistor era: vacuum tubes
From a historical perspective, numerous different electrically operated switch designs have been conceived
and used; it is both interesting and useful to examine them in some detail, because their properties act as
motivation for modern alternatives. In particular, the vacuum tube (or thermionic valve), is a compelling
example because it was used extensively by early generations of computing equipment; it still often plays
a role in high-end audio equipment. The idea is to use a glass or ceramic envelope to maintain a vacuum
surrounding an electron-producing filament (or cathode) and a metal plate (or anode). When the filament is
heated, electrons are produced into the vacuum which are attracted by the plate resulting in a current between
the two. In simple terms, this implements a switch: when the filament is heated the switch is on, when it is
cooled the switch is off.
The design outlined above, plus an example in Figure 2.4, hint at some potential disadvantages: they offer
the functionality we require, but, in relative terms, are physically large, operate slowly, and are unreliable. The
latter properties both stem from the need to (repeatedly) heat and cool the filament. This takes some time,
and stresses the filament up to a point where it fails (much like a light bulb failing when turned on or off). As
an aside, the terms bug and debug (allegedly) stem from failure of this sort. In 1945, programmers using the
Harvard Mark II computer, developed by Howard Aiken, discovered a moth inside one of the components; the
unfortunate insect had shorted the component, which had resulted in the malfunction. Although the terms had
been used previously in various other contexts, Grace Hopper and her team are now often cited as introducing
them to Computer Science. Certainly this real-life bug, shown in Figure 2.5, is so famous it is still on display
in the Smithsonian Museum of American History!

2.1.2.2 Design of MOSFET transistors


A transistor can in fact be used for a various tasks; for example, they can act as amplifiers. However, when
used as switches they

1. allow charge to flow between two terminals (i.e., act as a conductor) when we turn the the switch on, and

2. prevent change flowing between two terminals (i.e., act as a resistor) when we turn the the switch off.

The word transistor is a portmanteau of “transfer resistor”, offering a hint as to the underlying principle: a
transistor is a resistor, but one we can control by altering how resistive it is. Put more simply, we can control it
st. it is conductive when we want to turn the switch on and resistive when we want to turn it off.
The question is then how such behaviour is realised. Improvement and different trade-offs have given us
numerous transistor designs, but we focus on just one: the Field Effect Transistor (FET), initially designed and
patented by Julius Lilienfeld in 1925. However, at that point in time the general understanding of sub-atomic
behaviour was less than now, meaning use of his design was limited. This changed in 1952, when a team
of Engineers at Bell Labs, led by William Shockley, invented what is now termed a junction gate FET (or
JFET, due to some legal wranglings wrt. to the Lilienfeld patent). In turn, this gave rise to the Metal Oxide
Semi-Conductor Field-Effect Transistor (MOSFET), invented in 1960 by Dawon Kahng and Martin Atalla,
also at Bell Labs. These designs delivery the properties we require to avoid their limiting complexity of modern
digital logic components; in particular, they a) have the switch-like functionality described as useful thus far,
while also b) are simultenously physically small, operate quickly, are reliable, and easy to manufacture.

git # 8b6da880 @ 2023-09-27 69


© Daniel Page ⟨dan@phoo.org⟩

Figure 2.4: A 6P1P (i.e., a 100W to 200W, photo-sensitive type) vacuum tube (public domain image, source: http:
//en.wikipedia.org/wiki/File:6P1P.jpg).

Figure 2.5: A moth found by operations of the Harvard Mark 2; the “bug” was trapped within the computer and caused
it to malfunction (public domain image, source: http://en.wikipedia.org/wiki/File:H96566k.jpg).

Figure 2.6: A replica of the first point-contact transistor, a precursor of designs such as the MOSFET, constructed at Bell
Labs (public domain image, source: http://en.wikipedia.org/wiki/File:Replica-of-first-transistor.
jpg).

git # 8b6da880 @ 2023-09-27 70


© Daniel Page ⟨dan@phoo.org⟩

source gate drain

body

Figure 2.7: A high-level diagram of a MOSFET transistor, showing the terminal and body materials.

N-type N-type P-type P-type


P-type N-type

Figure 2.8: A pair of N-MOSFET and P-MOSFET transistors, arranged to form a CMOS cell.

Figure 2.7 offers a high-level description of a MOSFET, in which atomic-scale layers of semi-conductor
material are combined with metal or poly-silicon layers for the terminals; although a lower-level, more detailed
description would require deeper understanding of related Physics (see, e.g., [8]), we already have enough
background to explain the basic concept at this high level. In short, the switch-like behaviour is realised by
using the gate terminal to control a conductive channel between the source and drain terminals . Unlike a
a JFET, where an explicit semi-conductor layer is constructed for use as the channel, in a MOSFET transistor
the channel is induced. Specifically, applying a small potential difference to the gate terminal repels holes in
the P-type body; doing so forms a depletion layer in which the number of holes is depleted. As the potential
difference applied grows, an inversion layer is formed at the surface: the abundance of electrons relative to
the number of (repelled) holes inverts the properties of the P-type body, turning it into N-type and so forming
a conductive channel between N-type source and drain terminals.
Realising this behaviour in practice depends on the careful selection of semi-conductor materials; Figure 2.9
illustrates the symbols used for two MOSFET variants. These symbols abstract away the implementation detail
(retaining only the terminals, with d, s and g denote the drain, source and gate), which is as follows:
Definition 2.8. An N-MOSFET (or N-type MOSFET, or N-channel MOSFET, or NPN MOSFET) is constructed
from N-type semi-conductor terminals and a P-type body:

• applying a potential difference to the gate widens the conductive channel, meaning source and drain are connected
(i.e., act like a conductor); the transistor is activated.
• removing the potential difference from the gate narrows the conductive channel, meaning source and drain are
disconnected (i.e., act like an insulator); the transistor is deactivated.

Definition 2.9. A P-MOSFET (or P-type MOSFET, or P-channel MOSFET, or PNP MOSFET) is constructed
from P-type semi-conductor terminals and an N-type body:

• applying a potential difference to the gate narrows the conductive channel, meaning source and drain are disconnected
(i.e., act like an insulator); the transistor is deactivated.
• removing the potential difference from the gate widens the conductive channel, meaning source and drain are
connected (i.e., act like a conductor); the transistor is activated.

Put another way, for an N-MOSFET, applying a large potential difference to the gate terminal produces a wider
conductive channel, and so allows electrons (i.e., current) to flow between source and drain. Conversely, a
small potential difference (or at least smaller than some threshold) means a narrower conductive channel,
which prevents said flow. The gate terminal therefore offers functionality much like a switch: controlling the
potential difference applied controls conductivity between source and drain, and hence regulates the current.

2.1.2.3 Physical properties of MOSFET transistors


Various physical properties stem from the design of MOSFET transistors; since they are related, we define these
step-by-step in what follows.
Definition 2.10. One or more power rails supply voltage levels to each transistor, connecting to the gate or source
terminal.

git # 8b6da880 @ 2023-09-27 71


© Daniel Page ⟨dan@phoo.org⟩

d s

g g

s d
(a) An N-MOSFET transistor. (b) A P-MOSFET transistor.

Figure 2.9: Symbolic descriptions of N-MOSFET and P-MOSFET transistors.

Definition 2.11. The threshold voltage of a given MOSFET (i.e., either N- or P-MOSFET) is the minimum voltage
level (i.e., potential difference between gate and source) required to activate the transistor and thus connect the source and
drain; below the threshold voltage, the source and drain remain disconnected.

Definition 2.12. The concept of sub-threshold leakage (or just leakage) relates to a non-ideal properties of the
conductive channel: below the threshold voltage the source and drain are not perfectly disconnected, st. a small flow of
electrons (i.e., the leakage current) is possible.

2.1.2.4 Organisation of MOSFET transistors into CMOS-based logic gates

Rather than use MOSFET transistors in isolation, it is common to organise them into larger combinations; by
offering a higher level of abstraction, such combinations are usually easier to reason about from both functional
and behavioural perspectives.
Ultimately the aim is to (re)produce Section 2.1.1 where we outlined Boolean-like functionality using
mechanical switches, but now by using transistors. A popular3 first step relates to organisation of two transis-
tors (pairing an N-MOSFET with a P-MOSFET) to form one Complementary Metal-Oxide Semi-Conductor
(CMOS) component we term a cell. This approach, as illustrated at a high-level by Figure 2.8, was first con-
ceived in 1963 by Frank Wanlass at Fairchild Semi-conductor. The idea is to organise the transistors so they
operate in a complementary manner:

Definition 2.13. CMOS-based design strategies typically use two distinct parts to form a given component: there will be

1. a pull-up network of P-MOSFET transistors between the Vdd power rail and the output, and

2. a pull-down network of N-MOSFET transistors between the Vss power rail and the output.

A consequence of this logic style is that only one of the pull-up or pull-down networks can be active (i.e., connected) at a
time.

Definition 2.14. The power dissipation of a CMOS cell, and hence a CMOS-based design more generally, can be described
in terms of

1. a static component, where the transistors remain in a given state (to are “idle” in some sense), and

2. a dynamic component, where the transistors switch state, i.e., the gate is changes from being driven by Vdd to Vss ,
or vice versa.

CMOS exhibits a marginal amount of sub-threshold leakage, so the majority of power dissipation occurs due to switching
activity.

This has some obvious advantages, which make CMOS an attractive choice vs. alternatives. In particular,
when organising lots of transistors in close proximity, CMOS will have lower overall power consumption and
heat dissipation, and, in turn, better reliability.
The next step is to package CMOS cells into small, useful building-blocks that act as the next-level component
above transistors themselves. As an example, consider building a component which inverts the input st. if the
input x is Vdd the output is Vss and vice versa.

Example 2.5. Consider Figure 2.10a, where


3 Itis important to stress that CMOS is not the only possible logic style: although it represents a first step here, it may not be necessary
if an alternative is used instead.

git # 8b6da880 @ 2023-09-27 72


© Daniel Page ⟨dan@phoo.org⟩

Vdd

x r

Vss
(a) A CMOS-based NOT gate.

Vdd

Vss
(b) A CMOS-based, 2-input NAND gate.

Vdd
x

Vss
(c) A CMOS-based, 2-input NOR gate.

Figure 2.10: MOSFET-based implementations of NOT, NAND and NOR logic gates.

git # 8b6da880 @ 2023-09-27 73


© Daniel Page ⟨dan@phoo.org⟩

x y NOT NAND NOR


Vss Vss Vdd Vdd Vdd
Vss Vdd Vdd Vdd Vss
Vdd Vss Vss Vdd Vss
Vdd Vdd Vss Vss Vss

Figure 2.11: A voltage-oriented truth table for NOT, NAND and NOR logic gates.

An aside: naming conventions for voltage levels.

In a CMOS-based design strategy, we normally refer to the power rails as Vdd and Vss . The ‘d’ stands for drain:
Vdd could be read as “voltage level at the drain” st. it also makes sense to have Vss read as “voltage level at the
source”. This naming convention seems to stems from earlier bipolar-based transistors, where Vcc and Vee are
sort of the same thing but for collector and emitter terminals.
This all starts to become a little involved however, and beyond the scope of what we want to discuss.
All we really care about is that Vdd and Vss make our transistors work correctly, and we can tell them apart.
Although it might be too informal for some tastes, it is therefore enough to keep the following in mind:

• Vdd is the high or positive voltage level, e.g., 3.3V or 5V, and
• Vss is the low or negative voltage level, e.g., 0V ≃ GND.

Note that GND refers to ground: this can be thought of as a) a reference point other voltages are measured
relative to (note that voltage is a synonym for potential difference, meaning we need such a reference), or b) a
(or the) return path, i.e., the point to which electrons will move due to their preference to move from high to
low potential difference.

1. connecting x to Vss means the top P-MOSFET will be connected, the bottom N-MOSFET will be discon-
nected, so r will be connected to Vdd . while
2. connecting x to Vdd means the top P-MOSFET will be disconnected, the bottom N-MOSFET will be
connected, so r will be connected to Vss .

Note that even with this simple organisation, we can identify the pull-up and pull-down networks; although
there is just one transistor in each, it is true that the P-MOSFET connects Vdd to the output iff. x = Vss and the
N-MOSFET connects Vss to the output iff. x = Vdd . We can of course consider more complex organisations
under the same design strategy, by increasing the number of transistors.
Example 2.6. Consider Figure 2.10b, where

1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETS
will be disconnected, so r will be connected to Vdd ,
2. connecting x to Vdd and y to Vss means the right-most P-MOSFET will be connected, the upper-most
N-MOSFET will be disconnected, so r will be connected to Vdd ,
3. connecting x to Vss and y to Vdd means the left-most P-MOSFET will be connected, the lower-most
N-MOSFET will be disconnected, so r will be connected to Vdd , while
4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETS
will be connected, so r will be connected to Vss .

Example 2.7. Consider Figure 2.10c, where

1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETS
will be disconnected, so r will be connected to Vdd ,
2. connecting x to Vdd and y to Vss means the upper-most P-MOSFET will be disconnected, the left-most
N-MOSFET will be connected, so r will be connected to Vss ,
3. connecting x to Vss and y to Vdd means the lower-most P-MOSFET will be disconnected, the right-most
N-MOSFET will be connected, so r will be connected to Vss , while

git # 8b6da880 @ 2023-09-27 74


© Daniel Page ⟨dan@phoo.org⟩

4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETS
will be connected, so r will be connected to Vss .

A second aspect of the design strategy is made evident by increasing the number of transistors. Specifically, the
two examples include P-MOSFETs organised in parallel (st. either can be activated to connect Vdd to the output)
and N-MOSFETs organised in series (st. both must be activated to connect Vss to the output), or vice versa.
Hopefully it is obvious that the three examples model the NOT, NAND (or NOT AND) and NOR (or NOT
OR) Boolean operators respectively; this fact is renforced by Table 2.11. Either way, the fact is that from a
starting point involving atomic-level concepts we have developed components that we can reason about wrt.
both theory and practice. That is, we have used electrical switches to implement Boolean algebra; instead of
reasoning about computation involving the latter in theory, we can now actually build components that do that
computation in practice.

2.1.2.5 Some common terminology in CMOS-based logic design


Definition 2.15. The process used to manufacture organisations of transistors, plus their associated properties and
constraints, is a logic style (or logic family): examples include CMOS and TTL.

Definition 2.16. A given logic style will suggest an associated standard cell, i.e., an organisation of transistors that
realises a higher-level building block, namely either a a) computational component (e.g., a Boolean AND operator), or b)
storage component (e.g., a latch); where the former is more naturally described as a logic gate. Each such cell will have
associated functional specification (i.e., a truth table or excitation table), and behavioural specification (e.g., detailing
propagation delay).

Definition 2.17. A standard cell library is a collection of standard cells, used as building-blocks in a design.

Definition 2.18. The standard cell methodology permits design abstraction, in the sense a design can be specified at
a high- vs. low-level (i.e., in terms of standard cells, vs. transistors).

Definition 2.19. A Gate Equivalent (GE) is a unit of measurement used to assess the (area) complexity of a digital
logic design independently from the manufacturing process technology. It is common (e.g., for CMOS) to consider a
2-input NAND gate as 1 GE: you can think about it as a normalisation factor for manufacturing processes, st. designs
specified using different processes can be compared fairly.

2.2 Combinatorial logic


2.2.1 A suite of simplified logic gates
It should already be clear that designing functionality, even as simple as single Boolean operators, is hard at
the transistor-level: transistors are too low a level of abstraction, st. the amount of detail is prohibative at a
larger scale or higher level. To address this problem, we usually adopt a more abstract view of logic gates by
taking two steps: we 1) forget about the voltage levels Vss and Vdd , abstractly labelling them 0 and 1, then 2)
forget about the power rails, and just draw a symbol to represent each gate (with suitable inputs and outputs).
Figure 2.12 highlights several different notations for the resulting logic gates, including each of the NOT,
NAND and NOR gates from above and also AND, OR and XOR from Chapter 1; corresponding truth tables
are shown in Figure 2.13. Keep the following in mind:

• We are assuming the voltage levels used to represent values on each wire are perfect in some sense. In
short, we assume the associated signals have a “square” waveform and so are digital signals (i.e., only
ever have a value of 0 or 1). In reality this can be dubious, because physical phenomena that underpin
those voltage levels mean the edges of said signals might be “rounded” and so imperfect (e.g., have a
value of 0.5 say); we basically ignore this issue, at least until later.

• An inversion bubble on the output of a gates is used to denote that fact that the output is inverted. As
such, a buffer (or BUF) is simply a gate that connects the input directly to the output; a NOT gate is then
a buffer that inverts the input to form the output.

• For completeness we have included the NXOR (sometimes written XNOR) gate, which has the obvious
meaning but is seldom used in practise; per Chapter 1, we use ∧ , ∨ and ⊕ as a short-hand to denote
NAND, NOR and NXOR respectively. Clearly, for example, we have

x ∧ y ≡ ¬(x ∧ y).

git # 8b6da880 @ 2023-09-27 75


© Daniel Page ⟨dan@phoo.org⟩

r is x ≡ r=x ≡ x r

r is NOT x ≡ r = ¬x ≡ x r

r is x NAND y ≡ r=x∧y ≡ x
y r

r is x NOR y ≡ r=x∨y ≡ x
y r

r is x AND y ≡ r=x∧y ≡ x
y r

r is x OR y ≡ r=x∨y ≡ x
y r

r is x XOR y ≡ r=x⊕y ≡ xy r

Figure 2.12: Representation of standard logic gates in English, Boolean algebra, C and symbolic notations.

BUF NOT
x r x r
0 0 0 1
1 1 1 0
(a) A 1-input, 1-output buffer. (b) A 1-input, 1-output NOT gate.

AND NAND
x y r x y r
0 0 0 0 0 1
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 0
(c) A 2-input, 1-output AND gate. (d) A 2-input, 1-output NAND gate.

OR NOR
x y r x y r
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 1 1 1 0
(e) A 2-input, 1-output OR gate. (f) A 2-input, 1-output NOR gate.

XOR NXOR
x y r x y r
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
(g) A 2-input, 1-output XOR gate. (h) A 2-input, 1-output NXOR gate.

Figure 2.13: Truth tables for standard logic gates.

git # 8b6da880 @ 2023-09-27 76


© Daniel Page ⟨dan@phoo.org⟩

x r ≡ x r ≡ x r

x
x
y r ≡ x
y r ≡ r
y

x
x
y r ≡ r ≡ x
y r
y

Figure 2.14: Identities for standard logic gates in terms of NAND and NOR.

• Given 2-input gates such as AND, OR, and XOR, we use a short-hand and draw the gates with more
inputs; this is equivalent to making a tree of 2-input gates since, for example, we have

(w ∧ x ∧ y ∧ z) ≡ (w ∧ x) ∧ (y ∧ z).

Now, by treating the gates as operators per Boolean algebra we can combine them together and design
components that fall into a category often termed combinatorial logic; the gate behaviours combine to compute
a result continuously, with their output updated whenever an input changes.

2.2.2 Harnessing the universality of NAND and NOR


Following from the above, (at least) two questions should be immediately apparent:

1. Chapter 1 suggests NOT, AND, and OR are the operators to focus on, so why design NAND and NOR
from transistors? and

2. given the design of NAND and NOR from transistors was an involved, detailed process, is there a way
to avoid repeating this for AND and OR?

The answer to both questions stems from the functional completeness off NAND and NOR: they are universal,
in the sense we can implement every other logic gate using one or other of them alone (as already discussed in
Chapter 1). The identities

¬x ≡ x∧x
x∧y ≡ (x ∧ y) ∧ (x ∧ y)
x∨y ≡ ¬x ∧ ¬y ≡ (x ∧ x) ∧ (y ∧ y)

and

¬x ≡ x∨x
x∧y ≡ ¬x ∨ ¬y ≡ (x ∨ x) ∨ (y ∨ y)
x∨y ≡ (x ∨ y) ∨ (x ∨ y)

replicated diagrammatically in Figure 2.14, demonstrate why; one can easily verify them via enumeration, e.g.,
in

x y x∧y x∧x y∧y (x ∧ y) ∧ (x ∧ y) (x ∧ x) ∧ (y ∧ y)


0 0 1 1 1 0 0
0 1 1 1 0 0 1
1 0 1 0 1 0 1
1 1 0 0 0 1 1

and

x y x∨y x∨x y∨y (x ∨ x) ∨ (y ∨ y) (x ∨ y) ∨ (x ∨ y)


0 0 1 1 1 0 0
0 1 0 1 0 0 1
1 0 0 0 1 0 1
1 1 0 0 0 1 1

git # 8b6da880 @ 2023-09-27 77


© Daniel Page ⟨dan@phoo.org⟩

This is enormously important: it explains why designing NAND and NOR from transistors made sense in the
first place, but, more over, it allows us to implement any Boolean expression, and so any Boolean function, from
NAND and NOR gates alone. The manufacture of such implementations, which we cover in Section 2.5, will
be vastly easier as a result. At the transistor-level, we only need deal with some (large) number of one building
block (i.e., NAND or NOR) vs. the added complexity and effort associated with many such building blocks (i.e.,
AND, OR, XOR, and so on): everything at a low level is expressed in terms of NAND or NOR, so implemented
by exactly the organisations of N- and P-MOSFETs we have already seen.

2.2.3 Designing circuits for arbitrary combinatorial functions


Now we have logic gates that act as physical implementations of each Boolean operator, the next challenge is
how to produce Boolean expressions for some (arbitrary) Boolean function. Put another way, the challenge is to
take a specification of a function f , e.g., a truth table, and derive a Boolean expression e which computes it.
Chapter 1 has provides a complete enough background that we can attempt to address this challenge in a
mechanical, algorithmic manner; doing so contrasts with deriving or manipulating expressions by hand using
the Boolean axioms. Several viable approaches and thus algorithms exist, which we investigate in the following
Sections: each has advantages and disadvantages, and can be described as taking the description of f as input,
and producing e in SoP form as output.

2.2.3.1 Some design patterns


Before dealing with arbitrary Boolean functions, it is useful to start with some specific examples that can be
solved by using a design pattern (or template): although they may or may not apply to a particular problem,
whenever they do apply they represent a pre-designed solution we can use as is without further effort.
We use a specific example to introduce each design pattern below: in each case, a 2-input, 1-bit AND gate is
used to solve some sort of problem. It is crucial to remember that the example illustrates a more general pattern:
we will see cases where this is true later.

1. If, within some larger design, we use an AND gate to compute

r=x∧y

and then, somewhere else, compute


r′ = x ∧ y
we can replace the two AND gates with one: it is obvious that r = r′ = x ∧ y, so the output of a single
AND gate can be shared between the two usage points. This simplification is possible, but harder to
capture within a single Boolean expression: using

r = (w ∧ x ∧ y) ∨ (x ∧ y ∧ z)

as an example, it is usual to first define some intermediate, say

t=x∧y

then rewrite the expression as


r = (w ∧ t) ∨ (t ∧ z).
Doing so acts as a direct analogue to sub-expression elimination, an optimisation commonly applied by
C compilers to expressions in C programs.
2. A 2-input, m-bit AND gate can be realised using isolated replication of 2-input, 1-bit AND gates. That
is, If x and y are m-bit values then
r=x∧y
is computed via
ri = xi ∧ yi
for 0 ≤ i < m, i.e., m separate 2-input, 1-bit gates, each i-th instance of which uses xi and yi to produce the
output ri .
3. An n-input, 1-bit AND gate can be realised using cascaded replication of 2-input, 1-bit AND gates. That
is,
n−1
^
r= xi
i=0

git # 8b6da880 @ 2023-09-27 78


© Daniel Page ⟨dan@phoo.org⟩

An aside: why NAND not AND?!

Arguments based on universality of NAND and NOR motivate a preference for these building blocks by
a preference for minimalism: using a single building block to implement every other component will offer
manufacturing advantages, for example, vs. a more diverse set.
That said, it is reasonable to question what other motivations exist. Put another way, what would happen
if we wanted an AND design in similar, transistor-based terms? A common starting point for such questions is
the following

Vdd

Vss

in which we only have a pull-up network. The reasoning is often that if x = Vdd and y = Vdd then r = Vdd as
required, whereas if x = Vss or y = Vss then r is disconnected; in defining what 0 and 1 mean, if we just define
disconnected as 0 then maybe this design is valid? A counterargument (among many) is to think about what
happens if we use r elsewhere as an input, e.g.,

Vdd

Vss

Now, if r is disconnected, the top-most transistor in the second layer simply does not work: the gate terminal
is disconnected from either Vdd or Vss so the transistor cannot function.
It turns out there is a solution to this sort of issue, which is to opt for a pull-down resistor rather than
network of transistors, i.e., something like

Vdd

Vss

which you could think of as providing a “default” value to any disconnected wire. The problem is, now we
have to reason about and manufacture another component (i.e., the resistor): both of these are out of scope, so,
at least here, this approach is not viable.

git # 8b6da880 @ 2023-09-27 79


© Daniel Page ⟨dan@phoo.org⟩

for n = 4 is the same as


r = (x0 ∧ x1 ) ∧ (x2 ∧ x3 ).
This expression forms a tree of AND gates, which, in this case is balanced; it is more attractive than
equivalents such as
r = x0 ∧ (x1 ∧ (x2 ∧ x3 ))
because although they use the same number of gates, the critical path of the former is shorter (i.e.,
representing 2 rather than 3 such gates).

2.2.3.2 Mechanical derivation method #1


Imagine we are tasked with deriving a Boolean expression that implements some Boolean function f . The
function has n inputs I0 , I1 , . . . , In−1 , and one output O; we are given a truth table that describes it. The idea
is to follow a (fairly) simple algorithm:

1. Find a set T such that i ∈ T iff. O = 1 in the i-th row of the truth table.
2. For each i ∈ T, form a term ti by AND’ing together all the variables while following two rules:
(a) if I j = 1 in the i-th row, then we use
Ij
as is, but
(b) if I j = 0 in the i-th row, then we use
¬I j .
3. An expression implementing the function is then formed by OR’ing together all the terms, i.e.,
_
e= ti ,
i∈T

which is in SoP form.

Intuitively, each i ∈ T will produce a minterm ti in the SoP form: each term ti ANDs inputs together (to form
their product), whereas e ANDs together the terms (to form their sum). Each minterm fully specifies an input
assignment (i.e., a value for each input) for a row of the truth table where the output is 1; in a sense, we are
“covering” (or dealing with) each such row by doing so.
Example 2.8. Consider the task of implementing an expression for XOR, i.e., an e in SoP form which implements
f (x, y) = x ⊕ y, a truth table for which is reproduced (cf. Figure 2.13) here for clarity:

XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0

1. Looking at the truth table, it is clear there are

• n = 2 inputs that we denote I0 = x and I1 = y, and


• one output that we denote O = r.

Likewise, it is clear that T = {1, 2} because O = 1 in rows 1 and 2, whereas O = 0 in rows 0 and 3.
2. Each term ti for i ∈ T = {1, 2} is formed as follows:

• For i = 1, we find
– I0 = x = 0 and so we use ¬x,
– I1 = y = 1 and so we use y
and hence form the term t1 = ¬x ∧ y.
• For i = 2, we find
– I0 = x = 1 and so we use x,

git # 8b6da880 @ 2023-09-27 80


© Daniel Page ⟨dan@phoo.org⟩

– I1 = y = 0 and so we use ¬y
and hence form the term t2 = x ∧ ¬y.

3. The expression implementing the function is therefore

e =
W
ti
i∈T

=
W
ti
i∈{1,2}

= (¬x ∧ y) ∨ (x ∧ ¬y)

which is in SoP form.

For example, notice that the row for i = 1 produces the minterm t1 = ¬x ∧ y meaning “the row where x = 0 and
y = 1”, whereas the row for i = 2 produces the minterm t2 = x ∧ ¬y meaning “the row where x = 1 and y = 0”;
combining the minterms together, we get an SoP expression that specifies rows where the output should be 1
as “either x = 0 and y = 1, or x = 1 and y = 0”.

2.2.3.3 Mechanical derivation method #2: Karnaugh maps

Example 2.9. Consider the truth table in Figure 2.15a which describes a 4-input Boolean function, and the SoP
expression
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )
resulting from application of the method above.
Although it only becomes apparent when you do so, deriving such an expression is tedious and error prone;
although the algorithm is simple, it could be described as machine-friendly (in the sense it is best executed
by a computer). The complexity of the expression, in the sense it contains many operators, is more obvious.
Although we could simplify it by applying Boolean axioms, for example, this is again quite tedious. It is
obvious to ask, therefore, whether (and if so, how) we can improve the original method wrt. these problems?
The Karnaugh map invented in 1953 by Maurice Karnaugh while working at Bell Labs [2], is an alternative
method which offers (at least) two advantages over the original: it a) offers a more visual and so arguably
human-friendly way to derive the resulting expression, and b) automatically applies various optimisations
while doing so, st. we no longer need to apply (as much) post-derivation optimisation by hand. Although an
example more usefully illustrates how to use a Karnaugh map, and so the advantages above, the method itself
is best summarised using another algorithm. Again imagine we are tasked with deriving a Boolean expression
that implements some Boolean function f with n inputs and one output:

1. Draw a rectangular (p × q)-element grid, st.

(a) p ≡ q ≡ 0 (mod 2), and


(b) p · q = 2n

and each row and column represents one input combination; order rows and columns according to a
Gray code.
2. Fill the grid elements with the output corresponding to inputs for that row and column.
3. Cover rectangular groups of adjacent 1 elements which are of total size 2m for some m; groups can “wrap
around” edges of the grid and overlap.
4. Translate each group into one term of an SoP form Boolean expression e where

(a) bigger groups, and

git # 8b6da880 @ 2023-09-27 81


© Daniel Page ⟨dan@phoo.org⟩

An aside: binary versus Gray code.

Consider a sequence of unsigned, n-bit integers; selecting n = 4, for example, and starting from zero, such a
sequence would be
⟨0, 0, 0, 0⟩ 7→ 0(10)
⟨1, 0, 0, 0⟩ 7→ 1(10)
⟨0, 1, 0, 0⟩ 7→ 2(10)
⟨1, 1, 0, 0⟩ 7→ 3(10)
⟨0, 0, 1, 0⟩ 7→ 4(10)
⟨1, 0, 1, 0⟩ 7→ 5(10)
⟨0, 1, 1, 0⟩ 7→ 6(10)
⟨1, 1, 1, 0⟩ 7→ 7(10)
..
.
where the RHS describes a (decimal) value, and the LHS describes the (binary) representation of that value.
Notice that moving from ⟨1, 1, 0, 0⟩ to the next entry ⟨0, 0, 1, 0⟩ means changing 3 bits: the 0-th and 1-st bits
toggle from 1 to 0, and the 2-nd bit from 0 to 1. Now consider an alternative ordering of the same integers:

⟨0, 0, 0, 0⟩ 7→ 0(10)
⟨1, 0, 0, 0⟩ 7→ 1(10)
⟨1, 1, 0, 0⟩ 7→ 3(10)
⟨0, 1, 0, 0⟩ 7→ 2(10)
⟨0, 1, 1, 0⟩ 7→ 6(10)
⟨0, 0, 1, 0⟩ 7→ 4(10)
⟨1, 0, 1, 0⟩ 7→ 5(10)
⟨1, 1, 1, 0⟩ 7→ 7(10)
..
.

Now, moving from any entry to the next or the previous one will always toggle one bit: such an ordering is
termed a Gray code after Frank Gray who made reference to it in a 1953 patent application (such orderings
had been known and used for quite some time before that). Crucially,

1. we can produce an ordering that satisfies the same property for any n, and
2. the alternative ordering is just a permutation of the original: we keep the same values (and the same
representations), but just rearrange them within the sequence.

git # 8b6da880 @ 2023-09-27 82


© Daniel Page ⟨dan@phoo.org⟩

w x y z r
0 0 0 0 1
0 0 0 1 1
0 0 1 0 1
0 0 1 1 0 x y z r
0 1 0 0 1 0 0 0 0
0 1 0 1 1 0 0 1 0
0 1 1 0 0 0 1 0 1
0 1 1 1 0 0 1 1 1
1 0 0 0 1 1 0 0 0
1 0 0 1 0 1 0 1 ?
1 0 1 0 1 1 1 0 1
1 0 1 1 1 1 1 1 ?
1 1 0 0 0
(b) A 3-input example.
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
(a) A 4-input example.

Figure 2.15: 4- and 3-input example Boolean functions respectively.

(b) less groups


mean a simpler expression.
Based on this description, the underlying reason it delivers the claimed (or in fact any) advantages is far from
intuitive. However, there is a way to explain it: the central observation is that if we find two minterms st.
their input assignment differs in exactly one input, we can simplify the resulting expression by eliminating that
input. If you (re-)consider Figure 2.15a, the minterms associated with
(w, x, y, z) = (1, 0, 1, 0)
and
(w, x, y, z) = (1, 0, 1, 1),
i.e., rows of the truth table for i = 10 and i = 11, satisfy exactly this condition: they differ wrt. z, which is 0
in the first case and 1 in the second case. In the original method, we would implement them using the two
minterms
w ∧ ¬x ∧ y ∧ ¬z
and
w ∧ ¬x ∧ y ∧ z.
However, this is overly pessimistic and so sub-optimal: the value of z is irrelevant provided w = 1, x = 0, and
y = 1, because the output is 1 either way. As such, we eliminate z and use the LHS of
w ∧ ¬x ∧ y ≡ (w ∧ ¬x ∧ y ∧ ¬z) ∨ (w ∧ ¬x ∧ y ∧ z)
which is equivalent to the RHS and therefore cover both cases via a single, simpler expression.
Example 2.10. The best way to illustrate this in practice is to fully examine the the truth table in Figure 2.15a:
1. Essentially, the first two steps just translate information from the truth table into the map (or grid); keep
in mind that we are representing the same information, i.e., the specification of f , in both cases.
Since f has n = 4 inputs, the associated truth table has 24 = 16 rows; by selecting p = q = 4, we can draw
the following square grid with enough elements to capture those rows
w
x
00 01 11 10

00
0 1 5 4

01
2 3 7 6
z
11
y 10 11 15 14

10
8 9 13 12

git # 8b6da880 @ 2023-09-27 83


© Daniel Page ⟨dan@phoo.org⟩

Correctly interpreting the grid layout is crucial, since we need to translate rows of the truth table into the
correct elements. Note that w and x relate to the columns (or horizontal axis), whereas y and z relate to
the rows (or vertical axis). The left-most column, for example, relates to cases where w and x both have
the values 0, i.e., where (w, x) = (0, 0); reading that column top-to-bottom, the rows within it relates to
cases where (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0). The other columns, read from left-to-right, are similar for
y and z, but for the remaining cases where (w, x) = (0, 1), (1, 1) and (1, 0). As such, we can now fill each
element in the grid with an output listed in the corresponding truth table row to get

w
x
00 01 11 10

00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14

10
8
1 9
0 13
0 12
1

Bars above and to the left of this grid denote cases where the associated input is 1: the 1-st and 2-nd (or
middle) columns are where x = 1, for example, whereas the 0-th and 3-rd (or outer) columns are where
x = 0. Elsewhere you might also see numbers to the left of each row, or above each column to make
the values more explicit: they might show (0, 0) and (1, 0) (or just 00(2) and 01(2) ) for the 1-st and 2-nd
(or middle) columns, and (0, 1) and (1, 1) (or just 10(2) and 11(2) ) for the 0-th and 3-rd (or outer) columns.
Either way, the ordering might, reasonably, seem odd: note that in row- and column-wise directions, a
Gray code is used. From top-to-bottom, elements in a column are for (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0),
not (y, z) = (0, 0), (0, 1), (1, 0) and (1, 1) which might seem more natural. The reason for this choice will be
made apparent later, but, for now, keep in mind that it is what allows the Karnaugh map to deliver the
advantages outlined above.

2. The next step is to cover 1 elements in the grid. In a sense, this is analogous to what we did in the original
method when we identified each row in the truth table where the output was 1: there we would have a
group for each 1 element, but here we can form larger groups and cover multiple 1 elements.
The rules state we can form rectangular, potentially overlapping groups whose size is a power-of-two
(i.e., 2m for some m): provided we follow them, each group formed will represent a term we then need to
implement as part of the SoP expression. The larger the group, the fewer inputs we be included in each
of the terms; the fewer groups, the fewer terms there are. An example grouping in this case is as follows:

w
x
00 01 11 10

00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14

10
8
1 9
0 13
0 12
1

Here we have four groups:

• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,
• a group of one element in the top right-hand corner,
• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and
• a group of two elements which wrap around the bottom-left and bottom-right corners.

3. Finally, we need to translate each group into a term in the SoP expression. As an example, consider the
first group (i.e., of four elements in the top left-hand corner) and the values each input is assigned within
it. It should become clear that the value of x is irrelevant provided that w = 0. Put another way, fixing
w = 0 means we include the two left-most columns only (excluding the two right-most columns because
they relate to cases where w = 1). In the same way, the value of z is irrelevant provided that y = 0.

git # 8b6da880 @ 2023-09-27 84


© Daniel Page ⟨dan@phoo.org⟩

By specifying values for each relevant input and ignoring the irrelevant inputs, we can implement this
term as
¬w ∧ ¬y
to cover all four cells in that group; we are specifying “the columns where w = 0 and rows where y = 0”,
which restricts us precisely to elements within the group. By applying similar reasoning to the other
three groups, we find that

r = ( ¬w ∧ ¬y ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ y ∧ ¬z ) ∨
( w ∧ y ∧ z )

which is equivalent to but clearly simpler than the result we derived originally: there are a) fewer terms,
and b) each term is the combination of fewer inputs.

Example 2.11. The result above is simpler than the original, but it turns out we can do better still by more
careful formation of the groups. More specifically, we could consider the following alternative
w
x
00 01 11 10

00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14

10
8
1 9
0 13
0 12
1

where there are now three groups, namely

• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,

• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and

• a group of four elements which wrap around the top-left, bottom-left, top-right, and bottom-right corners.

The end result is a simpler expression, including one less term:

r = ( ¬w ∧ ¬y ) ∨
( ¬x ∧ ¬z ) ∨
( w ∧ y ∧ z ) .

Constructive use of don’t care entries An important feature or extension of truth tables, as defined so far, is
the potential to include so-called don’t care entries: rather than 0 nor 1, we use ? to denote we do not care what
the value is (vs. we do not know what the the value is, for example). When used in the context of an output, it
can be rationalised by considering a component whose output simply does not matter given some combination
of inputs: maybe this input is invalid, so the output is never used due to the resulting error.

Example 2.12. Consider the truth table


x y r
0 0 0
0 1 ?
? 0 1
which describes some 2-input Boolean function, where don’t care entries are used in two roles:

1. On the LHS, wrt. the input x. In this case, the ? represents a short-hand, because by saying we don’t care
what the value of x is we expand that one row into two: one for x = 0 and one for x = 1, which is like
saying “irrespective of x (so if x = 0 or x = 1), provided y = 0 then r = 1”.

2. On the RHS, wrt. the output r. In this case, the ? represents a choice, because by saying we don’t care
what the value of r is we can select whatever suits us: it could be thought of like a “wildcard” of some
sort.

git # 8b6da880 @ 2023-09-27 85


© Daniel Page ⟨dan@phoo.org⟩

This concept has various applications, but is immediately useful during the derivation of an expression from the
specification (including don’t care entries) of some function. In short, both the original method and Karnaugh
map alternative can, at a high level, be described as covering 1 entries in the truth table (either individually, or
in a group); in both cases, fewer 1 entries implies a simpler the SoP expression. As such, it makes sense to deal
with don’t care entries (in the output) in a way that helps: we are free to treat them as 0 or 1, so a) treating them
as 0 means we do not need to cover them with a group, whereas b) treating them as 1 means we can potentially
form larger groups.
Example 2.13. Consider the truth table in Figure 2.15b which describes a 3-input Boolean function and thus
has 23 = 8 rows; selecting p = 2 and q = 4 yields the (empty) map
x
y
00 01 11 10

0
0 1 5 4

z 1
2 3 7 6

then filled as follows:


x
y
00 01 11 10

0
0
0 1
1 5
1 4
0
z 1 0 1 ? ?
2 3 7 6

Consider the following two groupings:


x x
y y
00 01 11 10 00 01 11 10

0
0
0 1
1 5
1 4
0 0
0
0 1
1 5
1 4
0
z 1 0 1 ? ? z 1 0 1 ? ?
2 3 7 6 2 3 7 6

The left-hand option treats the element associated with x = 1 and z = 1 in the 1-st row, 2-nd column as 0: as
such it is not covered by a group, and we are forced to form two rectangular groups as a result st. the resulting
expression is
r = (¬x ∧ y) ∨ (y ∧ ¬z).
In contrast, the right-hand option treats the element as a 1, meaning it can be included in a single, larger group.
This produces the (much) simpler expression r = y.

Why Gray code?! In the example above, we informally cited the use of Gray code ordering for rows and
columns in a Karnaugh map as important wrt. the advantages it then offers. The easiest way to see why this is
true, is via another example where we do not use this approach.
Example 2.14. Consider the truth table
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

git # 8b6da880 @ 2023-09-27 86


© Daniel Page ⟨dan@phoo.org⟩

which describing some 4-input function f . By using a Gray code ordering, we translate it into the following
Karnaugh map

00 01 11 10

00 0 1 1 0
01 0 0 0 0
11 0 0 0 0
10 0 0 0 0

that allows formation of a single group that covers the two elements in the 0-th column; this group produces
the SoP expression
r = x ∧ ¬y ∧ ¬z,
noting that the value of w is irrelevant in this case (i.e., provided x = 0, y = 0 and z = 0, that alone is enough to
cover the group). Now consider a similar Karnaugh map without a Gray code ordering

00 01 10 11

00 0 1 0 1
01 0 0 0 0
10 0 0 0 0
11 0 0 0 0

which is more like a Veitch diagram [10], a precursor to the Karnaugh map. Note, for example, that the 2-nd
column now represents cases where w = 1 and x = 0, and the 3-nd column now represents cases where w = 1
and x = 1: the 2-nd and 3-rd columns are swapped versus the original Karnaugh map (and likewise for the
rows). The problem is, now we cannot make a single group that covers the same two elements: we now need
two groups, each covering one element. These groups obviously produce a more complicated SoP expression,
namely
r = ( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z )
where we now include w even though we know it is not required; to get the same result as before, we would
now have to manipulate the expression by hand using suitable axiomatic steps.
This basically demonstrates that by using a Gray code ordering, where one bit will always toggle in the input
assignment when moving between rows and/or columns, we support precisely the observation outlined at the
start of the Section. Put another way, we wanted to identify input assignments that differed wrt. one input only
so as to eliminate that input; by ensuring that two adjacent (including wrap-around) rows or columns satisfy
this property, a group that spans them will naturally translate into a term that eliminates the single, different
input that identifies them.

2.2.3.4 Mechanical derivation method #3: Quine-McCluskey


Although Karnaugh maps can represent functions with any number of inputs, they become unwieldy to
draw and use for larger n; the reason for this scalability problem stems fundamentally from the emphasis on
a human-friendly vs. machine-friendly algorithm. However, we can address the problem by investigating
Quine-McCluskey minimisation: this is a method developed independently by Willard Quine [7] and Edward
McCluskey [4] in the mid 1950s. It is reasonable to think of Quine-McCluskey as offering the advantages of both
the previous methods: it a) can be automated easily, while also b) automatically applying various optimisations,
and so avoids the need for (as much) post-derivation optimisation by hand. Unlike the previous methods where
we could write a concise description of the algorithm, it is easiest to explain this one inline with an example.
The following Sections do so by (re)considering the truth table in Figure 2.15a.

Step #1: extraction of prime implicants The first step is to produce a table, Table 2.16 for this example, that
we extend step-by-step: we

1. initialise the 0-th section by extracting each minterm from the truth table (i.e., each input assignment st.
the output is 1), then

2. process the i-th section to construct the (i + 1)-th section, iterating until no progress can be made.

git # 8b6da880 @ 2023-09-27 87


© Daniel Page ⟨dan@phoo.org⟩

Section Group Implicant Used


0 0 0 (0, 0, 0, 0) ✓
1 1 (1, 0, 0, 0) ✓
2 (0, 1, 0, 0) ✓
4 (0, 0, 1, 0) ✓
8 (0, 0, 0, 1) ✓
2 5 (1, 0, 1, 0) ✓
10 (0, 1, 0, 1) ✓
3 11 (1, 1, 0, 1) ✓
4 15 (1, 1, 1, 1) ✓
1 0 0+1 (?, 0, 0, 0) ✓
0+2 (0, ?, 0, 0) ✓
0+4 (0, 0, ?, 0) ✓
0+8 (0, 0, 0, ?) ✓
1 1+5 (1, 0, ?, 0) ✓
4+5 (?, 0, 1, 0) ✓
2 + 10 (0, 1, 0, ?) ✓
8 + 10 (0, ?, 0, 1) ✓
2 10 + 11 (?, 1, 0, 1)
3 11 + 15 (1, 1, ?, 1)
2 0 0+1+4+5 (?, 0, ?, 0)
0 + 2 + 8 + 10 (0, ?, 0, ?)
0+4+1+5 (?, 0, ?, 0) duplicate
0 + 8 + 2 + 10 (0, ?, 0, ?) duplicate

Figure 2.16: Quine-McCluskey simplification, step #1: extraction of prime implicants.

0 1 2 4 8 5 10 11 15
0+1+4+5 ✓ ✓ ✓ ✓
0 + 2 + 8 + 10 ✓ ✓ ✓ ✓
10 + 11 ✓ ✓
11 + 15 ✓ ✓

Figure 2.17: Quine-McCluskey simplification, step #2: covering the prime implicants table.

git # 8b6da880 @ 2023-09-27 88


© Daniel Page ⟨dan@phoo.org⟩

Based on an input assignment represented as a tuple, in this case (z, y, x, w), we identify each minterm using
an integer: you can see the seven mintems extracted from Figure 2.15a at the top of Table 2.16. In the table,
each entry (i.e., each row) is called an implicant; they are assigned a group based on the number of elements in
associated tuple that equal 1. Consider section 0 for example. Implicant 0 represented by (0, 0, 0, 0) (st. w = 0,
x = 0, y = 0 and z = 0) is assigned group 0 because zero elements of the representation equal 1, In contrast,
implicant 5 represented by (1, 0, 1, 0) (st. w = 0, x = 1, y = 0 and z = 1) and implicant 10 represented by (0, 1, 0, 1)
(st. w = 0, x = 1, y = 0 and z = 1) are both assigned group 2 because two elements of their representations
equal 1.
Recall from our simplification using Karnaugh maps that we were able to apply a rule to implement both
minterms w ∧ ¬x ∧ y ∧ ¬z and w ∧ ¬x ∧ y ∧ z with a single, simpler expression w ∧ ¬x ∧ y because the value
of z is irrelevant. We use a similar approach here, using the i-th section to construct the (i + 1)-th section by
comparing members of the j-th and ( j + 1)-th groups in the former; our goal is to find pairs of implicants whose
representations differ in one element, and combine them together. We skip comparison of the j-th group and
groups other than the ( j + 1)-th, because by definition they cannot satisfy the criterion. As an example, consider
construction of the 1-st section from the 0-th section: we

• compare implicant 0 from group 0 with implicants 1, 2, 4 and 8 from group 1,

• compare implicants 1, 2, 4 and 8 from group 1 with implicants 5 and 10 from group 2,

• compare implicants 5 and 10 from group 2 with implicant 11 from group 3, and

• compare implicant 11 from group 3 with implicant 15 from group 4.

In the new section, we replace the differing element of paired implicants with ? to highlight the fact we don’t
care about that input: combining implicants 0 and 1 represented by the tuples (0, 0, 0, 0) and (1, 0, 0, 0), for
example, produces an implicant represented by (?, 0, 0, 0). Furthermore, each implicant from the i-th section
which is used to form an implicant in the (i + 1)-th section is marked with a ✓ next to it; implicants 0, 1, 2, 4 and
8 are thus marked due to the comparison between groups 0 and 1 and their use in forming implicants 0 + 1,
0 + 2, 0 + 4 and 0 + 8.
The process is iterated, constructing subsequent sections until we can no longer make progress, i.e., there are
no implicants that can be combined. Table 2.16 includes three sections, noting that section 2 has no implicants
that be combined and so is the last constructed. In addition, it illustrates the fact combination of implicants
in the i-th section may produce duplicates in the (i + 1)-th section: here, we can see (0, ?, 0, ?) and (?, 0, ?, 0) are
duplicated. Whenever this occurs, we ignore the duplicates and omit them from further comparisons.

Step #2: covering the prime implicants table Any unmarked implicants are termed prime implicants: these
form the focus of a second step whose task is to produce the SoP expression. The content of Table 2.16 includes
four prime implicants, namely
0+1+4+5 7→ (?, 0, ?, 0)
0 + 2 + 8 + 10 7→ (0, ?, 0, ?)
10 + 11 7→ (?, 1, 0, 1)
11 + 15 7→ (1, 1, ?, 1)
These are used to form a prime implicant table, as in Table 2.17: it lists the prime implicants along the left-hand
side and the original minterms along the top, and includes a ✓ character in every elements where a given prime
implicant includes a given minterm.
The goal now is to select a combination of the prime implicants which covers all of the original minterms.
For example, the implicant 0 + 1 + 4 + 5 covers the prime implicants 0, 1, 4 and 5; selecting this as well as
implicant 10 + 11 will cover 0, 1, 4, 5, 10 and 11. Before doing so, we can make our task easier by identifying
the set of essential prime implicants, i.e., those which are the only cover for a given minterm. We can see the
prime implicant 11 + 15 is such a case in Table 2.17, because it is the only way to cover minterm 15; as a result,
we must include it in our expression.
The process for coverage is fairly intuitive: we start with essential prime implicants, and then draw a line
through the associated row in the prime implicants table; when a line goes through a ✓, we also draw a line
through that column. The resulting lines show which minterms are currently covered by prime implicants we
have selected for inclusion in our SoP expression. For our example we

• draw a line through the row for implicant 11 + 15, and hence through the columns for implicants 11 and
15,

• draw a line through the row for implicant 0 + 1 + 4 + 5, and hence through the columns for implicants 0,
1, 4 and 5, and finally

git # 8b6da880 @ 2023-09-27 89


© Daniel Page ⟨dan@phoo.org⟩

• draw a line through the row for implicant 0 + 2 + 8 + 10, and hence through the columns for implicants
0, 2, 8 and 10.

The end result shows that by using prime implicants 0 + 1 + 4 + 5, 0 + 2 + 8 + 10, and 11 + 15, we can can cover
all the original minterms; we need not include prime implicant 0 + 1 + 4 + 5 for example, since minterms 0, 1, 4
and 5 are all covered elsewhere. Looking at the associated tuples, we have

0+1+4+5 7→ (0, ?, 0, ?)
0 + 2 + 8 + 10 7→ (?, 0, ?, 0)
11 + 15 7→ (1, ?, 1, 1)

Following the rule that for some input t

if t = 0 { use ¬t
if t = 1 { use t
if t = ? { ignore

we form a term for each prime implicant listed and thus implement the SoP expression as

r = ( ¬w ∧ ¬y ) ∨
( ¬x ∧ ¬z ) ∨
( w ∧ y ∧ z )

as per our original attempt using Karnaugh maps.

2.2.4 Physical properties of combinatorial logic


2.2.4.1 Delay: from static to dynamic (i.e., including time) evaluation
Definition 2.20. Within some combinatorial logic, two classes of delay (which is often described as propagation delay,
with a hint toward delay of signals more generally) dictate the time between change to some input and corresponding
change (if any) in an output: these are

• wire delay, which relates to the time taken for current to move through the conductive wire from one point to
another, and

• gate delay, which relates to the time taken for transistors in each gate to switch between connected and unconnected
states.

The latter is typically larger than the former, and both relate to the associated implementations: the latter relates to
properties of the transistors used, the former to properties of the wire (e.g., conductivity, length, and so on). x

Definition 2.21. The critical path through some combinatorial logic is the longest sequential sequence of delays (so wire
and/or gate delays) between the inputs and outputs.

Although such wire and gate delays are typically very small, when many gates are placed in series or when
wires are very long, the delays add up; the problem of managing the result is multiplied as the complexity of
combinatorial logic increases. The concept of wire delay is perhaps more intuitive than gate delay, so it make
sense to expand a little on the latter; the example below attempts to explain the cause.

Example 2.15. Consider Figure 2.18, which includes an idealised (left-hand side, in Figure 2.18a) and (more)
realistic (right-hand side, in Figure 2.18b) illustration of what happens when the input to a MOSFET-based
NOT gate, i.e., Figure 2.10a, switches.
The idea is to stress the fact that in the idealised case, there is an instantaneous change in the output voltage:
the plot representing the output is square-edged, changing (or swinging) from 5V (i.e., 1) to 0V (i.e., 0) the
instant that the input voltage changes from 0V (i.e., 0) to 5V (i.e., 1), or, more precisely, when it reaches the
threshold voltage. Note that the illustration includes output voltage levels above 0V and below 5V that represent
the threshold at which said output is interpreted as a 0 or 1, but since the change is instantaneous these are
irrelevant.
In contrast, the realistic case suggests a non-instantaneous change in the output voltage, i.e., it takes some
time. The characteristics of the now curved plot relate to properties of the transistors. However, the important
thing to realise is that the input voltage will take some time to change between 0V (i.e., 0) and 5V (i.e., 1), so
there is some delay in the output voltage changing from 5V (i.e., 1) and 0V (i.e., 0); this also suggests there is a
(short) period of time where the output voltage cannot be interpreted is either 0 or 1.

git # 8b6da880 @ 2023-09-27 90


© Daniel Page ⟨dan@phoo.org⟩

threshold voltage threshold voltage

5 5

4 4
1 1
Output voltage

Output voltage
3 3

2 2
0 0
1 1

0 0

0 1 2 3 4 5 0 1 2 3 4 5
Input voltage Input voltage

(a) Idealised, square switching activity. (b) Realistic, curved switching activity.

Figure 2.18: An illustration of idealised and realistic switching activity wrt. a MOSFET-based NOT gate.

t0 x
t2
y
x t0
t1
r t2
t3
y
r
0ns

10ns

20ns

30ns

40ns

50ns
t1 t3

(a) An annotated implementation of an XOR gate, using


(b) A waveform tracking intermediate results that occur
NOT, AND and OR gates.
when x is changed from 0 to 1.

Figure 2.19: A behavioural waveform demonstrating the effects of propagation delay on an XOR implementation.

t
x r

Figure 2.20: A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour.

x
y

m target gates

Figure 2.21: A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive n target gates.

git # 8b6da880 @ 2023-09-27 91


© Daniel Page ⟨dan@phoo.org⟩

Although this property is often abstracted when illustrating the value of a wire in a waveform, meaning
transitions from 0 to 1, or vice versa, are square-edged, it can be captured with sloped-edges as shown in

x
y

Notice that x and y toggle between 0 and 1 in the same way, but transitions in the former (resp. latter) are
instantaneous (resp. take some time). Whether implicit or explicit, the gate delay property still exists, and has
an impact on evaluation of larger combinatorial designs:

Example 2.16. Consider Figure 2.19a, which shows the implementation of an XOR gate (using, so derived from
NOT, AND and OR gates). If we take a static approach to evaluating the output using the inputs, it is reasonable
that by setting x = 0 and y = 1 we get
x = = 0
y = = 1
t0 = ¬x = 1
t1 = ¬y = 0
t2 = t0 ∧ y = 1
t3 = t1 ∧ x = 0
r = t2 ∨ t3 = 1

However, this ignores the impact of delay on the evaluation process; if we take a dynamic approach and imagine
the delay of

1. a NOT gate is 10ns,

2. an AND gate is 20ns, and

3. an OR gate is 20ns,

this changes matters. Imagine we toggle the inputs from x = 0, y = 1 to x = 1, y = 1; immediately we introduce
time, in the sense we have introduced previous values of x and y rather than just current values. An illustration
of the gate behaviour is given in Figure 2.19b, however simplistic. The waveform starts when the gate is in the
correct state given the inputs x = 0, y = 1, after which the inputs are toggled to x = 1, y = 1 (at 0ns). Notice
that the the result is not valid immediately. In particular, we can examine points in the waveform and show
that the final and intermediate results are actually incorrect. For example, it takes 10ns before either NOT gate
produces the correct output on t0 and t1 ; the result r remains incorrect until 50ns; gate delay has caused a gap
between the inputs being toggled, and output being valid.

To conclude, it is important to stress the central role a critical path has: it is a limiting factor or bound on how
quickly some combinatorial logic computes outputs, i.e., it dictates the associated latency. That may not seem
important, but obviously we prefer an optimised design that has lower latency; this implies a design challenge,
in that we almost always want to minimise the critical path.

Example 2.17. Following the example above, consider Figure 2.19a: this XOR design has a critical path that
goes through a NOT gate, then an AND gate, and then an OR gate: the path has a total delay of 50ns. In a way,
this formalises what we found above: it took 50ns to get the correct output r from inputs x and y. However,
examining the critical path delivers this information with no evaluation; it basically tells us the design can
never compute outputs in less time, which of course might imply the system said design is placed in is further
limited as a result.

2.2.4.2 Glitches as a by-product of delay

Definition 2.22. A glitch is normally defined to describe a (momentary) change wrt. some wire, which may cause a
(momentarily) invalid or incorrect output if used as an input to some gate; the cause is typically delay of some sort, e.g.,
a mismatch in when two gate inputs become valid.

Example 2.18. Consider Figure 2.20, wherein the two AND gate inputs are forced to be valid at different times
due to imbalanced delay: it clearly takes longer for the value of x to propagate through the NOT gate than
directly along the wire. The net result is that if we toggle x = 0 to x = 1 then back again, we produce a short
glitch, i.e.,

git # 8b6da880 @ 2023-09-27 92


© Daniel Page ⟨dan@phoo.org⟩

x
t
r

0ns

10ns

20ns

30ns

40ns

50ns

60ns
matching the NOT gate delay.

2.2.4.3 On the sanity of buffer gates


Figure 2.13 included a so-called buffer gate, whose function can be described as r = x: no computation is
performed per se, because the output matches the input. As such, it is reasonable to question the purpose of
such a gate; we could eliminate it (or just replace it with a wire) and produce an equivalent result. It turns out
the buffer can be used in two somewhat subtle roles:

1. Although the functionality of a buffer is r = x, there is still some associated gate delay (roughly equivalent
to a NOT gate); it can thus be used to equalise the delay through different paths in some combinatorial
logic, and thus help solve the glitch problem outlined above. Within Figure 2.19a, for example, one can
imagine adding a buffer between y and the second input to the top AND gate; this would ensure that ¬x
and the buffered version of y arrive at the inputs to said gate at the same time.

2. Recall that the output of each MOSFET-based gate was formed by conditionally connecting Vdd or Vss to r;
the inputs, e.g., x and y, simply control which connection was made. This is important, because it implies
that even if the inputs are in some way “weak” then the output will be amplified, so equal to the “strong”
levels Vdd or Vss . A buffer can therefore be viewed a way to get r, an identical but amplified version of x.

Neither of these fact is particularly important within the remit of what we cover, but is is nonetheless important
to keep them in mind iff. you see buffer gates in designs elsewhere.

2.2.4.4 Fan-in and fan-out


The terms fan-in and fan-out refer to properties of logic gates associated with their inputs and outputs:

Definition 2.23. Consider a given logic gate:

• The term fan-in is used to describe the number of inputs to a given gate.

• The term fan-out is used to describe the number of inputs (so in a rough sense the number of other gates) the
output of a given gate is connected to.

The former is easier to explain: it is just a way to formalise the fact that, wlog. a 2-input AND gate that
computes r = x ∧ y has fan-in of 2, whereas a 3-input AND gate that computes r = x ∧ y ∧ z has fan-in of 3.
A gate with higher fan-in will typically switch more slowly than a gate with lower fan-in; this stems from the
fact the larger number of inputs are processed using a more complex internal organisation of transistors.
The latter is still easy to explain, but harder to justify as important. The idea is that, ideally, we are free
to connect the output of a given source gate to the inputs of say m other target gates; in practice, however,
there is a limit on m. It stems from increased load on the source gate, and so longer propagation delay: it
basically takes longer for the driving voltage to meet the required threshold. In addition, a transistor is limited
wrt. the current driven through it before it will malfunction in some way; if the fan-out requires this to be
exceeded, then the under-supplied source gate will fail somehow. So, in a sense, fan-out is an intrinsic versus
extrinsic implication of propagation delay (where the latter simply delays computation in some sense, the
former disrupts it). For example, consider the contrived design in Figure 2.21: the source AND gate on the
left-hand side is used to drive m other target AND gates to the right. Unless the source gate drives enough
current onto its output, it may malfunction because the target gates will not receive enough of a share to operate
correctly. The implementation of each gate will be rated wrt. fan-out, which essentially say how many is too
many, i.e., the the number of target gates which can be safely connected to a source gate; CMOS-based gates
have quite a high fan-out rating, perhaps 100 target gates or more can be connected to a single source.

2.2.4.5 3-state logic


In a sense, fan-out constrains m, the number of target gates we might connect to n = 1 source gate. But what
about n, and, in particular, what happens if we drive any number of target gates with the output from n = 0
source gates (i.e., none), or n > 1 source gates (e.g., by two rather than one)?

git # 8b6da880 @ 2023-09-27 93


© Daniel Page ⟨dan@phoo.org⟩

Suspend disbelieve for a moment and assume these cases could be of use in some way; hopefully it is obvious
that neither is likely to yield the outcome we want, or indeed can reason precisely about. In the first case, the
input is neither 0 or 1 so it is unclear what the output will be. Perhaps the only caveat to this is where one
input along can dictate the output; reconsider Figure 2.10b for example, which implements a NAND gate and
so computes r = x ∧ y. The truth table for NAND suggests if y = 0 the r = 1 irrespective of x: this reasoning is
validated by the implementation, since if y = 0 one P-MOSFET will always connect Vdd to r irrespective of the
other. This aside, however, so in general, if an input is not a Boolean value then it remains unlikely we get the
Boolean-like behaviour intended. In the same way, in the second case we basically join n outputs together: this
is more dangerous, because both drive current along the wire. The outcome depends on a number of factors,
but is, again, normally not a positive one wrt. the behaviour we want.
We can mitigate this issue by extending the idea of 2-state, Boolean logic values into 3-state logic. There
are two main ideas:

1. We introduce a new logic value, hence the name 3-state, called Z or high impedance; the easiest way to
think about this value is as representing a null, or disconnected value that can be safely “overpowered”
by any other value (i.e., 0 or 1).

2. We introduce a new logic gate, a so-called enable gate, which is essentially just a switch implemented
using a single transistor, i.e.,

en

x r

The associated truth table accommodates the high impedance value as follows:

Enable
x en r
0 0 Z
1 0 Z
Z 0 Z
0 1 0
1 1 1
Z 1 Z
0 Z Z
1 Z Z
Z Z Z

In combination, these steps allow us to cope with both cases above. The first case is now less of an issue: we
still might not get the behaviour we wanted, but at least we can reason about it. In the second case, we can use
the enable gate to allow conditional access to a shared wire: if en = 0 the output is Z so not driven, meaning
another driver could be safely connected to and use the same wire. However, when en = 1 the output is x;
nothing else should be driving a value along this wire or we are back to the situation which caused the original
problem.

2.2.4.6 Stable, unstable, and meta-stable states


Definition 2.24. Consider a component with a given output: the output (or component) can be said to be in

• a stable state if the output is predictable, i.e., either be 0 or 1, whereas

• an unstable state if the output is unpredictable, e.g., either be 0, 1, a voltage level between the threshold for either,
or oscilate between the two somehow.

Definition 2.25. A meta-stable state is an unstable state, which, after some period of time, will resolve to some stable
state: the output eventually settles to either a 0 or 1 (i.e., become stable), but we cannot predict which or when.

Instances of instability typically stem from some form of logical inconsistency in a design, and, in the case of
meta-stability, are only ever resolved due to physical characteristics of the implementation (e.g., strength of
transistors).

Example 2.19. Consider the following example

git # 8b6da880 @ 2023-09-27 94


© Daniel Page ⟨dan@phoo.org⟩

MUX2 r
c x y r
x y
y c
r 0 0 ? 0
0 1 ? 1
1 ? 0 0
(a) The multiplexer
1 ? 1 1
as a symbol.
(b) The multiplexer as a truth table. c

(c) The multiplexer as a circuit.

r0
DEMUX2 x
r0
c x r1 r0
x 0 0 ? 0 r1
c r1
0 1 ? 1
1 0 0 ?
(d) The demulti-
1 1 1 ?
plexer as a symbol.
(e) The demultiplexer as a truth table. c
(f) The demultiplexer as a circuit.

Figure 2.22: An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells.

x r

which could be captured using the (logically inconsistent) expression x = r = ¬x. Clearly there is a problem,
because of x = 0 it should be 1 due to the NOT gate, and if x = 1 it should be 0; as a result, the output r will be
unstable and oscillate somehow (potentially at a rate that is related to the gate delay involved).

2.2.5 Building block components


We have already seen it is convenient to design combinatorial logic using logic gates rather than transistors;
in short, this allows a higher level of abstraction. In the same way, it may be convenient to design larger,
more complex combinatorial logic components using smaller, less complex combinatorial logic components.
The latter are, in a sense, just standard building blocks that are useful when designing the former. Where
appropriate, they allow us to decompose a larger component into smaller components; this is often attractive,
in that designing the larger component within one, monolithic task is often a lot more difficult.
Without a context, it is easy to look at the building blocks we cover in the following and deem them odd or
even useless. Keep in mind that each one is covered specifically because it is useful; think of them as a way to
practice the techniques developed so far, and believe we will make use of them later (e.g., in Chapter 3).

2.2.5.1 Components for choosing between options


The idea of choice is crucial in constructing larger components: often we want to control the component, for
example making it operate differently depending on some input. The idea is that

1. a multiplexer

• has m inputs,
• has 1 output,
• uses a (⌈log2 (m)⌉)-bit control signal input to choose which input is connected to the output,

git # 8b6da880 @ 2023-09-27 95


© Daniel Page ⟨dan@phoo.org⟩

x0 x
w
y0
r r0 x
r
y c
x y c

c
x1 x c0
y1
r r1
y c
x
y c
r r
c
x2 x
y2
r r2
y c c1
y x
c r
z y c
x3 x
y3
r r3
y c
c0
c
(a) A 2-input, 4-bit multiplexer. (b) A 4-input, 1-bit multiplexer.

Figure 2.23: Application of the isolated and cascaded replication design patterns.

while
2. a demultiplexer
• has 1 input,
• has m outputs,
• uses a (⌈log2 (m)⌉)-bit control signal input to choose which output is connected to the input,
noting that each the input and output is n-bit. We can describe how the components behave using C as an
analogy. For example, ignoring the number of bits in each input, output and control signal, the statement
switch ( c ) {
case 0 : r = w; break ;
case 1 : r = x; break ;
case 2 : r = y; break ;
case 3 : r = z; break ;
}

acts similarly to a 4-input multiplexer: depending on the control signal c, one of the inputs (i.e., w, x, y, or z) is
assigned to the output (i.e., r). Likewise,
switch ( c ) {
case 0 : r0 = x; break ;
case 1 : r1 = x; break ;
case 2 : r2 = x; break ;
case 3 : r3 = x; break ;
}

acts similarly to a 4-output demultiplexer: depending on the control signal c, one of the outputs (i.e., r0,
r1, r2, or r3) is assigned from the input (i.e., x). Although attractive, using such an analogy needs care. In
particular, keep in mind the C fragments include an implicit, discrete order wrt. the assignments. In contrast,
the component design means an analogous connection is evaluated in a continuous manner: whenever either
the control signal or any input changes, the output may change to match.
This behaviour stems from a design based on combinatorial logic, which is easy to develop for both
components; in a similar way to before, we write down a truth table that describes the behaviour we require,
then derive a Boolean expression to implement that behaviour:
Example 2.20. Consider the case of a 2-input (resp. 2-output), 1-bit multiplexer, a truth table for which is
outlined in Figure 2.22b. The idea is we have two 1-bit inputs x and y, and one 1-bit control signal c; we want
to drive r with either x or y depending on whether c = 0 or c = 1. The truth table should make sense in that
when c = 0 the output r matches x, and when c = 1 the output r matches y; the don’t care entries, and so truth
table as a whole, can be read as “if c = 0 then r = x irrespective of y, whereas if c = 1 then r = y irrespective of
x”. From the truth table, we can arrive at the expression
r = ( ¬c ∧ x ) ∨
( c ∧ y )
which is shown diagrammatically in Figure 2.22c.

git # 8b6da880 @ 2023-09-27 96


© Daniel Page ⟨dan@phoo.org⟩

Example 2.21. Consider the case of a 2-input (resp. 2-output), 1-bit demultiplexer, a truth table for which is
outlined in Figure 2.22e. The idea is we have two 1-bit outputs r0 and r1 , and one 1-bit control signal c; we
want to drive either r0 or r1 with x depending on whether c = 0 or c = 1. The truth table should make sense in
that when c = 0 the output r0 matches x, and when c = 1 the output r1 matches x; the don’t care entries, and so
truth table as a whole, can be read as “if c = 0 then r0 = x and r1 is irrelevant, whereas if c = 1 then r1 = y and
r0 is irrelevant”. From the truth table, we can derive the expression

r0 = ¬c ∧ x
r1 = c∧x
shown diagrammatically in Figure 2.22f.
For more general m-input (resp. m-output), n-bit alternatives, we employ the design patterns outlined earlier
using the 2-input (resp. 2-output), 1-bit components as a starting point.
Example 2.22. Consider the task of designing a 2-input, n-bit multipliexer, wlog. taking n = 4 as an example.
Note that with m = 2 inputs, we need ⌈log2 (m)⌉ = 1 control signals: one of 21 = 2 possible input assignments is
used to select each input.
Figure 2.23a illustrates the design, which uses replication. The idea is simple: we use n separate 2-input,
1-bit multiplexers where the i-th instance accepts the i-th bit of each input x and y and produces the i-th bit of
the output r. Or, put another way, since each instance is controlled by the same c, they are all either selecting
some bit of x or of y to produce r.
Example 2.23. Consider the task of designing a m-input, 1-bit multipliexer, wlog. taking m = 4 as an example.
Note that with m = 4 inputs, we need ⌈log2 (m)⌉ = 2 control signals: one of 22 = 4 possible input assignments is
used to select each input.
One strategy would be to simply write down a larger truth table, i.e.,

MUX4
c1 c0 w x y z r
0 0 0 ? ? ? 0
0 0 1 ? ? ? 1
0 1 ? 0 ? ? 0
0 1 ? 1 ? ? 1
1 0 ? ? 0 ? 0
1 0 ? ? 1 ? 1
1 1 ? ? ? 0 0
1 1 ? ? ? 1 1

and then derive a larger Boolean expression

r = ( ¬c0 ∧ ¬c1 ∧ w ) ∨
( c0 ∧ ¬c1 ∧ x ) ∨
( ¬c0 ∧ c1 ∧ y ) ∨
( c0 ∧ c1 ∧ z )
This yields a reasonable result, but as the number of inputs grows the task becomes more difficult. An alternative
is to divide-and-conquer, using 2-input, 1-bit multiplexers to decompose the larger decision task into smaller
steps. Figure 2.23b illustrates the design, which uses a cascade. The first, left-most layer of multipliexers is
controlled by c0 : the top-most instance produces w if c0 = 0, or x if c0 = 1, whereas the bottom-most instance
produces y if c0 = 0, or z if c0 = 1. These outputs are fed into a second, right-most layer that uses c1 to select
appropriately: if c1 = 0 the output of the top-most multiplexer in the first layer is selected, whereas if c1 = 1
the output of the bottom-most multiplexer in the first layer is selected. The overall result r is the same as our
dedicated design above, but hopefully it is clear the cascaded design is conceptually a lot simpler.

2.2.5.2 Components for doing basic arithmetic


Chapter 1 addressed the challenge of representing numbers, integers for example, as n-bit binary sequences;
a question left open was how we might do arithmetic with those numbers, or, more precisely, how we might
do computation with the associated representations. Since we are now able to design arbitrary Boolean
functionality, we can start to investigate this question.
The general, high-level task is to design a large, more complex combinatorial logic component that imple-
ments some arithmetic operation (e.g., integer addition): it might accept n-bit inputs x̂ and ŷ that represent x
and y, and produce an n-bit result r̂ st.
r̂ = f (x̂, ŷ) 7→ x + y,

git # 8b6da880 @ 2023-09-27 97


© Daniel Page ⟨dan@phoo.org⟩

Equal
x
x y r
r 0 0 1
y x
0 1 0 y r
(a) The equality 1 0 0
comparator as a 1 1 1 (c) The equality comparator as a circuit.
symbol.
(b) The equality comparator as a truth
table.

Less-Than
x
x y r
x
r 0 0 0
y
0 1 1
(d) The less than 1 0 0
y r
comparator as a 1 1 0
symbol. (f) The less than comparator as a circuit.
(e) The less than comparator as a truth
table.

Figure 2.24: An overview of equality and less than comparators.

i.e., an r̂ that represents the sum of x̂ and ŷ. The content of Chapter 3 does exactly this. As a means of support,
however, a more specific, lower-level first step considers a set of less complex 1-bit building block components:
although not so useful alone, they will act as building blocks within the more general alternatives.

Comparators In contrast to arithmetic proper, where we expect both inputs and output to be numbers, a
comparison compares numerical inputs thus produces a Boolean output. Various types of comparison are
useful, but it is enough to consider two in particular: the others are derived from these comparators, that deal
with 1-bit inputs.

Example 2.24. Given 1-input inputs x and y, an equality comparator computes

1 if x = y
(
r=
0 otherwise

From the associated truth tables is shown in Figure 2.24b, we can derive the expression

r = ¬(x ⊕ y).

Example 2.25. Given 1-input inputs x and y, a less than comparator computes

1 if x < y
(
r=
0 otherwise

From the associated truth tables is shown in Figure 2.24e, we can derive the expression

r = ¬x ∧ y.

While fairly self explanatory, the truth tables may seem a little odd as a result of their dealing with 1-bit inputs.
However, reading through them row-wise should demonstrate their content is sane: using less than as an
example, consider than the truth table mirrors your intuition wrt. this comparison by stating that 0 is not less
than 0, 0 is less than 1, 1 is not less than 0, and, finally, 1 is not less than 1. Note that the equality comparator
design hints that an inequality comparator can be simpler still: inverting the expression, we find r = x ⊕ y
provides an inequality comparison (
1 if x , y
r=
0 otherwise
because, by definition, when x = y (i.e., x = 0 and y = 0 or x = 1 and y = 1) x ⊕ y = 0 and when x , y (i.e., x = 0
and y = 0 or x = 1 and y = 1) x ⊕ y = 1.

git # 8b6da880 @ 2023-09-27 98


© Daniel Page ⟨dan@phoo.org⟩

Half-Adder x
y s
co
x y co s
x 0 0 0 0
y s
0 1 0 1
(a) The half-adder as 1 0 0 1
a symbol. 1 1 1 0 co
(b) The half-adder as a truth table. (c) The half-adder as a circuit.

Full-Adder ci
ci x y co s x
y s
0 0 0 0 0
ci co
0 0 1 0 1
x 0 1 0 0 1
y s
0 1 1 1 0
(d) The full-adder as 1 0 0 0 1
a symbol. 1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
co
(e) The full-adder as a truth table. (f) The full-adder as a circuit.

Figure 2.25: An overview of half- and full-adder cells.

Adders The simplest arithmetic operation, conceptually at least, is addition. There are two variants of a 1-bit
adder, instances of which will be sufficient to construct larger, n-bit adders later:
Example 2.26. Given 1-bit inputs x and y, a half-adder component computes a 1-bit sum s and carry-out co (i.e.,
the LSB and MSB of the 2-bit sum x + y + ci), as output. The corresponding truth table shown in Figure 2.25b
can be used to derive associated Boolean expressions

co = x∧y
s = x⊕y

illustrated in Figure 2.25c.


Example 2.27. Given 1-bit inputs x and y and a carry-in ci, a full-adder component computes a 1-bit sum s and
carry-out co (i.e., the LSB and MSB of the 2-bit sum x + y + ci), as output. The corresponding truth table shown
in Figure 2.25e can be used to derive associated Boolean expressions

co = (x ∧ y) ∨ (x ∧ ci) ∨ (y ∧ ci)
= (x ∧ y) ∨ ((x ⊕ y) ∧ ci)
s = x ⊕ y ⊕ ci

illustrated in Figure 2.27d.


As was the case with comparators, the truth tables may seem a little odd as a result of their dealing with 1-bit
inputs; again, reading through them row-wise should demonstrate their content is sane. The half-adder, for
example, is st.

• if we add x = 0 to y = 0, the sum is 0 and there is no carry-out,


• if we add x = 0 to y = 1, the sum is 1 and there is no carry-out,
• if we add x = 1 to y = 0, the sum is 1 and there is no carry-out, and finally
• if we add x = 1 to y = 1, the sum is 2: since we cannot represent 2 using a single bit, we set s = 0 and
co = 1 to indicate there is a carry-out.

Note that the full-adder design is essentially two half-adders joined in a cascade: to accommodate the extra
carry-in the first instance computes t = x + y with the second one then computing s = t + ci. Also, note that the

git # 8b6da880 @ 2023-09-27 99


© Daniel Page ⟨dan@phoo.org⟩

co

x y
(a) An expanded half-adder, with XOR in terms of NOT, AND and OR.

co

x y
(b) An half-adder based on NAND gates only.

co

x y
(c) An half-adder based on NOR gates only.

Figure 2.26: Gate universality used to implement a NAND- and NOR-based half-adder. Note that the dashed boxes
in the NAND and NOR implementations (middle and bottom) are translations of the primitive gates within the more
natural description (top).

git # 8b6da880 @ 2023-09-27 100


© Daniel Page ⟨dan@phoo.org⟩

Boolean expressions listed for a full-adder effectively include two (equivalent) options for co. One reason to
prefer the second is that given we need to compute both co and s, it contains the shared term x ⊕ y which can
be capitalised on during optimisation.
As an aside, the half-adder represents a simple enough design to explore the idea of gate universality in (a
little) more detail:
Example 2.28. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26b
describes an equivalent using NAND gates only.
Example 2.29. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26c
describes an equivalent using NOR gates only.
Focusing on the NAND-based variant, for example, naive translation using identities (annotated using dashed
boxes, wrt. the original gates in the natural implementation) yields an implementation with 11 NAND gates.
As you might expect, we can improve this with some careful optimisation: capitalising on the equivalence

x ∧ (y ∧ y) ≡ x ∧ (x ∧ y),

we can write
s = x⊕y
= (¬x ∧ y) ∨ (x ∧ ¬y) (XOR identity)
= ¬(¬x ∧ y) ∧ ¬(x ∧ ¬y) (OR into NAND identity)
= (¬¬x ∨ ¬y) ∧ (¬x ∨ ¬¬y) (de Morgans)
= (x ∨ ¬y) ∧ (¬x ∨ y) (involution)
= (¬x ∧ ¬¬y) ∧ (¬¬x ∧ ¬y) (OR into NAND identity)
= (¬x ∧ y) ∧ (x ∧ ¬y) (involution)
= ((x ∧ x) ∧ y) ∧ (x ∧ (y ∧ y)) (NOT into NAND identity)
= ((x ∧ y) ∧ y) ∧ (x ∧ (x ∧ y))
which uses 4 NAND gate due to the shared term x ∧ y, which is also shared with

co = x∧y
= (x ∧ y) ∧ (x ∧ y)

meaning 5 NAND gates for the whole half-adder (which is roughly the same number of non-NAND gates
within the natural implementation). There is a more direct ways to manipulate the expression for s, but notice
that in the above a) steps 1 to 5 yield a result equivalent to Figure 2.26b, b) steps 6 to 7 eliminate any (obviously)
redundant NOT gates, and c) step 8 reorganises the gates to maximise shared logic (rather than eliminating any
gates outright). Although this is a specific example, these steps demonstrate a general strategy that often has a
counter-intuitive impact on any given design: correctly optimised, using NAND (or NOR) often yields a lower
(if any) increase in gate count vs. your expectation or an initial translation. Put another way, although they
can be harder to work with, they do not imply a less efficient result wrt. area (while also retaining advantages
such as regularity).

2.2.5.3 Components that translate between representations


Informally at least, encoder and decoder components can be viewed as translators. Consider the communication
between two parties (or components) as an example
n-bit x m-bit x0 n-bit x

x Encoder Decoder x

where the encoder accepts the input x, and encodes it into an x′ then transmitted; the decoder receives x′ , and
decodes it so as to recover the original x. Phrased as such, both encoder and decoder are basically translating
between representations because x′ could be thought of as a different representation of x we normally term a
code word.
Definition 2.26. Modelling an encoder and decoder as two functions

Encoder : {0, 1}n → {0, 1}m


Decoder : {0, 1}m → {0, 1}n

we use the term n-to-m to describe either component where it has n inputs and m outputs:

git # 8b6da880 @ 2023-09-27 101


© Daniel Page ⟨dan@phoo.org⟩

Enc-4-to-2
x3 x2 x1 x0 x′1 x′0 x1 x00
0 0 0 1 0 0 x3
0 0 1 0 0 1
0 1 0 0 1 0 x2 x01
1 0 0 0 1 1
(b) The encoder as a circuit.
(a) The encoder as a truth table.

x3

x2
Dec-2-to-4
x′1 x′0 x3 x2 x1 x0 x1
0 0 0 0 0 1
0 1 0 0 1 0
1 0 0 1 0 0 x0
1 1 1 0 0 0
(c) The decoder as a truth table.

x01 x00
(d) The decoder as a circuit.

Figure 2.27: An example encoder/decode pair.

1. an n-to-m encoder translates an n-bit input value into an m-bit code word, and

2. an m-to-n decoder translates an m-bit code word into an n-bit output value.

Definition 2.27. If for every code word x we have HW(x) = 1, i.e., every possible code word has exactly one bit set to 1,
we call the associated encoder (resp. decoder) one-hot (or one-of-many).

Definition 2.28. A priority encoder is st. priority (or preference) is given to one input over another. This concept is
most obviously useful in a one-hot encoder, allowing it to cope gracefully with erroneous situations where HW(x) > 1:
the idea is that if xi = 1 and x j = 1, then priority is given to xi say (meaning the fact x j = 1 is ignored).

These formalisms hide various subtleties, most notably the fact that it only makes sense to discuss encoder
and decoders in context: both a) the encoding (resp. decoding) scheme and so structure of code words, and
b) parameterisation of said scheme (e.g., n and m), are totally domain-specific, meaning we cannot describe a
general encoder (resp. decoder) in a sensible manner.

• The n-to-m terminology suggests inputs (resp. outputs) drawn from sets of 2n (resp. 2m ) values. However,
it is clearly possible, and often useful for some code words to remain unused; as such, it can be useful to
relax the terminology this think of n-value and m-value sets instead. Note there is no strict requirement
that m > n, or vice versa.

• Normally we need to consider the encoder and decoder together, as their behaviour is related: we
normally expect
(Decode ◦ Encode)(x) = x,
i.e., Decode = Encode−1 . This fact implies that it is not always possible to describe a valid decoder (resp.
encoder) for a given encoder (resp. decoder): some functions have no inverse. That said, however, some
contexts do not need a decoder: the problem at hand may be st. the code word is useful as is, and the
corresponding x need not be recovered.

Example 2.30. Consider the task of taking n inputs, say xi for 0 ≤ i < n, and producing a unsigned integer x′
that determines which xi = 1 given that for all j , i, x j = 0. In other words, we want an encoder that takes x and
produces some x′ as a result; the task of taking x′ and recovering each xk for 0 ≤ k < n demands a corresponding

git # 8b6da880 @ 2023-09-27 102


© Daniel Page ⟨dan@phoo.org⟩

0 ci co ci co ci co ci co
x x x x
y s y s y s y s

0
r0

r1

rn−1
Figure 2.28: An incorrect counter design, using naive “looped” feedback.

decoder. This problem might be motivated by a need to control components: if we have n such components in
a system, the decoder could, for instance, be used to enable one of them at a time.
By setting n = 4 for example, the encoder (resp. decoder) will have four inputs x0 , x1 , x2 , and x3 ; this implies
x′ ∈ {0, 1, 2, 3} and hence m = 2, meaning two outputs x′0 and x′1 . Figure 2.27a and Figure 2.27c show truth tables
for the two components. For the encoder, we derive the Boolean expressions

x′0 = x1 ∨ x3
x′1 = x2 ∨ x3

and for the decoder


x0 = ¬x′0 ∧ ¬x′1
x1 = x′0 ∧ ¬x′1
x2 = ¬x′0 ∧ x′1
x3 = x′0 ∧ x′1

Example 2.31. Using the previous example for motivation, imagine we break the rules and set both x1 = 1 and
x2 = 1: the encoder fails, producing
x′0 = x1 ∨ x3 = 1
x′1 = x2 ∨ x3 = 1
as the code word (incorrectly suggesting that x3 = 1). To address problems of this sort, we can employ a priority
encoder, giving x2 priority over x1 for example (or, more generally, every xi has priority over x j for i > j). To
capture this requirement, we rewrite the truth table as follows:

PriorityEnc-4-to-2
x3 x2 x1 x0 x′1 x′0
0 0 0 1 0 0
0 0 1 ? 0 1
0 1 ? ? 1 0
1 ? ? ? 1 1

Take the 2-nd row for example: although potentially x0 = 1 or x1 = 1, the output gives priority to x2 . That is,
provided that x2 = 1 and x3 = 0 (i.e., irrespective of x0 and x1 ) the output will be st. x′0 = 0 and x′1 = 1. The
associated Boolean expressions are updated accordingly to

x′0 = (x1 ∧ ¬x2 ∧ ¬x3 ) ∨ x3


x′1 = (x2 ∧ ¬x3 ) ∨ x3

2.3 Sequential logic


Imagine we are tasked with designing an n-bit counter, i.e., a component whose n-bit output r steps through
values 0, 1, . . . , 2n − 1 and then cycles (i.e., starts from 0 again). Recall that we have a 1-bit full-adder component;
Chapter 3 later explains how to extend this into an adder that can compute x + y for n-bit x and y, the basic idea
being to organise n full-adder instances in a cascade where the carry-out of each i-th instance connects to the
the carry-in of the next, (i + 1)-th instance. A natural attempt at the counter design might therefore use such an
adder, computing r ← r + 1 as Figure 2.28 illustrated in: we essentially set one input of the adder y = 1 and the
other to x = r, suggesting the adder will compute r + 1. This might sound like a reasonable approach in theory,
but has (at least) two practical flaws:

1. we cannot initialise the value, and

2. we do not let the output of each full-adder settle before it is used again as an input: they are computed
continuously (because there is a loop, from x through the full-adder to r and so back to x).

git # 8b6da880 @ 2023-09-27 103


© Daniel Page ⟨dan@phoo.org⟩

positive level negative level

clock cycle negative edge positive edge


(a) A 1-phase clock.

Φ2

Φ1

δ1 δ2 δ3 δ4
(b) A 2-phase clock.

Figure 2.29: An illustration of standard features in 1- and 2-phase clocks.

So despite the fact it intuitively functions as required, this design is far from ideal and, in fact, invalid. Perhaps
the only use it has is to illustrate some fundamental limitations of combinatorial logic. More specifically, we
cannot control when a component computes an output (since it does so continuously), nor have it remember
said output once produced. We need a different approach, which along with components used to support it, is
termed sequential logic: we need

• some way to control (e.g., synchronise) components,

• one or more components that remember what state they are in, and

• a mechanism to perform computation as a sequence of steps, rather than continuously,

which are addressed step-by-step in the following Sections.

2.3.1 Clocks
If we want to perform computation as a sequence of steps, we need to exert control over the components
involved: for example, it could be important to synchronise each component st. they all start (or stop)
computation at the same time. We use a special control signal to do this:

Definition 2.29. A clock signal is simply a digital signal that oscillates between 0 and 1 in a regular fashion.

Note that despite the terminology, in the context of digital logic a clock is somewhat analogous to a metronome:
rather than tracking the (wall-clock) time, for example, it simply produces a regular series of “ticks” (or features)
that are used for synchronise associated actions.

Clock features Since a clock signal is a digital signal, it shares features such as positive and negative edges
and levels as previously outlined within Chapter 1 and now by Figure 2.29a. That said, however, several
specific features are also important:

Definition 2.30. The interval between a given positive (resp. negative) edge and the next positive (resp. negative) edge
is termed a clock cycle. Additional terms you commonly encounter stem from this definition: for example, the clock
period is the time taken for a clock cycle to complete, while the clock frequency (or clock rate) is the number of clock
cycles completed in a unit of time (typically each second, and hence the inverse of the clock period).

Definition 2.31. The time the clock signal spends at positive and negative levels need not be equal; the term duty cycle
is used to describe the ratio between these times. A clock will typically have a duty cycle of a half, meaning the signal is
at a positive level (literally “doing its duty”) for the same time it is at a negative level, but clearly other ratios are valid.

These features are harnessed by a clocking strategy, which is a formal way of saying “a way of using the clock
signal”. For example, we might use a clock edge to trigger the start some computation, or a clock level to
enable or disable (e.g., reset) some computation.

git # 8b6da880 @ 2023-09-27 104


© Daniel Page ⟨dan@phoo.org⟩

Clock generation In a sense, any signal could be deemed a clock signal provided it satisfies the definition(s)
above. However, in practice there is set of distinguished clock signals generated by a) an external or b) an
internal clock source (or clock generator) component.
Example 2.32. An external clock source is commonly provided using a piezoelectric crystal. When a voltage
is applied across such a component, it will oscillate according to a natural frequency (related to the physical
characteristics of the crystal); roughly speaking, one can use the resulting electrical field generated by this
oscillation as a clock signal.
Definition 2.32. It can be useful to manipulate a given clock signal, in order to alter it wrt. features such as frequency; this
is common whenever the clock signal is provided as an input to a design, but the design has specific internal requirements.
In this context, the original and manipulated cases are sometimes termed the reference clock signal and derived clock
signal.
Increasing the frequency of, i.e., multiplying, a reference clock is possible but somewhat beyond our scope;
dedicated designs exist to solve this problem, but we omit a detailed overview. Decreasing the frequency of,
i.e., dividing, a reference clock is much easier. Imagine that each positive edge of the reference clock clk causes
a counter c to be incremented: assuming c = 0 initially, the individual bits of c can be visualised as

clk
c0
c1
c2
c=0
c=1
c=2
c=3
c=4
c=5

Notice that each successive bit of c models clk, but with a period that is twice as long: formally, the (i − 1)-th
bit of the counter c acts like clk divided by 2i . This means, for example, that if i = 1 we can extract a clock signal
with 21i = 211 = 12 times (i.e., half) the frequency via the 0-th bit of c.

Clock distribution
Definition 2.33. As with the power rails, a given clock signal must be distributed (or supplied) to each component that
makes use of it; a clock distribution network is tasked with doing so.
Definition 2.34. The term clock skew describes a phenomena whereby a clock signal arrives at one component along a
different path to another, and so at a different time; this suggests the two components are unsynchronised.
Example 2.33. Example clock distribution network topologies include the H-tree, which is a form of space
filling curve. The advantage of a H-tree is that wire delay from the clock generator to each target component,
is uniform: this helps minimise the potential for clock skew.
Definition 2.35. The term clock domain defines the influence of control exerted by a specific clock signal; every
component in a given clock domain is controlled by the same clock signal.
It is hard(er) to reason about the relationship between the features in different clock signals that imply different
clock domains. This means, for example, that a) synchronising, and/or b) communicating values between
components in two, different clock domains is harder than if the same components are in the one, single clock
domain: intuitively, for example, it is hard to tell when positive edges on said clocks may occur at the same
time (and so synchronise the components, say). As a result, points of interaction between (i.e., at the boundary
of) clock domains (e.g., so-called clock domain crossings) demand careful attention.

From 1-phase to n-phase clocks Although it is easiest to think of a single clock signal, as illustrated in
Figure 2.29a, more complicated arrangements are both possible and useful. A central example is the concept
of an n-phase clock, which sees the clock distributed as n separate signals along n separate wires.
A common4 instance of this general concept is 2-phase clock: the idea is that the clock is represented by
two signals, often labelled Φ1 and Φ2 . Figure 2.29b shows how the signals behave relative to each other. Note
that features within a 1-phase clock, e.g., the clock period, levels and edges, translate naturally to both Φ1 and
Φ2 . However, notice the additionally guarantee which means their positive levels are non-overlapping: while
Φ1 is at a positive level, Φ2 is always at a negative level and vice versa. This behaviour is controlled by four
parameters
4 Based admittedly on limited experience, it seems that relatively few textbooks cover both 1- and 2-phase clocking strategies: in some

ways this is a pity, since the use of 2-phase clocks is certainly simpler given the requirement for latches rather than flip-flops. If you want
an alternative overview, then [11, Section 5] offers an example.

git # 8b6da880 @ 2023-09-27 105


© Daniel Page ⟨dan@phoo.org⟩

• δ1 is the period between a negative edge on Φ2 and a positive edge on Φ1 ,

• δ2 is the period between a positive edge on Φ1 and a negative edge on Φ1 ,

• δ3 is the period between a negative edge on Φ1 and a positive edge on Φ2 , and

• δ4 is the period between a positive edge on Φ2 and a negative edge on Φ2 .

Adjusting these parameters will shorten or elongate the period of Φ1 and/or Φ2 , or the “gaps” between them,
but the central principle of their being non-overlapping is maintained.

2.3.2 Latches, flip-flops and registers


Our second requirement is a component which remembers what state it is in, which is to say it stores a value
(state and value are used synonymously). Put more formally, it should retain some current state Q (which can
also be read as an output), and allow update to some next state Q′ (which is provided as an input, meaning we
basically store the input value).

Definition 2.36. A stateful component can be classified as being

1. an astable, where the component is not stable in either state and flips uncontrolled between states,

2. a monostable, where the component is stable in one states and flips uncontrolled but periodically between states,
or

3. a bistable, where the component is stable in two states and flips between states under control of an input.

The third class or bistables is often the most useful, and our focus here, since it has the most useful behaviour.

Definition 2.37. Given a suitable bistable component controlled using an enable signal en that determines when updates
happen, we say it can be

1. level-triggered, i.e., updated by a given level on en, or

2. edge-triggered, i.e., updated by a given edge on en.

The former type is typically termed a latch, with the latter termed a flip-flop.

Latches are sometimes described as transparent: this term refers to the fact that while enabled, their input and
output will match since the state (which matches the output) is being updated with the input. This is not the
case with flip-flops, because their state is only updated at the exact instant of an edge.

Definition 2.38. Whether a positive or negative level (resp. edge) of some signal controls the component depends on
whether it is active high or active low; a signal en is often written ¬en to denote the latter case.

Definition 2.39. It is common for a given latch or flip-flop design to include additional control signals; an important
example is a reset signal rst, that is often included to allow (re)initialisation of a design.

Definition 2.40. When used as a verb rather than a noun (cf. logic gate), gate means to conditionally turn off some
component or feature.

Example 2.34. Consider a component whose 1-bit input x is gated by AND’ing it with a control signal g: the
input provided to the component is
x′ = g ∧ x.
We say g gates x because if g = 0 then x′ = g ∧ x = 0 ∧ x = 0: whatever the value of x, the component gets
x′ = 0 as input if g = 0, hence x has been “turned off” by g. In contrast, if g = 1 then x′ = g ∧ x = 1 ∧ x = x: the
component gets x′ = x as normal if g = 1.

Our description of such components has so far been very abstract; the goal in what follows is to remedy this
situation. First, we describe the high-level design and behaviour of some latch and flip-flip components. Then
we show how this behaviour can be realised, using a lower-level design expressed in terms of logic gates. In
combination, we focus specifically on the goal of developing an n-bit register based on D-type latches and/or
flip-flops.

git # 8b6da880 @ 2023-09-27 106


© Daniel Page ⟨dan@phoo.org⟩

2.3.2.1 Common latch and flip-flop types

High-level descriptions of behaviour There are four common, concrete instantiations of the somewhat
abstract components described above. That is, we usually rely on four common latch and flip-flop types:

1. An SR-type latch (resp. SR-type flip-flop) component has two inputs S (or set) and R (or reset):

• when enabled and


– S = 0, R = 0 the component retains Q,
– S = 1, R = 0 the component updates to Q = 1,
– S = 0, R = 1 the component updates to Q = 0,
– S = 1, R = 1 the component is meta-stable,
but
• when not enabled, the component is in storage mode and retains Q.

The corresponding behaviour is described as follows:

SR-Latch/SR-FlipFlop
Current Next
S R Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 ? ? ? ?

2. A D-type latch or “data latch” (resp. D-type flip-flop) component has one input D:

• when enabled and


– D = 1 the component updates to Q = 1,
– D = 0 the component updates to Q = 0,
but
• when not enabled, the component is in storage mode and retains Q.

The corresponding behaviour is described as follows:

D-Latch/D-FlipFlop
Current Next
D Q ¬Q Q′ ¬Q′
0 ? ? 0 1
1 ? ? 1 0

3. A JK-type latch (resp. JK-type flip-flop) component has two inputs J (or set) and K (or reset):

• when enabled and


– J = 0, K = 0 the component retains Q,
– J = 1, K = 0 the component updates to Q = 1,
– J = 0, K = 1 the component updates to Q = 0,
– J = 1, K = 1 the component toggles Q,
but
• when not enabled, the component is in storage mode and retains Q.

The corresponding behaviour is described as follows:

git # 8b6da880 @ 2023-09-27 107


© Daniel Page ⟨dan@phoo.org⟩

D Q D Q
en en
¬Q ¬Q

(a) A level-triggered, D-type latch. (b) A edge-triggered, D-type flip-flop.

Figure 2.30: Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation around en).

JK-Latch/JK-FlipFlop
Current Next
J K Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 0 1 1 0
1 1 1 0 0 1

4. A T-type latch or “toggle latch” (resp. T-type flip-flop) component has one input T:

• when enabled and


– T = 0 the component retains Q,
– T = 1 the component toggles Q,
but
• when not enabled, the component is in storage mode and retains Q.

The corresponding behaviour is described as follows:

T-Latch/T-FlipFlop
Current Next
T Q ¬Q Q′ ¬Q′
0 0 1 0 1
0 1 0 1 0
1 0 1 1 0
1 1 0 0 1

It is useful to look in more detail at the D-type component, since this will help explain the basic concepts. The
component has

• in addition to the enable signal en, one input called D, and

• two outputs, Q and ¬Q; we can ignore ¬Q usually, but note that it should always be the inverse of Q.

The truth table that describes the behaviour is split into two halves, which is unlike what we have seen
previously: the left-hand half is a description of the current state, the right-hand a description of the next state,
i.e., after we perform an update. Sometimes this is termed an excitation table to distinguish it from a standard
truth table. So, for example, the first row can be read as “if D = 0, then no matter what the current state is then
the next state should be Q = 0”, and the second row can be read as “if D = 1, then no matter what the current
state is then the next state should be Q = 1”. Put another way, this component works as required: we can either
update it to store D when enabled, or operate it in storage mode to retain Q otherwise.
Armed with this knowledge, we can already think about using such components in our designs: we
expand on their internal design in the following Sections, but can already use more abstract symbols shown
in Figure 2.30 to differentiate between the latch and flip-flop versions. Similar symbols describe components
other than the D-type one we have focused on. They typically retain the the triangle annotation (or absence
thereof) on en, and commonly omit any unused outputs (e.g., ¬Q).

git # 8b6da880 @ 2023-09-27 108


© Daniel Page ⟨dan@phoo.org⟩

Low(er)-level descriptions of behaviour Still focusing on the D-type component, lower-level use can be
illustrated using a timing diagram, which shows the behaviour of the enable signal en (which we assume is
active high), the input D and the output Q. For a D-type latch we have something like the following:

en
Q
D

t0 t1 t2 t3 t4 t5

The vertical dashed lines highlight important points in time; between t1 and t2 , and t3 and t4 for instance, en is
at a positive level so the latch state is updated to match D. Otherwise, for example between t0 and t1 , en is at a
negative level so changes to D do not effect the latch state: the latch is in storage mode, meaning it retains the
current state. Swapping to a D-type flip-flop, the behaviour changes:

en
Q
D

t0 t1 t2

Now the flip-flop state will be updated to match D only at the points in time where en transitions from 0 to 1;
this happens at t0 , t1 and t2 , meaning interim changes to D have no effect on the flip-flop state.

Definition 2.41. Using a component of this type is more difficult in practice than alluded to by these examples. Although
we largely ignore them from here on, the following are important:

1. The setup time (resp. hold time) is the minimum period of time that D must be stable before (resp. after) use to
update the component.
Think of the clock feature (either level or edge) as triggering the act of sampling from D in order to update the state.
As such, the two timing restrictions mentioned make sure the sample is reliable: they specify a window, around the
change to en, where D has to be stable for some period of time.

2. The clock-to-output time is an artefact of propagation delay: a delay will exist between the update event being
triggered by the associated clock feature, and the output Q changing to match.

These time periods or delays will be determined by the implementation of the component; ideally they will be minimised,
which makes the component easier to use (i.e., more tolerant).

Example 2.35. The concepts of setup, hold, and clock-to-output time are illustrated in the following (intention-
ally exaggerated) waveform relating to a D-type (edge triggered) flip-flop:

setup hold
time time

en
Q
D

clock-to-output
time

git # 8b6da880 @ 2023-09-27 109


© Daniel Page ⟨dan@phoo.org⟩

S ¬Q S Q

R Q R ¬Q

(a) An NOR-based SR type latch. (b) An NAND-based SR type latch.

S ¬Q S Q

en en

Q ¬Q
R R
(c) An NOR-based SR type latch with enable signal. (d) An NAND-based SR type latch with enable signal.

D S0 D S0
¬Q Q

en en

Q ¬Q
R0 R0
(e) An NOR-based SR type latch with enable signal and (f) An NAND-based SR type latch with enable signal and
R = ¬S. R = ¬S.

Figure 2.31: A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated (middle
and bottom) control features.

1 0 0 1
S ¬Q S ¬Q

R Q R Q
0 1 1 0

(a) A case for S = 1, R = 0. (b) A case for S = 0, R = 1.

0 1 0 0
S ¬Q S ¬Q

R Q R Q
0 0 0 1

(c) A case for S = 0, R = 0. (d) A case for S = 0, R = 0.

1 0
S ¬Q

R Q
1 0

(e) A case for S = 1, R = 1.

Figure 2.32: A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases for S = 0
and R = 0, and no sane cases for when S = 1 and R = 1.

git # 8b6da880 @ 2023-09-27 110


© Daniel Page ⟨dan@phoo.org⟩

An aside: NAND- rather than NOR-based latches.

As an aside, we can construct (more or less) the same component using NAND rather than NOR gates; the
NAND-based versions are shown alongside each of the associate NOR-based Figures. This change implies
a subtle difference in behaviour however. Essentially, the storage and meta-stable states are swapped over:
when enabled and

• S = 1, R = 1 (rather than S = 0 and R = 0) the component retains Q, and

• S = 0, R = 0 (rather than S = 1 and R = 1) the component is meta-stable.

In addition, the Q and ¬Q outputs from the component swap over as well. In short, the NAND-based version
still achieves the same goal, but we need to carefully translate the behaviour when using it within a larger
design. It is often termed an SR latch rather than SR latch to highlight this fact, which we adopt to avoid
confusion about which type of component is meant.

2.3.2.2 Implementation step #1: a basic SR latch

The first step is somewhat counter-intuitive. We start by looking at the Set-Reset or SR latch: the circuit
shown in Figure 2.31a has two inputs called S and R which are the set and reset signals, and two outputs Q
and ¬Q. Internally, the arrangement will probably seem odd in comparison to other designs we have seen so
far: the outputs of each NOR gate is wired to the input of the other, an arrangement we say is cross-coupled.
Understanding the behaviour of the design as a whole depends on a property of the NOR gates. Recall
(e.g., from Figure 2.13) that we can describe NOR using a truth table as follows:
NOR
x y r
0 0 1
0 1 0
1 0 0
1 1 0

In particular, this illustrate the fact that if either x = 1 or y = 1 then the result must be r = x ∨ y = 0. Put another
way, we can write two axioms
x ∨1 = 0
1 ∨y = 0
These are important, because they allow us to resolve the loop introduced by the cross-coupled nature of NOR
gates in this design. We can see how, on a case-by-case basis, by observing output for each possible assignment:
this is shown in Figure 2.32.
• if S = 1, R = 0 (Figure 2.32a) then we force Q = 1, ¬Q = 0 (irrespective of what they were previously)
because the top NOR gate must output 0 because we know 1 ∨ y = 0,
• if S = 0, R = 1 (Figure 2.32b) then we force Q = 0, ¬Q = 1 (irrespective of what they were previously)
because the bottom NOR gate must output 0 because we know x ∨ 1 = 0,
• if S = 0, R = 0 then the outputs are not uniquely defined by the inputs: there are in fact two logically
consistent possibilities (Figure 2.32c and Figure 2.32d), namely Q = 1, ¬Q = 0 or Q = 0, ¬Q = 1,
• if S = 1, R = 1 (Figure 2.32e) then we force Q = 0, ¬Q = 0: in a sense this is contradictory, because we
expect each to be the inverse of the other, but hints at another problem.
In the final case, the latch could be (and we have) described as being in a meta-stable state because the eventual
output is not predictable. An intuitive reading is that it makes no sense to both set and reset the value, so some
form of unexpected behaviour for S = R = 1 is therefore not unreasonable. More specifically though, once we
return to S = 0, R = 0 the latch must settle in one or other of the two possibilities outlined above: we cannot
predict which one, however, so the subsequent state of the latch is essentially random.
Note that in terms of the specified behaviour, the design does what we want. For example, we can set
or reset the current state (per Figure 2.32a and Figure 2.32b) or retain the current state (per Figure 2.32c and
Figure 2.32d) as need be. However, this high-level description avoids two perfectly reasonable questions,
namely

git # 8b6da880 @ 2023-09-27 111


© Daniel Page ⟨dan@phoo.org⟩

Vdd

S t0

r1 t1

r0

S r1

t2 t3
Vss

Vdd

R t4

r0 t5

r1

R r0

t6 t7
Vss

Figure 2.33: An annotated SR latch, decomposed into two NOR gates and then into transistors; r0 , the output of the top
NOR gate, is used as input by the bottom NOR gate and r1 , the output from the bottom NOR gate, is used as input by
the top NOR gate (although the physical connections are not drawn).

1. how does the latch settle into any state, particularly given the case where S = R = 0 seems to imply there
are two options, and

2. how does it stay in one of those states when S = R = 0.

Up to a point, it is reasonable to consider that if it the latch settles into one of the two logically consistent states,
there is just no motivation for it to subsequently change into the other; therefore, it will retains the same state.
To provide greater detail, however, we rely on Figure 2.33. The idea is it decomposes the SR latch design into
eight individual transistors (labelled t0 through to t7 ) which implement the two NOR gates; this annotation is
important because it allows a clear explanation of their behaviour.

Question #1: how does the latch settle into a state? You can use a similar reasoning for all four cases, but
focus on S = 0 and R = 1 which mean

• t0 is a P-MOSFET, so is connected since S = 0,

• t2 is an N-MOSFET, so is disconnected since S = 0,

• t4 is a P-MOSFET, so is disconnected since R = 1, and

• t6 is an N-MOSFET, so is connected since R = 1.

This means r1 = 0 because t6 is connected and t4 is disconnected. Now we can see that

• t1 is a P-MOSFET, so is connected since r1 = 0,

• t3 is an N-MOSFET, so is disconnected and since r1 = 0.

This means r0 = 1 because t0 and t1 are connected, while t2 and t3 are disconnected. Finally, we can check for
consistency, noting

git # 8b6da880 @ 2023-09-27 112


© Daniel Page ⟨dan@phoo.org⟩

• t5 is a P-MOSFET, so is disconnected since r0 = 1, and

• t7 is an N-MOSFET, so is connected since r0 = 1.

This means r1 = 0 because t6 and t7 are connected, while t4 and t5 are disconnected: we knew that anyway. So,
in short, the circuit settles into a stable state even though it might seem the “loop” would prevent it doing so,
and is valid in the sense that r0 and r1 (i.e., Q and ¬Q) are each others inverse as expected.

Question #2: how does the latch remain in a state? Now imagine we flip to S = R = 0, meaning we would
like to retain the state fixed above, i.e., keep Q = 0 until we want to update it again. Two transistors change as
a result of R changing

• t4 is a P-MOSFET, so is now connected since R = 0,

• t6 is an N-MOSFET, so is now disconnected since R = 0.

However, everything else stays the same, i.e.,

• t5 is a P-MOSFET, so is still disconnected since r0 = 1, meaning that t4 being connected does not connect
r1 to Vdd , and

• t7 is an N-MOSFET, so is still connected since r0 = 1, meaning that t6 being disconnected does not
disconnect r1 from Vss .

That is, there is no motivation (or physical stimulus) for the transistors to flip into into the other stable state
(i.e., where S = R = 0 and Q = 1) and so the current state is therefore retained.

2.3.2.3 Implementation step #2: controlling latch updates


The initial SR latch design is arguable too simple, in the sense it is hard to use. We have little or no control over
when an update happens for instance, because any change to S or R might provoke this; it is also unattractive
that we can produce unpredictable behaviour by (perhaps unintentionally) driving it with inputs that would
cause meta-stability. Fortunately, both of these problems can be solved with only simple alterations to the
original design:

1. To control when an update happens, we gate S and R by adding an extra input en and two AND gates:
the internal latch inputs become
S′ = S ∧ en
R′ = R ∧ en
When en = 0, S and R are irrelevant: S′ and R′ will always be 0 because, for example, S′ = S ∧ 0 = 0. This
means when en = 0 the latch can never be updated. When en = 1 however, S and R are passed through
into the latch as input because S′ = S ∧ 1 = S.
Put another way, the result shown in Figure 2.31c is now clearly level-triggered because S and R only
matter during a positive level of en. Note that although en can be considered a generic enable signal, we
can use a clock signal to provoke regular, synchronised updates.

2. To avoid the situation where S = R = 1, we simply force R = ¬S by inserting a NOT gate between them
to disallow the case where S = R; Figure 2.31e shows the result, where the single input is now labelled D.
By following the above, the latch inputs become

S′ = D ∧ en
R′ = ¬D ∧ en

This might seem to imply that we cannot put the latch into storage mode any longer. However, remember
that when en = 0 we always have S′ = R′ = 0 irrespective of D, so en basically decides if we retain Q (if
en = 0) or update it with D (if en = 1).

The result now represents the D-type latch component discussed originally. Reiterating, when enabled and

• D = 1 the component updates to Q = 1,

• D = 0 the component updates to Q = 0,

but when not enabled, the component is in storage mode and retains Q.

git # 8b6da880 @ 2023-09-27 113


© Daniel Page ⟨dan@phoo.org⟩

D ¬Q
en

Figure 2.34: A NOR-based D-type flip-flop created using a glitch generator.

D Q

en

¬Q

Figure 2.35: A NOR-based D-type flip-flop created using a primary-secondary organisation of latches.

2.3.2.4 Implementation step #3: from latch to flip-flop


Our next problem is that although the level triggered D-type latch gives some control, it is not very fine-grained.
Put simply, although we restrict updates to a positive (resp. negative, for active low components) level where
en = 1 (resp. en = 0), this is potentially a lengthy period of time; the input D may potentially change several
times during this period for instance. To give more precise control over updates, we might try to convert the
latch into a flip-flop: this means restricting updates to the precise, and hence much smaller, instant in time that
an edge on en occurs.
There are various ways to realise this alteration; flip-flop design as a topic is broad enough that it starts to
go outside our scope wrt. level of detail. In the following Sections we therefore cover two approaches, both at
a somewhat high level.

Using a glitch generator One approach is to construct a circuit that will intentionally generates a glitch (or
pulse), i.e., an output that whose value will be 1 for a short period of time, namely when en transitions from
0 to 1. The glitch then approximates an edge, even though we are still actually using a level; doing so can be
rationalised by noting that as long as the glitch period is short, it will give us finer grained control than the
original latch.

Example 2.36. Reconsider Figure 2.20, whereby a glitch is is generated (in that case unintentionally) for a (short)
period of time when en = 1 and t = 1. We can drive the original D-type latch using such a design: Figure 2.34
illustrates the result, which now approximates a flip-flop due to the approximation of edge-based triggering.

Using a primary-secondary organisation An alternative approach is a primary-secondary5 organisation of


two latches, which yields the edge-triggered behaviour we want. Figure 2.35 shows the result, which is basically
one latch (the primary, on the left) in series with a second latch (the secondary, on the right). The idea is to split
a clock cycle into to half-cycles such that

1. while en = 1, i.e., during the first half-cycle, the primary latch is enabled,

2. while en = 0, i.e., during the second half-cycle, the secondary latch is enabled.

In practical terms, this means while en = 1, i.e., during a positive level on en, the primary latch stores the input.
Then, the instant en = 0, i.e., at a negative edge on en, the secondary latch stores the output of the primary
latch: you can think of it as triggering a transfer from the primary to the secondary latch, or as the secondary
latch only being sensitive to the output of the primary latch rather than the input. The fact that the transfer is
5

Historically, the terms master and slave have often been used in place of primary and secondary. Per [3, Section 1.1], however, and
despite some debate, the former are typically viewed as inappropriate now. We deliberately use the latter, therefore, noting that doing so
may imply a need to translate the former when aligning with other literature.

git # 8b6da880 @ 2023-09-27 114


© Daniel Page ⟨dan@phoo.org⟩

Q0n−1

Qn−1
Q00

Q0

Q01

Q1
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q

en
(a) A level-triggered register based on D-type latches.

Q0n−1

Qn−1
Q00

Q0

Q01

Q1
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q

en
(b) An edge-triggered register based on D-type flip-flops.

Figure 2.36: An n-bit register, with n replicated 1-bit components synchronised using the same enable signal.

instantaneous, in the sense it occurs as the result of an (in this case a negative) edge when en flips from 1 to 0,
means we get what we want, i.e., an edge-triggered flip-flop.

2.3.2.5 Implementation step #4: an n-bit register


The D-type component we have, either the latch or flip-flop version, holds a 1-bit state; to store a larger, n-bit
state we simply group together n such components into a register. This just means replicating the relevant
component type, and synchronising updates to them all using the same enable signal.
Figure 2.36 shows the general structure. We can read the current value of the register from the Q outputs:
Qi is the current state of the i-th bit held by the register. We can latch (or store) a new value Q′ into the register
by driving each Di with Q′i then waiting for an update to be triggered (which depending on the component
type, means waiting for an appropriate level or edge on en).

2.3.3 Putting everything together: general clocking strategies


2.3.3.1 A robust n-bit counter design
So now think back to our original problem outlined at the beginning of the Section: given everything accumu-
lated so far, how do we solve it? We can use one or other of two designs; both attempt to “break” the loop
evident in the original design by inserting storage components, and therefore differ as a result of opting for
either flip-flops or latches.

Example 2.37. Figure 2.37a represents a solution based on use of flip-flops, which implies a 1-phase clocking
strategy. The top-half of the Figure shows an n-bit ripple-carry adder; the idea is that it computes r′ ← r + 1.
This part is roughly the same as the initial, faulty solution. The bottom-half of the Figure shows an n-bit,
edge-triggered register; the idea is that it stores the current value of r. Beyond this, two features of the design
are vitally important:

1. Notice that the 1-bit sum produced as output by each full-adder is AND’ed with ¬rst. This acts as a
limited reset mechanism, in the sense rst gates the output register input (resp. adder output): if rst = 1
(so ¬rst = 0) then the register input will always be zero, whereas if rst = 0 (so ¬rst = 1) then the register
input will match the adder output. Put another way, if rst = 1 then the value subsequently latched by the
input flip-flop is forced to be zero: this is important, because when powered-on the current value will be
undefined and hence unusable.

2. Notice that each D-type flip-flop in the register is synchronised by clk (which we assume is a clock):
positive edges on clk provoke them to update the stored value r with r′ ← r + 1.
The original loop is broken, because the update is instantaneous not continuous: there is a “gap” between
computing and storing values, in the sense that the adder has an entire clock cycle to compute the result
r + 1 given r is stored in the flip-flops. Provided that that the propagation delay associated with the adder
is less than the clock period (i.e., we do not update r faster than r′ is computed) the problem is solved
and r cycles through the required values in discrete steps controlled by the clock.

git # 8b6da880 @ 2023-09-27 115


© Daniel Page ⟨dan@phoo.org⟩

rst
0 ci co ci co ci co ci co
x x x x
y s y s y s y s
1

0
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q

Φ1
r0

r1

rn−1
(a) Using a 1-phase clock and flip-flop based register(s).

D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q

Φ2

rst
0 ci co ci co ci co ci co
x x x x
y s y s y s y s
1

D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q

Φ1
r0

r1

rn−1

(b) Using a 2-phase clock and latch based register(s).

Figure 2.37: A correct counter design, using sequential logic components.

git # 8b6da880 @ 2023-09-27 116


© Daniel Page ⟨dan@phoo.org⟩

computes r + 1
adder

computes r + 1
adder
···

···
r←0
flip-flops reset

r←r+1
flip-flops update

r←r+1
flip-flops update

(a) Using a 1-phase clock and flip-flop based register(s).


reset r ← 0
input latches

store r ← r0
input latches

store r ← r0
input latches

···

Φ1

···
r+1
computes
adder

r+1
computes
adder

Φ2

···
store r0 ← r + 1
output latches

store r0 ← r + 1
output latches

(b) Using a 2-phase clock and latch based register(s).

Figure 2.38: Two illustrative waveforms, outlining stages of computation within the associated counter design.

git # 8b6da880 @ 2023-09-27 117


© Daniel Page ⟨dan@phoo.org⟩

Latch based Output


register(s) Φ2
Input Output
Combinatorial logic

Input Output
Combinatorial logic

Flip-flop based Output


register(s) Clock
Latch based Output
(a) Using a 1-phase clock. register(s) Φ1

(b) Using a 2-phase clock.

Figure 2.39: Two different high-level clocking strategies.

Example 2.38. Figure 2.37a represents a solution based on use of latches, which implies a 2-phase clocking
strategy. A reasonable question to ask is why we cannot just replace the flip-flops with latches? Imagine we did
this: since the latches are level-triggered, they will be updated when clk = 1. So one one hand we have broken
the original loop, but on the other hand the loop is still there when clk = 1 because the latches are essentially
transparent.
To resolve this the design uses two sets of latches, one to store the adder input and one to store the adder
output. Only one set is enabled at a time, because we use a 2-phase clock to control them; when Φ2 = 1 the
output latches store the adder output, then when Φ1 = 1 the input latches store whatever the output latches
stored and subsequently provide a new input to the adder. Clearly we need more storage components to
support this approach, but you can think of this as a trade-off wrt. reduced complexity of latches versus
flip-flops. Put another way, the design might be less efficient in terms of area but is much easier to reason
about.

2.3.3.2 Generalising the two design strategies

Figure 2.39 generalises the two counter solutions in the previous Section; you can think of both as general
frameworks, or architectures, that can be filled-in with concrete details to realise the solution to a specific
problem. These can be generalised a little further by noting the following:

Definition 2.42. A typical circuit based on sequential logic will be comprised of

1. a data-path, of computational and/or storage components, and

2. a control-path, that tells components in the data-path what to do and when to do it.

For example, within the two counter solutions we clearly have computational (i.e., the adder) and storage
components (i.e., the register), and also mechanisms to control them (i.e., the reset AND gates).

2.4 Pipelined logic


Consider some combinatorial logic component called X. In terms of efficiency, the critical path of the component
presents a major hurdle: it is what limits how quickly a result can be produced. To cope we might attempt one
of at least two approaches, namely

1. try to apply various low-level optimisations with the goal of reducing the critical path of X, or

2. apply the higher-level technique of pipelining, restructuring X as investigated by the rest of this Section.

git # 8b6da880 @ 2023-09-27 118


© Daniel Page ⟨dan@phoo.org⟩

Worker #1 Worker #2 Worker #3 Worker #4

Car #1

Step #1

Car #1

Step #2

Car #1

Step #3

Car #1

Step #4

Figure 2.40: Production line #1, staffed with pre-Ford workers.

Worker #1 Worker #2 Worker #3 Worker #4

Car #1 Car #2
#1 Car #3
#1 Car #4
#1

Step #1

Car #1 Car #2
#1 Car #3
#1

Step #2

Car #1 Car #2
#1

Step #3

Car #1

Step #4

Figure 2.41: Production line #2, staffed with post-Ford workers.

2.4.1 An analogy: car production lines


Production (or assembly) lines in the context of manufacturing offer a great analogy for the concept of pipelined
logic, which is simpler than you might expect. The basic idea of a production line is for the result to be produced
as the combination of a number of stages.
Though probably not the first to employ such a process, the manufacture of cars within the Ford Motor
Company is a good example. Ford, under direction of the owner Henry Ford, used a system of continuous
production lines to build cars. While one person was assembling the engine of car number one, another could
be attending to the body work on car number two, while yet another could be fitting the wheels to car number
three. By around 1913, Ford had his production line down to such a fine art that they were able to double the
output of all their competitors, selling half of all cars purchased in the USA. Although assigning each worker
a dedicated task reduced accident and wasted time through their wandering around the factory, the fact that
they stood in the same place for long periods performing repetitive tasks meant that RSI-type injuries were
common. Ford combated the resulting high turnover of staff by increasing wages to $5 a day, cutting shift
lengths to eight hours a day and installing a dedicated medical department. Productivity soared and the cost
of producing each vehicle decreased as a result.
Figure 2.40 and Figure 2.41 show two production lines: imagine #1 is pre-Ford and #2 is post-Ford if you
want. Notice that the production of a given car is still sequential: it moves through the stages of production in
order, one at a time in both cases. However, production line #2 benefits by overlapping production of different

git # 8b6da880 @ 2023-09-27 119


© Daniel Page ⟨dan@phoo.org⟩

cars with each other, i.e., producing more than one at a time, in parallel. We can measure the efficiency of the
production lines #1 and #2 using two metrics, the first of which probably seems more natural:
Definition 2.43. The latency is the total time elapsed before a given input is operated on to produce an output; this is
simply the sum of the latencies of each stage.
Definition 2.44. The throughput (or bandwidth) is the rate at which new inputs can be supplied (resp. outputs
collected).
The point is that although the latency associated with one car is not changed (it takes 4 time units to produce
a car in both production lines), the throughput is: in production line #2 we produce a new car every time unit
(once the production line is full), whereas we only produce one every 4 time units in #1. In a sense this is an
obvious byproduct of the fact that in production line number #1 some of the stages are idle at any given time,
but in number #2 they are all active eventually.
If we generalise, an n-stage production line will ideally give us an n-fold improvement in throughput.
However, there are some caveats:
• The maximum improvement comes only when we can keep the production line full of work: if the first
stage does not start because there is a lack of demand, the production line as a whole is utilised less
efficiently.
• If we cannot advance the production line for some reason (perhaps one stage is delayed by a lack of
parts), we say it has stalled; this also reduces utilisation.
• The speed at which the production line can be advanced is limited by the slowest stage; to minimise
idle time, balance is needed between the workload of stages. That is, if there is one stage that takes
significantly longer than the rest (e.g., it involves some relatively time consuming task), it will hold up
the rest.
• Usually a production line will not be perfect: moving the result of one stage to the next will take some
time, so there is some (perhaps small) overhead associated with all stages. This overhead typically reduces
efficiency; minimising it means we can get closer to the ideal n-fold improvement.

2.4.2 Treating logic as a production line


Fortunately, pipelined logic does not suffer from the human-related problems that the Ford production line
did: our logic gates never tire, get RSI, or complain about wages for example! Other than this, the principles
are almost exactly the same. That is, we aim to
1. split some combinatorial logic X into a pipeline of n pipeline stages, say Xi for 0 ≤ i < n, arranged in
sequence,
2. have each stage perform one step of the overall computation, with in-flight (or active) partial computation
advancing through the pipeline stage-by-stage, and
3. supply inputs into the first stage X0 , and collect outputs from the last stage Xn−1 .

2.4.2.1 Problem #1: how to structure the pipeline


Given X, our first problem is in two parts: first where can we split it to produce the Xi (which depends heavily
on what X is), and second where should we split it?
A generic answer to the first question is hard, since it depends on the component itself. About the most
general approach we can start with is to identify natural splitting points, i.e., look at X and see where there are
steps in the overall computation that can be grouped together. The second question is, however, easier: once
we have an idea where we can split X, we can look at all the options and select the one that produces the best
result. More specifically, we know the slowest stage dictates how fast we can advance the pipeline; our goal is
therefore to balance the stages (as far as possible) so idleness is minimised (i.e., we avoid one stage waiting for
another). This is illustrated by Figure 2.42 wherein four options for splitting some component X into stages Xi
are given:
1. a 1-stage unpipelined design, basically representing the original component X,
2. a 2-stage pipelined design where X0 has a larger latency than X1 ,
3. a 2-stage pipelined design where X1 has a larger latency than X0 , and
4. a 3-stage pipelined design where all stages have equal latency.
Focusing only on the idea of balancing the stages, the last option is most attractive: since all stages take the
same time to compute their part of the overall result, selecting this option will minimise potential idleness.

git # 8b6da880 @ 2023-09-27 120


© Daniel Page ⟨dan@phoo.org⟩

x X r
300ns

(a) Option #1: 1-stage, unpipelined.

x X0 X1 r
200ns 100ns

(b) Option #2: 2-stage, pipelined but unbalanced.

x X0 X1 r
100ns 200ns

(c) Option #3: 2-stage, pipelined but unbalanced.

x X0 X1 X2 r
100ns 100ns 100ns

(d) Option #4: 3-stage, pipelined and balanced.

Figure 2.42: Four different ways to split a (hypothetical) component X into stages.

2.4.2.2 Problem #2: how to control the pipeline

The next problem is how we control the pipeline so it does what we want, at the right time, to produce the
right outputs from the right inputs. Consider Figure 2.43a, which outlines some generic pipeline stages (whose
behaviour is irrelevant). There are two key problems:

1. Fundamentally, the stages cannot operate on different inputs if there is nowhere to store those inputs: if
we supply a new input to X0 at each step, where does the first input go once the first step is finished? It
should be processed by X1 , but instead it will vanish when replaced with the input for the second step.
2. Imagine in the j-th clock cycle the i-th stage Xi computes a partial result ti required by the (i + 1)-th stage.
If the stages are connected by a wire, as soon as the i-th stage changes ti this potentially disrupts what the
(i + 1)-th stage is doing.

So instead, we connect the stages with by pipeline registers, say Ri . This means the (i + 1)-th stage can have
a separate, stable input which only changes when the register latches a new value, i.e., when the pipeline
advances. However, each pipeline register takes time to operate, and so adds to the total latency.
Figure 2.43c outlines the new structure, which resolves both problems above. The structure is controlled
by adv, shown here as a single global signal that advances all stages at the same time by having the output of
each Xi stored in Ri and hence used subsequently as input to Xi+1 . Figure 2.44 gives a high-level overview of
progression through the pipeline, controlled by positive edges on adv.
The implication of this structure is that we need to take more care wrt. how we split X into stages.
Specifically, more pipeline registers means larger overall latency; as a result, we cannot simply split X into as
many stages as we need to have them balanced. Rather, we must make a trade-off between increased latency (as
the result of some pipeline registers) and increased throughput (as the result of the pipelined design overall).

2.4.3 Some concrete examples


So far, our discussion has been necessarily abstract: many details of a concrete pipeline depend on the
component under consideration.
Example 2.39. Consider Figure 2.45, wherein an abstract component X is shown in both unpipelined and
pipelined forms. In the unpipelined case we find that the latency is

300 + 20ns = 320ns,

git # 8b6da880 @ 2023-09-27 121


© Daniel Page ⟨dan@phoo.org⟩

An aside: synchronous versus asynchronous pipelines.

A synchronous pipeline is a term used to describe a pipeline structure where all stages are globally synchro-
nised, controlled using a single global signal adv which you can think of as a clock; to re-enforce this fact, the
period between advances is often termed a pipeline cycle.
In an asynchronous pipeline the aim is to remove the need for global control over when the pipeline
advances, and hence remove the need for a global clock. Roughly speaking, control is devolved into the
pipeline stages themselves: for one stage to advance, it must engage in a simple handshake with the preceding
and subsequent stages to agree when to advance. More formally each Xi controls advi , the local signal that
determines when it advances, by communicating with Xi−1 and Xi+1 .
This is advantageous in that stages can operate as fast or slow as their workload, rather than a global clock,
dictates: the asynchronous pipeline can advance whenever the result is ready rather than being pessimistic
and forcing advancement at the rate of the slowest stage. However, although the global clock is removed one
potential disadvantage of this approach is overhead in provision of the handshake mechanism that has to exist
between stages; clearly this can become quite complex depending on the pipeline structure.

while the throughput is


1/320ns = 3.12 × 106 operations/s
if we measure the latency of computing and storing the result. However, for a 3-stage pipeline (using the same
measure) the latency is
100 + 20 + 100 + 20 + 100 + 20ns = 360ns,
while the throughput is
1/120ns = 8.33 × 106 operations/s.
That is, we have improved the throughput by (roughly) a factor of three: we now get an output from the
pipeline (resp. can provide new input) every 120ns rather than 320ns. The drawback is that the overall latency
of a given operation is slightly more, i.e., 360ns rather than 320ns.

Great. But what use is this? The point is, we can relate this abstract example to a concrete component which
acts as motivation for why such an improvement is worthwhile.

Example 2.40. Consider a component that performs the logical left-shift of some 8-bit vector x by a distance of
y ∈ {0, 1, . . . , 7} bits. There are a variety of approaches to designing a circuit with the required behaviour, but
one of the simplest is a combinatorial, logarithmic shifter. We will look at the design in detail in Chapter 3, but
the idea is illustrated by Figure 2.46a. In short, the result is computed using three steps: each step produces
an intermediate result by either shifting an intermediate input by some fixed distance (the i-th stage shifts by
2i bits), or simply passing it through unaltered. For example, if we select y = 6(10) = 110(2) then

1. since y0 = 0, the 0-th stage passes the input x through unaltered to form the intermediate result x′ , then

2. since y1 = 1, the 1-st stage shifts the intermediate input x′ by a distance of 21 = 2 bits to form the
intermediate result x′′ , then

3. since y2 = 1, the 2-nd stage shifts the intermediate input x′′ by a distance of 22 = 4 bits to form the result r

meaning overall, x is shifted by 2 + 4 = 6 bits as required.


Applying the same reasoning as above Figure 2.46b splits the design into a 3-stage pipeline; this decision
is natural given that the computation is trivially split into three stages of equal latency. Now, the critical path
is now determined by just one stage rather than all three since each stages works independently; the 1-st and
2-nd stages, for example, compute results using an input in the 1-st and 2-nd pipeline registers while the 0-th
stage computes a result using the input x. As such, we get a similar benefit as the abstract example: basically
we improve the throughput by nearly a factor of three, with a slight increase in overall latency as a result of
the extra registers.

2.5 Implementation and fabrication technologies


When we write software (i.e., a program), we usually intend to use it somehow (i.e., execute it on a computer).
The program (or description of behaviour) is often compiled into an form we can use; depending on the

git # 8b6da880 @ 2023-09-27 122


© Daniel Page ⟨dan@phoo.org⟩

i-th (i + 1)-th
stage stage

ti ti+1 ti+2

(a) Option #1: without pipeline registers.

i-th i-th (i + 1)-th (i + 1)-th (i + 2)-th


register stage register stage register

ti ti+1 ti+2

adv

(b) Option #2: with pipeline registers and a global control signal.

i-th i-th (i + 1)-th (i + 1)-th (i + 2)-th


register stage register stage register

ti ti+1 ti+2

adv0 adv1 adv2


(c) Option #3: with pipeline registers and multiple, per-stage control signals.

Figure 2.43: A problematic pipeline, and a solution involving the use of pipeline registers and a control signal to indicate
when each stage should advance.

processor we want to use, our program might be compiled in different ways and produce different executable
forms.
In a rough sense, the same process applies to circuits: once we have a description of behaviour, we need to
actually realise the corresponding components (i.e., logic gates or transistors) so that we can use them. There
are various ways to achieve this, which depend on the underlying technology used: using semi-conductors to
construct transistors is not the only option. Although the topic is somewhat beyond the scope of this book, it
is useful to understand some approaches and technologies involved: at very least, it acts to connect theoretical
concepts with their practical realisation.

2.5.1 Silicon fabrication


2.5.1.1 Lithography
The construction of semi-conductor-based circuits is very similar to how pictures are printed, or at least were
printed before the era of digital photography and laser printers! The act of printing pictures onto a surface
is termed lithography and has been used for a couple of centuries to produce posters, maps and so on; the
process involves controlled use of chemical processes within a controlled environment, often termed a dark
room. The basic idea is to coat a surface, which we usually call the substrate, with a photosensitive chemical.
We then expose the substrate to light projected through a negative, or mask, of the required image; the end
result is a representation of said image left on the substrate where light reacts with the chemical. After washing
the substrate, one can treat it with further chemicals so that the treated areas representing the original image
are able to accept inks while the rest of the substrate cannot.

git # 8b6da880 @ 2023-09-27 123


© Daniel Page ⟨dan@phoo.org⟩

stored in Ri
from ti
computes ti+1
each Xi
stored in Ri
from ti
computes ti+1
each Xi
···

···

computed by Xi
stores ti+1
each Ri+1

computed by Xi
stores ti+1
each Ri+1

computed by Xi
stores ti+1
each Ri+1
Figure 2.44: An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock.

x X R r
300ns 20ns

(a) Option #1: an unpipelined design.

x X0 R1 X1 R2 X2 R3 r
100ns 20ns 100ns 20ns 100ns 20ns

(b) Option #2: a 3-stage pipelined design.

Figure 2.45: An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative.

For semi-conductors the analogous process is photolithography, and involves very similar steps which are
illustrated by Figure 2.48. We again start (Figure 2.48a) with a substrate, which is usually a wafer of silicon;
this is often circular by virtue of machining it from a synthetic ingot, or boule, of very pure silicon. After being
cut into shape, the wafer is polished to produce a surface suitable for the next stage. We can now coat it with
a layer of base material we wish to work with (Figure 2.48b), for example a doped silicon or metal. Then we
coat the whole thing with a photosensitive chemical, usually called a photo-resist (Figure 2.48c). Two types
exist, a positive one which hardens when hidden from light and a negative one which hardens when exposed
to light. By projecting a mask of the circuit onto the result (Figure 2.48d), one can harden the photo-resist so
that only the required areas are covered with a hardened covering. After baking the result to fix the hardened
photo-resist, and etching to remove the surplus base material, one is left with a layer of the base material only
where dictated by the mask (Figure 2.48e to Figure 2.48g).
The process iterates to produce many layers of potentially different materials, i.e., the result is 3D not 2D. We
might need layers of N-type and P-type semi-conductor and a metal layer to produce transistors, for example.
The feature size (e.g., 90nm CMOS) relates to the resolution of this process; for example, accuracy of the
photolithographic process dictates the width of wires or density of transistors. Regularity of such features
is a major advantage: we can manufacture many similar components in a layer using one photolithographic
process. For example, if we aim to manufacture many transistors they will all be composed of the same layers
albeit in different locations on the substrate.

2.5.1.2 Packaging
Before we can use the “raw” output from the photolithography, a process of packaging is typically applied.
At very least, the first step is to cut out individual components from the resulting wafer: remember that we
can produce many identical components using the same process, so this step gives us a single component we
can use. Before we do so however, each component is typically mounted on a plastic base and connected
to externally accessible pins (or pads) with bonding wires. This makes the inputs to and outputs from the
component (which may be physically tiny and delicate) easier to access. A protective, often plastic, package

git # 8b6da880 @ 2023-09-27 124


© Daniel Page ⟨dan@phoo.org⟩

Flip-flop based
register
x
y × + r

(a) Option #1: an unpipelined design.

0-th 1-st 1-st 2-nd


stage register(s) stage register(s)
Flip-flop based

Flip-flop based
register

register

x
y × + r
Flip-flop based

Flip-flop based
register

register

(b) Option #2: a 2-stage pipelined design.

Figure 2.46: An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative.

git # 8b6da880 @ 2023-09-27 125


© Daniel Page ⟨dan@phoo.org⟩

Flip-flop based
register
x x x
x 1 y c
r 2 y c
r 4 y c
r r

y0 y1 y2

(a) Option #1: an unpipelined design.

0-th 1-st 1-st 2-nd 2-nd 3-rd


stage register(s) stage register(s) stage register(s)
Flip-flop based

Flip-flop based

Flip-flop based
register

register

register
x x x
x 1 y c
r 2 y c
r 4 y c
r r

y0 y1 y2
Flip-flop based

Flip-flop based

Flip-flop based
register

register

register

(b) Option #2: a 3-stage pipelined design.

Figure 2.47: An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative.

git # 8b6da880 @ 2023-09-27 126


© Daniel Page ⟨dan@phoo.org⟩

(a) The substrate. (b) Coating of base material.

(c) Coating of photosensitive chemical. (d) Exposure to a (simple) mask.

(e) Application of etching. (f) Application of etching.

(g) Final result.

Figure 2.48: A high-level illustration of a lithography-based fabrication process.

git # 8b6da880 @ 2023-09-27 127


© Daniel Page ⟨dan@phoo.org⟩

Figure 2.49: Bonding wires connected to a high quality gold pad (public domain image, source: http://en.wikipedia.
org/wiki/Image:Wirebond-ballbond.jpg).

Figure 2.50: A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public domain image,
source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_z-clip.png).

is also applied to prevent physical damage; large or power-hungry components might also mandate use of a
heat sink (and fan) to dissipate heat.
The final result is a self-contained component, which we can describe as a microchip (or simply a chip) and
start to integrate with other components to construct a larger system.

2.5.1.3 Moore’s Law


Gordon Moore, co-founder of Intel, is credited with identification of an important and influential trend as-
sociated with development of transistor-based technology. The so-called Moore’s Law was originally an
observation [5] in 1965

The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.
Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term,
the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly
constant for at least 10 years. That means by 1975, the number of components per integrated circuit for
minimum cost will be 65, 000.

– Moore

and later updated: in short “the number of transistors that can be fabricated in unit area doubles roughly every
two years”. In a sense, this has become a form of a self-fulfilling prophecy in that the “law” is now an accepted
truth: industry is forced to deliver improvements, and is in part driven by the law rather than the other way
around!
Figure 2.51 demonstrates the manifestation of Moore’s Law on the development of Intel processors. The
implications for design of such processors, and circuits more generally, can be viewed in (at least) two ways:

1. If one can fit more transistors in unit area, the transistors are getting smaller and hence working faster
due to their physical characteristics. As a result one can take a fixed design and, over time, it will get
faster or use less power as a result of Moore’s Law.

git # 8b6da880 @ 2023-09-27 128


© Daniel Page ⟨dan@phoo.org⟩

1000000000

Intel Pentium D

100000000
Intel Pentium M
Intel Pentium 4

10000000 Intel Pentium 3


Intel Pentium 2
Intel Pentium Pro
Number of transistors

Intel Pentium

Intel 80486
1000000

Intel 80386

Intel 80286
100000

Intel
Intel
8086
8088

10000

Intel 8080
Intel 8008
Intel 4004

1000
1970 1975 1980 1985 1990 1995 2000 2005
Year

Figure 2.51: A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.intel.com/
technology/mooreslaw/).

2. If one can fit more transistors in unit area, then one can design and implement more complex structures
in the same fixed area. As a result, over time, one can use the extra transistors to improve the design yet
keep it roughly the same size.

There is no “free lunch” however; Moore notes that as feature size decreases (i.e., transistors get smaller)
two problems become more and more important. First, power consumption and heat dissipation become an
issue: it is harder to distribute power to the more densely packed transistors and keep with within operational
temperature limits. Second, process variation, which may imply defects and reduce yield, starts to increase
meaning a higher chance that a manufactured chip malfunctions.

2.5.2 (Re)programmable fabrics


Among many alternatives to manufacture of circuits using silicon-based transistors, two in particular are inter-
esting. You can think of them as making two steps from silicon (implying a fixed circuit once manufactured),
toward a fabric that can be reprogrammed again and again (more like software) to form any circuit required.
The resulting performance and flexibility characteristics blur traditional boundaries between hardware and
software, and such fabrics are therefore increasingly important components with a broad range of applications.

2.5.2.1 Programmable Logic Arrays (PLAs)

A Programmable Logic Array (PLA) is a general-purpose fabric that can be configured to implement specific
SoP or PoS expressions as combinatorial circuits. The fabric itself accepts n inputs, say xi for 0 ≤ i < n, and
produces m outputs, say r j for 0 ≤ j < m via logic gates arranged in two planes. Using an AND-OR type PLA
as an example, the first plane computes a set of minterms using AND gates; those minterms are fed as input
to a second plane of OR gates whose output is the required SoP expression. An OR-AND type PLA simply
reverses the ordering of the planes, thus allows implementation of PoS expressions.
This does not hint at a PLA being particularly remarkable: why is it any different to the combinatorial circuits
we have seen already? The crucial difference is how we end up with the required circuit. The starting point is
a generic, clean fabric as shown in Figure 2.52a. At this point you can think of all of the gates being connected
to all corresponding gate inputs via existing connection points at wire junctions (filled circles), and fuses at
the gate inputs (filled boxes). This is transformed into a specific circuit using a process roughly analogous to
programming: we selectively blow fuses, guided by a configuration that is derived from the circuit design.
Normally a fuse acts as a conductive material, somewhat like a wire; when the fuse is blown using some
directed energy, however, it becomes a resistive material. Therefore, to form the required connections we

git # 8b6da880 @ 2023-09-27 129


© Daniel Page ⟨dan@phoo.org⟩

x0

x1

xn−1

r0

r1

rm−1

(a) A “clean” PLA fabric, with fuses (filled boxes) acting as potential connections between the AND and OR planes.

x0

x1

xn−1

r0

r1

rm−1

(b) The PLA fabric with blown fuses (empty boxes) to implement a half-adder.

Figure 2.52: Conceptual diagrams of a PLA fabric.

git # 8b6da880 @ 2023-09-27 130


© Daniel Page ⟨dan@phoo.org⟩

simply blow all the fuses6 where no connection is required. Figure 2.52b shows an example, where fuses have
been blown (now shown as unfilled boxes) to form various connections (shown as thick lines). As a result, this
PLA computes
r0 = (x0 ∧ ¬x1 ) ∨ (¬x0 ∧ x1 ) = x0 ⊕ x1
and
r1 = x0 ∧ x1 ,
i.e., it is a half-adder.
We say that a PLA fabric is one-time programmable. Put simply, once a fuse (or antifuse) is blown, it
cannot be unblown. Since a PLA can only be configured once, it is not unreasonable to think of a PLA as like a
ROM (in the sense that once programmed, the content is fixed) but has the advantage of being able to optimise
for don’t care states. However, the fixed structure means that versus conventional combinatorial logic, it has
the disadvantage of being less (easily) able to capitalise on optimisations such as sharing logic for common
sub-experssions.

2.5.2.2 Field Programmable Gate Arrays (FPGAs)


Although a PLA might be useful for some tasks, two clear limitations are evident: such a fabric
1. is special-purpose in so much as it implements only SoP- or PoS-type designs, as a result of the wiring
and gate structure, and is constrained by parameters such as n and m, and
2. is only one-time programmable, since once the fuses are irreversibly blown it then implements a fixed
circuit.
As such, one could consider generalising the underlying idea by a) allowing the wiring and gate structure to be
configured freely, and then b) allowing this configuration to be performed multiple times, using some type of
memory instead of fuses for each element of configuration data. A Field Programmable Gate Array (FPGA)
fabric is the result, whose goal is basically to offer a general-purpose, many-time programmable fabric: the
FPGA can be configured with one circuit design and then re-configured with another design at a later point in
time.
Figure 2.53a is a conceptual representation of an FPGA fabric, which is basically a collection of logic resources
(or blocks) organised in a two-dimensional mesh; the logic blocks are connected using routing resources placed
between them. Both the logic and routing resources are controlled by a configuration termed a bit-stream.
For instance, the routing resources are conceptually similar to fuses in the sense they determine connectivity;
unlike fuses, they are re-configurable switches that can be turned on and off as required rather than blown in
a one-off act. In a similar way the logic resources are analogous to logic gates, but now their functional can be
changed to suit as part of the configuration process: a specific logic resource might be configured to act as an
AND gate in one circuit, then as an XOR gate in another at some later point in time. This produces a much
more flexible structure than a PLA, plus limit of one-time programmability.
This alone is a fairly big step forward, but the logic resources offer even more features internally: they
are not just reconfigurable logic gates. Although the architecture of different brands and families within a
brand differ, we focus on Xilinx Virtex-4 devices as an example. The central Vertex-4 logic resource is called
a Configurable Logic Block (CLB): each CLB is connected to (and hence can communicate with) immediate
neighbours, and contains four slices. Figure 2.53b is a block diagram of a Vertex-4 slice, which contains
• two 4-input, 1-output Look-Up Tables (LUTs),
• two D-type flip-flops,
• a suite of arithmetic cells, including two 1-bit full-adders, and
• several interconnected multiplexers.
The important thing to grasp is that although this looks like fixed circuit design, various aspects of it are
reconfigurable. A good example is the LUT content. Each LUT is basically a 16-cell SRAM memory: given
a 4-bit input i, it reads the i-th SRAM cell and uses this as the 1-bit output. So by storing appropriate values
in the SRAM during the device configuration phase, the LUT can be used to compute any 4-input, 1-output
Boolean function. Likewise the 4-input multiplexers acting as input to the two flip-flops are controlled by the
device configuration, not control-signals generated by another part of the circuit. In addition to standard CLBs,
Vertex-4 FPGAs also offer various other special-purpose logic resources. Figure 2.53a attempts to show this
fact by including
6 Alternatively, one can consider an antifuse which acts in the opposite way to a fuse (normally it is a resistor but when blown it is

a conductor). Using antifuses at each junction means the configuration process blows each antifuse at a junctions where a connection is
required.

git # 8b6da880 @ 2023-09-27 131


© Daniel Page ⟨dan@phoo.org⟩

CLB CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

DCM BRAM MUL I/O

(a) The mesh of configurable logic (large boxes) and communication resources (small boxes).

x
r
y c

arithmetic w
and carry
x
logic r D Q
y
en
¬Q
LUT z

x
r
y c

x
r D Q
y
en
¬Q
LUT z

(b) A example Vertex-5 slice, including two LUTs, two D-type flip-flops and a suite of arithmetic cells.

Figure 2.53: Conceptual diagrams of an FPGA fabric.

git # 8b6da880 @ 2023-09-27 132


© Daniel Page ⟨dan@phoo.org⟩

• a Digital Clock Manager (DCM) block, which allows a fixed input clock to be manipulated in a way
that suits the device configuration,
• a Block RAM (BRAM) block, instances of which act like memory devices, and are often realised using
SRAM or similar,

• an Input/Output (I/O) block, which allow off-fabric communication.

Other possibilities include common arithmetic building blocks, multipliers for instance, which would be
relatively costly to construct using the CLB resources yet are often required.
The added complexity of supporting such flexibility typically means FPGAs have a lower maximum clock
frequency, and will consume more power than a comparable implementation directly in silicon. As such,
they are often used as a prototyping device for designs which will eventually be fabricated using a more
high-performance technology. Other applications include those where debugging and updating hardware
is important, meaning an FPGA-based solution is as flexible as software while also improving performance.
Consider space exploration for example: it turns out to be exceptionally useful to be able to remotely fix bugs
in hardware rather than write off a multi-million pound satellite which is orbiting Mars (and hence out of the
reach of any local repair men).

References
[1] D. Harris and S. Harris. Digital Design and Computer Architecture: From Gates to Processors. Morgan Kauf-
mann, 2007.
[2] M. Karnaugh. “The map method for synthesis of combinatorial logic circuits”. In: Transactions of American
Institute of Electrical Engineers 72.9 (1953), pp. 593–599 (see p. 81).
[3] M. Knodel and N. ten Oever. Terminology, Power and Oppressive Language. Internet Engineering Task Force
(IETF) Internet Draft. 2018. url: https://tools.ietf.org/id/draft-knodel-terminology-00.html
(see p. 114).
[4] E.J. McCluskey. “Minimization of Boolean function”. In: Bell System Technical Journal 35.5 (1956), pp. 1417–
1444 (see p. 87).
[5] G.E. Moore. “Cramming more components onto integrated circuits”. In: Electronics Magazine 38.8 (1965),
pp. 114–117 (see p. 128).
[6] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.
[7] W.V. Quine. “The problem of simplifying truth functions”. In: The American Mathematical Monthly 59.8
(1952), pp. 521–531 (see p. 87).
[8] R.J. Smith and R.C. Dorf. “Chapter 12: Transistors and Integrated Circuits”. In: Circuits, Devices and
Systems. 5th ed. Wiley, 1992 (see p. 71).
[9] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice Hall, 2012.
[10] E.W. Veitch. “A Chart Method for Simplifying Truth Functions”. In: ACM National Meeting. 1952, pp. 127–
133 (see p. 87).
[11] N.H.E. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. 2nd ed. Addison
Wesley, 1993 (see p. 105).

git # 8b6da880 @ 2023-09-27 133


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 134


© Daniel Page ⟨dan@phoo.org⟩

CHAPTER

3
BASICS OF COMPUTER ARITHMETIC

The whole of arithmetic now appeared within the grasp of mechanism.

– Babbage

In Chapter 1, we saw how numbers could be represented using bit-sequences. More specifically, we demonstrated
various techniques to represent both unsigned and signed integers using n-bit sequences. In Chapter 2, we then investigated
how logic gates capable of computing Boolean operations (such as NOT, AND, and OR) and higher-level building block
components could be designed and manufactured.
One way to view this content is as a set of generic techniques. We have the ability to design and implement components
that computes any Boolean function, for example, and reason about their behaviour in terms of Physics. A natural next
step is to be more specific: what function would be useful? Among many possible options, the field of computer
arithmetic provides some good choices. In short, arithmetic is something most people would class as computation;
something as simple as a desktop calculator could still be classed as a basic computer. As such, the goal of this Chapter
is to combine the previous material, producing a range of high-level building blocks that perform computation involving
integers: although this useful and interesting in itself, it is important to keep in mind that it also represents a starting
point for study of more general computation.

3.1 Introduction
In general terms, an Arithmetic and Logic Unit (ALU) is a component (or collection thereof) tasked with
computation. The concept stems from the design of EDVAC by John von Neumann [9]: he foresaw that
a general-purpose computer would need to perform basic Mathematical operations on numbers, so it is
“reasonable that [the computer] should contain specialised organs for these operations”. In short, the modern
ALU is an example of such an organ: as part of a micro-processor, an ALU supports execution of instructions
by computing results associated with arithmetic expressions such as x + y in a given C program.
One can view a concrete ALU at two levels, namely 1) at a low(er) level, in terms of how the constituent
components themselves are designed, or 2) at a high(er) level, in terms of how said components are organised.
In this Chapter we focus primarily on the former, which implies a focus on computer arithmetic. The challenge
is roughly as follows: given one or more n-bit sequences that represent numbers, say x̂ and ŷ, how can we
design a component, i.e., a Boolean function f we can then implement as a circuit, whose output represents an
arithmetic operation? For example, if we want to compute

r̂ = f (x̂, ŷ) 7→ x + y,

i.e., an r̂ that represents the sum of x̂ and ŷ, how can we design a suitable function

f : Bn × Bn → Bn

that realises the operation correctly while also satisfying additional design metrics once implemented as a
circuit?

git # 8b6da880 @ 2023-09-27 135


© Daniel Page ⟨dan@phoo.org⟩

x op
C0
y x0 C0 r0
y0

x op
C1

m-input, n-bit
y x1 C1 r1

multiplexer
y1
r

x op
Cm−1
y xn−1 Cn−1 rn−1
yn−1

op (b) An integrated architecture: each i-th sub-component Ci


deals with a different part (e.g., i-th bit of the output) of all
(a) An unintegrated architecture: each i-th sub-component
operations.
Ci deals with all of a different operation.

Figure 3.1: Two high-level ALU architectures: each combines a number of sub-components, but does so using a different
strategy.

Often you will have already encountered long-hand, “school-book” techniques for arithmetic operations
such as addition and multiplication. These allow you to perform the associated computation manually, which
can can be leveraged to address the challenge of designing such an f . That is, we can use an approach whereby
we a) recap on your intuition about what the arithmetic operation means and works at a fundamental level, b)
formalise this as an algorithm, then, finally, c) design a circuit to implement the algorithm (often by starting
with a set of 1-bit building blocks, then later extending them to cope with n-bit inputs and outputs). Although
effective, the challenge of applying this approach is magnified by what is typically a large design space of
options and trade-offs. For example, we might implement f using combinatorial components alone, or widen
this remit by considering sequential components to support state and so on: with any approach involving a
trade-off, the sanity of opting for one option over another requires careful analysis of the context.
After first surveying higher-level, architectural options for an abstract ALU, this Chapter deals more
concretely with a set of low-level components: each Section basically applies the approach above to a different
arithmetic operation. From here on, keep in mind that the scope is constrained by several simplifications:

1. The large design space of options for any given operation dictates we take a somewhat selective approach.
A major example of this is our focus on integer arithmetic only: arithmetic with fixed- and floating-
point numbers is an important topic, but we ignore it entirely and instead refer to [10, Part V] for a
comprehensive treatment.

2. We use sequences of n = 8 bits to represent integers, assuming use of two’s-complement representation


where said integers are signed; any algorithms (eventually) operate in base-2 (i.e., set b = 2) as a result.
Even so, most algorithms are developed and presented in a manner amenable to generalisation. For
example, they often support larger n or different b with minimal alterations.

3. Having fixed the representation of integers, writing x̂ is somewhat redundant: we relax this notation and
simply write x as a result. However, we introduce extra notation to clarify whether a given operation
is signed or unsigned: for an operation ⊙, we use ⊙s and ⊙u to denote signed and unsigned versions
respectively. With no annotation of this type, you can assume the signed’ness of the operator is irrelevant.

3.2 High-level ALU architecture


As the name suggests, a typical ALU will perform roughly three classes of operation: arithmetic, logical (typi-
cally focused on operations involving individual bits, contrasting with arithmetic operations on representations
of integers using multiple bits), and comparison. Although a given ALU is often viewed as a single unit, having
a separate ALU for each class can have advantages. For example, this allows different classes of operation
(e.g., an addition and comparison) to be performed at the same time. To prevent a single unit becoming too

git # 8b6da880 @ 2023-09-27 136


© Daniel Page ⟨dan@phoo.org⟩

complex, it can also be advantageous to have separate ALUs for different classes of input; use of a dedicated
(so separate from the ALU) Floating-Point Unit (FPU) for floating-point computation is a common example.
These possibilities aside, at a high-level an ALU is simply a collection of sub-components; we provide one
or more inputs (wlog. say x and y), and control it using op to select the operation required. Of course, some
operations will produce a different sized output than others: an (n × n)-bit multiplication produces a 2n-bit
output, but any comparison will only ever produce a 1-bit output for example. One can therefore view the ALU
as conceptually producing a single output r, but in reality it might have multiple outputs that are used as and
when appropriate. To be concrete, imagine we want an ALU which performs say m = 11 different operations

⊙ ∈ {+, −, ·, ∧, ∨, ⊕, ∨ , ≪, ≫, =, <}

meaning it can perform addition, subtraction, multiplication, a range of bit-wise Boolean operations (AND,
OR, XOR and NOR), left- and right-shift, and two comparisons (equality and less than): it computes r = x ⊙ y
for an ⊙ selected by op. Figure 3.1 shows two strategies for realising the ALU, each using sub-components (the
i-th of which is denoted Ci ) of a different form and in a different way:

1. Figure 3.1a illustrates an architecture where each sub-component implements all of a different operation.
For example, C0 and C1 might compute all n bits of x + y and x − y respectively; the ALU output is selected,
from the m sub-component outputs, using op to control a suitable multiplexer.
Although, as shown, each sub-component is always active, in reality it might be advantageous to power-
down a sub-component which is not being used. This could, for example, reduce power consumption or
heat dissipation.
2. Figure 3.1b illustrates an architecture where each sub-component implements all operations, but does so
wrt. a single bit only. For example, C0 and C1 might compute the 0-th and 1-st bits of x + y and x − y
respectively (depending on op).

Tanenbaum and Austin [11, Chapter 3, Figures 3-18/3-19] focus on the second strategy, discussing a 1-bit ALU
slice before dealing with their combination. Such 1-bit ALUs are often available as standard building blocks,
so this focus makes a lot of sense on one hand. On the other hand, an arguable disadvantage is that such a
focus complicates the overarching study of computer arithmetic. Put another way, focusing at a low-level on
1-bit ALU slices arguably makes it hard(er) to see how some higher-level arithmetic works. As a result, we
focus instead on the first strategy in what follows: we consider designs for each i-th sub-component, realising
each operation (a Ci for addition, for example) in isolation.
Essentially this means we ignore high-level organisation and optimisation of the ALU from here on, but of
course both strategies have merits. For example, as we will see in the following, overlap exists between different
arithmetic circuit designs: intuitively, the computation of addition and subtraction is similar for example. The
second strategy is advantageous therefore, since said overlap can more easily be capitalised upon to reduce
overall gate count. However, arithmetic circuits that require multiple steps to compute an output (using an
FSM for example) are hard(er) to realise using the second strategy than the first. As a result, a mix of both
strategies as and when appropriate is often a reasonable compromise.

3.3 Components for addition and subtraction


Perhaps more so than other aspects of computer arithmetic, the meaning and use of addition and subtraction
should be familiar. (Very) formally, an addition operation computes the sum r = x + y using an x and y which
are both termed an addend in this context; likewise, subtraction computes the difference r = x − y using a
minuend x and a subtrahend y. This terminology hints at the fact that addition is commutative but subtraction
is not: x + y = y + x but x − y , y − x.
The challenge of course is how we compute these results. The goal in each case is to first describe the
computation algorithmically, then translate this into a design (or set of designs) for a circuit we can construct
from logic gates.

3.3.1 Addition
Example 3.1. Consider the following unsigned, base-10 addition of x = 107(10) to y = 14(10) :

x = 107(10) 7→ 1 0 7
y = 14(10) 7→ 0 1 4 +
c = 0 0 1 0
r = 121(10) 7→ 1 2 1

git # 8b6da880 @ 2023-09-27 137


© Daniel Page ⟨dan@phoo.org⟩

An aside: sign extension.

Although not an arithmetic operation per se, the issue of type conversion is a an important related concept
none the less. Where such a conversion is performed explicitly (e.g., by the programmer) it is formally termed
a cast, and where performed implicitly (or automatically, e.g., by the compiler) it is termed a coercion; either
conversion, depending on the types involved, may or may not retain the same value due to the range of
representable values involved.
As an example, imagine you write a C program that includes a cast of an n-bit integer x into an n′ -bit
integer r. Four cases can occur:

1. If x and r are unsigned and n ≤ n′ , r is formed by padding x with n′ − n bits equal to 0, at the most-
significant end.

2. If x and r are signed and n ≤ n′ , r is formed by padding x with n′ − n bits equal to the sign bit (i.e., the
MSB or (n − 1)-th bit of x) at the most-significant end.
3. If x and r are unsigned and n > n′ , r is formed by truncating x, i.e., removing (and discarding) n − n′ bits
from the most-significant end.
4. If x and r are signed and n > n′ , r is formed by truncating x, i.e., removing (and discarding) n − n′ bits
from the most-significant end.

The second case above is often termed sign extension, and is required (vs. the first case) because simply
padding x with 0 may turn it from a negative to positive value. For example, imagine n = 16 (i.e., the short
type) and n′ = 32 (i.e., the int type): if x = −1(10) , the two options yield

x = 1111111111111111(2) = −1(10)

ext32
0
(x) = 00000000000000001111111111111111(2) = 65535(10)
± (x) =
ext32 11111111111111111111111111111111(2) = −1(10)

where the latter retains the value of x, whereas the former does not.

If we write them naturally, it is clear that |107(10) | = 3 and |14(10) | = 2. However, the resulting mismatch will
become inconvenient: in this example and from here on, we force x and y to have the same length by padding
them with more-significant zero digits. Although this may look odd, keep in mind this padding can be ignored
without altering the associated value (i.e., we are confident 14(10) = 014(10) , however odd the latter looks when
written down).
Most people will have at least seen something similar to this, but, to ensure the underlying concept clear, r
is being computed by working from the least-significant, right-most digits (i.e., x0 and y0 ) towards the most-
significant, left-most digits (i.e., xn−1 and yn−1 ) of the operands x and y. In English, in each i-th step (or column,
as read from right to left) we sum the i-th digits xi and yi and a carry-in ci (produced by the previous, (i − 1)-th
step); since this sum is potentially larger than a single base-b digit is allowed to be, we produce the i-th digit of
the result ri and a carry-out ci+1 (for use by the next, (i + 1)-th step). We call c a (or the) carry chain, and say
carries propagate from one step to the next.
This description can be written more formally: Algorithm 1 offers one way to do so. Notice the loop in
lines #2 to #5 iterates through values of i from 0 to n − 1, with the body in lines #3 and #4 computing ri and ci
respectively. You can read the latter as “if the sum of xi , yi and ci is smaller than a single base-b digit there is a
carry into the next step, otherwise there is no carry”. Notice that the algorithm sets c0 = ci to allow a carry-into
the overall operation (in the example we assumed ci = 0), and co = cn allowing a carry-out; the sum of two
n-digit integers is an (n + 1)-digit result, but the algorithm produces an n-digit result r and separate 1-digit
carry-out co (which you could, of course, think of as two parts of a single, larger result).
A reasonable question is why a larger carry (i.e., a ci+1 > 1) is not possible? To answer this, we should first
note that although line #4 is written as a conditional statement, it could be rewritten st.

ri ← (xi + yi + ci ) mod b
ci+1 ← (xi + yi + ci ) div b

where mod and div are integer modulo and division: this makes more sense in a way, because the latter
assignment can be read as “the number of whole multiples of b carried into the next, (i + 1)-th column”. By

git # 8b6da880 @ 2023-09-27 138


© Daniel Page ⟨dan@phoo.org⟩

ci ci co ci co ci co ci co co
x x x x
y s y s y s y s

x0
y0

r0
x1
y1

r1

xn−1
yn−1

rn−1
Figure 3.2: An n-bit, ripple-carry adder described using a circuit diagram.

bi bi
x
bo bi
x
bo bi
x
bo bi
x
bo bo
y d y d y d y d
x0
y0

r0
x1
y1

r1

xn−1
yn−1

rn−1
Figure 3.3: An n-bit, ripple-carry subtractor described using a circuit diagram.

ci
op ci co ci co ci co ci co co
x x x x
y s y s y s y s
x0
y0

r0
x1
y1

r1

xn−1
yn−1

rn−1
Figure 3.4: An n-bit, ripple-carry adder/subtractor described using a circuit diagram.

ci
x carry look-ahead logic co
y
pn−1
gn−1
cn−1
p0
g0

p1
g1
c0

c1

p g ci p g ci p g ci p g ci
x x x x
y s y s y s y s
x0
y0

r0
x1
y1

r1

xn−1
yn−1

rn−1

Figure 3.5: An n-bit, carry look-ahead adder described using a circuit diagram.

ci


O(log n)
∨ ∨

∧ ∧ ∧ ∧

O(log n) ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧

Figure 3.6: An illustration depicting the structure of carry look-ahead logic, which is formed by an upper- and lower-tree
of OR and AND gates respectively (with leaf nodes representing gi and pi terms for example).

git # 8b6da880 @ 2023-09-27 139


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit carry-in ci ∈ {0, 1}
Output: An unsigned, n-digit, base-b integer r = x + y, and a 1-digit carry-out co ∈ {0, 1}
1 r ← 0, c0 ← ci
2 for i = 0 upto n − 1 step +1 do
3 ri ← (xi + yi + ci ) mod b
4 if (xi + yi + ci ) < b then ci+1 ← 0 else ci+1 ← 1
5 end
6 co ← cn
7 return r, co
Algorithm 1: An algorithm for addition of base-b integers.

Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit borrow-in bi ∈ {0, 1}
Output: An unsigned, n-digit, base-b integer r = x − y, and a 1-digit borrow-out bo ∈ {0, 1}
1 r ← 0, c0 ← bi
2 for i = 0 upto n − 1 step +1 do
3 ri ← (xi − yi − ci ) mod b
4 if (xi − yi − ci ) ≥ 0 then ci+1 ← 0 else ci+1 ← 1
5 end
6 bo ← cn
7 return r, bo
Algorithm 2: An algorithm for subtraction of base-b integers.

considering bounds on (i.e., the maximum values of) each of the inputs, we can show ci ≤ 1 for 0 ≤ i ≤ n. That
is,

1. in the 0-th step we compute x0 + y0 + c0 , which can be at most (b − 1) + (b − 1) + 1 = 2 · b − 1 (given we know


x0 , y0 ∈ {0, 1, ..., b − 1}, and set c0 = ci ∈ {0, 1} in line #1); as a result, c1 can be at most (2b − 1) div b = 1,

2. in the i-th step we compute xi + yi + ci , which can be at most (b − 1) + (b − 1) + 1 = 2 · b − 1 (given we know


xi , yi ∈ {0, 1, ..., b − 1}, and set ci ∈ {0, 1} per the above); as a result, ci+1 can be at most (2b − 1) div b = 1,

so we know (inductively) that the carry out of the i-th step into the next, (i + 1)-th step is at most 1 (and so
either 0 or 1); this is true no matter what value of b is selected.

Example 3.2. Consider the following trace of Algorithm 1, for x = 107(10) and y = 14(10) :

i xi yi ci r x i + y i + ci ci+1 ri r′
⟨0, 0, 0⟩ ⟨0, 0, 0⟩
0 7 4 0 ⟨0, 0, 0⟩ 11 1 1 ⟨1, 0, 0⟩
1 0 1 1 ⟨1, 0, 0⟩ 2 0 2 ⟨1, 2, 0⟩
2 1 0 0 ⟨1, 2, 0⟩ 1 0 1 ⟨1, 2, 1⟩
0 ⟨1, 2, 1⟩

Throughout this Chapter, a similar style is used to describe step-by-step behaviour of an algorithm for specific
inputs (particularly those which include one or more loops). Read from left-to-right, there is typically a section
of loop counters, such as i and j, a section of variables as they are at the start of each iteration, a section of
variables computed during an iteration, and a section of variables as they are at the end of each iteration. If
variable t in the left-hand section is updated during an iteration, we write it as t′ (read as “the new value of t”)
in the right-hand section.

An important feature in the presentation of Algorithm 1 is use of a general b: when invoking it, we can select
any concrete value of b we want. When discussing representation of integers, b = 2 was a natural selection
because it aligned with concepts in Boolean algebra; the same is true here, within a discussion of computation
involving such integers.

Example 3.3. Consider the following unsigned, base-2 addition of x = 107(10) = 01101011(2) to y = 14(10) =
00001110(2)
x = 107(10) 7→ 0 1 1 0 1 0 1 1
y = 14(10) 7 → 0 0 0 0 1 1 1 0 +
c = 0 0 0 0 1 1 1 0 0
r = 121(10) 7→ 0 1 1 1 1 0 0 1

git # 8b6da880 @ 2023-09-27 140


© Daniel Page ⟨dan@phoo.org⟩

and the corresponding trace of Algorithm 1


i xi yi ci r xi + yi + ci ci+1 ri r′
⟨0, 0, 0, 0, 0, 0, 0, 0⟩ ⟨0, 0, 0, 0, 0, 0, 0, 0⟩
0 1 0 0 ⟨0, 0, 0, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
1 1 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 2 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
2 0 1 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 2 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
3 1 1 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 3 1 1 ⟨1, 0, 0, 1, 0, 0, 0, 0⟩
4 0 0 1 ⟨1, 0, 0, 1, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 1, 1, 0, 0, 0⟩
5 1 0 0 ⟨1, 0, 0, 1, 1, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 1, 1, 1, 0, 0⟩
6 1 0 0 ⟨1, 0, 0, 1, 1, 1, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 1, 1, 1, 1, 0⟩
7 0 0 0 ⟨1, 0, 0, 1, 1, 1, 1, 0⟩ 0 0 0 ⟨1, 0, 0, 1, 1, 1, 1, 0⟩
0 ⟨1, 0, 0, 1, 1, 1, 1, 0⟩
which produces r = 01111001(2) = 121(10) as expected.
Better still, if we assume use of two’s-complement (which we reasoned in Chapter 1 was sane), then the
algorithm can compute the sum of signed x and y without change:
Example 3.4. Consider the following signed, base-2 addition of x = 107(10) 7→ 01101011(2) to y = −14(10) 7→
00001110(2) (both represented using two’s-complement)
x = 107(10) 7→ 0 1 1 0 1 0 1 1
y = −14(10) 7 → 1 1 1 1 0 0 1 0 +
c = 1 1 1 0 0 0 1 0 0
r = 93(10) 7 → 0 1 0 1 1 1 0 1
and the corresponding trace of Algorithm 1
i xi yi ci r xi + yi + ci ci+1 ri r′
⟨0, 0, 0, 0, 0, 0, 0, 0⟩ ⟨0, 0, 0, 0, 0, 0, 0, 0⟩
0 1 0 0 ⟨0, 0, 0, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
1 1 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 2 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
2 0 0 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩
3 1 0 0 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩
4 0 1 0 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
5 1 1 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 2 1 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
6 1 1 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 3 1 1 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
7 0 1 1 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩ 2 1 0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
1 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
which produces r = 01011101(2) 7→ 93(10) as expected.
Intuitively, the reason no change is required is because both the unsigned and signed, two’s-complement
representations perfectly fit the definition of a positional number system: they express the value as a summation
of weighted terms, whereas sign-magnitude, for example, needs a special case (namely the factor −1xn−1 ) to
capture the sign. As a result of this feature the carry chain still functions in the same way, for example.

3.3.1.1 Ripple-carry adders


Having developed and reasoned about an algorithm for addition, the next challenge is to translate it into a
concrete design we can implement as a circuit. At first glance this may seem difficult, not least because the
algorithm contains a loop. Crucially however, we can unroll this loop once n is fixed: this means we copy and
paste the loop body (i.e., lines #3 and #4) n times, replacing i with the correct value in each i-th copy.
Example 3.5. Given n = 4, the loop in Algorithm 1 can be unrolled into the straight-line alternative
1 c0 ← ci
2 r0 ← (x0 + y0 + c0 ) mod b
3 if (x0 + y0 + c0 ) < b then c1 ← 0 else c1 ← 1
4 r1 ← (x1 + y1 + c1 ) mod b
5 if (x1 + y1 + c1 ) < b then c2 ← 0 else c2 ← 1
6 r2 ← (x2 + y2 + c2 ) mod b
7 if (x2 + y2 + c2 ) < b then c3 ← 0 else c3 ← 1
8 r3 ← (x3 + y3 + c3 ) mod b
9 if (x3 + y3 + c3 ) < b then c4 ← 0 else c4 ← 1
10 co ← c4

git # 8b6da880 @ 2023-09-27 141


© Daniel Page ⟨dan@phoo.org⟩

Notice that if we select b = 2, the body of the loop and therefore each replicated step in the unrolled alternative
computes the 1-bit addition
ri = x i ⊕ y i ⊕ ci
ci+1 = (xi ∧ yi ) ∨ (xi ∧ ci ) ∨ (yi ∧ ci )
Put another way, it matches the full-adder cell we produced a design for in Chapter 2. Substituting one for the
other, we simply have n full-adder instances connected via respective carry-in and carry-out: each i-th instance
computes the sum of xi and yi and a carry-in ci , and produces the sum ri and a carry-out ci+1 . The design,
which is termed a ripple-carry adder since the carry “ripples” or propagates through the chain chain, is shown
in Figure 3.2.
The algorithm and associated design satisfy the required functionality: they can compute the sum of n-bit
addends x and y. As such, one might question whether exploring other designs is necessary. Any metric
applied to the design may provide some motivation, but the concept of critical path is particularly important
here. Recall from Chapter 2 that the critical path of a circuit is defined as the longest sequential sequence of
gates; here, the critical path runs through the entire circuit from the 0-th to the (n − 1)-th full-adder instance.
Put another way, the carry chain represents an order on the computation of digits in r: ri cannot be computed
until ci is known, so the i-th step cannot be computed until every j-th step for j < i is computed, due to the carry
chain. This implies the critical path can be approximated by O(n) gate delays; our motivation for exploring
other designs is therefore the possibility of improving on this, and thus computing r with lower latency (i.e.,
less delay).

3.3.1.2 Carry look-ahead adders


One approach to removing the constraint imposed by a carry chain might be to separate computation of the
carry and sum. At first glance this probably seems impossible, or at least difficult: we argued above that the
latter depends on the former! However, notice that we can say at least something about how each i-th step of
the ripple-carry adder works independently of the others. We know for instance that

1. if xi + yi > b − 1 it generates a carry, i.e., sets ci+1 = 1 irrespective of ci ,


2. if xi + yi = b − 1 it propagates a carry, i.e., sets ci+1 = 1 iff. ci = 1. and
3. if xi + yi < b − 1 it absorbs a carry, i.e., sets ci+1 = 0 irrespective of ci .

Example 3.6. Consider the following unsigned, base-10 addition of x = 456(10) to y = 444(10)

x = 456(10) 7→ 4 5 6
y = 444(10) 7 → 4 4 4 +
c = 0 1 1 0
r = 900(10) 7→ 9 0 0

where the three rules above apply as follows:

1. In the 0-th column, xi + yi = x0 + y0 = 6 + 4 = 10 which is greater than b − 1 = 10 − 1 = 9. Put another way,


this is already too large to represent using a single base-b digit and will hence always generates a carry
into the next, (i + 1)-th step irrespective of whether there is a carry-in or not.
2. In the 1-st column, xi + yi = x1 + y1 = 5 + 4 = 9 which is equal to b − 1 = 10 − 1 = 9. Put another way, this
is at the limit of what we can represent using a single base-b digit: iff. there is a carry-in, then there will
be a carry-out.
3. In the 2-nd column, xi + yi = x2 + y2 = 4 + 4 = 8 which is less than b − 1 = 10 − 1 = 9. Put another way, even
if there is a carry into the i-th stage there will never be a carry-out because 8 + 1 can be accommodated
within the single base-b digit r2 of the sum.

A Carry Look-Ahead (CLA) adder takes advantage of the fact that using base-2 makes application of the rules
simple. In particular, imagine we use gi and pi to indicate whether the i-th step will generate or propagate a
carry respectively. We can express these as

gi = xi ∧ yi
pi = xi ⊕ yi

which can be explained in words:

• we generate a carry-out if both xi = 1 and yi = 1 since no matter what the carry-in is, their sum cannot be
represented in a single base-b digit, and

git # 8b6da880 @ 2023-09-27 142


© Daniel Page ⟨dan@phoo.org⟩

• we propagate a carry-out if either xi = 1 or yi = 1 since this plus any carry-in will also produce a sum
which cannot be represented in a single base-b digit.

Given gi and pi we have that


ci+1 = gi ∨ (ci ∧ pi )

where, again, c0 = ci and we produce a carry-out cn = co. Again this can be explained in words: at the i-th stage
“there is a carry-out if either the i-th stage generates a carry itself, or there is a carry-in and the i-th stage will
propagate it”. As an aside, note that it is common to see gi and pi written as

gi = xi ∧ yi
pi = xi ∨ yi

Of course, when used in the above both expressions have the same meaning: if xi = 1 and yi = 1, then gi = 1
so it does not matter what the corresponding pi is (given the OR will yield 1, since the left-hand term gi = 1,
irrespective of the right-hand term). As such, use of an OR gate rather than an XOR is preferred because the
former requires less transistors.
Like the ripple-carry adder, once we fix n we can unwind the recursion to get an expression for the carry
into each i-th full-adder cell:

c0 = ci
c1 = g0 ∨ (ci ∧ p0 )
c2 = g1 ∨ (g0 ∧ p1 ) ∨ (ci ∧ p0 ∧ p1 )
c3 = g2 ∨ (g1 ∧ p2 ) ∨ (g0 ∧ p1 ∧ p2 ) ∨ (ci ∧ p0 ∧ p1 ∧ p2 )
..
.

This looks horrendous, but notice that the general structure is of the form shown in Figure 3.6: both the bottom-
and top-half are balanced binary trees (st. leaf nodes are gi and pi terms, and internal nodes are AND and OR
gates respectively) that implement the SoP expression for a a given ci . We are able to use this organisation as
a result of having decoupled computation of ci from the corresponding ri , which is, essentially, what yields an
advantage: the critical path (i.e., the depth of the structure, or longest path from the root to some leaf) is shorter
than for a ripple-carry adder. Stated in a formal way, the former is described by O(log n) gate delays due to the
tree structure, and the latter by O(n) as a result of the linear structure.
The resulting design is shown in Figure 3.5. In contrast with the ripple-carry adder design in Figure 3.2,
all the full-adder instances are independent: the carry chain previously linking them has now been eliminated.
Instead, the i-th such instance produces gi and pi ; these inputs are used by the carry look-ahead logic to produce
ci . The design hides an important trade-off, namely the associated gate count. Although we have reduced
the critical path, the gate count is now much higher: a rough estimate would be O(n) and O(n2 ) gates for a
ripple-carry and carry look-ahead adders. It can therefore be attractive to combine several, small(er) carry
look-ahead adders (e.g., 8-bit adders) in a large(r) ripple-carry configuration (e.g., to form a larger, 32-bit,
adder).

3.3.1.3 Carry-save adders

A second approach to eliminating the carry chain is to bend the rules a little, and look at a slightly different
problem. The ripple-carry and the carry look-ahead adder compute the sum of two addends x and y; what
happens if we consider three addends x, y, and z, and thus compute x + y + z rather than x + y?
A carry-save adder offers a solution for this alternative problem. It is often termed a 3 : 2 compressor
because it compresses three n-bit inputs x, y and z into two n-bit outputs r′ and c′ (termed the partial sum and
shifted carry). Put another way, a carry-save “adder” computers the actual sum r = x + y + z in two steps: 1)
a compression step produces a partial sum and shifted carry, then 2) an addition step combines them into the
actual sum.
The first step amounts to replacing c with z in the ripple-carry adder design, meaning that for the i-th
full-adder instance we have
r′i = xi ⊕ yi ⊕ zi
c′i = (xi ∧ yi ) ∨ (xi ∧ zi ) ∨ (yi ∧ zi )

Unlike the ripple-carry adder, where the instances are connected via a carry chain, the expressions for r′i and
c′i only use the i-th digits of x, y, and z: computation of each i-th digit of r′ and c′ is independent. Crucially,
this means each r′i and c′i can be computed at the same time; the critical path runs through just one full-adder
instance, rather than all n instances as in a ripple-carry adder.

git # 8b6da880 @ 2023-09-27 143


© Daniel Page ⟨dan@phoo.org⟩

Example 3.7. Consider computation of the partial sum and shifted carry from x = 96(10) = 01100000(2) , y =
14(10) = 00001110(2) and z = 11(10) = 00001011(2) :
x = 96(10) 7→ 0 1 1 0 0 0 0 0
y = 14(10) 7 → 0 0 0 0 1 1 1 0
z = 11(10) 7 → 0 0 0 0 1 0 1 1

r′ = 0 1 1 0 0 1 0 1
c′ = 0 0 0 0 1 0 1 0
After computing r′ and c′ , we combine them via the second step by computing r = r′ + 2 · c′ using a standard
(e.g., ripple-carry) adder. You could think of this step as propagating the carries, now represented separately
(from the sum) by c′ .
Example 3.8. Consider computation of the actual sum from r′ = 01100101 and c′ = 00001010:
r′ = 0 1 1 0 0 1 0 1
2 · c′ = 0 0 0 0 1 0 1 0 0 +
c = 0 0 0 0 0 0 1 0 0 0
r = 121(10) 7→ 0 0 1 1 1 1 0 0 1
which produces r = 001111001(2) = 121(10) as expected.
Given we need this step to produce r, it is reasonable to question to ask why we bother with this approach at
all: it seems as if we are limited in the same way as if we used a ripple-carry adder in the first place. With m = 1
compression step, the answer is that we have a critical path of O(1) + O(n) gate delays vs. O(n) + O(n) if we used
two ripple-carry adders (one to compute t = x + y, then another to compute r = t + z). The more general idea,
however, is we compute many compression steps (i.e., m > 1) and then a single, addition step: if we do this, the
cost associated with the addition step becomes less significant (i.e., is amortised) as m grows larger. Later, in
Section 3.5 when we look at designs for multiplication, the utility of this approach should become clear.

3.3.2 Subtraction
3.3.2.1 Redesigning a ripple-carry adder
Subtraction is conceptually, and so computationally similar to addition. In essence, the same steps are evident:
we again work from the least-significant, right-most digits (i.e., x0 and y0 ) towards the most-significant, left-
most digits (i.e., xn−1 and yn−1 ). At each i-th step (or column), we now compute the difference of the i-th digits
xi and yi and a borrow-in produced by the previous, (i − 1)-th step; this difference is potentially smaller than
zero, so we produce the i-th digit of the result and a borrow-out into the next, (i + 1)-th step. This description is
formalised in a similar way by Algorithm 2. Note that although the name c is slightly counter-intuitive (it now
represents a borrow- rather than carry-chain), we stick to the same notation as an adder to stress the similar
use.
Example 3.9. Consider the following unsigned, base-10 subtraction of y = 14(10) from x = 107(10)
x = 107(10) 7→ 1 0 7
y = 14(10) 7 → 0 1 4 −
c = 0 1 0 0
r = 93(10) 7→ 0 9 3
and the corresponding trace of Algorithm 2
i xi yi ci r x i − y i − ci ci+1 ri r′
⟨0, 0, 0⟩ ⟨0, 0, 0⟩
0 7 4 0 ⟨0, 0, 0⟩ 3 0 3 ⟨3, 0, 0⟩
1 0 1 0 ⟨0, 0, 0⟩ −1 1 9 ⟨3, 9, 0⟩
2 1 0 1 ⟨0, 0, 0⟩ 0 0 0 ⟨3, 9, 0⟩
0 ⟨3, 9, 0⟩
which produces r = 93(10) as expected.
Example 3.10. Consider the following unsigned, base-2 subtraction of y = 14(10) = 00001110(2) from x = 107(10) =
01101011(2)
x = 107(10) 7→ 0 1 1 0 1 0 1 1
y = 14(10) 7→ 0 0 0 0 1 1 1 0 −
c = 0 0 0 1 1 1 0 0 0
r = 93(10) 7→ 0 1 0 1 1 1 0 1

git # 8b6da880 @ 2023-09-27 144


© Daniel Page ⟨dan@phoo.org⟩

x
y d
Half-Subtractor
x y bo d
0 0 0 0
0 1 1 1
1 0 0 1
1 1 0 0
(a) The half-subtractor as a truth table. bo
(b) The half-subtractor as a circuit.

bi
Full-Subtractor x
y d
bi x y bo d
0 0 0 0 0
0 0 1 1 1
0 1 0 0 1
0 1 1 0 0
1 0 0 1 1
1 0 1 1 0
1 1 0 0 0
1 1 1 1 1
(c) The full-subtractor as a truth table. bo
(d) The full-subtractor as a circuit.

Figure 3.7: An overview of half- and full-subtractor cells.

and the corresponding trace of Algorithm 2

i xi yi ci r xi − yi − ci ci+1 ri r′
⟨0, 0, 0, 0, 0, 0, 0, 0⟩ ⟨0, 0, 0, 0, 0, 0, 0, 0⟩
0 1 0 0 ⟨0, 0, 0, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
1 1 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 0 0 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
2 0 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩
3 1 1 1 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩
4 0 0 1 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
5 1 0 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 0 0 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
6 1 0 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
7 0 0 0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩ 0 0 0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩

which produces r = 01011101(2) = 93(10) as expected.


Since the algorithm is more or less the same, it follow that a circuit to implement it would also be the same:
Figure 2 illustrates this. The only difference, of course, is in the loop body, where we need the subtraction
equivalent to half- and full-adder cells. More specifically, we need 1) a half-subtractor that takes two 1-bit values,
say x and y, and subtracts one from the other to produce a difference and a borrow-out, say d and bo, and 2) a
full-subtractor that extends a half-subtractor by including a borrow-in bi as an additional input. Unsurprisingly,
Figure 3.7 demonstrate the components themselves are simple to design and write as the Boolean expressions

bo = ¬x ∧ y
d = x ⊕ y

and
bo = (¬x ∧ y) ∨ (¬(x ⊕ y) ∧ bi)
d = x ⊕ y ⊕ bi
respectively. Keep in mind that bi and bo perform the same role as ci and co previously: the subtraction
analogue of the ripple-carry adder, an n-bit ripple-borrow subtractor perhaps, is identical except for the borrow
chain through all n full-subtractor instances.

git # 8b6da880 @ 2023-09-27 145


© Daniel Page ⟨dan@phoo.org⟩

3.3.2.2 Reusing a ripple-carry adder


As we have seen, subtraction is similar to addition. This is even more obvious still if we write x − y ≡ x + (−y):
the subtraction required (on the LHS) could be computed by adding x to the negation of y (on the RHS). Given
we compute an addition in both cases, we might opt for a second approach by designing a single component
that allows selection of either addition or subtraction: given a control signal op, we might have

x + y + ci if op = 0
(
r=
x − y − ci if op = 1

for example. Notice that as well as controlling computation of the sum or difference of x and y, op will control
use of ci as a carry- or borrow-in depending whether an addition of subtraction is computed. The advantage
is that, at a high-level, the design

ci ci0 ci
x
co ci
x
co ci
x
co ci
x
co co0 co
op y s y s y s y s
x00
y00

r00
x01
y01

r01

x0n−1
y0n−1

r0n−1
x y r

includes one internal adder. Versus two separate, similar components (i.e., an adder and a subtractor), this is
already a useful optimisation outright; in designs for multiplication this will be amplified further.
The question is, how should we control the internal inputs to the adder (namely x′ , y′ and ci′ ) st. given
all the external inputs (namely op, x, y and ci) the correct output r is produced? By using two’s-complement
representation, we saw in Chapter 1 that
−y 7→ ¬y + 1
for any given y. The idea is to use this identity, translating from what we want to compute into what we already
can compute:

op ci r op ci r op ci r op ci r
0 0 x + y + ci 0 0 x+y+0 0 0 x+y+0 0 0 x+y+0
0 1 x + y + ci ≡ 0 1 x+y+1 ≡ 0 1 x+y+1 ≡ 0 1 x+y+1
1 0 x − y − ci 1 0 x−y−0 1 0 x + (¬y + 1) − 0 1 0 x + (¬y) + 1
1 1 x − y − ci 1 1 x−y−1 1 1 x + (¬y + 1) − 1 1 1 x + (¬y) + 0

The left-most table just captures what we said above: if op = 0 (in the top two rows) we want to compute
x + y + ci, but if op = 1 (in the bottom two rows) we want to compute x − y − ci. Moving from left-to-right, we
substitute in values of ci then apply the identity for −y in the bottom rows; the right-most table simply folds
the constants together. In the right-most table, all the cases (for addition and subtraction, so where op = 0 or
op = 1) are of the same form, which we can cope with using the internal adder: we have op, x, y and ci, so can
just translate via
op ci xi yi ci′ x′i y′i
0 0 0 0 0 0 0
0 1 1 1 1 1 1
1 0 0 0 1 0 1
1 1 1 1 0 1 0
i.e., ci′ = ci ⊕ op, x′i = xi and y′i = yi ⊕ op. That is x is unchanged whereas yi and ci are XOR’ed with op to
conditionally invert them (in the bottom two rows, where we need ¬yi rather than yi ). Figure 3.4 illustrates the
result, where it is important to see that the overhead, versus in this case a ripple-carry adder, is simply extra
n + 1 XOR gates.

3.3.3 Carry and overflow detection


Consider the addition of some n-bit inputs x and y: the magnitude of r will be too large to represent in n bits,
st. it is incorrect, if either

1. x and y are (and hence the addition is) unsigned and there is a carry-out, or

git # 8b6da880 @ 2023-09-27 146


© Daniel Page ⟨dan@phoo.org⟩

2. x and y are (and hence the addition is) signed but the sign of r makes no sense
which are termed carry and overflow errors respectively. The two cases can be illustrated using some (specific)
examples:
Example 3.11. Consider the following unsigned, base-2 addition of x = 15(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)
x = 15(10) 7→ 1 1 1 1
y = 1(10) 7 → 0 0 0 1 +
c = 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0
which produces an incorrect result r = 0000(2) 7→ 0(10) due to a carry error.
Example 3.12. Consider the following signed, base-2 addition of x = −1(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)
(both represented using two’s-complement)
x = −1(10) 7→ 1 1 1 1
y = 1(10) 7→ 0 0 0 1 +
c = 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0
which produces a correct result r = 0000(2) 7→ 0(10) .
Example 3.13. Consider the following signed, base-2 addition of x = 7(10) 7→ 0111(2) to y = 1(10) 7→ 0001(2) (both
represented using two’s-complement)
x = 7(10) 7→ 0 1 1 1
y = 1(10) 7→ 0 0 0 1 +
c = 0 1 1 1 0
r = −8(10) 7→ 1 0 0 0
which produces an incorrect result r = 1000(2) 7→ −8(10) . due to an overflow error.
To deal with such errors in a sensible manner, we really need two steps: 1) detect that the error has occurred,
then 2) apply some mechanism, e.g., to communicate or correct the error.
Detecting the carry error is simple: as suggested by the first example above, we need to inspect the carry-out.
In this example, that assumes n = 4, the correct result r = 16 has a magnitude which cannot be accommodated
in the number of bits available; an incorrect result r = 0 is therefore produced, with the carry-out (i.e., the fact
that if the result is computed by Algorithm 1, it produces co = 1) signalling an error. However, notice that if we
have signed x and y, as in the second example, any carry-out is irrelevant: in this case, the result r = 0 is correct
and the carry-out should be discarded.
This suggests detecting the overflow error requires more thought, with the third example suggesting a
starting point. In this case, x is the largest positive integer we can represent using n = 4 bits; adding y = 1
means the value wraps-around (as discussed in Chapter 1) to form a negative result r = −8. Clearly this is
impossible, in the sense that for positive x and y we can never end up with a negative sum: this mismatch
allows us to conclude than an overflow error occurred. More specifically, in the case of addition, we apply the
following set of rules (with a similar set possible for subtraction):
x +ve y -ve ⇒ no overflow
x -ve y +ve ⇒ no overflow
x +ve y +ve r +ve ⇒ no overflow
x +ve y +ve r -ve ⇒ overflow
x -ve y -ve r +ve ⇒ overflow
x -ve y -ve r -ve ⇒ no overflow
Note that testing the sign of x or y is trivial, because it will be determined by their MSBs as a result of how
two’s-complement is defined: x is positive, for example, iff. xn−1 = 0 and negative otherwise. Based on this,
detection of an overflow error is computed as
of = ( xn−1 ∧ yn−1 ∧ ¬rn−1 ) ∨
( ¬xn−1 ∧ ¬yn−1 ∧ rn−1 )
or in words: “there is an overflow if either x is positive and y is positive and r is negative, or if x is negative
and y is negative and r is positive”. This can be further simplified to
o f = cn−1 ⊕ cn−2
where c is the carry chain during addition of x and y: basically this XORs the carry-in and the carry-out of the
(n − 1)-th full-adder. As such, an overflow is signalled, i.e., o f = 1, in two cases: either

git # 8b6da880 @ 2023-09-27 147


© Daniel Page ⟨dan@phoo.org⟩

An aside: shift operators in C and Java.

The fact there are two different classes of shift operation demands some care when writing programs; put
simply, in a given programming language you need to make sure you select the correct operator. In C, both
left- and right-shifts use the operators << and >> irrespective of whether they are arithmetic or logical; the type
of the operand being shifted dictates the class of shift. For example

1. if x is of type int (i.e., x is a signed integer) then the expression x >> 2 implies an arithmetic right-shift,
whereas
2. if x is of type unsigned int (i.e., x is an unsigned integer) then the expression x >> 2 implies a logical
right-shift.

In contrast, Java has no unsigned integer data types so needs to take a different approach: arithmetic and
logical right-shifts are specified by two different operators, meaning

1. the expression x >> 2 implies an arithmetic right-shift. whereas

2. the expression x >>> 2 implies a logical right-shift,

1. cn−1 = 0 and cn−2 = 1, which can only occur of xn−1 = 0 and yn−1 = 0 (i.e., x and y are both positive but r is
negative), or

2. cn−1 = 1 and cn−2 = 0, which can only occur of xn−1 = 1 and yn−1 = 1 (i.e., x and y are both negative but r
is positive).

Once an error condition is detected (during a relevant operation by the ALU, for example), the next question
is what to do about it: clearly the error needs to be managed somehow, or the incorrect result will be used as
normal. There are numerous options, but two in particular illustrate the general approach:

1. provide the incorrect result as normal, (e.g., truncate the result to n bits by discarding bits we cannot
accommodate), but signal the error condition somehow (e.g., via a status register or some form of
exception), or

2. fix the incorrect result somehow, according to pre-planned rules (e.g., saturate or clamp the result to the
largest integer we can represent in n bits).

In short, the choice is between delegating responsibility to whatever is using the ALU (in the former) and
making the ALU itself responsible (in the latter); both have advantages and disadvantages, and may therefore
be appropriate in different situations.

3.4 Components for shift and rotation

3.4.1 Introductory concepts and theory


3.4.1.1 Abstract shift operations, described as arithmetic
Although one would not normally think of doing a long-hand shift, as with addition or subtraction, it is
possible to consider such an operation in arithmetic terms: a shift of some base-b integer x by a distance of y
digits has the same effect as multiplying x by b y . That is,
n−1
r = x · by = xi · bi ) · by
P
(
i=0
n−1
= xi · bi · b y
P
( )
i=0
n−1
= xi · bi+y
P
i=0

git # 8b6da880 @ 2023-09-27 148


© Daniel Page ⟨dan@phoo.org⟩

Notice that if y is positive it increases the weight associated with a given digit xi , hence “shifting” said digit to
the left in the sense it assumes a more-significant position. If y is negative, on the other hand, it decreases the
weight associated with xi and the digit “shifts” to the right; in this case, the operation acts as a division instead,
because clearly
1 x
x · b−y = x · y = y .
b b
This argument applies for any b, and, as you might expect, we will ultimately be interested in b = 2 since this
aligns with our approach for representing integers.
Example 3.14. Consider a base-10 shift of x = 123(10) by y = 2
n−1
r = x · by = xi · bi+y
P
i=0
= x0 · b0+2 + x1 · b1+2 + x2 · b2+2
= 3 · 102 + 2 · 103 + 1 · 104
= 300 + 2000 + 10000
= 12300(10)

which produces r = x · b y = 123(10) · 102 = 12300(10) as expected.


Example 3.15. Consider a base-10 shift of x = 123(10) by y = −2
n−1
r = x · by = xi · bi+y
P
i=0
= x0 · b0−2 + x1 · b1−2 + x2 · b2−2
= 3 · 10−2 + 2 · 10−1 + 1 · 100
= 0.03 + 0.2 + 1
= 1.23(10)

which produces r = x · b y = 123(10) · 10−2 = 1.23(10) as expected.


Example 3.16. Consider a base-2 shift of x = 51(10) = 110011(2) by y = 2
n−1
r = x · by = xi · bi+y
P
i=0
= x0 · b0+2 + x1 · b1+2 + x2 · b2+2 + x3 · b3+2 + x4 · b4+2 + x5 · b5+2
= 1 · 22 + 1 · 23 + 0 · 24 + 0 · 25 + 1 · 26 + 1 · 27
= 4 + 8 + 0 + 0 + 64 + 128
= 11001100(10)
= 204(10)

which produces r = x · b y = 51(10) · 102 = 204(10) as expected.


Example 3.17. Consider a base-2 shift of x = 51(10) = 110011(2) by y = −2
n−1
r = x · by = xi · bi+y
P
i=0
= x0 · b0−2 + x1 · b1−2 + x2 · b2−2 + x3 · b3−2 + x4 · b4−2 + x5 · b5−2
= 1 · 2−2 + 1 · 2−1 + 0 · 20 + 0 · 21 + 1 · 22 + 1 · 23
= 0.25 + 0.5 + 0 + 0 + 4 + 8
= 1100.11(2)
= 12.75(10)

which produces r = x · b y = 51(10) · 10−2 = 12.75(10) as expected.

3.4.1.2 Concrete shift (and rotate) operations of n-bit sequences


Recall from Chapter 1 that we represent signed or unsigned integers using an n-bit sequence or an equivalent
literal. For example, wrt. an unsigned representation using n = 8 bits, each of the following

x = 218(10)
= 11011010(2)
7 → ⟨0, 1, 0, 1, 1, 0, 1, 1⟩
7→ 11011011

git # 8b6da880 @ 2023-09-27 149


© Daniel Page ⟨dan@phoo.org⟩

describes the same value: using the literal notation in what follows is more natural, but keep in mind that the
equivalence above allows us to translate the same reasoning to any of the alternatives.
Based on our description the in previous Section, we need to consider what a shift operation means when
applied to an integer represented by an n-bit sequence.
Definition 3.1. Two types of shift can be applied to an n-bit sequence x:

1. a left-shift, where y > 0, can be defined as

r = x ≪ y = xn−1−abs(y) ∥ · · · ∥ x1 ∥ x0 ∥ ?? . . .?
| {z }
n bits
and
2. a right-shift, where y < 0, can be defined as

r = x ≫ y = ?? . . .? ∥ xn−1 ∥ · · · ∥ xabs(y)+1 ∥ xabs(y)


| {z }
n bits

where y is termed the distance, and each ? represents a “gap” bit that must be filled to ensure r has n bits.
Definition 3.2. When computing a shift, any gap is filled in according to some rules:

1. logical shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBs
and fills the gap in MSBs with zeros, and
2. arithmetic shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBs
and fills the gap in MSBs with a sign bit.

Phrased in this way, a rotate operation (of some x by a distance of y) is the same as a logical shift except that any gap is
filled by the other end of x rather than zero: that is,

1. a left-rotate (for which we use the operator ≪, vs. ≪ for the corresponding shift) yields a gap in the LSBs which
is filled by the MSBs that would be discarded by a left-shift, and
2. a right-rotate (for which we use the operator ≫, vs. ≫ for the corresponding shift) yields a gap in the MSBs
which is filled by the LSBs that would be discarded by a right-shift.

Example 3.18. Consider the base-2 shift and rotation of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010 by a
distance of y = 2:

1. logical left- and right-shift produce

x ≪u y = 11011010 ≪u 2 = 01101000
x ≫u y = 11011010 ≫u 2 = 00110110

2. arithmetic left- and right-shift produce

x ≪s y = 11011010 ≪s 2 = 01101000
x ≫s y = 11011010 ≫s 2 = 11110110

and
3. logical left- and right-rotate produce

x≪y = 11011010 ≪ 2 = 01101011


x≫y = 11011010 ≫ 2 = 10110110

These examples hopefully illustrate the somewhat convoluted definitions more clearly: in reality, the underly-
ing concepts are reasonably simple. Consider the the logical left-shift: looking step-by-step at

x ≪u y = 11011010 ≪u 2
= 011010??
= 01101000

the idea is that

git # 8b6da880 @ 2023-09-27 150


© Daniel Page ⟨dan@phoo.org⟩

1. we discard two bits from the left-hand, most-significant end because they cannot be accommodated, plus

2. at the right-hand, less-significant end we need to fill the resulting gap: this is a logical shift, so they are
replaced with 0.

On the other hand, some more subtle points are important. First, note the importance of knowing n when
performing these operations. If we did not know n, or did not fix it say, a left-shift would just elongate the
literal: instead of discarding MSBs, the literal grows to form an n + y bit result. Likewise, if we do not know n
then rotate cannot be defined in a sane way; a left-rotate cannot fill the gap in the LSBs with discarded MSBs,
because they are not discarded! Second, use of terminology including “left” and “right” explains why it is
easier to reason about these operations by using literals. In short, doing so means the operations both have
the intuitive effect by moving elements left or right: using a sequence, under our notation at least, the effect is
counter-intuitive (i.e., the wrong way around). Third, and finally, if y is a known, fixed constant then shift and
rotate operations require no actual arithmetic: we are simply moving bits in x left or right. As a result, a circuit
that left- or right-shifts x by a fixed distance y simply connects wires from each xi (or zero say, to fill LSBs or
MSBs) to ri+y . We can use this fact as a building block in more general circuits that can cater for scenarios where
y is not fixed. Even then, however, we typically assume a) the sign of y is always positive (which is captured in
the above via use of abs(y)), which is sane because we have specific left- and right-shift (or rotate) operations
vs. one generic operation, and b) the magnitude of y is restricted to 0 ≤ y < n meaning y has m = ⌈log2 (n)⌉ bits.
We then have a choice of how to deal with a y outside this range. Typically, we either let r be undefined for any
y > n or y < 0 or consider y modulo n.
Although logical and arithmetic left-shift are equivalent (i.e., a gap is zero-filled in both cases), this is not
so for right-shift; as such, it is fair to question why arithmetic right-shift is included as a special case. Recall
the original description above, where a shift of x by y was equated to a multiplication of x by b y . If x uses a
signed representation, a multiplication, and therefore also a shift, ideally preserves the sign: if x is positive
(resp. negative) then we expect x · b y to be positive (resp. negative). This is essentially the purpose of an
arithmetic right-shift, in the sense it preserves the sign of x and hence has the correct arithmetic meaning. Both
the underlying issue and impact of this special case is clarified by an example:

Example 3.19. Assuming n = 8, consider that

x = −38(10) 7→ 11011010
x/2 = −38(10) /2 = −19(10) 7 → 11101101

when represented using two’s-complement. We know a shift using y = −1 should mean

1 x
x · b y = x · 2−1 = x · = .
2 2
However, using logical right-shift, we get

r = x ≫u y = 11011010 ≫u 1
= 01101101
7 → 109(10)

whereas if we use arithmetic right-shift we get

r = x ≫s y = 11011010 ≫s 1
= 11101101
7 → −19(10)

as expected. In the former we fill the MSBs with zero, which turns x into a positive r; in the later we fill the
MSBs with the sign bit of x to preserves the sign in r. This highlights the reason there is no need for a special
case for arithmetic left-shift. With right-shift we fill MSBs of, so dictate the sign bit of, the result; in contrast, a
left-shift means filling LSBs in the result, so the sign bit remains as is (i.e., is preserved by default).

3.4.2 Iterative designs


Imagine we want to shift some x (wlog. left or right) by a distance of y, then shift that result again by a distance
y′ ; a subtle but important fact is that we can combine the two shifts into one, i.e., we know that

(x ≪ y) ≪ y′ ≡ x ≪ (y + y′ ).

git # 8b6da880 @ 2023-09-27 151


© Daniel Page ⟨dan@phoo.org⟩

Input: An n-bit sequence x, and an unsigned integer distance 0 ≤ y < n


Output: The n-bit sequence x ≪ y
1 r←x
2 for i = 0 upto y − 1 step +1 do
3 r←r≪1
4 end
5 return r
Algorithm 3: An algorithm for n-bit (left-)shift.

Input: An n-bit sequence x, and an unsigned integer distance 0 ≤ y < n


Output: The n-bit sequence x ≪ y
1 r ← x, m ← ⌈log2 (n)⌉
2 for i = 0 upto m − 1 step +1 do
3 if yi = 1 then
4 r ← r ≪ 2i
5 end
6 end
7 return r
Algorithm 4: An algorithm for n-bit (left-)shift.

r 1 r0

Figure 3.8: An iterative design for n-bit (left-)shift described using a circuit diagram.

x x x x
x  20 y c
r  21 y c
r
y c
r  2m−1 y c
r r

y0 y1 ym−1
y

Figure 3.9: A combinatorial design for n-bit (left-)shift described using a circuit diagram.

git # 8b6da880 @ 2023-09-27 152


© Daniel Page ⟨dan@phoo.org⟩

Example 3.20. Consider the base-2, logical left-shift of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010, first by
a distance of y = 2 then by a distance of y′ = 4:
(x≪y) ≪ y′ = ( 11011010 ≪ 2 ) ≪ 4
= ( 01101000 ) ≪ 4
= 10000000

x ≪ (y + y′ ) = 11011010 ≪ (2 + 4)
= 11011010 ≪ 6
= 10000000
Using the same reasoning, it should be obvious that if y = 6 then
r = x≪y = x≪6
= (((((x ≪ 1) ≪ 1) ≪ 1) ≪ 1) ≪ 1) ≪ 1
Put another way, we can decompose one large shift on the LHS into several smaller shifts on the RHS: six
repeated shifts each by a distance of 1 bit produce the same result as one shift by a distance of 6 bits. This
approach is formalised by Algorithm 3.
Example 3.21. Consider the following trace of Algorithm 3, for y = 6(10) :

i r r′
x
0 x x≪1 r′ ←r≪1
1 x≪1 x≪2 r′ ←r≪1
2 x≪2 x≪3 r′ ←r≪1
3 x≪3 x≪4 r′ ←r≪1
4 x≪4 x≪5 r′ ←r≪1
5 x≪5 x≪6 r′ ←r≪1
x≪6
which produces r = x ≪ 6 as expected.
Figure 3.8 captures the components required to implement this algorithm; the design highlights a trade-off
between area and latency in which smaller area is favoured. Specifically, we only need a) a register to store r
(left-hand side), and b) a component to perform a 1-bit left-shift (center), which realises line #3) of Algorithm 3
and so needs no actual logic (since the shift distance is constant). However, this data-path demands an associated
control-path that realises the loop. We can do so using an FSM of course: in each i-th step, the FSM latches
r′ (representing the combinatorial result r ≪ 1) into r ready for the (i + 1)-th step; implementation of such an
FSM clearly demands a register to hold i and suitable control logic, both of which add somewhat to the area
(and design complexity). Even so, the trade-off is essentially that we have a simple computational step but, as
a result, need to iterate through y such steps to compute the (eventual) result.
So far so good, but what about right-shift? Or, rotate?! Crucially, we can support the entire suite of shift-like
operations via a fairly simple alteration to line #3 of Algorithm 3: we simply need to the component at the
center of our design. For example, if we replace r ← r ≪ 1 with r ← r ≫ 1 we get a design that performs a
right- vs. left-shift. Even better, if we replace r ← r ≪ 1 with something more involved, namely
r ≪ 1 if op = 0



 r ≫ 1 if op = 1


r←


 r ≪ 1 if op = 2
 r ≫ 1 if op = 3

then provided we supply op as an extra input, the resulting design can perform left- and right-shift, and
left- and right-rotate: a multiplexer, controlled by op, decides which result of (produced by each different,
individual operation) to update r with. We still iterate through y steps, meaning the end result is now a left-
or right-shift, or left- or right-rotate by a distance of y. One can view this as an application of the unintegrated
ALU architecture in Figure 3.1a, but at a lower (or internal, component) level vs. higher, ALU level.

3.4.3 Combinatorial designs


Again following the same reasoning as above, it should be clear that if y = 6 then
r = x≪y = x≪6
= (x ≪ 2) ≪ 4
= (x ≪ 21 ) ≪ 22

git # 8b6da880 @ 2023-09-27 153


© Daniel Page ⟨dan@phoo.org⟩

Although the example is the same, the underlying strategy is to express y st. each smaller shift is by a power-
of-two distance (i.e., by 2i for some i). As such, if we write y in base-2 then each bit yi tells us whether or not
to shift by a distance derived from i: we can compute the result via application of a simple rule “if yi = 1 then
shift the accumulated result by a distance of 2i , otherwise leave it as it is” which is formalised by Algorithm 4.

Example 3.22. Consider the following trace of Algorithm 3, for y = 6(10) st. m = ⌈log2 (n)⌉ = 3:

i r yi r′
x
0 x 0 x r′ ← r
1 x 1 x≪2 r′ ← r ≪ 21
2 x≪2 1 x≪6 r′ ← r ≪ 22
x≪6

which produces r = x ≪ 6 as expected.

Translating the algorithm into a corresponding design harnesses the same idea as the ripple-carry adder: once
n is known, we unroll the loop by copy and pasting the loop body (i.e., lines #3 to #5) n times, replacing i with
the correct value in each i-th copy. Doing so given y = 6, for example, produces the straight-line alternative

1 r←x
2 if y0 = 1 then r ← r ≪ 20
3 if y1 = 1 then r ← r ≪ 21
4 if y2 = 1 then r ← r ≪ 22

which makes it (more) clear that we are essentially performing a series of choices: if the (i − 1)-th stage produces
t as output, the i-th uses yi to choose between producing t or t ≪ 2i for use by the (i + 1)-th stage. All the shifts
themselves are by fixed constants (which we already argued are trivial), so these stages are really just a cascade
of multiplexers.
Figure 3.9 translates this idea into a concrete circuit. The trade-off between latency and area is swapped vs.
that for the previous, iterative design. On one hand, the component is combinatorial: it takes 1 step to perform
each operation (vs. n), whose latency is dictated by the critical path, and can do so without the need for an FSM.
On the other hand, however, it is likely to use significantly more area (relating to the logic gates required for
each multiplexer).

3.5 Components for multiplication

Formally, a multiplication operation1 computes the product r = y·x based on the multiplier y and multiplicand
x. Despite a focus on integer values of x and y here, the techniques covered sit within a more general case often
described as scalar multiplication: abstractly, x could be any object from a suitable structure (wlog. an integer,
meaning x ∈ Z) that is multiplied, while y is an integer scalar that does the multiplying.
In the case of addition, we covered several possible strategies with some associated trade-offs. This is
exacerbated with multiplication, where a much larger design space exists. Even so, the same approach2
is adopted: we again start by investigating the computation above from an algorithmic perspective, then
somehow translate this into a design for a circuit we can construct from logic gates.

1 Why write y · x rather than x · y, which would match addition for example?! Since multiplication is commutative, we could legitimately

use the operands


  either way around: it makes no difference to the result. Given the choice, we opt for y · x basically because it matches the
notation y x often used for more general scalar multiplication.
2 Note that we ignore various optimisations for squaring operations, i.e., a multiplication r = y · x where we know x = y so in fact r = x2 .

See, for example, [10, Chapter 12.5].

git # 8b6da880 @ 2023-09-27 154


© Daniel Page ⟨dan@phoo.org⟩

x2 x1 x0 x2 x1 x0
y2 y1 y0 × y2 y1 y0 ×
y0 · x0 y0 · x0
y1 · x0 y1 · x0
y2 · x0 y0 · x1
y0 · x1 y2 · x0
y1 · x1 y1 · x1
y2 · x1 y0 · x2
y0 · x2 y2 · x1
y1 · x2 y1 · x2
y2 · x2 y2 · x2
r5 r4 r3 r2 r1 r0 r5 r4 r3 r2 r1 r0
(a) Using operand scanning. (b) Using product scanning.

Figure 3.10: Two examples demonstrating different strategies for accumulation of base-b partial products resulting from
two 3-digit operands.

3.5.1 Introductory concepts and theory


3.5.1.1 Options for long-hand multiplication
Example 3.23. Consider the following unsigned, base-10 addition of x = 623(10) to y = 567(10) :

x = 623(10) 7→ 6 2 3
y = 567(10) 7→ 5 6 7 ×
p0 = 7 · 3 · 100 = 21(10) 7 → 2 1
p1 = 7 · 2 · 101 = 140(10) 7 → 1 4
p2 = 7 · 6 · 102 = 4200(10) 7 → 4 2
p3 = 6 · 3 · 101 = 180(10) 7→ 1 8
p4 = 6 · 2 · 102 = 1200(10) 7→ 1 2
p5 = 6 · 6 · 103 = 36000(10) 7→ 3 6
p6 = 5 · 3 · 102 = 1500(10) 7→ 1 5
p7 = 5 · 2 · 103 = 10000(10) 7→ 1 0
p8 = 5 · 6 · 104 = 300000(10) 7→ 3 0
r = 353241(10) 7 → 3 5 3 2 4 1

The idea of long-hand multiplication is that to compute r = y · x (at the bottom) from x and y (at the top), we
generate and then sum a set of partial products (in the middle): each pi is generated by multiplying a digit
from y with a digit from x, which we term a digit-multiplication. Within this context, and multiplication in
general, we use the following definition:
Definition 3.3. The result of a digit-multiplication between x j and yi is said to be reweighted by the combined weight
of the digits being multiplied: if x j has weight j and yi has weight i, the result will have weight j + i.
Informally at least, this explains why each partial product is offset by some distance from the right-hand edge.
In the example above, note that

• y0 and x0 have weight 0, so p0 = y0 · x0 = 21(10) has weight 0 + 0 = 0,


• y1 and x1 have weight 1, so p4 = y1 · x1 = 12(10) has weight 1 + 1 = 2, and
• y2 and x1 have weight 2 and 1 respectively, so p7 = y2 · x1 = 10(10) has weight 2 + 1 = 3

st. p7 is offset (or left-shifted) by 3 digits and so weighted by 103 : during summation of the partial products, it
is representing y2 · x1 · 103 = 10000(10) .
The question then is how to generate and sum the partial products. It turns out there are (at least) two
strategies for doing so. These are described in Figure 3.10, which hightlights a difference wrt. how the
digit-multiplications are managed. More specifically:

• The left-hand strategy is termed operand scanning, and is formalised by Algorithm 5. The idea is to loop
through digits of x and y, accumulating the associated digit-multiplications into whatever the relevant
digit of the result r is.
• The right-hand strategy is termed product scanning, and is formalised by Algorithm 6. The idea is to
loop through digits of the result r, so that when computing the i-th such digit ri we accumulate all relevant
digit-multiplications stemming from x and y.

git # 8b6da880 @ 2023-09-27 155


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-digit, base-b integers x and y


Output: An unsigned, 2n-digit, base-b integer r = y · x
1 r←0
2 for j = 0 upto n − 1 step +1 do
3 c←0
4 for i = 0 upto n − 1 step +1 do
5 u · b + v = t ← y j · xi + r j+i + c
6 r j+i ← v
7 c←u
8 end
9 r j+n ← c
10 end
11 return r
Algorithm 5: An algorithm for multiplication of base-b integers using on operand scanning.

Input: Two unsigned, n-digit, base-b integers x and y


Output: An unsigned, 2n-digit, base-b integer r = y · x
1 r ← 0, c0 ← 0, c1 ← 0, c2 ← 0
2 for k = 0 upto n + n − 1 step +1 do
3 for j = 0 upto n − 1 step +1 do
4 for i = 0 upto n − 1 step +1 do
5 if ( j + i) = k then
6 u · b + v = t ← y j · xi
7 c · b + c0 = t ← c0 + v
8 c · b + c1 = t ← c1 + u + c
9 c2 ← c2 + c
10 end
11 end
12 end
13 rk ← c0 , c0 ← c1 , c1 ← c2 , c2 ← 0
14 end
15 rn+n−1 ← c0
16 return r
Algorithm 6: An algorithm for multiplication of base-b integers using on product scanning.

git # 8b6da880 @ 2023-09-27 156


© Daniel Page ⟨dan@phoo.org⟩

Example 3.24. Consider the following trace of Algorithm 5, which computes a base-10 operand scanning
multiplication for x = 623(10) and y = 567(10)

j i r c yi xj t = yi · xi + ri+j + c r′ c′
⟨0, 0, 0, 0, 0, 0⟩
0 0 ⟨0, 0, 0, 0, 0, 0⟩ 0 7 3 21 ⟨1, 0, 0, 0, 0, 0⟩ 2
0 1 ⟨1, 0, 0, 0, 0, 0⟩ 2 7 2 16 ⟨1, 6, 0, 0, 0, 0⟩ 1
0 2 ⟨1, 6, 0, 0, 0, 0⟩ 1 7 6 43 ⟨1, 6, 3, 0, 0, 0⟩ 4
0 ⟨1, 6, 3, 0, 0, 0⟩ 4 ⟨1, 6, 3, 4, 0, 0⟩
1 0 ⟨1, 6, 3, 4, 0, 0⟩ 0 6 3 24 ⟨1, 4, 3, 4, 0, 0⟩ 2
1 1 ⟨1, 4, 3, 4, 0, 0⟩ 2 6 2 17 ⟨1, 4, 7, 4, 0, 0⟩ 1
1 2 ⟨1, 4, 7, 4, 0, 0⟩ 1 6 6 41 ⟨1, 4, 7, 1, 0, 0⟩ 4
1 ⟨1, 4, 7, 1, 0, 0⟩ 4 ⟨1, 4, 7, 1, 4, 0⟩
2 0 ⟨1, 4, 7, 1, 4, 0⟩ 0 5 3 22 ⟨1, 4, 2, 1, 4, 0⟩ 2
2 1 ⟨1, 4, 2, 1, 4, 0⟩ 2 5 2 13 ⟨1, 4, 2, 3, 4, 0⟩ 1
2 2 ⟨1, 4, 2, 3, 5, 0⟩ 1 5 6 35 ⟨1, 4, 2, 3, 5, 0⟩ 3
2 ⟨1, 4, 2, 3, 5, 0⟩ 3 ⟨1, 4, 2, 3, 5, 3⟩ 3
⟨1, 4, 2, 3, 5, 3⟩
producing r = 353241(10) as expected.
Example 3.25. Consider the following trace of Algorithm 6, which computes a base-10 product scanning
multiplication for x = 623(10) and y = 567(10)

k j i r c2 c1 c0 yi xj t = yi · xi r′ c′2 c′1 c′0


⟨0, 0, 0, 0, 0, 0⟩ 0 0 0
0 0 0 ⟨0, 0, 0, 0, 0, 0⟩ 0 0 0 7 3 21 ⟨0, 0, 0, 0, 0, 0⟩ 0 2 1
0 ⟨0, 0, 0, 0, 0, 0⟩ 0 2 1 ⟨1, 0, 0, 0, 0, 0⟩ 0 0 2
1 0 1 ⟨1, 0, 0, 0, 0, 0⟩ 0 0 2 7 2 14 ⟨1, 0, 0, 0, 0, 0⟩ 0 1 6
1 1 0 ⟨1, 0, 0, 0, 0, 0⟩ 0 1 6 6 3 18 ⟨1, 0, 0, 0, 0, 0⟩ 0 3 4
1 ⟨1, 0, 0, 0, 0, 0⟩ 0 3 4 ⟨1, 4, 0, 0, 0, 0⟩ 0 0 3
2 0 2 ⟨1, 4, 0, 0, 0, 0⟩ 0 0 3 7 6 42 ⟨1, 4, 0, 0, 0, 0⟩ 0 4 5
2 1 1 ⟨1, 4, 0, 0, 0, 0⟩ 0 4 5 6 2 12 ⟨1, 4, 0, 0, 0, 0⟩ 0 5 7
2 2 0 ⟨1, 4, 0, 0, 0, 0⟩ 0 4 7 5 3 15 ⟨1, 4, 0, 0, 0, 0⟩ 0 7 2
2 ⟨1, 4, 0, 0, 0, 0⟩ 0 7 2 ⟨1, 4, 2, 0, 0, 0⟩ 0 0 7
3 1 2 ⟨1, 4, 2, 0, 0, 0⟩ 0 0 7 6 6 36 ⟨1, 4, 2, 0, 0, 0⟩ 0 4 3
3 2 1 ⟨1, 4, 2, 0, 0, 0⟩ 0 4 3 5 2 10 ⟨1, 4, 2, 0, 0, 0⟩ 0 5 3
3 ⟨1, 4, 2, 0, 0, 0⟩ 0 5 3 ⟨1, 4, 2, 3, 0, 0⟩ 0 0 5
4 2 2 ⟨1, 4, 2, 3, 0, 0⟩ 0 0 5 5 6 30 ⟨1, 4, 2, 3, 0, 0⟩ 0 3 5
4 ⟨1, 4, 2, 3, 0, 0⟩ 0 3 5 ⟨1, 4, 2, 3, 5, 0⟩ 0 0 3
⟨1, 4, 2, 3, 5, 0⟩ 0 0 3 ⟨1, 4, 2, 3, 5, 3⟩ 0 0 3
⟨1, 4, 2, 3, 5, 3⟩
producing r = 353241(10) as expected.
Notice that given n-digit x and y, we produce a larger 2n-digit product r = y · x; this can be rewritten as
y · x = r1 · bn + r0
to show the 2n-digit r can be considered as two n-digit halves of the same size as x and y. The reason for
doing so is to stress the fact that although we typically want to compute r, sometimes it is enough to compute
r0 : this so-called truncated multiplication basically just discards r1 , the n most-significant digits of r (or does
not compute them in the first place), and retains r0 .

3.5.1.2 Multiplication as repeated addition


In Section 3.3.1, the study of long-hand addition led naturally to a implementable design: the ripple-carry
adder in Figure 3.2 is a very direct translation of Algorithm 1. It is harder to make the same claim here, in the
sense there is no (or at least a lot less of an) obvious route from one to the other. This suggests taking a step
back to rethink what multiplication actually means.
At a more fundamental level than the long-hand approaches described, one can view multiplication as just
repeated addition. Put another way,
r = y · x = x + x + ··· + x + x ,
| {z }
y terms

git # 8b6da880 @ 2023-09-27 157


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-digit, base-b integers x and y


Output: An unsigned, 2n-digit, base-b integer r = y · x
1 r←0
2 for i = 0 upto y − 1 step +1 do
3 r←r+x
4 end
5 return r
Algorithm 7: An algorithm for multiplication, using repeated addition (with y treated as any integer).

Input: Two unsigned, n-digit, base-b integers x and y


Output: An unsigned, 2n-digit, base-b integer r = y · x
1 r←0
2 for i = 0 upto y − 1 step +1 do
3 r ← r + yi · x · bi
4 end
5 return r
Algorithm 8: An algorithm for multiplication, using repeated addition (with y is written in base-b).

st. if we select y = 14(10) , then we obviously have

r = 14 · x = x + x + x + x + x + x + x + x + x + x + x + x + x + x.

This is important because we already covered how to compute an addition, plus how to design associated
circuits. So to compute a multiplication, we essentially just need to reuse our addition circuit in the right way:
Algorithm 7 states the obvious, in the sense it captures this idea by simply adding x to r (which is initialised to
0) in a loop that iterates y times. Directly using repeated addition is unattractive, however, since the number
of operations performed relates to the magnitude of y. That is, we need y − 1 operations3 in total, so for some
n-bit y we perform O(2n ) operations: this grows quickly, even for modest values of n (say n = 32).
Fortunately, improvements are easy to identify. Another way to look at the multiplication of x by y is as
inclusion of an extra weight to the digits that describe y. That is, writing y in base-b yields
n−1
r = = yi · 2i ) · x
P
y·x (
i=0

n−1
= yi · x · 2i
P
i=0

Example 3.26. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) , which we can expand into a sum
of n = 4 terms as follows:
y·x = y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
= 0 · x · 20 + 1 · x · 21 + 1 · x · 22 + 1 · x · 23
= 0·x + 2·x + 4·x + 8·x
= 14 · x

Intuitively, this should already seem more attractive: there are only n terms (relating to the n digits in y) in
the summation so we only need n − 1, or O(n), operations to compute their sum. Using a similar format to
Algorithm 7, this is formalised by Algorithm 8. However, a problem lurks: line #3 of the algorithm reads

r ← r + y i · x · bi

or, put another way, our goal is to compute a multiplication but each step in doing so needs a further two
multiplications itself! This is a chicken-and-egg style problem, but can be resolved by our selecting b = 2:

1. Multiplying x by yi can be achieved without a multiplication: given we know yi ∈ {0, 1, . . . b − 1} = {0, 1},
if yi = 0 then yi · x = 0, and if yi = 1 then yi · x = x. Put simply, we make a choice between 0 and x using yi
rather than multiply x by yi .
3 Why y − 1 and not y: Algorithm 7 certainly performs y iterations of the loop! If you think about it, although doing so would make it

more complicated, the algorithm could be improved given we know the first addition (i.e., when i = 0) will add x to 0. Put another way,
we could avoid this initial addition and simply initialise r ← n and perform one less iteration (i.e., y − 1 vs. y). Although the difference
is minor, and so a distraction from the main argument here, you can see this more easily by counting the number of + operators in the
expansion above: for y = 14 we have 13 such additions.

git # 8b6da880 @ 2023-09-27 158


© Daniel Page ⟨dan@phoo.org⟩

2. Multiplying r by 2 can be achieved without a multiplication: clearly 2 · r = r + r = r ≪ 1, so we use a shift


(or, if you prefer, an addition) instead.

So, in short, these facts mean the two multiplications in line #3 are pseudo-multiplications (or “fake” multipli-
cations) because we can replace them with a non-multiplication equivalent whenever b = 2.

3.5.1.3 A high-level overview of the design space

Although the reformulations above might not seem useful yet, they represent an important starting point from
which we can later construct various concrete algorithms and associated designs and implementations. Within
the large design space of all possible options, we will focus on a selection summarised as follows:

more time less time


less space more space

iterative, iterative, combinatorial, combinatorial,


bit-serial digit-serial digit-parallel bit-parallel
Section 3.5.2 Section 3.5.3 Section 3.5.4

You can think of options within this design space in a similar way to Section 3.4, where, for example, we
overviewed options for the shift operation. Iterative options for multiplication typically deal with one (or at
least few) partial products in each step; many (i.e., more than 1) steps and hence more time will be required
to compute the result, but less space is required to do so (essentially because less computation is performed
per step). Combinatorial options make the opposite trade-off, requiring just a single step to compute the
result. However, the a) critical path, and so time said step takes due to the associated delay, and b) the
space required, are typically both large(r). Unlike shift operations, a clear separation between iterative and
combinatorial options is harder to make; trade-offs that blur the boundaries between some options are attractive,
and explored where relevant.

3.5.1.4 Decomposition (or divide-and-conquer) techniques

Consider two designs which compute r = y · x. Irrespective of how they compute r, they differ wrt. the limits
placed on x and y: the first design can deal with n-bit y and x, whereas the second can only deal with smaller,
m-bit values (st. m < n).
Within this context, consider the specific case of m = n2 (in which we assume n is even). As such, we can
split x and y into two parts, i.e., write
x = x1 · 2n/2 + x0
y = y1 · 2n/2 + y0

where each xi and yi are n2 -bit integers. Likewise, we can write

r = r2 · 2n + r1 · 2n/2 + r0

where
r2 = y1 · x1
r1 = y1 · x0 + y0 · x1
r0 = y0 · x0

and st. working through the multiplication as follows

r = y·x = (y1 · 2n/2 + y0 ) · (x1 · 2n/2 + x0 )


= (y1 · 2n/2 · x1 · 2n/2 ) + (y1 · 2n/2 · x0 ) + (y0 · x1 · 2n/2 ) + (y0 · x0 )
= (y1 · x1 · 2n ) + (y1 · x0 · 2n/2 ) + (y0 · x1 · 2n/2 ) + (y0 · x0 )
= (y1 · x1 ) · 2n + (y1 · x0 + y0 · x1 ) · 2n/2 + (y0 · x0 )
= r2 · 2n + r1 · 2n/2 + r0

demonstrates the result is correct. The more general, underlying idea is we decompose the single, larger n-bit
multiplication into several, smaller n2 -bit multiplications: in this case, we compute the larger n-bit product r
using four n2 -bit multiplications (plus several auxiliary additions). In a sense, this is an instance of divide-
and-conquer, a strategy often used in the design of algorithms: sorting algorithms such as merge-sort and

git # 8b6da880 @ 2023-09-27 159


© Daniel Page ⟨dan@phoo.org⟩

quick-sort, for example, will decompose the problem of sorting a larger sequence into that of sorting several
smaller sequences. The Karatsuba-Ofman [8] (re)formulation4 offers further improvement, by first computing
t2 = y1 · x1
t1 = (y0 + y1 ) · (x0 + x1 )
t0 = y0 · x0
then rewrites the terms of r as
r2 = t2
r1 = t1 − t0 − t2
r0 = t0
Doing so requires three n2 -bit multiplications (although now there are more auxiliary additions and/or sub-
tractions). This suggests a general trade-off: we could consider performing fewer, larger n-bit multiplications
or more, smaller n2 -bit multiplications. If we accept the premise that designs for n2 -bit multiplication will be
inherently less complex than an n-bit equivalent, this leads us to adopt one of (at least) two approaches:
1. instantiate and operate several smaller multipliers in parallel (e.g., compute y1 · x1 at the same time as
y0 · x0 ) in an attempt to reduce the overall latency,
2. instantiate and reuse one smaller multipliers (e.g., first compute y1 · x1 then y0 · x0 ) in an attempt to reduce
the overall area.
Although this can be useful in the sense it widens the design space of options, making a decision whether the
original monolithic approach or the decomposed approach is better wrt. some metric can be quite subtle (and
depend delicately on the concrete value of n).

3.5.1.5 Multiplier recoding techniques


The use of multiplier recoding provides a broad set of strategies for improving both iterative and combinatorial
designs; we focus on examples of the former, but keep in mind the general principles can be applied to both.
The underlying idea is to
1. spend some effort before multiplication to recode (or transform) y into some equivalent y′ , then
2. be more efficient during multiplication by using y′ as the multiplier rather than y.
This is a rough description, however, because simple (enough) recoding may be possible during multiplication
rather than strictly beforehand.
In simple terms, recoding y means using a different representation: we represent the same value, but in a
way that allows some sort of advantage during multiplication.
Example 3.27. Consider y = 30(10) , here written in base-10. Among many options, some alternative representa-
tions for this value are
y = 30(10)
7→ ⟨0, 1, 1, 1, 1, 0, 0, 0⟩(2)
7→ ⟨2, 3, 1, 0⟩(4)
7→ ⟨0, −1, 0, 0, 0, +1, 0, 0⟩(2)
7→ ⟨−2, 0, 2, 0⟩(4)
The first case is y represented in base-2, but, after this, which is somewhat obvious given what we already
know; after this, the cases less obviously represent the same value. However, it is important to see why they
are equivalent. The second case requires a larger digit set. Each yi ∈ {0, 1, 2, 3} as a result of it using a base-4
representation, but, even so, we still have
⟨2, 3, 1, 0⟩(4) 7→ 2 · 40 + 3 · 41 + 1 · 42 + 0 · 43
= 2 + 12 + 16
= 30
The third and fourth cases use signed digit sets; for example, in the third case each yi ∈ {−1, 0, +1}. Again, we
still have
⟨0, −1, 0, 0, 0, +1, 0, 0⟩(2) 7→ 0 · 20 − 1 · 21 + 0 · 22 + 0 · 23 + 0 · 24 + 1 · 25 + 0 · 26 + 0 · 27
= −2 + 32
= 30
It might not be immediately clear why these representations offer any advantage. Intuitively, however, note
two things:
4 Other n
extensions and generalisations also exist. For example, we could apply the strategy recursively (i.e., decomposing the 2 -bit
multiplications in a similar way), or attempt other forms of split (st. x and y are split into say 3 parts, rather than 2 as above).

git # 8b6da880 @ 2023-09-27 160


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-bit, base-2 integers x and y


Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = n − 1 downto 0 step −1 do
3 r←2·r
4 if yi = 1 then
5 r←r+x
6 end
7 end
8 return r
Algorithm 9: An algorithm for multiplication of base-2 integers using a iterative, left-to-right, bit-serial
strategy.

Input: Two unsigned, n-bit, base-2 integers x and y


Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = 0 upto n − 1 step +1 do
3 if yi = 1 then
4 r←r+x
5 end
6 x←2·x
7 end
8 return r
Algorithm 10: An algorithm for multiplication of base-2 integers using a iterative, right-to-left, bit-serial
strategy.

1. the first case requires eight base-2 digits to represent a given y, but the forth case can do the same with
only four, and

2. the first case requires four non-zero base-2 digits to represent this y, but the forth case can do the same
with only two.

In short, features such as these, when generalised, allow more efficient strategies (in time and/or space) for
multiplication using y′ than y.
Whatever representation we select, however, it is crucial that any overhead related to producing and using
the recoded y′ is always less than the associated improvement during multiplication. Put another way, if the
improvement is small and the overhead is large, then overall we are worse off: we may as well not using
recoding at all! This requires careful analysis, for example to judge the relative merits of a specific recoding
strategy given a specific n.

3.5.2 Iterative, bit-serial designs


Definition 3.4. As originally written, Horner’s Rule [7] is a method for evaluating polynomials: it states that a
polynomial a(x) can be written as

a0 + a1 · x + · · · + an−1 · xn−1 ≡ a0 + x · (a1 + x · (· · · + x · (an )))

where the RHS factors out powers of the indeterminate x from the LHS.

This fact provides a starting point for an iterative multiplier design. Consider the similarity between a
polynomial
i<n
X
a(x) = ai · xi
i=0

and an integer y represented using a positional number system

i<n
X
y= yi · bi .
i=0

git # 8b6da880 @ 2023-09-27 161


© Daniel Page ⟨dan@phoo.org⟩

r 1

x
y c
r r0

x +
yi

Figure 3.11: An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram.

Put simply, there is no difference wrt. the form: only the names of variables are changed, plus b represents
an implicit parameter in the latter whereas x is an explicit indeterminate in the former. As a result, we can
consider a similar way of evaluating
i<n
X
y·x= yi · x · bi
i=0

which again has the same form.

Example 3.28. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) . As previously stated we would
write
i<n
y·x = yi · x · bi
P
i=0

= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
but now this can be rewritten using Horner’s Rule as

y·x = y0 · x + 2 · ( y1 · x + 2 · ( y2 · x + 2 · ( y3 · x + 2 · ( 0 ) ) ) )
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x+2·(0))))
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x+ 0 )))
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x )))
= 0·x+2·( 1·x+2·( 1·x+ 2·x ))
= 0·x+2·( 1·x+2·( 3·x ))
= 0·x+2·( 1·x+ 6·x )
= 0·x+2·( 7·x )
= 0·x+ 14 · x
= 14 · x

There are two sane approaches to evaluate the bracketed expression: we either

1. work inside-out, starting with the inner-most sub-expression and processing y from most- to least-
significant bit (i.e., from yn−1 to y0 ), meaning they are read left-to-right, or

2. work outside-in, starting with the outer-most sub-expression and processing y from least- to most-
significant bit (i.e., from y0 to yn−1 ), meaning they are read right-to-left.

Either way, note that each successive multiplication by 2 eventually accumulates to produce each 2i . Using
y3 · x as an example, we see it multiplied by 2 a total of three times: this means we end up with

2 · (2 · (2 · (y3 · x))) = y3 · x · 23 ,

and hence the original term required. Putting everything together, to compute the result we maintain an
accumulator r that hold the current (or partial) result during evaluation; using r, the computation could be
described as

• start with the inner sub-expression, initially setting r = 0, then

• to realise each step of evaluation, apply a simple 2-part rule: first double the accumulated result (i.e., set
r to 2 · r), then add yi · x to the accumulated result (i.e., set r to r + yi · x).

git # 8b6da880 @ 2023-09-27 162


© Daniel Page ⟨dan@phoo.org⟩

Slightly more formally this implies iterative application of the rule

r ← yi · x + 2 · r

which is further formalised by Algorithm 9: notice that line #1 realises the first point above while lines #3 to #6
realise the second point, with a loop spanning lines #2 to #7 iterating over them to realise each step. Although
we will continue to focus on this approach, it is interesting to note, as an aside, that Algorithm 10 will yield the
same result.

Example 3.29. Consider the following trace of Algorithm 9, for y = 14(10) 7→ 1110(2) :

i r yi r′
0
3 0 1 x r′ ←2·r+x
2 x 1 3·x r′ ←2·r+x
1 3·x 1 7·x r′ ←2·r+x
0 7·x 0 14 · x r′ ←2·r
14 · x

Algorithm 9 is termed the left-to-right variant, since it processes y from the most- down to the least-significant
bit (i.e., starting with yn−1 , on the left-hand end of y when written as a literal).

Example 3.30. Consider the following trace of Algorithm 10, for y = 14(10) 7→ 1110(2) :

i r x yi r′ x′
0 x
0 0 x 0 0 2·x x′ ←2·x
1 0 2·x 1 2·x 4·x r ← r + x, x′

←2·x
2 2·x 4·x 1 6·x 8·x r′ ← r + x, x′ ←2·x
3 6·x 8·x 1 14 · x 16 · x r′ ← r + x, x′ ←2·x
14 · x

Algorithm 10 is termed the right-to-left variant, since it processes y from the least- up to the most-significant
bit (i.e., starting with y0 , on the right-hand end of y when written as a literal).

Whereas the left-to-right variant only updates r, the right-to-left alternative updates r and x; this may be deemed
an advantage for the former, since we only need one register (at least one that is updated in any way, vs. simply
a fixed input) rather than two. Beyond this, however, how does either strategy compare to the approach
based on repeated addition which took O(2n ) operations in the worst case? In both algorithms, the number
of operations performed is dictated by the number of loop iterations: using Algorithm 9 as an example, in
each iteration we a) always perform a shift to compute r ← 2 · r, then b) conditionally perform an addition to
compute r ← r + x (which will be required in half the iterations on average assuming a random y). In other
words we perform O(n) operations, which is now dictated by the size of y (say n = 8 or n = 32) rather than the
the magnitude of y (say 2n = 28 = 256 or 2n = 232 = 4294967296) as it was before.
Whether we use Algorithm 9 or Algorithm 10, the general strategy is termed bit-serial multiplication
because use the 1-bit value yi in each iteration; the remaining challenge is to translate this strategy into a
concrete design we can implement. We did something similar by translating Algorithm 3 into an iterative
design for left-shift in Figure 3.8, so can adopt the same idea here: Figure 3.11 outlines a (partial) design that
implements the loop body (in lines #3 to #6) of Algorithm 9. Notice that, as before,

• the left-hand side shows a register to store r (i.e., the current value of r at the start of the loop body),

• the right-hand side shows a register to store r′ (i.e., the next value of r at the end of the loop body), and

• the middle shows some combinatorial logic that computes r′ from r: this is more complex than the
left-shift case, but the idea is that a) the 1-bit left-shift component computes r ≪ 1 = 2 · r, then b) the
multiplexer component selects between 2 · r and 2 · r + x (the latter of which is computed by an adder)
depending on yi .

To control this data-path, we again need an FSM: in each i-th step it will take r′ (representing yi · x + 2 · r, per
the above) and latches it back into t ready for the (i + 1)-th step. A similar trade-off is again evident, in the
sense that although we only need an adder and multiplexer (plus a register for r and the FSM), the result will
be computed after n steps.

git # 8b6da880 @ 2023-09-27 163


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = n − 1 downto 0 step −d do
3 r ← 2d · r
4 if yi...i−d+1 , 0 then
5 r ← r + yi...i−d+1 · x
6 end
7 end
8 return r
Algorithm 11: An algorithm for multiplication of base-2 integers using a iterative, left-to-right, digit-serial
strategy.

Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = 0 upto n − 1 step +d do
3 if yi+d−1...i , 0 then
4 r ← r + yi+d−1...i · x
5 end
6 x ← 2d · x
7 end
8 return r
Algorithm 12: An algorithm for multiplication of base-2 integers using a iterative, right-to-left, digit-serial
strategy.

3.5.3 Iterative, digit-serial designs


3.5.3.1 Improvements via standard digit-serial multiplication: using an unsigned digit set
By definition, a bit-serial multiplier processes a 1-bit digit of y in each of n steps. However, this actually
represents a special case of more general digit-serial multiplication: a digit size d is selected, and used to
recode y by splitting it into d-bit digits then processed in each of nd steps (noting that d = 1 is the special case
referred to above). Provided d divides n, extracting each d-bit digit from y is easy: by writing y in binary,
recoding it to form y′ means splitting the sequence of bits into d-element sub-sequences.

Example 3.31. Consider d = 2, and some y st. n = 4: this implies we process nd = 42 = 2 digits in y, each of 2 bits.
Based on what we covered originally, we already know, for example, that in base-2
n−1
r = = ( yi · 2i ) · x
P
y·x
i=0
n−1
= yi · x · 2i
P
i=0
= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
= y0 · x + 2 · (y1 · x + 2 · (y2 · x + 2 · (y3 · x + 2 · (0))))

The only change is to combine y0 and y1 into a single digit whose value is y0 + 2 · y1 ; this is basically just treating
the two 1-bit digits in y as one 2-bit digit. By doing so, we can rewrite the expression as follows:

r = y · x = (y0 + 2 · y1 ) · x · 20 + (y2 + 2 · y3 ) · x · 22
= y1...0 · x · 20 + y2...3 · x · 22
= y1...0 · x + 22 · (y2...3 · x + 22 · (0))

The term y1...0 should be read as “the bits of y from 1 down to 0 inclusive”, so clearly y1...0 ∈ {0, 1, 2, 3}. As such,
consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) :

r = y·x = y1...0 · x + 22 · (y2...3 · x + 22 · (0))


= 10(2) · x + 22 · (11(2) · x + 22 · (0))
= 2 · x + 22 · (3 · x + 22 · (0))
= 2 · x + 12 · x
= 14 · x

git # 8b6da880 @ 2023-09-27 164


© Daniel Page ⟨dan@phoo.org⟩

To implement this new strategy, however, Algorithm 9 needs to be generalised for any d. Recall that for the
special case of d = 1, we already saw and used a rule

r ← yi · x + 2 · r.

Looking at the example above, a similar form

r ← yi...i−d+1 · x + 2d · r

can be identified, which differs slightly in both left- and right-hand terms of the addition.

• The right-hand term is simple to accommodate. Rather than multiply r by 2 as before, we now multiply
it by 2d ; we already know this can be realised by left-shifting r by a distance of d, i.e., computing
2d · r ≡ r ≪ d.
• The left-hand term is more tricky. For d = 1, we needed to compute yi · x but argued doing so was
essentially a choice: because yi ∈ {0, 1}, the result is either 0 or x. Now, each d-bit digit

yi...i−d+1 ∈ {0, 1, . . . 2d − 1}

could be any one of 2d values rather than 21 = 2, so either a) the choice is more involved, i.e., includes
more cases, or b) we abandon the idea of it being a choice at all, instead using a combinatorial (d × n)-bit
multiplier to compute yi...i−d+1 · x directly (related designs are covered in Section 3.5.4); you can view this
component as replacing the multiplexer shown in Figure 3.11, which, by analogy, realised the (1 × n)-bit
multiplication yi · x.

Making these changes yields Algorithm 11. Note that line #4 could be implemented via either option above,
and that, as outlined above, extracting the digit yi...i−d+1 from y is simple enough that we view is as happening
during multiplication, ignoring the need to formally recode y into y′ beforehand. Either way, a clear advantage
is already evident: we now require nd steps to compute the result.
Example 3.32. Consider the following trace of Algorithm 12, for y = 14(10) 7→ 1110(2) :

i r yi...i−d+1 r′
0
3 0 11(2) = 3(10) 3·x r′ ← 22 · r + 3 · x
1 3 · x 10(2) = 2(10) 14 · x r′ ← 22 · r + 2 · x
14 · x

Assuming use of a combinatorial (d × n)-bit multiplier, one way to think about a digit-serial multiplier is as a
hybrid combination of iterative and combinatorial designs: it is iterative, in that it performs nd steps, but now
each i-th such step utilises a (d × n)-bit combinatorial multiplier component. Given we can select d, the hybrid
can be configured to make a trade-off between time and space: larger d implies fewer steps of computation but
also a larger combinatorial multiplier, and vice versa.

3.5.3.2 Improvements via Booth multiplication: using a signed digit set


A question: what is the most efficient way to compute r = 15 · x, i.e., r = y · x for the fixed multiplier y = 15?
We already know that left-shifting x by a fixed distance requires no computation, so a reasonable first answer
might be to compute

r = 15 · x = 8·x + 4·x + 2·x + 1·x


= 23 · x + 22 · x + 21 · x + 20 · x
= x≪3 + x≪2 + x≪1 + x≪0

A better strategy exists however: remember that computing a subtraction is (more or less) as easy as an
addition, so we might instead opt for

r = 15 · x = 16 · x − 1·x
= 24 · x − 20 · x
= x≪4 − x≪0

Intuitively, this latter strategy should seems preferable given we only sum two terms rather than four. Booth
recoding [2] is a standard recoding-based strategy for multiplication which generalises this example. Although
various versions of the approach are considered in what follows, the advantages they all offer stem from use
of a signed representation of y and hence use of addition and subtraction operations.

git # 8b6da880 @ 2023-09-27 165


© Daniel Page ⟨dan@phoo.org⟩

Original, base-2 Booth recoding

Definition 3.5. Given a binary sequence y, a run of 1 (resp. 0) bits between i and j means yk = 1 (resp. yk = 0) for
i ≤ k ≤ j; in simply terms, this means there is a sub-sequence of consecutive bits in y whose value is 1 (resp. 0).

Example 3.33. Consider y = 30(10) 7→ 00011110(2) : we can clearly identify

• a run of one 0 bit between i = 0 and j = 0, i.e., yk = 0 for 0 ≤ k ≤ 0,

• a run of four 1 bits between i = 1 and j = 4, i.e., each yk = 1 for 1 ≤ k ≤ 4, and

• a run of three 0 bits between i = 5 and j = 7 i.e., each yk = 0 for 5 ≤ k ≤ 7.

As a starting point we consider base-2 Booth recoding; the idea is to identify a run of 1 bits in y between i and
j, and then replace it with a single digit whose weight is 2 j+1 − 2i .

Example 3.34. Consider y = 30(10) 7→ 00011110(2) : since there is a run of four 1 bits between i = 1 and j = 4, and
the fact that
2 j+1 − 2i = 24+1 − 21 = 25 − 21 = 30,
we can recode
y = 30(10)
7→ 24 + 23 + 22 + 21
7→ ⟨0, +1, +1, +1, +1, 0, 0, 0⟩(2)
into
y′ = ⟨0, −1, +0, +0, +0, +1, 0, 0⟩(2)
7→ −21 + 25
7→ 30(10)
which clearly still represents the same value (albeit now via a signed digit set, st. y′i ∈ {0, ±1} vs. yi ∈ {0, 1}).
Using the same intuition as previously, the recoded y′ is preferable to y because it has a lower weight (i.e.,
number of non-zero digits). We can see the impact this feature has by illustrating how such a y′ might be used
during multiplication. Given x = 6(10) 7→ 00000110(2) , for example, we would normally compute r = y · x as

x = 6(10) →
7 0 0 0 0 0 1 1 0
y = 30(10) → 7 0 0 0 1 1 1 1 0 ×

p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0
p1 = +1 · x · 21 = +12(10) 7 → 0 0 0 0 0 1 1 0
p2 = +1 · x · 22 = +24(10) 7 → 0 0 0 0 0 1 1 0
p3 = +1 · x · 23 = +48(10) 7 → 0 0 0 0 0 1 1 0
p4 = +1 · x · 24 = +96(10) 7 → 0 0 0 0 0 1 1 0
p5 = 0 · x · 25 = 0(10) 7 → 0 0 0 0 0 0 0 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 180(10) 7 → 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0

and thus accumulate four non-zero partial products. However, by first recoding y into y′ we find

x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 30(10) 7→ 0 0 0 1 1 1 1 0 ×
y′ (2) = 30(10) 7 → 0 0 +1 0 0 0 −1 0
p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0
p1 = −1 · x · 21 = −12(10) 7 → 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p2 = 0 · x · 22 = 0(10) 7 → 0 0 0 0 0 0 0 0
p3 = 0 · x · 23 = 0(10) 7 → 0 0 0 0 0 0 0 0
p4 = 0 · x · 24 = 0(10) 7 → 0 0 0 0 0 0 0 0
p5 = +1 · x · 25 = +192(10) 7 → 0 0 0 0 0 1 1 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 180(10) 7 → 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0

which requires accumulation of two non-zero partial products.

git # 8b6da880 @ 2023-09-27 166


© Daniel Page ⟨dan@phoo.org⟩

Modified, base-4 Booth recoding A base-2 Booth recoding already seems to produce what we want. However,
there is a subtle problem: using y′ does not always yield an improvement over y itself. This can be demonstrated
by example:
Example 3.35. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2) , which, based on recoding y, we
would compute r = y · x as

x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 5(10) 7 → 0 0 0 0 0 1 0 1 ×
y′ (2) = 5(10) 7 → 0 0 0 0 +1 −1 +1 −1
p0 = −1 · x · 20 = −6(10) 7→ 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p1 = +1 · x · 21 = +12(10) 7→ 0 0 0 0 0 1 1 0
p2 = −1 · x · 22 = −24(10) 7 → 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p3 = +1 · x · 23 = +48(10) 7 → 0 0 0 0 0 1 1 0
p4 = 0 · x · 24 = 0(10) 7 → 0 0 0 0 0 0 0 0
p5 = 0 · x · 25 = 0(10) 7 → 0 0 0 0 0 0 0 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0

This requires accumulation of four non-zero partial products, as would be the case if using y as is.
Based on the original Booth recoding as a first step, to resolve this problem we employ a second recoding step
based on the y′ (2) we already have:

1. reading y′ right-to-left, group the recoded digits into pairs of the form (y′i , y′i+1 ), then
2. treat each pair as a single digit whose value is y′i + 2 · y′i+1 per

y′i = 0 y′i+1 = 0 7→ 0
y′i = +1 y′i+1 = 0 7 → +1
y′i = −1 y′i+1 = 0 7 → −1
y′i = 0 y′i+1 = +1 7→ +2
y′i = +1 y′i+1 = +1 7→ not possible
y′i = −1 y′i+1 = +1 7→ +1
y′i = 0 y′i+1 = −1 7→ −2
y′i = +1 y′i+1 = −1 7→ −1
y′i = −1 y′i+1 = −1 7→ not possible

meaning that y′i+1 has twice the weight of y′i .

Given we originally had a signed base-2 recoding of y, we now have a signed base-4 recoding of the same y
(termed the modified Booth recoding): each pair represents a digit in {0, ±1, ±2}. Note that the two invalid (or
impossible) pairs exists because of the original Booth recoding: we cannot encounter them, because the first
recoding step will have already eliminated the associated run.
Example 3.36. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2) ; based on the modified recoding,
we would compute r = y · x as

x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 5(10) 7 → 0 0 0 0 0 1 0 1 ×
y′ (2) = 5(10) 7 → 0 0 0 0 +1 −1 +1 −1
y′ (4) = 5(10) 7→ +1 +1
p0 = +1 · x · 20 = +6(10) 7 → 0 0 0 0 0 1 1 0

p2 = +1 · x · 22 = +24(10) 7→ 0 0 0 0 0 1 1 0

p4 = 0 · x · 24 = 0(10) 7→ 0 0 0 0 0 0 0 0

p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0

r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0

and this accumulate two non-zero partial products rather than four.

git # 8b6da880 @ 2023-09-27 167


© Daniel Page ⟨dan@phoo.org⟩

Input: An unsigned, n-bit, base-2 integer y


Output: A base-2 Booth recoding y′ of y
1 y′ ← ∅
2 for i = 0 upto n step 2 do
3 if (i − 1) < 0 then t0 ← 0 else t0 ← yi−1
4 if i ≥ n then t1 ← 0 else t1 ← yi
5 if (i + 1) ≥ n then t2 ← 0 else t2 ← yi+1
 0 if t = 000(2)  0 if t = 000(2)
 
+1 if t = 001(2) 0 if t = 001(2)

 


 

 
= +1 if t = 010(2)
 


 −1 if t 010 (2)



 0 if t = 011(2) ′  +1 if t = 011(2)

 


 
6 yi ←  y ←

 0 if t = 100(2) i+1 
 −1 if t = 100(2)
+1 =  −1 if t = 101(2)
 
if t 101

 


 (2) 

= 0 if t = 110(2)



 −1 if t 110 (2)



0 if t = 111(2) 0 if t = 111(2)

 

7 end
8 return y′
Algorithm 13: An algorithm for base-2 Booth recoding.

An algorithm for Booth-based recoding Both the first and second recoding steps above are still presented in
a somewhat informal manner, because the goal was to demonstrate the idea; to make use of them in practice,
we obviously need an algorithm. Fortunately, such an algorithm is simple to construct: notice that in a base-2
Booth recoding

• y′i depends on yi−1 and yi , while

• y′i+1 depends on yi and yi+1

and since these digits are paired to form the base-4 Booth recoding, each digit in that depends on yi−1 , yi , and
yi+1 . Thanks to this observation, the recoding process is easier than it may appear: assuming suitable padding
of y (i.e., y j = 0 for j < 0 and j ≥ n), we can produce digits of y′ from a 2- or 3-bit sub-sequence (or window) of
bits in y via
Unsigned Signed Signed
base-2 base-2 base-4
yi+1 yi yi−1 y′i+1 y′i y′i/2
0 0 0 0 0 0
0 0 1 0 +1 +1
0 1 0 +1 −1 +1
0 1 1 +1 0 +2
1 0 0 −1 0 −2
1 0 1 −1 +1 −1
1 1 0 0 −1 −1
1 1 1 0 0 0
Algorithm 13 and Algorithm 14 capture these rules in algorithms that produce a base-2 and base-4 recodings of
a given y respectively. Crucially, one can unroll the loop to produce a combinatorial circuit. For Algorithm 14
say, one would replicates a single recoding cell: each instance of the cell would accepts three bits of y as input
(namely yi+1 , yi , and yi−1 ) and produce a digit of the recoding as output. This implies that the recoding could
be performed during rather than before the subsequent multiplication; the only significant overhead relates to
increased area.

An algorithm for Booth-based multiplication Finally, we can address the problem of using the recoded
multiplier to actually perform the multiplication above: ideally this should be more efficient than the bit-serial
starting point. Algorithm 15 captures the result, which one can think of as a form of digit-serial multiplier:
each iteration of the loop processes a digit of a recoding formed from multiple bits in y.

1. In Algorithm 9, |y| = n dictates the number of loop iterations; Algorithm 11 improves this to n
d for
appropriate choices of d. In comparison, Algorithm 15 requires fewer, i.e.,

|y| n
|y′ | ≃ ≃ ,
2 2

git # 8b6da880 @ 2023-09-27 168


© Daniel Page ⟨dan@phoo.org⟩

Input: An unsigned, n-bit, base-2 integer y


Output: A base-4 Booth recoding y′ of y
1 y′ ← ∅
2 for i = 0 upto n step 2 do
3 if (i − 1) < 0 then t0 ← 0 else t0 ← yi−1
4 if i ≥ n then t1 ← 0 else t1 ← yi
5 if (i + 1) ≥ n then t2 ← 0 else t2 ← yi+1
 0 if t = 000(2)

+1 if t = 001(2)





+1 if t = 010(2)




 +2 if t = 011(2)




6 yi/2 ←  
 −2 if t = 100(2)
if t = 101(2)

−1




t = 110(2)



 −1 if
0 if t = 111(2)


7 end
8 return y′
Algorithm 14: An algorithm for base-4 Booth recoding.

Input: An unsigned, n-bit, base-2 integer x, and a base-4 Booth recoding y′ of some integer y
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = |y′ | − 1 downto 0 step −1 do
3 r ← 22 · r
r − 2 · x if y′i = −2



 r − 1 · x if y′i = −1



4 r←
 r + 1 · x if yi = +1
 ′

 r + 2 · x if y′ = +2

i
5 end
6 return r
Algorithm 15: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serial
strategy with base-4 Booth recoding.

iterations. As with the digit-serial strategy, to allow this to work, we need to compute 22 · r in line #3
(rather than 2·r), but this can be realised by left-shifting r by a distance of d, i.e., computing 22 ·r ≡ r ≪ 2.
2. In Algorithm 9 we had yi ∈ {0, 1} and in Algorithm 9 we had yi...i−d+1 ∈ {0, 1, . . . 2d − 1}. In Algorithm 15,
however, we have y′i ∈ {0, ±1, ±2}. This basically means we have to test each non-zero y′i against more
cases than before: line #4 captures them in one rather than use a more lengthy set of conditions. In short,
dealing with y′i = −1 vs. y′i = +1 is easy: we simply subtract x from r rather than adding x to r. In the
same way, dealing with y′i = −2 and y′i = +2 mean subtracting (resp. adding) 2 · x from (resp. to) r; since
2 · x can be computed via a shift of x (vs. an extra addition), there is no real overhead vs. subtracting
(resp. adding) x itself.

Example 3.37. Consider y = 14(10) 7→ 1110(2) : we first use Algorithm 14 as follows

i yi+1 yi yi−1 t2 t1 t0 t y′

0 1 0 ⊥ 1 0 0 100(2) ⟨−2⟩
2 1 1 1 1 1 1 111(2) ⟨−2, 0⟩
4 ⊥ ⊥ 1 0 0 1 001(2) ⟨−2, 0, +1⟩
⟨−2, 0, +1⟩

to recode y into y′ , then use Algorithm 15 as follows

i y′i r r′
0
2 +1 0 x r ′ ← 22 · r + 1 · x
1 0 x 4·x r′ ← 2 2 · r
0 −2 4 · x 14 · x r′ ← 2 2 · r − 2 · x
14 · x

git # 8b6da880 @ 2023-09-27 169


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = 0 upto n − 1 step +d do
3 if yd−1...0 , 0 then
4 r ← r + yd−1...0 · x
5 end
6 x ← x · 2d
7 y ← y/2d
8 if y = 0 then
9 return r
10 end
11 end
12 return r
Algorithm 16: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serial
strategy with early termination.

to produce the result expected in three rather than six steps.

3.5.3.3 Improvements via early termination: avoiding unnecessary iterations


Example 3.38. Consider digit-serial multiplication using d = 2 where y = 30(10) 7→ 00011110(2) : as a first step
we recode y into y′ = ⟨10(2) , 11(2) , 01(2) , 00(2) ⟩ by splitting the former sequence of bits into 2-bit sub-sequences.
We then process y′ either left-to-right or right-to-left, as reflected by traces of Algorithm 11

i r yi...i−d+1 r′
0
7 0 00(2) = 0(10) 0 r′ ← 22 · r
5 2 · x 01(2) = 1(10) 1·x r′ ← 22 · r + 1 · x
3 1 · x 11(2) = 3(10) 7·x r′ ← 22 · r + 3 · x
1 7 · x 10(2) = 2(10) 30 · x r′ ← 22 · r + 2 · x
30 · x

Algorithm 12
i r x yi+d−1...i r′ x′
0 x
0 0 x 10(2) = 2(10) 2·x 22 · x r′ ← r + 2 · x, x′ ← 22 · x
2 2 · x 22 · x 11(2) = 3(10) 14 · x 24 · x r′ ← r + 3 · x, x′ ← 22 · x
4 14 · x 24 · x 01(2) = 1(10) 30 · x 26 · x r′ ← r + 1 · x, x′ ← 22 · x
6 30 · x 26 · x 00(2) = 0(10) 30 · x 28 · x r′ ← r + 0 · x, x′ ← 22 · x
30 · x
respectively; both produce r = 30 · x as expected.
Notice that the 2 MSBs of y are both 0, i.e., y7 = y6 = 0 st. y7...6 = 0 and hence y′3 = 00(2) . This fact can
be harnessed to optimise both algorithms. Algorithm 11 processes y′ left-to-right so y′3 is the first digit: the
iteration where i = 7 extracts the digit l

yi...i−d+1 = y7...6 = y′3 = 0.

As such, we know that


r ← r + yi...i−d+1 · x
leaves r unchanged: the condition yi...i−d+1 , 0 allows us to skip said update for i = 7, and thus be more efficient.
Algorithm 12 processes y′ right-to-left so y′3 is the last digit: the iteration where i = 6 extracts the digit l

yi+d−1...i = y7...6 = y′3 = 0.

The same argument applies here, in the sense we can skip the associated update of r. In fact, we can be more
aggressive by skipping multiple such updates. If in some i-th iteration the digits processed by all j-th iterations
for j > i are zero, then we may as well stop: none of them will update r, meaning the algorithm can return
it early as is (rather than perform extra iterations). This strategy is normally termed early termination; using
Algorithm 12 as a starting point, it is realised by Algorithm 16.

git # 8b6da880 @ 2023-09-27 170


© Daniel Page ⟨dan@phoo.org⟩

Example 3.39. Consider the following trace of Algorithm 16 for y = 30(10) 7→ 00011110(2) :

i r x y yd−1...0 r′ x′ y′
0 x 00011110(2)
0 0 x 00011110(2) 10(2) = 2(10) 2·x 22 · x 00000111(2) r′ ← r + 2 · x, x′ ← x · 22 , y′ ← y/22
2 2·x 2
2 · x 00000111(2) 11(2) = 3(10) 14 · x 24 · x 00000001(2) r′ ← r + 3 · x, x′ ← x · 22 , y′ ← y/22
4 14 · x 24 · x 00000001(2) 01(2) = 1(10) 30 · x 26 · x 00000000(2) r′ ← r + 1 · x, x′ ← x · 22 , y′ ← y/22
30 · x

Once r, x, and y have been updated within the iteration for i = 4, we find y′ = 0: this triggers the conditional
statement, meaning r is returned early after three (via line #9) vs. four (via line #12) iterations: the correct result
r = 30 · x is produced as expected.
Although this should seem attractive, some trade-offs and caveats apply. First, the loop body, spanning lines
#3 to #10 of Algorithm 16, is obviously more complex than the equivalent in Algorithm 12. Specifically, r, x,
and y all need to be updated, and the FSM controlling iteration needs to test y and conditionally return r: this
makes it more complex as well. Second, this added complexity, which typically means an increased area, only
potentially (rather than definitively) reduces the latency of multiplication. Put simply, the number of iterations
now depends on the value of y (i.e., whether y′ contains more-significant digits that are 0 st. the algorithm can
skip them), which we cannot know a priori: if this property does not hold, the algorithm will be no better than
standard digit-serial multiplication.

3.5.4 Combinatorial designs


3.5.4.1 A vanilla tree multiplier
In Section 3.5.2, we made use of Horner’s Rule as our starting point; the iterative nature by which the associated
expression
y · x = y0 · x + 2 · (y1 · x + 2 · (· · · yn−1 · x + 2 · (0)))
was evaluated translated naturally into an iterative algorithm. For a combinatorial alternative, however, we
adopt a different starting point and (re)consider
i<n
X
y·x= yi · x · bi .
i=0

Developing a design directly from this expression is surprisingly easy: we just need to generate each term,
which represents a partial product, then add them up. Figure 3.13 is a (combinatorial) tree multiplier whose
design stems from this idea. It can be viewed, from top-to-bottom, as three layers:

1. The top layer is comprised of n groups of n AND gates: the i-th group computes x j ∧ yi for 0 ≤ j < n,
meaning it outputs either 0 if yi = 0 or x if yi = 1. You can think of the AND gates as performing all n2
possible (1 × 1)-bit multiplications of some x j and yi , or a less general form of multiplexer that selects
between 0 and x based on yi .
2. The middle layer is comprised of n left-shift components. The i-th component shifts by a fixed distance
of i bits, meaning the output is either 0 if yi = 0, or x · 2i if yi = 1. Put another way, the output of the i-th
component in the middle layer is
yi · x · 2i
i.e., some i-th partial product in the long-hand description of y · x.
3. The bottom layer is a balanced, binary tree of adder components: these accumulate the partial products
resulting from the middle layer, meaning the output is
n−1
X
r= yi · x · 2i = y · x
i=0

as required.

In Section 3.5.2, both iterative multiplier designs we produced made a trade-off: they required O(n) time and
O(1) space, thus representing high(er) latency but low(er) area. Here we have more or less the exact opposite
trade-off. The design is combinatorial, so takes O(1) time where the constant involved basically represents the
critical path. However, it clearly takes a lot more space; is is difficult to state formally how much, but the fact
the design includes a tree of several adders vs. one adder hints this could be significant.

git # 8b6da880 @ 2023-09-27 171


© Daniel Page ⟨dan@phoo.org⟩

Beyond this comparison, it is important to consider various subtleties that emerge if the block diagram is
implemented as a concrete circuit. First, notice that the critical path looks like O(log2 (n)) gate delays, because
this describes the depth of the (balanced) tree as used to form the bottom layer. However, because each node in
said tree is itself an adder, the actual critical path is more like O(n log2 (n)). Even this turns out to be optimistic:
notice, second, that those adders lower-down in the tree (i.e., closer to the root) must be larger (and hence
more complex) than those higher-up. This is simply because the intermediate results get larger; the first level
adds two n-bit partial products to produce a (n + 1)-bit intermediate result, whereas the last level adds two
(2n − 1)-bit intermediate values to produce the 2n-bit result.

3.5.4.2 Wallace and Dadda tree multipliers


An obvious next question is whether and how we can improve the vanilla tree multiplier design. There are
various possibilities, but one would be to focus on reducing the critical path and latency. One idea is to
reimplement the tree using carry-save rather than ripply-carry adders; this is a natural replacement given the
former was specifically introduced for use in contexts where we accumulate multiple inputs (or partial products).
Another idea is to examine Figure 3.13 in detail, and identify features that can be optimised at a low(er)-level.
Candidates might include where we can use half- vs. full-adder cells, which are, of course, less complex. The
Wallace multiplier [13], and Dadda multiplier [5] designs employ a combination of both approaches, with the
goal of reducing the critical path: they still represent combinatorial designs, but aim to have a smaller critical
path and hence lower latency.
As with the vanilla tree multiplier, Wallace and Dadda multipliers are comprised of a number of layers.
More specifically, you should think both as comprising

1. an initial layer,
2. O(log n) layers of reduction, and
3. a final layer

where the difference is, basically, how those layers are designed. The initial layer generates the partial products,
then the second and third layers accumulate them; this is somewhat similar to the tree multiplier. However,
rather than perform the latter using a tree of general-purpose adders, however, a carefully designed, special-
purpose tree is employed. Producing a design for Wallace and Dadda multipliers follows a different process
than we have used before. Rather than develop an algorithm then translate it into design, the multipliers are
generated directly by an algorithm. Given a value of n as input, Algorithm 17 and Algorithm 18 generate
Wallace and Dadda multipliers respectively; both are described in three steps that mirror the layers above.
Example 3.40. Consider n = 4, where we want to produce a Wallace multiplier design that computes the
product r = y · x for 4-bit x and y; to do so, we use Algorithm 17.
An initial layer multiplies x j with yi for 0 ≤ i, j < 4, st. we produce

• one weight-0 wire, i.e., x0 · y0 ,


• two weight-1 wires, i.e., x0 · y1 and x1 · y0 ,
• three weight-2 wires, i.e., x0 · y2 , x2 · y0 , and x1 · y1 ,
• four weight-3 wires, i.e., x0 · y3 , x3 · y0 , x1 · y2 , and x2 · y1 ,
• three weight-4 wires, i.e., x1 · y3 , x3 · y1 , and x2 · y2 ,
• two weight-5 wires, i.e., x2 · y3 , and x3 · y2 ,
• one weight-6 wire, i.e., x3 · y3 , and, finally,
• zero weight-7 wires.

Figure 3.12a details the subsequent two reduction layers. For example, in the first reduction layer

• there is one input wire with weight-0, so we use a pass-through operation (denoted PT) which results in
one weight-0 wire as output,
• there are two input wires with weight-1, so we use a half-adder operation (denoted HA) which results in
one weight-2 wire and one weight-4 wire as output, and
• there are three input wires with weight-2, so we use a full-adder operation (denoted FA) which results in
one weight-4 wire and one weight-8 wire as output.

git # 8b6da880 @ 2023-09-27 172


© Daniel Page ⟨dan@phoo.org⟩

1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2
intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weight
stemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.
2. Reduce the number of intermediate wires using layers composed of full and half adders:
• Combine any three wires with same weight using a full-adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
• Combine any two wires with same weight using a half adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
• If there is only one wire with a given weight, just pass it through to the next layer.
3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:
merge the wires to form two 2n-bit values (padding as required), then add them together with an adder
component.

Algorithm 17: An algorithm to generate a Wallace tree multiplier design.

1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2
intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weight
stemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.
2. Reduce the number of intermediate wires using layers composed of full and half adders:

• Combine any three wires with same weight using a full-adder; the result in the next layer is one
wire of the same weight (i.e. the sum) and one wire a higher weight (i.e. the carry).
• If there are two wires with the same weight left, let w be that weight then:
– If w ≡ 2 mod 3 then combine the wires using a half-adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
– Otherwise, just pass them through to the next layer.
• If there is only one wire with a given weight, just pass it through to the next layer.

3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:
merge the wires to form two 2n-bit values (padding as required), then add them together with an adder
component.

Algorithm 18: An algorithm to generate a Dadda tree multiplier design.

git # 8b6da880 @ 2023-09-27 173


© Daniel Page ⟨dan@phoo.org⟩

Layer 1 Layer 2
Input Output Input Output
Weight Wires Operation Wires Wires Operation Wires
0 1 PT 1 1 PT 1
1 2 HA 1 1 PT 1
2 3 FA 2 2 HA 1
3 4 FA 3 3 FA 2
4 3 FA 2 2 HA 2
5 2 HA 2 2 HA 2
6 1 PT 2 2 HA 2
7 0 0 0 1
(a) Using a Wallace-based multiplier.

Layer 1 Layer 2
Input Output Input Output
Weight Wires Operation Wires Wires Operation Wires
0 1 PT 1 1 PT 1
1 2 PT 2 2 PT 2
2 3 FA 1 1 PT 1
3 4 FA 3 3 FA 1
4 3 FA 2 2 HA 2
5 2 PT 3 3 FA 2
6 1 PT 1 1 PT 2
7 0 0 0 0
(b) Using a Dadda-based multiplier.

Figure 3.12: A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs.

The resulting design, including the final layer, is illustrated by Figure 3.14.
Notice that n is the only input to the algorithm(s), so although the example is specific to n = 4 the general
structure will remain similar. In fact, the example highlights some important general points:

• we have 1 initial layer, log2 (n) = log2 (4) = 2 reduction layers, and 1 final layer,
• the reduction layers yield at most two wires with a given weight; we form then sum two 2n-bit values
(e.g., using a ripple-carry adder) to produce the result, and, crucially,
• there are no intra-layer carries in the reduction layer(s): the only carry chains that appear are inter-layer,
during reduction, or in the final layer.

Phrased as such, it should be clear why the concept of carry-save addition is relevant: the reduction and final
layers employ essentially the same concept, by compressing many inputs into few(er) outputs until the point
they can be summed to produce the result. If you look again at Algorithm 17 and Algorithm 18, the difference
between the two is within the second step: in the Dadda design, the number of wires of a given weight remains,
by-design, close to a multiple of three, which facilitate use of 3 : 2 compressors as a means of reduction. As
hinted at, a crucial feature of both designs is that each adder cell within the reduction layer(s) operates in
parallel so has an O(1) critical path; this suggests the overall critical path will be O(1 + log2 (n) + n) = O(n) gate
delays in both cases. Comparing Figure 3.12a with Figure 3.12b, we see the main difference is wrt. space not
time. More specifically, for n = 4 the Wallace multiplier uses 6 half-adders and 4 full-adders, and the Dadda
multiplier would use 1 half-adder and 5 full adders; this is a trend that holds for larger n.

3.5.5 Some multiplier case-studies


One of the challenges outlined at the start of this Section related to the large design space wrt. multiplication;
although we have only covered a sub-set of that design space, hopefully the challenge is already clear! As a
result of the possible trade-offs, it is hard to identify a single “correct” design. This means different micro-
processors, for example, legitimately opt for different designs so as to match their design constraints. In this
Section, we attempt to survey such choices in a set of real micro-processors: the survey is by no means
exhaustive, but will, none the less, offer better understanding of the associated constraints and designs used to
address them.

git # 8b6da880 @ 2023-09-27 174


© Daniel Page ⟨dan@phoo.org⟩

yn−1

y1n−1

y0n−1

n−1

n−1
yn−1
y00
y1 0

y0 1

y11

y1
x
x

x
x
x

x
x

x
0 1 n−1

+ +

Figure 3.13: An (n × n)-bit tree multiplier design, described using a circuit diagram.

0 ci co
0 x
y s r7

co ci co
x
y33 x x
y s y s r6

x
y23
co co ci co
x x x
y s y s y s r5
x
y32

x
y13
ci co co ci co
x
y3 x x x
1 y s y s y s r4
x
y22
x
y0
3
x
y30
ci co ci co ci co
x
y1 x x x
2 y s y s y s r3
x
y21

x
y02
ci co co ci co
x
y2 x x x
0 r2
y s y s 0 y s
x
y11

x
y10
co ci co
x x
y s 0 y s r1
x
y10

x 0 ci co
y00 x
r0
0 y s

Figure 3.14: A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram.

git # 8b6da880 @ 2023-09-27 175


© Daniel Page ⟨dan@phoo.org⟩

Example 3.41. The ARM Cortex-M0 [4] processor supports multiplication via the muls instruction: it yields
a truncated result, st. r = x · y is the least-significant 32 bits of the actual product given 32-bit x and y. The
instruction can be supported in two ways by a given implementation: the design can be a combinatorial,
requiring 1 cycle, or a iterative, requiring 32 cycles. The Cortex-M0 is typically deployed in an embedded
context, where area and power consumption are paramount. The latter, iterative multiplier design may
therefore be attractive choice: assuming increased latency (or time) can be tolerated, it satisfies the goal of
minimising area (or space) associated with this component of the associated ALU.

Example 3.42. The ARM7TDMI [1] processor houses a (8 · 32)-bit combinatorial multiplier; it supports digit-
serial multiplication (for the fixed case where d = 8) with early termination, as invoked by a range of instructions
including umull. One must assume ARM selected this design based on careful analysis. For instance, it seems
fair to claim that

• using a digit-serial multiplier makes an good trade-off between time and space (due to the hybrid,
combinatorial and iterative nature), which is particularly important for embedded processors, plus

• although early termination adds some overhead, it often produces a reduction in latency because of the
types of y used: quite a significant proportion will relate to address arithmetic, where y is (relatively)
small (e.g., as used to compute offsets from a fixed base address).

Example 3.43. Thomas and Balatoni [12] describe a (12 × 12)-bit multiplier design, intended for use in the PDP-8
computer: the design is based on an iterative strategy that makes use of Booth recoding.

Example 3.44. The MIPS R4000 [6] processor takes a somewhat similar, somewhat dissimilar approach to that
described here: it houses a Booth-based multiplier using exactly the recoding strategy described, but within a
(64 · 64)-bit combinatorial rather than an iterative design.
Mirapuri et al. [6, Page 13] detail the design, which splits the multiplication into Booth recoding, multi-
plicand selection, partial product generation and product accumulation steps. A series of carry-save adders
accumulate the partial products, which produces a result r that is stored into two 64-bit registers called hi and
lo (meaning more- and less-significant 64-bit halves).

Example 3.45. An iterative, bit-serial multiplier requires n steps to compute the product; with no further
optimisation, this constraint is inherent in the design. Although the data-path required is minimal, the need
for iterative use of that data-path demands a control-path (i.e., an FSM) of some sort. When placed in a
micro-processor, a resulting question is why we bother having a dedicated multiplier at all: why not just have
an instruction that performs one step of multiplication, and let the program make iterative use of it?
The MIPS-X processor [3] provides an concrete example of this approach: using a slightly rephrased notation
to match what has been described here, [3, Section 4.4.4] basically defines

if GPR[y]31 = 1 then


GPR[r] ← GPR[r] + GPR[x]









 GPR[y] ← GPR[y] ≪ 1
mstep GPR[x], GPR[y], GPR[r] 7→  else





 GPR[r] ← GPR[r]
GPR[y] ← GPR[y] ≪ 1





 end

i.e., a multiply-step instruction essentially matching lines #3 to #6 in Algorithm 9. As such, the idea is implement
a loop that iterates over mstep as described in [3, Appendix IV]. The reason y is left-shifted, is so that one can
test GPR[y]31 rather than GPR[y]i ; the former is updated by the shift, st. in each i-th iteration it does contain
GPR[y]i as required (give iteration is left-to-right, so starts with i = 31 and ends with i = 0).
There are advantages and disadvantages of either approach, i.e., use of a dedicated multiplier vs. an mstep
instruction, with some examples including:

• The mstep instruction removes the need for an FSM to control the dedicated multiplier, essentially
harnessing the existing processor as a control-path. As such, the overhead to support multiplication
within the processor is further reduced.

• On one hand, the 1-step nature of mstep suggests single-cycle execution; in contrast, the n-step nature of
the dedicated multiplier suggests multi-cycle execution. However, this is phrased in terms of processor
cycles: it could be reasonable for a dedicated multiplier and processor to make use of different clock
frequencies. Iff. the former is higher than the latter, n multiplier cycles can be less than n processor cycles
and so execution of n mstep instructions.

git # 8b6da880 @ 2023-09-27 176


© Daniel Page ⟨dan@phoo.org⟩

An aside: comparison using arithmetic.

It is tempting to avoid designing dedicated circuits for general-purpose comparison, by instead using arithmetic
to make the task easier (or more special-purpose at least). Glossing over the issue of signed’ness, we know for
example that

• x = y is the same as x − y = 0, and


• x < y is the same as x − y < 0

so we could re-purpose a circuit for subtraction to perform both tasks: we just compute t = x − y and then claim

• x = y iff. each ti = 0, and


• x < y iff. t < 0, or rather tn−1 = 1 given we are using two’s-complement.

There idea here is that the general-purpose comparison of x and y is translated into a special-purpose compar-
ison of t and 0.
This slight of hand seems attractive, but turns out to have some arguable disadvantages. Primarily, we
need to cope with signed x and y, and hence deal with cases where x − y overflows for example. In addition,
one could argue a dedicated circuit for comparison can be more efficient than subtraction: even if we reuse one
circuit for subtraction for both operations, cases might occur when this is not possible (e.g., in a micro-processor,
where often we need to do both at the same time).

Input: Two unsigned, n-digit, base-b integers x and y


Output: If x = y then true, otherwise false
1 for i = n − 1 downto 0 step −1 do
2 if xi , yi then
3 return false
4 end
5 end
6 return true
Algorithm 19: An algorithm for equality comparison between base-b integers.

• By including the mstep instruction, the MIPS-X ISA exposes details of the implementation and so fixes
how a given program should compute r = y · x. If, in contrast, it had a mul instruction with obvious
semantics, then any given implementation of the processor could opt for an iterative or combinatorial
multiplier while maintaining compatibility.

• At least for simple processors, one instruction is executed at a time: for n-bit x and y, this means the
processor will be kept busy for n cycles while executing n mstep instruction. With a dedicated multiplier,
however, one could at least imagine the processor doing something else in the n cycles while the multiplier
is kept busy.

3.6 Components for comparison

As with the 1-bit building blocks for addition (namely the half- and full-adder), we already covered designs
for 1-bit equality and less than comparators in Chapter 2; these require a handful of logic gates to implement.
Again as with addition, the challenge is essentially how we extend these somehow. The idea of this Section is
to tackle this step-by-step: first we consider designs for comparison of unsigned yet larger, n-bit x and y, then
we extend these designs to cope with signed x and y, and finally consider how to support a suite of comparison
operations beyond just equality and less than.

git # 8b6da880 @ 2023-09-27 177


© Daniel Page ⟨dan@phoo.org⟩

x0 = x0
y0 y0 ,

x1 = x1
y1 y1 ,
r r
= ,

xn−1 = xn−1
yn−1 yn−1 ,

(a) An AND plus equality comparator based design. (b) An OR plus non-equality comparator based design.

Figure 3.15: An n-bit, unsigned equality comparison described using a circuit diagram.
xn−1
yn−1

xn−1
yn−1

x1
y1

x1
y1

x0
y0
< = < = < = <

Figure 3.16: An n-bit, unsigned less than comparison described using a circuit diagram.

3.6.1 Unsigned comparison


3.6.1.1 Unsigned equality
Example 3.46. Consider two cases of comparison between unsigned x and y expressed in base-10

x = 123(10) x = 121(10)
y = 123(10) y = 123(10)

where, obviously, x = y in the left-hand case, and x , y in the right-hand case.

More formally, x and y are equal iff. each digit of x is equal to the corresponding digit of y, so xi = yi for
0 ≤ i < n. As such, in the left-hand case x = y because xi = yi for 0 ≤ i < 3; in the right-hand case x , y because
xi , yi for i = 0. This fact is true in any base, and in base-2 we have a component that can perform the 1-bit
comparison xi = yi : to cope with larger x and y, we just combine instances of it together.
Read out loud, “if x0 equals y0 and x1 equals y1 and ... xn−1 equals yn−1 then x equals y, otherwise x does
not equal y” highlights the basic strategy: each i-th of n instances of a 1-bit equality comparator will compare
xi and yi , then we AND together the results. However, we need to take care wrt. then gate count: by looking
at the truth table
xi yi xi , yi xi = yi
0 0 0 1
0 1 1 0
1 0 1 0
1 1 0 1
it should be clear that the former (inequality) is simply an XOR gate, whereas the latter (equality) needs an
XOR and a NOT gate to implement directly. So we could either

1. use a dedicated XNOR gate whose cost is roughly the same as XOR given that

x ⊕ y ≡ (x ∧ ¬y) ∨ (¬x ∧ y)

and
x ⊕ y ≡ (¬x ∧ ¬y) ∨ (x ∧ y),
or

2. compute x =u y ≡ ¬(x ,u y) instead, i.e., test whether x is not equal to y, then invert the result.

git # 8b6da880 @ 2023-09-27 178


© Daniel Page ⟨dan@phoo.org⟩

Input: Two unsigned, n-digit, base-b integers x and y


Output: If x < y then true, otherwise false
1 for i = n − 1 downto 0 step −1 do
2 if xi < yi then
3 return true
4 end
5 else if xi > yi then
6 return false
7 end
8 end
9 return false
Algorithm 20: An algorithm for less than comparison between base-b integers.

Both designs are illustrated in Figure 3.15: it is important to see that both compute the same result, but use a
different internal design motivated loosely by the standard cell library available (i.e., what gate types we can
use and their relative efficiency in time and space).

3.6.1.2 Unsigned less than


Example 3.47. Consider three cases of comparison between unsigned x and y expressed in base-10

x = 121(10) x = 323(10) x = 123(10)


y = 123(10) y = 123(10) y = 123(10)

where, obviously, x < y in the left-hand case, x > y in the middle case, and x = y in the right-hand case.
Although the examples offer intuitively obvious results, determining why, in a formal sense, x is less than y (or
not) is more involved than the case of equality. A somewhat algorithmic strategy is as follows: work from the
most-significant, left-most digits (i.e., xn−1 and yn−1 ) towards the least-significant, right-most digits (i.e., x0 and
y0 ) and at each i-th step, apply a set of rules that say

1. if xi < yi then x < y,


2. if xi > yi then x > y, but
3. if xi = yi then we need to check the rest of x and y, i.e., move on to look at xi−1 and yi−1 .

This can be used to explain the example:

• in the left-hand case we find xi = yi for i = 2 and i = 1 but x0 = 1 < 3 = y0 and conclude x < y,
• in the middle case, when i = 2, we find x2 = 3 > 1 = y2 and conclude x > y, while
• in the left-hand case, we find xi = yi for all i and conclude x = y.

Figure 20 captures this more formally: we described, a loop iterates from the most- to least-significant digits of
x and y, and at each i-th step applies the rules above. That is, if xi < yi then x < y and if xi > yi then x ≮ y; if
xi = yi then the loop continues iterating, dealing with the next (i − 1)-th step until it has processed all the digits.
Notice that if the loop actually concludes, then we know that xi = yi for all i and so x ≮ y.
Of course when x and y are written in base-2, our task is easier still because each xi , yi ∈ {0, 1}; this means
we can use our existing 1-bit comparators. As such, translating the algorithm into a concrete design means
reformulating more directly it wrt. said comparators. The idea is to recursively compute
t0 = (x0 < y0 )
ti = (xi < yi ) ∨ ((xi = yi ) ∧ ti−1 )

which matches our less formal rules above: at each i-th step, “x is less than y if xi < yi or xi = yi and comparing
the rest of x is less than the rest of y”. Each step simply requires one of each comparator plus an extra AND and
an extra OR gate; if we have n-bit x and y, we have n such steps as illustrated in Figure 3.16.
Example 3.48. Consider less than comparison for n = 4 bit x and y, st. the unwound recursion

t0 = (x0 < y0 )
t1 = (x1 < y1 ) ∨ ((x1 = y1 ) ∧ t0 )
t2 = (x2 < y2 ) ∨ ((x2 = y2 ) ∧ t1 )
t3 = (x3 < y3 ) ∨ ((x3 = y3 ) ∧ t2 )

git # 8b6da880 @ 2023-09-27 179


© Daniel Page ⟨dan@phoo.org⟩

yields a result t3 . For x = 5(10) 7→ 0101(2) and y = 7(10) 7→ 0111(2) we can see that

x0 < y0 = false x0 = y0 = true


x1 < y1 = true x1 = y1 = false
x2 < y2 = false x2 = y2 = true
x3 < y3 = false x3 = y3 = true
so
t0 = (x0 < y0 )
= false

t1 = (x1 < y1 ) ∨ ((x1 = y1 ) ∧ t0 )


= true ∨ false
= true

t2 = (x2 < y2 ) ∨ ((x2 = y2 ) ∧ t1 )


= false ∨ true
= true

t3 = (x3 < y3 ) ∨ ((x3 = y3 ) ∧ t2 )


= false ∨ true
= true
and, since t3 = true, conclude that x <u y as expected.

3.6.2 Signed comparison


Signed and unsigned equality comparison are equivalent, meaning we can use the unsigned comparison above
in both cases. To see why, note the unsigned comparison we formulated tests whether each xi is the same as yi
for 0 ≤ i < n. For signed x and y we do exactly the same thing: if xi differs from yi , then the value x represents
will differ from the value y represents irrespective of whether the representation is signed or unsigned.
However, signed less than comparison is not as simple. To produce the behaviour required, we use unsigned
less than as a sub-component within a design for signed less than: for x <s y the rules

x +ve y -ve 7→ x ≮s y
x -ve y +ve 7 → x <s y
x +ve y +ve 7→ x <s y if abs(x) <u abs(y)
x -ve y -ve 7→ x <s y if abs(y) <u abs(x)

produce the result we want. The first two cases are obvious: if x is positive and y is negative it cannot ever be
true that x < y, while if x is negative and y is positive it is always true that x < y. The other two cases need more
explanation, but basically the idea is to consider the magnitudes of x and y only by computing then comparing
abs(x) and abs(y), the absolute values of x and y. Note that in the case where x and y are both negative the
order of comparison is flipped. This is because a larger negative x will be less than a smaller negative y (and
vice versa); when considering their absolute values, the comparison is therefore reversed.
Example 3.49. set n = 4:
1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) and y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , then x ≮s y since x is +ve and y is -ve,
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) and y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , then x ≮s y since x is +ve and y is -ve,
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) and y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , then x <s y since x is -ve and y is +ve, and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) and y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , then x ≮s y since x is -ve and y is +ve.
Example 3.50. 1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) and y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , then x <s y since x is +ve and y is
+ve and abs(x) = 4 <u 6 = abs(y),
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) and y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , then x ≮s y since x is +ve and y is +ve and
abs(x) = 6 ≮u 4 = abs(y),
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) and y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , then x ≮s y since x is -ve and y is -ve and
abs(y) = 6 ≮u 4 = abs(x), and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) and y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , then x <s y since x is -ve and y is -ve and
abs(y) = 4 <u 6 = abs(x).

git # 8b6da880 @ 2023-09-27 180


© Daniel Page ⟨dan@phoo.org⟩

Since x and y are representing using two’s-complement, we can make a slight improvement by rewrite the
rules more simply as

x +ve y -ve 7→ x ≮s y
x -ve y +ve 7 → x <s y
x +ve y +ve 7 → x <s y if chop(x) <u chop(y)
x -ve y -ve 7→ x <s y if chop(x) <u chop(y)

where chop(x) = xn−2...0 , meaning chop(x) is x with the MSB (which determines the sign of x) removed; this
is valid because a small negative integer becomes a large positive integer (and vice versa) when the MSB is
removed. Doing so is much simpler than computing abs(x), because we just truncate or ignore the MSBs.
Example 3.51. 1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , x <s y since x is +ve and y is +ve and
chop(x) = 4 <u 6 = chop(y),
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , x ≮s y since x is +ve and y is +ve and chop(x) = 6 ≮u
4 = chop(y),
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , x ≮s y since x is -ve and y is -ve and chop(x) = 4 ≮u
2 = chop(y), and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , x <s y since x is -ve and y is -ve and chop(x) = 2 <u
4 = chop(y).
The question is, finally, how do we implement these rules as a design? As in the case of overflow detection, we
use the fact that testing the sign of x or y is trivial. As a result, we can write


 false if ¬xn−1 ∧ yn−1
xs< y = 

true if xn−1 ∧ ¬yn−1


 chop(x)u chop(y) otherwise

<

which can be realised by a multiplexer: producing the LHS just amounts to selecting an option from the RHS
using xn−1 and yn−1 , i.e., the sign of x and y, as control signals.

3.6.3 Beyond equality and less than


Once we have components for equality and less than comparison, whether they are signed or unsigned, all
other comparisons can be derived using a set of identities. For example, one can easily verify that

x, y ≡ ¬(x = y)
x≤ y ≡ (x < y) ∨ (x = y)
x≥ y ≡ ¬(x < y)
x> y ≡ ¬(x < y) ∧ ¬(x = y)

meaning the result of all six comparisons between x and y on the LHS can easily be realised using just

• one component for x = y,


• one component for x < y, and
• four (two NOT, and OR and an AND) extra logic gates

rather than instantiating additional, dedicated components.

References
[1] ARM7TDMI Technical Reference Manual. Tech. rep. DDI-0210C. ARM Ltd., 2004. url: http://infocenter.
arm.com/help/topic/com.arm.doc.ddi0210c/index.html (see p. 176).
[2] A.D. Booth. “A Signed Binary Multiplication Technique”. In: Quarterly Journal of Mechanics and Applied
Mathematics 4.2 (1951), pp. 236–240 (see p. 165).
[3] P. Chow. MIPS-X Instruction Set And Programmer’s Manual. Tech. rep. CSL-86-289. Computer Systems
Laboratory, Stanford University, 1998 (see p. 176).
[4] Cortex-M0 Technical Reference Manual. Tech. rep. DDI-0432C. ARM Ltd., 2009. url: http://infocenter.
arm.com/help/topic/com.arm.doc.ddi0432c/index.html (see p. 176).

git # 8b6da880 @ 2023-09-27 181


© Daniel Page ⟨dan@phoo.org⟩

[5] L. Dadda. “Some Schemes for Parallel Multipliers”. In: Alta Frequenza 34 (1965), pp. 349–356 (see p. 172).
[6] J. Heinrich. MIPS R4000 Microprocessor User’s Manual. 2nd. 1994 (see p. 176).
[7] W.G. Horner. “A new method of solving numerical equations of all orders, by continuous approximation”.
In: Philosophical Transactions (1819), pp. 308–335 (see p. 161).
[8] A. Karatsuba and Y. Ofman. “Multiplication of Many-Digital Numbers by Automatic Computers”. In:
Physics-Doklady 7 (1963), pp. 595–596 (see p. 160).
[9] J. von Neumann. First Draft of a Report on the EDVAC. Tech. rep. 1945 (see p. 135).
[10] B. Parhami. Computer Arithmetic: Algorithms and Hardware Designs. 1st ed. Oxford University Press, 2000
(see pp. 136, 154).
[11] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice Hall, 2012 (see p. 137).
[12] P.A.V. Thomas and N. Balatoni. “A hardware multiplier/divider for the PDP 8S computer”. In: Behavior
Research Methods & Instrumentation 3.2 (1971), pp. 89–91 (see p. 176).
[13] C.S. Wallace. “A Suggestion for Fast Multipliers”. In: IEEE Transactions on Computers 13.1 (1964), pp. 14–17
(see p. 172).

git # 8b6da880 @ 2023-09-27 182


© Daniel Page ⟨dan@phoo.org⟩

CHAPTER

4
BASICS OF MEMORY TECHNOLOGY

4.1 Introduction
1. one or more channels, each backed by
2. one or more physical banks, each composed from
3. one or more devices, each composed from

4. one or more logical banks, of


5. one or more arrays, of
6. many cells

4.2 Memory cells


4.2.1 Static RAM (SRAM) cells
4.2.2 Dynamic RAM (DRAM) cells
4.2.3 ROM cells

4.3 Memory cells { devices


4.3.1 Static RAM (SRAM) devices
4.3.2 Dynamic RAM (DRAM) devices
4.3.3 ROM devices

4.4 Memory devices { modules

git # 8b6da880 @ 2023-09-27 183


© Daniel Page ⟨dan@phoo.org⟩

wl wl

Vdd
Q ¬Q

¬bl ¬bl

bl bl

Vss
Q ¬Q

Figure 4.1: ...

wl

bl Vss

Figure 4.2: ...

bit line conditioning

wl wl wl wl
bl bl bl bl
SRAM SRAM SRAM SRAM

wl wl wl wl
row decode

bl bl bl bl
SRAM SRAM SRAM SRAM
WE
An0 −1 . . . A0
wl wl wl wl
bl bl bl bl OE
SRAM SRAM SRAM SRAM

CS
wl wl wl wl
bl bl bl bl
SRAM SRAM SRAM SRAM

sense amplifiers

column decode

Figure 4.3: ...

bit line conditioning

wl wl wl wl
bl bl bl bl
RAS DRAM DRAM DRAM DRAM

wl wl wl wl
row decode
row buffer

bl bl bl bl
DRAM DRAM DRAM DRAM
A n0 −1 . . . A0 WE
2
wl wl wl wl
bl bl bl bl
DRAM DRAM DRAM DRAM
OE

wl wl wl wl CS
bl bl bl bl
CAS DRAM DRAM DRAM DRAM
column buffer

sense amplifiers

column decode

Figure 4.4: ...

git # 8b6da880 @ 2023-09-27 184


© Daniel Page ⟨dan@phoo.org⟩

bit line conditioning

Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss

row decode
Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss

An0 −1 . . . A0
OE
Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss

CS

Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss

sense amplifiers

column decode

Figure 4.5: ...

bit line conditioning bit line conditioning

wl wl wl wl wl wl wl wl
bl bl bl bl bl bl bl bl
SRAM SRAM SRAM SRAM DRAM DRAM DRAM DRAM

wl wl wl wl wl wl wl wl
row decode

row decode
row buffer

bl bl bl bl bl bl bl bl
SRAM SRAM SRAM SRAM WE RAS DRAM DRAM DRAM DRAM WE

OE A n20 −1 . . . A0
wl wl wl wl wl wl wl wl
An0 −1 . . . A0 r bl
SRAM
bl
SRAM
bl
SRAM
bl
SRAM
r bl
DRAM
bl
DRAM
bl
DRAM
bl
DRAM
OE
wl wl wl wl wl wl wl wl
bl bl bl bl CS bl bl bl bl CS
SRAM SRAM SRAM SRAM CAS DRAM DRAM DRAM DRAM
column buffer

sense amplifiers sense amplifiers

column decode column decode

w w
c c

Dw−1 . . . D0 Dw−1 . . . D0

Figure 4.6: ...

Bank #0
Bank #0
Bank #1
WE WE WE
A Bank #0 OE A OE A OE
CS CS CS
Bank #2
Bank #1
Bank #3

D D D

Figure 4.7: ...

git # 8b6da880 @ 2023-09-27 185


© Daniel Page ⟨dan@phoo.org⟩

x x x 1Mbit × 8 × 1 SDRAM

x MEM7...0
1Mbit × 16 × 1 SDRAM 1Mbit × 8 × 1 SDRAM
1Mbit × 32 × 1 SDRAM x MEM15...0 x MEM15...8
x MEM31...0 1Mbit × 8 × 1 SDRAM
1Mbit × 16 × 1 SDRAM
x MEM23...16
x MEM31...16
1Mbit × 8 × 1 SDRAM
MEM MEM MEM
x MEM31...24

Figure 4.8: ...

git # 8b6da880 @ 2023-09-27 186


© Daniel Page ⟨dan@phoo.org⟩

CHAPTER

5
COMPUTATIONAL MACHINES: FINITE
STATE MACHINES (FSMS)

5.1 State machines: from simple to more complex control-paths


The topic of automaton, specifically Finite State Machines (FSMs), has a very formal basis; basically they are
models of computation, not too far from topics such as Turing Machines (TMs). Put another way, you can think
of an FSM as a computer, albeit a simple one.
The control-path in Figure 2.37 is very simple: this is partly an artefact of the problem at hand of course,
but masks the difficulty of dealing with more complicated problems. FSMs represent an attractive solution
however, allowing us to reason about and implement more complicated, general-purpose control-paths.

5.1.1 A rough overview of FSM-related theory


Definition 5.1. An FSM is a (theoretical) machine that can be in a finite set of states. The machine consumes input
symbols from an alphabet (which defines which symbols are valid and so on) one at a time; symbols make the machine
transition from one state to another according to a transition function. When the input is exhausted, the machine
halts; depending on the state it halts in, the machine is said to accept or reject the input. The set of inputs accepted by
the machine is termed the language accepted; this can be used to classify the machine itself.
Definition 5.2. Based on the fact that
1. entry actions happen when entering a given state,
2. exit actions happen when exiting a given state,
3. input actions happen based on the state and any input received, and
4. transition actions happen when a given transition between states is performed
we can categorise an FSM based on output behaviour:

1. a Moore-style FSM only uses entry actions, i.e., the output depends on the state only, while
2. a Mealy-style FSM only uses input actions, i.e., the output depends on the state and the input.

An alternative classification relates to transition behaviour, where an FSM is deemed

1. deterministic if for each state there is always one transition for each possible input (i.e., we always know what the
next state should be), or
2. non-deterministic if for each state there might be zero, one or more transitions for each possible input (i.e., we
only know what the next state could be).

git # 8b6da880 @ 2023-09-27 187


© Daniel Page ⟨dan@phoo.org⟩

δ Xi = 0 Xi = 0
Q Q′ Xi = 1
Xi = 0 Xi = 1
start Seven Sodd
Seven Seven Sodd
Sodd Sodd Seven Xi = 1
(a) A tabular description. (b) A diagrammatic description.

Figure 5.1: An example FSM to decide whether there is an odd number of 1 elements in some sequence X.


Xi = 10
δ
Xi = 20 Xi = 20
Q Q′ start S0 S20 S⊥ Xi = 20
Xi = 10 Xi = 20 Xi = 10
Xi = 10
S0 S10 S20
Xi = 10 Xi = 10 Xi = 20
S10 S20 S30
S20 S30 S⊥ S10 S30
Xi = 20
S30 S⊥ S⊥
(a) A tabular description. 
(b) A diagrammatic description.

Figure 5.2: An example FSM modelling a simple vending machine.

Definition 5.3. A given FSM can be defined via the following:

1. S, a finite set of states and a distinguished start state s ∈ S.

2. A ⊆ S, a finite set of accepting states.

3. An input alphabet Σ and output alphabet Γ.

4. A transition function
δ : S × Σ → S.

5. An output function
ω:S→Γ

in the case of a Moore FSM, or


ω:S×Σ→Γ

in the case of a Mealy FSM.

Note that:

• The FSM itself might be enough to solve a given problem, but it is common to control an associated data-path using
the outputs.

• A special “empty” (or null) input denoted ϵ allows a transition which can always occur.

• It is common to allow δ to be a partial function, i.e., a function which is not defined for all inputs.

• If the FSM is non-deterministic, then δ might instead give a set of possibilities that is sampled from.

More simply, you can think of an FSM as a directed graph where moving between nodes (which represent
each state) means consuming the input on the corresponding edge. Some examples should show that the fairly
formal description above translates into a much more manageable reality.

git # 8b6da880 @ 2023-09-27 188


© Daniel Page ⟨dan@phoo.org⟩

5.1.1.1 Example #1: even or odd number of 0 elements


Imagine we are tasked with designing an FSM that decides whether a binary sequence X has an odd number
1 elements in it (i.e., it computes the parity of X). The input alphabet in this case is

Σ = {0, 1}

since each Xi can either be 0 or 1. The FSM can clearly be in two states: having consumed the input so far, it
can either have seen an even or odd number of 1 elements. Therefore we can say

S = {Seven , Sodd },

have s = Seven as the starting state, and let A = {Sodd } be the (singleton) set of accepting states. There is no output
as such, so in this case both the and output alphabet Γ and output function ω are irrelevant.
Our final task is to define the transition function. Figure 5.1 includes a tabular and a diagrammatic
description of the same thing. The tabular, truth table style description is easier to discuss. The idea is that
it lists the current state (left-hand side), alongside the next state for each possible input (right-hand side). In
words, the rows read as follows:

• if we are in state Seven and the input Xi = 0 then we stay in state Seven ,

• if we are in state Seven and the input Xi = 1 then we move to state Sodd ,

• if we are in state Sodd and the input Xi = 0 then we stay in state Sodd , and

• if we are in state Sodd and the input Xi = 1 then we move to state Seven .

The intuition is, for example and with a similar argument possible for the state Sodd , that if we are in state Seven
(i.e., have seen an even number of 1 elements so far) and the next input is 1, then we have now seen an odd
number of 1 elements so move to state Sodd . Conversely, if we are in state Seven (i.e., have seen an even number
of 1 elements so far) and the next input is 0, then we have still seen an even number of 1 elements so stay in
state Seven .
Consider some examples of the FSM in operation

1. For the input X = ⟨1, 0, 1, 1⟩ the transitions are

X0 =1 X1 =0 X2 =1 X3 =1
{ Seven { Sodd { Sodd { Seven { Sodd

meaning we start in state Seven then

(a) move to Sodd since X0 = 1,


(b) stay in Sodd since X1 = 0,
(c) move to Seven since X2 = 1, and finally
(d) move to Sodd since X3 = 1

Since we finish in state Sodd , the input is accepted and hence we conclude it has an odd number of 1
elements.

2. For the input X = ⟨0, 1, 1, 0⟩ the transitions are

X0 =0 X1 =1 X2 =1 X3 =0
{ Seven { Seven { Sodd { Seven { Seven

meaning we start in state Seven then

(a) stay in Seven since X0 = 0,


(b) move to Sodd since X1 = 1,
(c) move to Seven since X2 = 1, and finally
(d) stay in Seven since X3 = 0.

Since we finish in state Seven , the input is rejected and hence we conclude it has an even number of 1
elements.

git # 8b6da880 @ 2023-09-27 189


© Daniel Page ⟨dan@phoo.org⟩

5.1.1.2 Example #2: a vending machine

Imagine we are tasked with designing an FSM that controls a vending machine. The machine accepts tokens
worth 10 or 20 units: when the total value of tokens entered reaches 30 units it delivers a chocolate bar but
it does not give change. That is, the exact amount must be entered otherwise an error occurs, all tokens are
ejected and we start afresh.
The design is clearly a little more complex this time. The input alphabet is basically just the tokens that the
machine can accept, so we have
Σ = {10, 20}.
The set of states the machine can be in is easy to enumerate: it can either have accepted tokens totalling 0, 10,
20 or 30 units in it or be in the error state which we denote by ⊥. Thus, we can say

S = {S⊥ , S0 , S10 , S20 , S30 }

and clearly set s = S0 since initially the machine has accepted no tokens. There is one accepting state, which is
when a total of 30 tokens has been accepted, so A = {S30 }. Since there is again no output, our final task is again
to define the transition function. As before, Figure 5.2 outlines a tabular and diagrammatic description.

1. For the input X = ⟨10, 20⟩ the transitions are

X0 =10 X1 =20
{ S0 { S10 { S30

meaning we start in state S0 then

(a) move to S10 since X0 = 10, and finally


(b) move to S30 since X3 = 30.

Since we finish in state S30 , the input is accepted and we get a chocolate bar as output!

2. For the input X = ⟨20, 20⟩ the transitions are

X0 =20 X1 =20
{ S0 { S20 { S⊥

meaning we start in state S0 then

(a) move to S20 since X0 = 20, and finally


(b) move to S⊥ since X3 = 20.

Since we finish in state S⊥ , the error state, the input is rejected and the tokens are returned.

Note that the input marked ϵ is the empty input; that is, with no input we can move between the accepting or
error states back into the start state thus resetting the machine. So for example, once we accept or reject the
input we might assume the machine returns to state S0 .

5.1.2 Practical implementation of FSMs in hardware


Based on the formal definition above, Figure 5.3 illustrates a general framework into which we can place
concrete implementations of the component parts in a specific FSM. It is crucial to notice that when drawn as
a diagram like this, we can have

1. the state implemented by register (i.e., a group of latches or flip-flops), and

2. the δ and ω functions implemented using combinatorial logic only: they are functions of the current state
and any input.

The behaviour of the framework is illustrated by Figure 5.4. The idea is that within a given current clock cycle

1. ω computes the output from the current state and input, and

2. δ computes the next state from the current state and input

git # 8b6da880 @ 2023-09-27 190


© Daniel Page ⟨dan@phoo.org⟩

Latch based
register(s) Φ2

δ Q0
Q0 Q0
Q δ

Flip-flop based
Input Clock Q
register(s)

Latch based
Q Input
register(s) Φ1

ω Output Q

(a) Using a 1-phase clock. ω Output

(b) Using a 2-phase clock.

Figure 5.3: Two generic FSM frameworks (for different clocking strategies) into which one can place implementations of
the state, δ (the transition function) and ω (the output function).

such that the next state is latched by the positive clock edge marking the next clock cycle. So we have a period
of computation in which ω and δ operate, then an update triggered by a positive clock edge which steps the
FSM from the current state into the next state. What results is a series of steps, under control of the clock, each
performing some computation. As such, it should be clear that the clock frequency determines how quickly
computation occurs; it has to be fast enough to to satisfy the design goals, yet slow enough to cope with the
critical path of a given step of computation. That is, the faster the clock oscillates the faster we step though the
computation, but if it is too fast we cannot finish one step before the next one starts.

To summarise, this is a framework for a computer we can build: we know how each of the components
function, and can reason about their behaviour from the transistor-level upward. To solve a concrete problem
using the framework, we follow a (fairly) standard sequence of steps:

1. Count the number of states required, and give each state an abstract label.
2. Describe the state transition and output functions using a tabular or diagrammatic approach.
3. Decide how the states will be represented, i.e., assign concrete values to the abstract labels, and allocate
a large enough register to hold the state.
4. Express the functions δ and ω as (optimised) Boolean expressions, i.e., combinatorial logic.
5. Place the registers and combinatorial logic into the framework.

Versus a theoretical alternative, it is less common for a hardware-based FSM to have have an accepting states
since we cannot usually halt the circuit (without turning it off); we might include idle or error states to cope. In
addition, and although the framework does not show it, it is common to have a reset input that (re)initialises
the FSM into the start state. For one thing, this avoids the need to turn the FSM off then on again to reset it!

Example #1: an ascending modulo 6 counter Imagine we are tasked with designing an FSM that acts as a
cyclic counter modulo n (rather than 2n as before). If n = 6 for example, we want a component whose output r
steps through values
0, 1, 2, 3, 4, 5, 0, 1, . . . ,
with the modular reduction representing control behaviour (versus the uncontrolled counter that was cyclic
by default). In this case it is clear the FSM can be in one of 6 states (since the counter value is is one of
0, 1, . . . , 5), which we label S0 , S1 , . . . , S5 . Figure 5.5 includes tabular and diagrammatic descriptions of the
transition function, both of which are a little dull: they simply move from one state to the next (with the ϵ
meaning no input is required), cycling from S5 back to S0 .

git # 8b6da880 @ 2023-09-27 191


Figure 5.4: Two illustrative waveforms (for different clocking strategies), outlining stages of computation within the

192
···
···

···
···
···

(b) Using a 2-phase clock.


(a) Using a 1-phase clock.
input latches
flip-flops store
store Q ← Q0
Q ← Q0
compute
Q0 = δ(Q, X1 ) output latches
compute
Y1 = ω(Q, X1 ) store Q0
flip-flops store Q0 = δ(Q, X1 )
Q ← Q0 Y1 = ω(Q, X1 )
compute input latches
Q0 = δ(Q, X0 ) store Q ← Q0
Y0 = ω(Q, X0 )
flip-flops reset
output latches
Q to start state compute
store Q0
Q0 = δ(Q, X0 )
Y0 = ω(Q, X0 )
input latches
reset Q to start state
© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27


Φ1

Φ2

associated FSM framework.


© Daniel Page ⟨dan@phoo.org⟩

An aside: binary versus one-hot encodings.

The fact that state assignment occurs quite late in the design of a given FSM is intentional: it allows us to
optimise the representation based on what we do with it. So far, we have used a natural, binary encoding to
represent the i-th of n states as a (⌈log2 (n)⌉)-bit unsigned integer i. For example, if n = 6 we use

S0 7→ ⟨0, 0, 0⟩
S1 7 → ⟨1, 0, 0⟩
S2 7 → ⟨0, 1, 0⟩
S3 7 → ⟨1, 1, 0⟩
S4 7 → ⟨0, 0, 1⟩
S5 7 → ⟨1, 0, 1⟩

This is not the only option, however.


A one-hot encoding represents the i-th of n states as a sequence X st. Xi = 1 and X j = 0 for j , i. For
example, if n = 6 again, then we use
S0 7→ ⟨1, 0, 0, 0, 0, 0⟩
S1 7→ ⟨0, 1, 0, 0, 0, 0⟩
S2 7→ ⟨0, 0, 1, 0, 0, 0⟩
S3 7→ ⟨0, 0, 0, 1, 0, 0⟩
S4 7→ ⟨0, 0, 0, 0, 1, 0⟩
S5 7→ ⟨0, 0, 0, 0, 0, 1⟩
meaning that for S0 , the 0-th bit is 1 and all others are 0. On one hand, and depending on n, this might mean
we need more flip-flops to store the state (i.e., n instead of ⌈log2 (n)⌉). On the other hand, we potentially get two
advantages, namely

1. transition between states is easier (we simply rotate any given encoding by the right distance to get
another), and
2. switching behaviour (and hence power consumption) is reduced since only two bits toggle for any change
(one from 1 to 0, and one from 0 to 1).

Clearly 23 = 8 > 6, so we can represent the current state using a 3-bit integer Q = ⟨Q0 , Q1 , Q2 ⟩. That is,

S0 7→ ⟨0, 0, 0⟩ ≡ 000(2)
S1 7 → ⟨1, 0, 0⟩ ≡ 001(2)
S2 7 → ⟨0, 1, 0⟩ ≡ 010(2)
S3 7 → ⟨1, 1, 0⟩ ≡ 011(2)
S4 7 → ⟨0, 0, 1⟩ ≡ 100(2)
S5 7 → ⟨1, 0, 1⟩ ≡ 101(2)

To implement the FSM, all we need to do is derive Boolean equations for the transition function δ so it can
compute the next state Q′ from Q; with this FSM there is no input, so δ is a function of the current state. To do
so, we first rewrite the tabular description of δ by replacing the abstract labels with concrete values. The result
is a truth table, i.e.,
δ ω
Q2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0
0 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 1
0 1 0 0 1 1 0 1 0
0 1 1 1 0 0 0 1 1
1 0 0 1 0 1 1 0 0
1 0 1 0 0 0 1 0 1
1 1 0 ? ? ? ? ? ?
1 1 1 ? ? ? ? ? ?

which encodes the same information. For example, if the current state is Q = ⟨0, 0, 0⟩ (i.e., we are in state S0 )
then the next state should be Q′ = ⟨1, 0, 0⟩ (i.e., state S1 ). Note that there are 2 unused states, namely ⟨0, 1, 1⟩

git # 8b6da880 @ 2023-09-27 193


© Daniel Page ⟨dan@phoo.org⟩

An aside: Moore vs. Mealy style FSMs.

When written symbolically, the motivation for using either a Moore or Mealy style FSMs may be unclear. When
the framework for implementing FSMs is taken into account, however, the issue should become more concrete:

• In a Moore FSM the output depends on the current state only, implying changes to any input are only
relevant when the state is updated; you can think of this as meaning the inputs are only relevant in
relation to the clock signal that triggers said update (i.e., they are only taken into account periodically,
rather than continuously).
• In contrast, a Mealy FSM allows the output to depend on the current state and any input. ω is a
combinatorial function, so this implies the output can change a) in relation to the clock signal as a result
up an update to the state, and/or b) at any time as a result of changes to the input. You could think of this as
meaning the FSM is more responsive, in the sense that although the state is updated at the same frequency
(i.e., in relation to the same features of the clock) the output can continuously, and instantaneously change
if/when the input changes.

Both are viable options, so it is not true that one is correct or incorrect. However, it is clearly important to
understand the (subtle) difference so an informed choice can be made within some specific context.

start

S0
δ ω 
Q Q′ r S1
S0 S1 0 
S1 S2 1 S2
S2 S3 2  
S3 S4 3 S3
S4 S5 4

S5 S0 5
S4
(a) A tabular description. 
S5

(b) A diagrammatic description.

Figure 5.5: An example FSM modelling an ascending modulo 6 counter.

and ⟨1, 1, 1⟩, which we include in the table: the next state in either of these cases does not matter since they are
invalid, so the entries are don’t care.
To summarise, we need to derive Boolean expressions for each of Q′2 , Q′1 and Q′0 in terms of Q2 , Q1 and Q0 .
This can be achieved by applying the Karnaugh map technique to get

Q1 Q1 Q1
Q0 Q0 Q0
Q02 Q01 Q00
00 01 11 10 00 01 11 10 00 01 11 10

0
0
0 1
0 5
1 4
0 0
0
0 1
1 5
0 4
1 0
0
1 1
0 5
0 4
1
Q2 1 1 0 ? ? Q2 1 0 0 ? ? Q2 1 1 0 ? ?
2 3 7 6 2 3 7 6 2 3 7 6

which produce
Q′2 = ( Q1 ∧ Q0 )∨
( Q2 ∧ ¬Q0 )

Q′1 = ( ¬Q2 ∧ ¬Q1 ∧ Q0 )∨


( Q1 ∧ ¬Q0 )

Q′0 = ( ¬Q0 )

git # 8b6da880 @ 2023-09-27 194


© Daniel Page ⟨dan@phoo.org⟩

start

S0
δ ω
d=1 d=0
Q Q′ r f
S1
d=0 d=1 d=0 d=1
d=1 d=0
S0 S1 S5 0 0 1
S1 S2 S0 1 0 0 S2
S2 S3 S1 2 0 0 d=1 d=1 d=0 d=0
S3 S4 S2 3 0 0 S3
S4 S5 S3 4 0 0 d=1 d=0
S5 S0 S4 5 1 0 S4
(a) A tabular description. d=1 d=0
S5

(b) A diagrammatic description.

Figure 5.6: An example FSM modelling an ascending or descending modulo 6 counter.

start

δ ω S0
Q Q′ M g M a M r A g Aa Ar rst = 0
rst = 0 rst = 1 rst = 0 S1
rst = 1
S0 S1 S6 1 0 0 0 0 1 rst = 0
S1 S2 S6 0 1 0 0 0 1 rst = 1
S2
S2 S3 S6 0 0 1 0 1 0 rst = 1
rst = 1 S6 rst = 0 rst = 0
S3 S4 S6 0 0 1 1 0 0 rst = 1
S3
S4 S5 S6 0 0 1 0 1 0 rst = 1
rst = 0
S5 S0 S6 0 1 0 0 0 1 rst = 1
S4
S6 S0 S6 0 0 1 0 0 1
rst = 0
(a) A tabular description. S5

(b) A diagrammatic description.

Figure 5.7: An example FSM modelling a traffic light controller.

git # 8b6da880 @ 2023-09-27 195


© Daniel Page ⟨dan@phoo.org⟩

Now we have enough to fill in the FSM framework: the state is simply a 3-bit register, δ is represented by
circuit analogues of the expressions above. Note that tn this case, the output function ω is trivial: the counter
output r = Q due to our state assignment, so in a sense ω is just the identity function.

Example #2: an ascending or descending modulo 6 counter No imagine we need to upgrade the previous
example: we are tasked with designing an FSM that again acts as a cyclic counter modulo n, but whose direction
can also be controlled. If n = 6 for example, we want a component whose output r steps through values

0, 1, 2, 3, 4, 5, 0, 1, . . .

or
0, 5, 4, 3, 2, 1, 0, 5, . . .
depending on some input d, plus has an output f to signal when the cycle occurs (i.e., when the current value
is last or first in the sequence, depending on d).
The possible states are the same as before: we still have 6 states, labelled S0 , S1 , . . . S6 . The difference is how
transitions between states occur; this is illustrated by Figure 5.6, in which the new tabular and diagrammatic
descriptions of the transition function are shown. Although it looks more complicated, we take exactly the
same approach as before: we start by rewriting the tabular description of δ by replacing the abstract labels with
concrete values to yield:
δ ω
d Q2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0 f
0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 1 0 0 0 1 0
0 0 1 0 0 1 1 0 1 0 0
0 0 1 1 1 0 0 0 1 1 0
0 1 0 0 1 0 1 1 0 0 0
0 1 0 1 0 0 0 1 0 1 1
0 1 1 0 ? ? ? ? ? ? ?
0 1 1 1 ? ? ? ? ? ? ?
1 0 0 0 1 0 1 0 0 0 1
1 0 0 1 0 0 0 0 0 1 0
1 0 1 0 0 0 1 0 1 0 0
1 0 1 1 0 1 0 0 1 1 0
1 1 0 0 0 1 1 1 0 0 0
1 1 0 1 1 0 0 1 0 1 0
1 1 1 0 ? ? ? ? ? ? ?
1 1 1 1 ? ? ? ? ? ? ?
The table is larger since we need to consider d as input as well as Q, but the process is the same: to compute
δ, we just need a set of appropriate Boolean expressions. So next we translate the truth table into a set of
Karnaugh maps
Q1 Q1 Q1
Q0 Q0 Q0
Q02 Q01 Q00
00 01 11 10 00 01 11 10 00 01 11 10

00
0
0 1
0 5
1 4
0 00
0
0 1
1 5
0 4
1 00
0
1 1
0 5
0 4
1
01
2
1 3
0 7
? 6
? 01
2
0 3
0 7
? 6
? 01
2
1 3
0 7
? 6
?
Q2 Q2 Q2
11
10
0 11
1 15
? 14
? 11
10
1 11
0 15
? 14
? 11
10
1 11
0 15
? 14
?
d d d
10
8
1 9
0 13
0 12
0 10
8
0 9
0 13
1 12
0 10
8
1 9
0 13
0 12
1

and finally produce


Q′2 = ( ¬d ∧ Q1 ∧ Q0 )∨
( ¬d ∧ Q2 ∧ ¬Q0 )∨
( d ∧ Q2 ∧ Q0 )∨
( d ∧ ¬Q2 ∧ ¬Q1 ∧ ¬Q0 )

Q′1 = ( ¬d ∧ ¬Q2 ∧ ¬Q1 ∧ Q0 )∨


( ¬d ∧ Q1 ∧ ¬Q0 )∨
( d ∧ Q2 ∧ ¬Q0 )∨
( d ∧ Q1 ∧ Q0 )

Q′0 = ( ¬Q0 )

git # 8b6da880 @ 2023-09-27 196


© Daniel Page ⟨dan@phoo.org⟩

This time however, we need to deal with ω more carefully: we can still generate the counter output trivially as
r = Q, but also need to compute f somehow. This is straight-forward of course, because using the truth table
we can write
Q1
Q0
f 00 01 11 10

00
0
0 1
0 5
0 4
0
01
2
0 3
1 7
? 6
?
Q2
11
10
0 11
0 15
? 14
?
d
10
8
1 9
0 13
0 12
0

and finally produce


f =( ¬d ∧ Q2 ∧ Q0 )∨
( d ∧ ¬Q2 ∧ ¬Q1 ∧ ¬Q0 )
which completes the design in the sense we have now specified all components of the framework.

Example #3: a traffic light controller Imagine we are tasked with designing a traffic light controller for two
roads (a main road and an access road) that intersect. The requirements are to

1. stop cars crashing into each other, so the behaviour should see
(a) green on main road and red on access road, then
(b) amber on main road and red on access road, then
(c) red on main road and amber on access road, then
(d) red on main road and green on access road, then
(e) red on main road and amber on access road, then
(f) amber on main road and red on access road,
and then cycle, and
2. allow an emergency stop button to force red on both main and access roads while pushed, then reset the
system into an initial start state when released.

First we need to take stock of the problem itself: there is basically one input (the emergency stop button,
denoted rst) and six outputs (namely the traffic light values, denoted M g , Ma and Mr for the main road and A g ,
Aa and Ar for the access road). Next we try to develop a precise description of the FSM behaviour. We need 7
states in total: S0 , S1 , . . . , S5 represent steps in the normal traffic light sequence, and S6 is an extra emergency
stop state. Figure 5.7 shows both tabular and diagrammatic descriptions of the transition function; in essence,
it is similar to the counter example (in the sense that it cycles from S0 through to S5 and back again) provided
rst = 0, but if rst = 1 in any state then we move to the S6 . As an aside however, it is important to see this
description represents one solution among several derived from what is (by design) an imprecise question. Put
another way, we have already made several choices. On example is the decision to use a separate emergency
stop state, and have the FSM enter this as the next state of any current state provided rst = 1; the red lights are
both forced on by virtue of being in the emergency stop state, rather than by rst per se. Another valid approach
might be to have ω depend on rst as well (rather than just Q, so it turns from a Moore-based into a Mealy-based
FSM) and forcing the red lights on as soon as rst = 1 and irrespective of what state the FSM is in. In some ways
this is arguably more attractive, in the sense that the emergency stop is instant: we no longer need to wait for
the next clock cycle when the next state is latched. Likewise, we have opted to make the first state listed in the
question (i.e., green on the main road and red on the access road) the initial state; since the sequence is cyclic
this choice seems a little arbitrary, so other choices (plus what state the FSM restarts in after an emergency stop)
might also seem reasonable.
Given our various choices however, we next follow standard practice by translating the description into an
implementation. Since 23 = 8 > 7 we can represent the current and next states via 3-bit integers Q = ⟨Q0 , Q1 , Q2 ⟩
and Q′ = ⟨Q′0 , Q′1 , Q′2 ⟩. where
S0 7→ ⟨0, 0, 0⟩ ≡ 000(2)
S1 7→ ⟨1, 0, 0⟩ ≡ 001(2)
S2 7→ ⟨0, 1, 0⟩ ≡ 010(2)
S3 7→ ⟨1, 1, 0⟩ ≡ 011(2)
S4 7→ ⟨0, 0, 1⟩ ≡ 100(2)
S5 7→ ⟨1, 0, 1⟩ ≡ 101(2)
S6 7→ ⟨0, 1, 1⟩ ≡ 110(2)

git # 8b6da880 @ 2023-09-27 197


© Daniel Page ⟨dan@phoo.org⟩

and we have one unused state (namely ⟨1, 1, 1⟩). As such, both input and output registers will be comprised
of three 1-bit storage components, in this case D-type latches. Now we have a concrete value for each abstract
state label, we can expand the tabular description of the FSM into a (lengthy) truth table:

δ ω
rst Q2 Q1 Q0 Q′2 Q′1 Q′0 Mg Ma Mr Ag Aa Ar
0 0 0 0 0 0 1 1 0 0 0 0 1
0 0 0 1 0 1 0 0 1 0 0 0 1
0 0 1 0 0 1 1 0 0 1 0 1 0
0 0 1 1 1 0 0 0 0 1 1 0 0
0 1 0 0 1 0 1 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1 0 0 0 1
0 1 1 0 0 0 0 0 0 1 0 0 1
0 1 1 1 ? ? ? ? ? ? ? ? ?
1 0 0 0 1 1 0 1 0 0 0 0 1
1 0 0 1 1 1 0 0 1 0 0 0 1
1 0 1 0 1 1 0 0 0 1 0 1 0
1 0 1 1 1 1 0 0 0 1 1 0 0
1 1 0 0 1 1 0 0 0 1 0 1 0
1 1 0 1 1 1 0 0 1 0 0 0 1
1 1 1 0 1 1 0 0 0 1 0 0 1
1 1 1 1 ? ? ? ? ? ? ? ? ?

Although this looks intimidating, the point is that

• the transition function δ is just three Boolean expressions, one for each Q′i , using rst, Q2 , Q1 and Q0 as
input,

• the output function ω is just six Boolean expressions, one for each Mi and A j , using rst, Q2 , Q1 and Q0 as
input.

So we just need to derive each expression. For δ, the Karnaugh maps


Q1 Q1 Q1
Q0 Q0 Q0
Q02 Q01 Q00
00 01 11 10 00 01 11 10 00 01 11 10

00
0
0 1
0 5
1 4
0 00
0
0 1
1 5
0 4
1 00
0
1 1
0 5
0 4
1
01
2
1 3
0 7
? 6
0 01
2
0 3
0 7
? 6
0 01
2
1 3
0 7
? 6
0
Q2 Q2 Q2
11
10
1 11
1 15
? 14
1 11
10
1 11
1 15
? 14
1 11
10
0 11
0 15
? 14
0
rst rst rst
10
8
1 9
1 13
1 12
1 10
8
1 9
1 13
1 12
1 10
8
0 9
0 13
0 12
0

can be used to produce


Q′2 = ( rst )∨
( Q2 ∧ ¬Q1 ∧ ¬Q0 )∨
( Q1 ∧ Q0 )

Q′1 = ( rst )∨
( ¬Q2 ∧ ¬Q1 ∧ Q0 )∨
( ¬Q2 ∧ Q1 ∧ ¬Q0 )

Q′0 = ( ¬rst ∧ ¬Q1 ∧ ¬Q0 )∨


( ¬rst ∧ ¬Q2 ∧ ¬Q0 )
Likewise for ω, we find
Q1 Q1 Q1
Q0 Q0 Q0
Mg 00 01 11 10 Ma 00 01 11 10 Mr 00 01 11 10

0
0
1 1
0 5
0 4
0 0
0
0 1
1 5
0 4
0 0
0
0 1
0 5
1 4
1
Q2 1 0 0 ? 0 Q2 1 0 1 ? 0 Q2 1 1 0 ? 1
2 3 7 6 2 3 7 6 2 3 7 6

Q1 Q1 Q1
Q0 Q0 Q0
Ag 00 01 11 10 Aa 00 01 11 10 Ar 00 01 11 10

0
0
0 1
0 5
1 4
0 0
0
0 1
0 5
0 4
1 0
0
1 1
1 5
0 4
0
Q2 1 0 0 ? 0 Q2 1 1 0 ? 0 Q2 1 0 1 ? 1
2 3 7 6 2 3 7 6 2 3 7 6

git # 8b6da880 @ 2023-09-27 198


© Daniel Page ⟨dan@phoo.org⟩

can be used to produce

Ag = ( Q1 ∧ Q0 )
M g = ( ¬Q2 ∧ ¬Q1 ∧ ¬Q0 ) Aa = ( ¬Q2 ∧ Q1 ∧ ¬Q0 )∨
Ma = ( ¬Q1 ∧ Q0 ) ( Q2 ∧ ¬Q1 ∧ ¬Q0 )
Mr = ( Q1 )∨ Ar = ( ¬Q2 ∧ ¬Q1 )∨
( Q2 ∧ ¬Q0 ) ( ¬Q1 ∧ Q0 )∨
( Q2 ∧ Q1 )

As before, these expressions can be used to fill in the FSM framework to yield a resulting design for the
controller.

git # 8b6da880 @ 2023-09-27 199


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 200


© Daniel Page ⟨dan@phoo.org⟩

Part II

Appendices

git # 8b6da880 @ 2023-09-27 201


© Daniel Page ⟨dan@phoo.org⟩

APPENDIX

A
EXAMPLE EXAM-STYLE QUESTIONS

A.1 Chapter 1
Q1. We studied representation of unsigned integers using a base-b positional number system. Which of the
following literals
A: 10101
B: 11111
C: 11120
D: 12200
E: 12345
represents the unsigned decimal integer 123(10) in base-3 (or ternary, digits in which are termed trits).

Q2. Imagine that two signed, 8-bit integers x and y are represented using two’s-complement and sign-magnitude
respectively, and both of which have the decimal value 51(10) . If the most-significant bit of both x and y is set
to 1, what are their new (decimal) values?
A: −77(10) and 179(10)
B: −77(10) and −51(10)
C: −51(10) and −77(10)
D: 179(10) and 179(10)
E: 179(10) and −51(10)

Q3. Imagine that two signed, 16-bit integers x and y are represented using two’s-complement; their product r = x · y
is a signed, 32-bit integer also represented using two’s-complement. What is the largest (i.e., whose magnitude
is greatest) negative value of r possible?
A: −0
B: −32768
C: −65535
D: −1073709056
E: −2147483648

Q4. Imagine you write a C program that defines signed, 16-bit integer variables x and y (of type short) and then
assigns them the decimal values 256(10) and 4852(10) respectively. If x and y are then cast into signed, 8-bit
integers (of type char), which of the following
A: 0 and 12
B: 0 and −12

git # 8b6da880 @ 2023-09-27 203


© Daniel Page ⟨dan@phoo.org⟩

C: −1 and 256
D: −1 and −52
E: 0 and 52
identifies their decimal value? Or, put another way, which are the result of evaluating the two expressions
( char )( x ) and ( char )( y )?

Q5. Consider two signed, 8-bit integer variables x and r (of type char) used in a C program. If x has the decimal
value 9(10) and an assignment
r = ( ~x << 4 ) | 0x97
is executed, what is the decimal value of r afterwards?
A: −9(10)
B: −1(10)
C: 0(10)
D: 1(10)
E: 9(10)

Q6. In general, some x is a fixed point of a function f if f (x) equals x, i.e., if f maps x to itself. Consider the following
function
int8_t abs( int8_t x ) {
int8_t r;

if( x >= 0 ) {
r = x;
}
else {
r = -x;
}

return r;
}

implemented in C: abs was written in an attempt to compute the absolute value of x, a signed, 8-bit integer
representing using two’s-complement. How many of the 28 = 256 possible values of x are fixed points of abs?
A: 0
B: 127
C: 128
D: 129
E: 256

Q7. Imagine that within a given C function, you declare signed, 8-bit integer variables (i.e., variables whose type is
int8_t) x and r. Assume C represents signed integers using two’s-complement, and the right-shift operator
yields arithmetic (rather than logical) shift: if x has the (decimal) value −10(10) , what (decimal) value does r
have after the assignment
r = ~( ( x >> 2 ) ^ 0xF4 )
is executed?
A: −10(10)
B: 10(10)
C: 11(10)
D: 54(10)
E: 203(10)

Q8. Consider two unsigned, 8-bit integer variables, x and y, as declared in some C function by using the type
uint8_t. For how many assignments to these variables will the Hamming weight of their unsigned, 8-bit
integer sum, i.e., x + y, be zero? Put another way, how many elements does the set

{(x, y) | HW(x + y) = 0}

have?
A: 0

git # 8b6da880 @ 2023-09-27 204


© Daniel Page ⟨dan@phoo.org⟩

B: 1
C: 255
D: 256
E: 65536

Q9. Consider a literal x̂ = 10, which represents a value x using a base-b positional number system. Based on this
information alone, which of the following values
A: x=1
B: x=2
C: x=b
D: x = 10
E: x = 16
is correct?

Q10. Assuming an n-bit x and use of two’s-complement representation for signed integers, which of the following
identities
A: x ∧ ¬x ≡ 0(10)
B: x ∨ ¬x ≡ −1(10)
C: x ⊕ ¬x ≡ −1(10)
D: x + ¬x ≡ −1(10)
E: x − ¬x ≡ −1(10)
is not correct?

Q11. a For the sets A = {1, 2, 3}, B = {3, 4, 5} and U = {1, 2, 3, 4, 5, 6, 7, 8}, compute the following:

i |A|.
ii A ∪ B.
iii A ∩ B.
iv A − B.
v A.
vi {x | 2 · x ∈ U}.

b For each of the following decimal integers, write down the 8-bit binary representation in sign-magnitude
and two’s-complement:

i +0.
ii −0.
iii +72.
iv −34.
v −8.
vi 240.

Q12. For some 32-bit integer x, explain what is meant by the Hamming weight of x; write a short C function to
compute the Hamming weight of a given 32-bit input.

Q13. From the following list


A: (x ∧ y) ⊕ z
B: (¬x ∨ y) ⊕ z
C: (x ∨ ¬y) ⊕ z
D: ¬(x ∨ y) ⊕ z
E: ¬¬(x ∨ y) ⊕ z
identify each Boolean expression that evaluates to 1 given the assignment x = 0, y = 0 and z = 1.

git # 8b6da880 @ 2023-09-27 205


© Daniel Page ⟨dan@phoo.org⟩

Q14. One of the following equivalences


A: (x ∧ y) ∧ z ≡ x ∧ (y ∧ z)
B: x∨1 ≡x
C: x ∨ ¬x ≡ 1
D: ¬(x ∨ y) ≡ ¬x ∧ ¬y
E: ¬¬x ≡ x
is incorrect: identify which.

Q15. The Boolean expression


(x ∨ (z ∨ y)) ∧ ¬(¬y ∧ ¬z)
is equivalent to which of the following alternatives?
A: y∨z
B: ((x ∨ z) ∨ y)) ∧ (x ∨ z)
C: (x ∧ y) ∨ (x ∧ z)
D: (x ∨ y) ∧ ¬(x ∨ z)
E: (x ∧ z) ∨ (x ∧ y)

Q16. The Boolean expression


(x ∨ y) ∨ (x ∧ z)
is equivalent to which of the following alternatives?
A: (x ∨ y) ∧ (x ∨ z)
B: (x ∨ y) ∧ z
C: (x ∨ y) ∧ (x ∧ z)
D: x∨y
E: (x ∧ y) ∨ x

Q17. A given set of Boolean operators may be termed functionally complete (or universal): this means any Boolean
function can be expressed using a Boolean expression involving elements of the set alone. For example, because
we know the NAND operator is functionally complete, we can also term the sets { ∧ } and {∧, ¬} functionally
complete. Noting that . and ⇏ denote the inverse of equivalence and implication respectively (i.e., not
equivalent, and does not imply), which of the following sets
A: {⊕, ∨}
B: {⇒, .}
C: {⇒, ⇏}
D: all of the above
E: none of the above
is/are functionally complete?

Q18. How many n-input, 1-output Boolean functions are there?


A: 1
B: n
C: 2n
n
D: 22 n
2
E: 22

Q19. Consider the equivalence


(y ∧ ¬x) ∨ (x ∧ ¬y) ≡ (x ∨ y) ∧ ¬(x ∧ y),
the LHS of which can be manipulated into the RHS by applying the following sequence of Boolean axioms:

identity { inverse { distribution { commutativity { distribution { commutativity { X

The final axiom is missing, i.e., replaced with X: which of the following options for X yields a valid derivation?
A: Absorption

git # 8b6da880 @ 2023-09-27 206


© Daniel Page ⟨dan@phoo.org⟩

B: Idempotency
C: Implication
D: Null
E: de Morgan

Q20. One of the following equivalences


A: (x ∧ y) ∧ z ≡ x ∧ (y ∧ z)
B: x ⇒ y ≡ ¬x ∨ y
C: x ∧ (x ∨ y) ≡ y
D: ¬(x ∨ y) ≡ ¬x ∧ ¬y
E: x∨0 ≡ x
is incorrect: identify which.

Q21. Identify which of the following Boolean expressions


A: x ∧ (x ∨ ¬x)
B: (x ∨ ¬x) ∧ x
C: x
D: ¬x
E: ¬((¬x ∨ ¬x) ∧ (¬x ∨ x))
is not equivalent to
x ∧ x ∨ x ∧ ¬x.

w
x
00 01 11 10
y
00 1 0 1 1
0 1 5 4
x
00 01 11 10
01 0 0 0 0
2 3 7 6
z
0 1 0 1 1 11 0 0 0 0
0 1 5 4 10 11 15 14
y
w 1 0 0 0 1 10 1 0 0 1
2 3 7 6 8 9 13 12

A B

w w
x x
00 01 11 10 00 01 11 10

00 1 0 1 1 00 1 0 1 1
0 1 5 4 0 1 5 4

01 0 0 0 1 01 0 0 0 1
2 3 7 6 2 3 7 6
z z
11 0 0 0 0 11 0 0 0 0
10 11 15 14 10 11 15 14
y y
10 1 0 0 1 10 1 0 0 1
8 9 13 12 8 9 13 12

C D

w
x
00 01 11 10

00 1 0 1 1
0 1 5 4

01 0 0 0 1
2 3 7 6
z
11 0 0 0 0
10 11 15 14
y
10 1 0 0 1
8 9 13 12

Figure A.1: A set of 5 different Karnaugh maps, captioned with an associated option.

git # 8b6da880 @ 2023-09-27 207


© Daniel Page ⟨dan@phoo.org⟩

Q22. Consider the following truth table, which describes a Boolean function f :

w x y z f (w, x, y, z)
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

Which of the Karnaugh maps shown in Figure A.1 will yield the most efficient (in terms of the number of
operators involved), correct Boolean expression for f ?

Q23. Identify which of the following Boolean expressions


A: x∧y
B: x∧z
C: y∧z
D: x∧y∧z
E: 1
is equivalent to
x ∧ y ∨ x ∧ y ∧ z.

Q24. Consider a Boolean function f with n = 1 input x. How many such functions are not idempotent, i.e., how
many f exist such that ∀x ∈ {0, 1}, f ( f (x)) = f (x) does not hold?
A: 0
B: 1
C: 2
D: 3
E: 4

Q25. Consider a Boolean function f with n = 2 inputs x and y. How many such functions are symmetric, i.e., how
many f exist such that ∀x, y ∈ {0, 1}, f (x, y) = f (y, x) holds?
A: 0
B: 1
C: 2
D: 8
E: 16

Q26. Which of the following Boolean expressions


A: x ∧ ¬x ∨ y ∧ (1 ∨ x)
B: 0∨x∧y∨y
C: x∧y
D: y
E: ¬((¬x ∨ (x ∧ ¬y)) ∧ ¬y)
is not equivalent to
x ∧ (¬x ∨ y) ∨ y.

git # 8b6da880 @ 2023-09-27 208


© Daniel Page ⟨dan@phoo.org⟩

Q27. Which of the following Boolean expressions


A: ¬x
B: x
C: ¬y
D: y
E: 0
is equivalent to
(x ∨ y) ∧ (x ∨ ¬y).

Q28. a Write out a truth table for the Boolean function

f (a, b, c) = (a ∧ b ∧ ¬c) ∨ (a ∧ ¬b ∧ c) ∨ (¬a ∧ ¬b ∧ c),

then decide how many

i input combinations, and


ii outputs where f (a, b, c) = 1

exist in it.
b Consider the Boolean function
f (a, b, c, d) = ¬a ∧ b ∧ ¬c ∧ d.
Which of the following assignments

i a = 0, b = 0, c = 0 and d = 1,
ii a = 0, b = 1, c = 0 and d = 1,
iii a = 1, b = 1, c = 1 and d = 1,
iv a = 0, b = 0, c = 1 and d = 0.

produces the output f (a, b, c, d) = 1?


c Which of the following Boolean expressions

i (a ∨ b ∨ d) ∧ (¬c ∨ d),
ii (a ∧ b ∧ d) ∨ (¬c ∧ d),
iii (a ∨ b ∨ d) ∨ (¬c ∨ d).

is in Sum-of-Products (SoP) standard form?


d Identify each equivalence that is correct:

i a ∨ 1 ≡ a.
ii a ⊕ 1 ≡ ¬a.
iii a ∧ 1 ≡ a.
iv ¬(a ∧ b) ≡ ¬a ∨ ¬b.

e Identify each equivalence that is correct:

i ¬¬a ≡ a.
ii ¬(a ∧ b) ≡ ¬a ∨ ¬b.
iii ¬a ∧ b ≡ a ∧ ¬b.
iv ¬a ≡ a ⊕ a.

Q29. a The OR form of the null axiom is x ∨ 1 ≡ 1. Which of the following options

i x ∧ 1 ≡ 1,
ii x ∧ 0 ≡ 0,
iii x ∨ 0 ≡ 0,

git # 8b6da880 @ 2023-09-27 209


© Daniel Page ⟨dan@phoo.org⟩

iv x ∧ x ≡ x,

is the dual of this axiom?


b Given the Boolean equation
f = ¬a ∧ ¬b ∨ ¬c ∨ ¬d ∨ ¬e,
which of the following

i ¬ f = a ∨ b ∨ c ∨ d ∨ e,
ii ¬ f = a ∧ b ∧ c ∧ d ∧ e,
iii ¬ f = a ∧ b ∧ (c ∨ d ∨ e),
iv ¬ f = a ∧ b ∨ ¬c ∨ ¬d ∨ ¬e,
v ¬ f = (a ∨ b) ∧ c ∧ d ∧ e

is correct?
c If we write the de Morgan axiom in English, which of the following

i NOR is equivalent to AND if each input to AND is complemented,


ii NAND is equivalent to OR if each input to OR is complemented,
iii AND is equivalent to NOR if each input to NOR is complemented, or
iv NOR is equivalent to NAND if each input to NAND is complemented.

describes the correct equivalence?

Q30. a Identify which one of these Boolean expressions

i c∨d∨e
ii ¬c ∧ ¬d ∧ ¬e
iii ¬a ∧ ¬b
iv ¬a ∧ ¬b ∧ ¬c ∧ ¬d ∧ ¬e

is the correct result of simplifying

(¬(a ∨ b) ∧ ¬(c ∨ d ∨ e)) ∨ ¬(a ∨ b).

b If you simplify the Boolean expression

(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)

into a form that contains the fewest operators possible, which of the following options

i a ∨ b ∨ c,
ii ¬a ∧ ¬b ∧ ¬c,
iii d ∨ e,
iv ¬d ∧ ¬e,
v none of the above

do you end up with and why?


c If you simplify the Boolean expression

a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)

into a form that contains the fewest operators possible, which of the following options

i (b ∧ c) ∨ c,
ii c ∨ (a ∧ b ∧ c),
iii a ∧ c,

git # 8b6da880 @ 2023-09-27 210


© Daniel Page ⟨dan@phoo.org⟩

iv a ∨ (b ∧ c),
v none of the above

do you end up with and why?

d Consider the Boolean expression

a ∧ b ∨ a ∧ b ∧ c ∨ a ∧ b ∧ c ∧ d ∨ a ∧ b ∧ c ∧ d ∧ e ∨ a ∧ b ∧ c ∧ d ∧ e ∧ f.

Which of the following simplifications

i a ∧ b ∧ c ∧ d ∧ e ∧ f,
ii a ∧ b ∨ c ∧ d ∨ e ∧ f,
iii a ∨ b ∨ c ∨ d ∨ e ∨ f,
iv a ∧ b,
v c ∧ d,
vi e ∧ f,
vii a ∨ b ∧ (c ∨ d ∧ (e ∨ f ))
viii ((a ∨ b) ∧ c) ∨ d ∧ e ∨ f

is correct?

e Given the options

i 1,
ii 2,
iii 3,
iv 4,

decide which is the least number of operator required to compute the same result as

f (a, b, c) = (a ∧ b) ∨ a ∧ (a ∨ c) ∨ b ∧ (a ∨ c).

Show how you arrived at your decision.

f Prove that
(¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ≡ ¬x ∨ ¬y.

g Prove that
(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z)) ≡ y ∧ (x ∨ z).

h Simplify the Boolean expression


¬(a ∨ b) ∧ ¬(¬a ∨ ¬b)
into a form which contains the fewest operators possible.

A.2 Chapter 2
Q31. From the following list
A: has N-type semiconductor terminals and P-type body
B: has P-type semiconductor terminals and N-type body
C: is paired with another N-MOSFET to form a CMOS cell
D: has a threshold voltage above which the transistor is deemed active
identify each statement that correctly describes an N-MOSFET.

Q32. Consider the following implementation of a 2-input NAND gate:

git # 8b6da880 @ 2023-09-27 211


© Daniel Page ⟨dan@phoo.org⟩

Vdd

Vss

From the following list


A: two inputs x and y, and one output r
B: a pull-up network of P-MOSFET transistors
C: a pull-down network of BJT transistors
D: two power rails supplying different voltage levels
E: a flux capacitor
identify each component evident in the implementation?

Q33. Consider the following organisation of MOSFET transistors

Vdd

Vss

which implements a 3-input Boolean function r = f (x, y, z). Which function, from the following, do you think
it matches?

git # 8b6da880 @ 2023-09-27 212


© Daniel Page ⟨dan@phoo.org⟩

A: r=x∧y∧z
B: r=x
C: r = ¬(x ∧ (y ∨ z))
D: r = x ∧ (y ∨ z)
E: r=x∨y∨z

Q34. Recall that a 2-input XOR operator can be described via the following truth table:

XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0

An implementation of this operator is realised by combining logic gate instances, e.g., for NOT, NAND, AND,
NOR, and OR, while attempting to minimise the total number of underlying MOSFET-based transistors. How
many such transistors do you think it uses?
A: 14
B: 16
C: 18
D: 20
E: 22

Q35. A buffer can be described as a “pass through” logic gate: although it performs no computation (i.e., the output
r matches the input x, so r = x), it does impose a delay (often roughly the same as a NOT gate). It may be
termed a non-inverting buffer (cf. an inverting buffer, or NOT gate) because of this.
You are asked to implement a buffer, using an unconstrained organisation of N- and P-MOSFET transistors
alone. Assuming you attempt to minimise the number used, how many transistors do you need?
A: 0
B: 2
C: 4
D: 6
E: 8

Q36. Recalling that ? denotes don’t-care, the following truth table

f
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 ?
1 1 1 1

describes a 3-input, 1-output Boolean function f st. r = f (x, y, z). Which of the following Boolean expressions
A: (¬x ⊕ ¬y) ∧ z
B: (¬x ⊕ ¬y) ∨ z
C: (¬x ∧ ¬y) ∧ z
D: (¬x ∧ ¬y) ∨ z
E: (¬x ∨ ¬y) ∧ z
correctly realises f ?

git # 8b6da880 @ 2023-09-27 213


© Daniel Page ⟨dan@phoo.org⟩

Q37. Imagine you want to design an 8-input, 8-bit multiplexer. Rather than do so from scratch, you intend to form
the design using multiple instances of an existing 2-input, 1-bit multiplexer component. How many do you
need?
A: 1
B: 8
C: 24
D: 40
E: 56

Q38. The following diagram

ci ci co ci co ci co ci co co
x x x x
y s y s y s y s

x0 y0 r0 x1 y1 r1 x2 y2 r2 x3 y3 r3

illustrates a 4-bit ripple-carry adder circuit, constructed using 4 full-adder instances: it computes the sum
r = x + y + ci, given two operands x and y and a carry-in ci, and an associated carry-out co. Given the
propagation delay of NOT, AND, OR and XOR gates is 10ns, 20ns, 20ns and 60ns respectively, which of the
following
A: 120ns
B: 180ns
C: 240ns
D: 280ns
E: 480ns
most accurately reflects the critical path of the entire circuit?

Q39. Imagine you use the ripple-carry adder in the previous question to compute an unsigned addition within some
larger circuit. Having seen your design, your friend suggests they can optimise it: they claim that replacing each
full-adder instance with a half-adder instance will halve the total number of logic gates required. However, they
admit the optimisation does have a disadvantage. Specifically, although any value of x can be accommodated
the optimised circuit can only produce the correct output for some values of y. Which of the following values
of y
A: −1
B: 0
C: 1
D: any 2 ≤ y < 8
E: any 8 ≤ y < 16
will produce the correct output?

Q40. Consider the following combinatorial circuit

x0 r0

x1 r1

r2
x2

r3
x3

with a 4-bit input x and a 4-bit output r. Which of the following best describes the purpose of this circuit?
A: it computes the Hamming weight of x

git # 8b6da880 @ 2023-09-27 214


© Daniel Page ⟨dan@phoo.org⟩

w x y z r = f (w, x, y, z)
0 0 0 0 0
0 0 0 1 1
0 0 1 0 ?
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 0

Figure A.2: A truth table for the 4-input Boolean function f .

B: it computes the parity of x


C: it swaps the most-significant 2-bit half of x with the least-significant 2-bit half of x
D: it adds the most-significant 2-bit half of x to the least-significant 2-bit half of x (treating it as an unsigned,
4-bit integer)
E: it negates x (treating it as a signed, 4-bit integer represented using two’s-complement)

Q41. Recalling that ? denotes don’t-care, consider the truth table shown in Figure A.2.
a Construction of a Karnaugh map for f demands formation of a set of groups; these (collectively) cover
of all 1 entries. Assuming the most efficient approach is adopted when forming said groups, how many
are required?

A: 1
B: 2
C: 3
D: 4
E: 6

b Using the Karnaugh map above (plus any subsequent optimisation steps you deem necessary), derive a
Boolean expression for f that minimises the number of operators required. How many operators remain
in said expression?

A: 1
B: 4
C: 5
D: 11
E: 12

Q42. Consider the following waveform

git # 8b6da880 @ 2023-09-27 215


© Daniel Page ⟨dan@phoo.org⟩

which details the behaviour of three signals labelled x, y and z. Which of the following components could the
behaviour illustrated relate to?
A: an SR-type flip-flop
B: an SR-type latch
C: a D-type flip-flop
D: a D-type latch
E: a T-type flip-flop

Q43. The following diagram

S Q

R ¬Q

illustrates a preliminary NAND-based SR-latch design, in the sense it currently lacks an enable signal. If Q and
Q′ denote the current and next state respectively, which of the following excitation tables

Current Next
 S R Q ¬Q Q′ ¬Q′


 0 0 0 1 0 1



 0 0 1 0 1 0
A  0 1 ? ? 0 1


1 0 ? ? 1 0




1 1 0 0


 ? ?


 0 0 ? ? 1 1



 0 1 ? ? 1 0
B  1 0 ? ? 0 1


1 1 0 1 0 1




1 1 1 0 1 0


(
0 0 ? ? 0 1
C 1 1 ? ? 1 0
(
0 ? ? ? 0 1
D 1 ? ? ? 1 0
(
? 0 ? ? 0 1
E ? 1 ? ? 1 0

correctly captures the behaviour of this circuit?

Q44. Although perhaps unusual, the following diagram

a x x
r r c
y c y c

illustrates a circuit with well defined behaviour. Based on analysis of this behaviour, which of the following
components
A: a flip-flop
B: a latch
C: a RAM cell
D: a ROM cell
E: a clock multiplier
does the circuit implement?

git # 8b6da880 @ 2023-09-27 216


© Daniel Page ⟨dan@phoo.org⟩

Vdd Vdd

y
r1

x
y

r0
x

Vss Vss
(a) C0 (using P-type MOSFETs). (b) C1 (using N-type MOSFETs).

Figure A.3: MOSFET-based implementations of C0 and C1 .

Q45. A m-output, 1-bit demultiplexer connects a 1-bit input x to one of m separate 1-bit outputs (say ri for 0 ≤ i < m).
The output is selected using an l-bit control signal c (or, equivalently, c is a collection of l separate 1-bit control
signals). If m = 5, what is the minimum value of l required?
A: 0
B: 1
C: 2
D: 3
E: 4

Q46. Figure A.3 describes the implementation of two components denoted C0 and C1 . Each component Ci produces
one output ri given two inputs x and y, and has been implemented using MOSFET transistors.

a The truth table below includes 5 possibilities for outputs r0 and r1 (stemming from instances of C0 and
C1 ), given x and y. Recall that Vss and Vdd are used to represent 0 and 1 respectively: which option is
correct?
A. B. C. D. E.
z}|{ z}|{ z}|{ z}|{ z}|{
x y r0 r1 r0 r1 r0 r1 r0 r1 r0 r1
0 0 1 0 0 0 1 0 Z 0 1 Z
0 1 1 1 0 0 0 0 Z Z Z Z
1 0 1 1 0 0 0 0 Z Z Z Z
1 1 0 1 1 0 0 0 1 Z Z 0

b The vendor of these components claims they can be used to implement any Boolean function; their
reasoning is based on the fact that a NAND gate can be implemented using instances of C0 and C1 .
Imagine you adhere to a design strategy where any given wire is driven by at most one non-Z value at
any given time, and want to minimise the number of C0 and C1 instances used: how many of each do
you need to implement a NAND gate?

A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
C0 C1 C0 C1 C0 C1 C0 C1 C0 C1
1 1 5 3 3 5 3 3 5 5

Q47. Moore’s Law is an observation about the number of transistors which can be fabricated within some fixed unit
of area: it observes that this number doubles roughly every two years. Which of the following properties of
MOSFET-based transistors act as a constraint with respect to Moore’s Law?
A: Feature size
B: Power consumption
C: Heat dissipation
D: All of the above

git # 8b6da880 @ 2023-09-27 217


© Daniel Page ⟨dan@phoo.org⟩

ci y x

t0

s
t1

t2 t3

co

Figure A.4: An implementation of a full-adder cell.

E: None of the above

Q48. Figure A.4 shows an implementation of a full-adder cell. It uses three 1-bit inputs denoted x, y, and ci (the
carry-in), to compute two 1-bit outputs denoted s (the sum) and co (the carry-out); several other intermediate
wires, namely t0 , t1 , t2 , and t3 , are labelled for reference. Let

(x, y, ci) → (x′ , y′ , ci′ )

denote a change in said inputs: the LHS captures current values, whereas the RHS captures next (or new)
values. For example,
(0, 0, 0) → (1, 0, 0)
toggles x from 0 to 1, while both y and ci remain 0. Which of the following options will cause s to toggle in the
shortest period of time (i.e., with the shortest delay)?
A: (0, 0, 0) → (0, 0, 1)
B: (0, 0, 1) → (0, 1, 1)
C: (0, 1, 1) → (0, 0, 1)
D: (1, 1, 1) → (1, 1, 0)
E: (1, 0, 1) → (0, 1, 1)

Q49. Figure A.5 shows an implementation of a cyclic n-bit counter. While the counter is operational (i.e., while not
reset, and given a clock signal), each ri will transition between 0 and 1 at a different frequency. For the concrete
case of n = 4, which does so at the lowest frequency?
A: r4
B: r3
C: r2
D: r1
E: r0

Q50. Consider a 16-bit register, constructed from CMOS-based D-type latches. Based on high-level reasoning about
this component alone, if the initial value stored is DEAD(16) then overwriting it with which of the following
A: BEEF(16)
B: F00D(16)
C: 1234(16)
D: FFFF(16)
E: 0000(16)
might you expect to consume the most power?

git # 8b6da880 @ 2023-09-27 218


© Daniel Page ⟨dan@phoo.org⟩

rn−1
Q

¬Q

co

¬Q
s
en

en
D

D
ci
x
y
0
Q

¬Q

co

¬Q
s
en

en
D

D
ci
x
y

r1
Q

¬Q

co

¬Q
s
en

en
D

D
ci
x
y

r0
Q

¬Q

co

¬Q
s
en

en
D

D
ci
x
y

1
rst
0
Φ2

Φ1

Figure A.5: An implementation of a cyclic n-bit counter.

git # 8b6da880 @ 2023-09-27 219


© Daniel Page ⟨dan@phoo.org⟩

Q51. You are tasked with implementing the Boolean function

r = f (x, y, z) = (¬x ∧ ¬y ∧ ¬z) ∨ (y ∧ ¬z) ∨ (y ∧ z).

Each of the five options (i.e., columns) in the table below

A B C D E
1. × ✓ × × ✓
2. × × ✓ × ✓
3. × × × ✓ ✓

states whether f can (a tick) or cannot (a cross) be implemented using a given set of components (i.e., row),
namely

a an 8-input, 1-bit multiplexer,


b a 4-input, 1-bit multiplexer,
c a 2-input, 1-bit multiplexer, an OR gate, and a NOT gate,

plus the constant values 0 and 1. For example, option C states that f can be implemented by using component
set 2 but not 1 or 3. Which option do you think is correct?

Q52. Consider the fact that

1 x
r r
0 y c
x r ≡

i.e., that one can implement a NOT gate using one instance of a 2-input, 1-bit multiplexer component. Assuming
you want to minimise the number of multiplexer instances, identify how many are required to implement the
expression
(x ∧ y) ∨ z.

A: 1
B: 2
C: 3
D: 6
E: 8

Q53. Consider the combinatorial logic design as shown in Figure A.6, which is described using N-type and P-type
MOSFET transistors. Within the design, three inputs (i.e., x, y, and z) and one output (i.e., r) can be identified;
note that several transistors (e.g., m0 ) and intermediate signals (e.g., t0 ) are annotated for reference. Which of
the following Boolean expressions
A: ¬x
B: ¬((x ∨ y) ∧ z)
C: (¬(x ∨ y)) ∧ z
D: ¬(x ∧ y ∧ ¬z)
E: ¬(x ∨ y ∨ ¬z)
does the design implement?

Q54. Consider the sequential logic design as shown in Figure A.7, which contains two D-type flip-flops. Within the
design, one output (i.e., r) can be identified; note that several intermediate signals (e.g., t0 ) are annotated for
reference. If the clock signal clk has a frequency of 400MHz, what is the frequency of r?
A: 100MHz
B: 200MHz

git # 8b6da880 @ 2023-09-27 220


© Daniel Page ⟨dan@phoo.org⟩

x y z

Vdd

m0 m4

m1 m5 m8
t1
r
t0
m2 m6 m9

m3 m7

Vss

Figure A.6: A combinatorial logic design, described using N-type and P-type MOSFET transistors.

t0 t1 t2 t3

1 D Q D Q r
en en
¬Q ¬Q

clk

Figure A.7: A sequential logic design, containing two D-type flip-flops.

C: 400MHz
D: 800MHz
E: 1600MHz

Q55. Consider the following combinatorial logic design

x
r r
p y c

which is described using a 2-input, 1-bit multiplexer. Within the design, two inputs (i.e., p and q) and one
output (i.e., r) can be identified. Which of the following Boolean expressions
A: r = ¬p
B: r=p∧q
C: r = ¬(p ∧ q)
D: r=p⊕q
E: r = ¬(p ⊕ q)
correctly reflects the relationship between inputs and output?

git # 8b6da880 @ 2023-09-27 221


© Daniel Page ⟨dan@phoo.org⟩

Vdd

x m0

y m1 z m2 m3

r
t0

pull-down network m4

Vss

Figure A.8: A combinatorial logic design, described using N-type and P-type MOSFET transistors; note that the
pull-down network is (partially) missing.

Q56. A NAND-based SR latch implementation can be realised as follows: The following

S Q

R ¬Q

Imagine that the two NAND gates have a non-zero, but unequal gate delay associated with them, i.e., the top
gate has the delay x whereas the bottom gate has the delay x ± δ for some x and δ > 0. If the current input
S = R = 0 is changed instantaneously to S = R = 1, what will the outputs be?
A: Q = 1, ¬Q = 1
B: either Q = 0, ¬Q = 1 or Q = 1, ¬Q = 0
C: either Q = 1, ¬Q = 1 or Q = 0, ¬Q = 0
D: Q = 0, ¬Q = 0
E: None of the above

Q57. Consider the combinatorial logic design as shown in Figure A.8, which is described using N-type and P-type
MOSFET transistors. Within the design, three inputs (i.e., x, y, and z) and one output (i.e., r) can be identified;
note that several transistors (e.g., m0 ) and intermediate signals (e.g., t0 ) are annotated for reference. Despite the
fact that the pull-down network is (partially) missing, it is still possible to infer how the design works: which
of the following Boolean expressions
A: r=x⊕y
B: r = (¬x ∧ ¬y) ∨ ¬z
C: r = (¬x ∨ ¬y) ∧ ¬z
D: r = (x ∧ y) ∨ z
E: r = (x ∨ y) ∧ z
correctly reflects the relationship between inputs and output?

Q58. Consider the following MOSFET transistor

git # 8b6da880 @ 2023-09-27 222


© Daniel Page ⟨dan@phoo.org⟩

en

x r

If x ∈ {0, 1} and en ∈ {0, 1}, how many different values can r potentially take?
A: 1
B: 2
C: 3
D: 4
E: 5

Q59. Consider a combinatorial logic component defined by

x>y
(
1
r = f (x, y) =
0 otherwise

For how many combinations of the unsigned, 2-bit inputs x and y is the output r = 1?
A: 1
B: 2
C: 4
D: 6
E: 8

Q60. Consider a micro-processor which is compatible with the ARMv7-A ISA. During execution of an instruction,
the fetch stage of the fetch-decode-execute cycle computes PC + 4, which (potentially) forms the program
counter in the next cycle. In an initial implementation of the micro-processor, PC + 4 is computed by using
a general-purpose ripple-carry adder. Said adder is subsequently optimised, however, by capitalising on the
special-purpose for of computation: ARMv7-A demands that PC is word-aligned, for example.
Assuming that logic gates for NOT, AND, OR, and XOR require 1, 2, 2, and 4 units of area respectively, the
general-purpose solution requires

32 · (2 · XOR + 2 · AND + 1 · OR) = 32 · (2 · 4 + 2 · 2 + 1 · 2) = 448

units of area due to the use of 32 full-adder cells. If the optimisation aims to minimise area, which of the
following options
A: 2.00
B: 2.31
C: 2.33
D: 2.52
E: 3.14
most accurately reflects the improvement factor offered by the special-purpose solution?

Q61. Binary-Coded Decimal (BCD) is a representation for decimal integers, where each decimal digit in some x is
represented independently by a 4-bit binary sequence in r. For example,

x = 123(10) 7→ ⟨0011(2) , 0010(2) , 0001(2) ⟩ = r.

Note that because each 0 ≤ xi < 10 and 24 = 16 > 10, some values of the associated BCD-encoded digit ri are
impossible.
Imagine you are asked to implement a 4-input Boolean function f using combinatorial logic, which will be
used to process BCD-encoded digits. Select an option to complete blanks in the sentence “a Karnaugh map cell
which contains a can be treated as either or in order to the resulting term” , so that it
correctly describes how you might deal with an impossible BCD-encoded digit.
A: don’t care, AND, OR, eliminate
B: duplicate, 1, 0, verify
C: unknown, 1, 0, simplify
D: don’t care, 1, 0, simplify

git # 8b6da880 @ 2023-09-27 223


© Daniel Page ⟨dan@phoo.org⟩

S
⊙ Q

⊙ ¬Q
R

Figure A.9: An SR-latch, described in terms of abstract components labelled ⊙.

S Q

en

R ¬Q

Figure A.10: An SR-latch variant, which includes additional inputs P, C, and en.

E: unknown, 0, 1, optimise

Q62. The block diagram in Figure A.9, describes a sequential logic component, or, more specifically, an SR-type
latch: it does so by using two abstract components labelled ⊙. If the associated excitation table is as follows

S R Q ¬Q Q′ ¬Q′
1 1 0 1 0 1
1 1 1 0 1 0
1 0 ? ? 1 0
0 1 ? ? 0 1
0 0 ? ? ? ?

which of the following gate types


A: XOR
B: AND
C: OR
D: NAND
E: NOR
has been used to concretely instantiate the abstract components (i.e., replace each ⊙)?

Q63. Figure A.10, describes a sequential logic component, or, more specifically, a variant of the SR-latch: in addition
to S and R, it also includes the inputs labelled P, C, and en.

a Which of the following options

S and R P and C
A synchronous synchronous
B synchronous asynchronous
C asynchronous synchronous
D asynchronous asynchronous

most accurately classifies the inputs?

git # 8b6da880 @ 2023-09-27 224


© Daniel Page ⟨dan@phoo.org⟩

b Which of the following options

S and R P and C
A active low active low
B active low active high
C active high active low
D active high active high

most accurately classifies the inputs?

Q64. Write the simplest (i.e., with fewest operators) possible Boolean expression that implements the Boolean
function
r = f (x, y, z)
described by
f
x y z r
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 ?
1 0 0 1
1 0 1 0
1 1 0 ?
1 1 1 1
where ? denotes don’t care.

Q65. Take the Boolean expression


¬(x ∨ y)
and draw a gate-level circuit diagram that computes an equivalent resulting using only 2-input NAND gates.

Q66. Recall that an SR latch has two inputs S (or set) and R (or reset); if S = R = 1, the two outputs Q and ¬Q are
undefined. This issue can be resolved by using a reset-dominate latch: the alternative design has the same
inputs and outputs, but resets the latch (i.e., has Q = 0 and ¬Q = 1) whenever S = R = 1.
Using a gate-level circuit diagram, describe how a reset-dominate latch can be implemented using only
NOR gates and at most one AND gate.

Q67. The quality of the design for some hardware component is often judged by measuring efficiency, for example
how quickly it can produce output on average. Name two other metrics that might be considered.

Q68. a Describe how N-type and P-type MOSFET transistors are constructed using silicon and how they operate
as switches.
b Draw a diagram to show how N-type and P-type MOSFET transistors can be used to implement a NAND
gate. Show your design works by describing the transistor states for each input combination.

Q69. The following diagram


Vdd

Vss

git # 8b6da880 @ 2023-09-27 225


© Daniel Page ⟨dan@phoo.org⟩

details a 2-input NAND gate comprised of two P-MOSFET transistors (top) and two N-MOSFET transistors
(bottom). Draw a similar diagram for a 3-input NAND gate.

Q70. Moore’s Law predicts the number of CMOS-based transistors we can manufacture within a fixed sized area
will double roughly every two years; this is often interpreted as doubling computational efficiency over the
same period. Briefly explain two limits which mean this trend cannot be sustained indefinitely.

Q71. Given that ? is the don’t care state, consider the following truth table which describes a function p with four
inputs (a, b, c and d) and two outputs (e and f ):

p
a b c d e f
0 0 0 0 0 0
0 0 0 1 0 1
0 0 1 0 1 0
0 0 1 1 ? ?
0 1 0 0 0 1
0 1 0 1 1 0
0 1 1 0 0 0
0 1 1 1 ? ?
1 0 0 0 1 0
1 0 0 1 0 0
1 0 1 0 0 1
1 0 1 1 ? ?
1 1 0 0 ? ?
1 1 0 1 ? ?
1 1 1 0 ? ?
1 1 1 1 ? ?

a From the truth table above, write down the corresponding Sum of Products (SoP) equations for e and f .
b Simplify the two SoP equations so that they use the minimum number of logic gates possible. You can
assume the two equations can share logic.

Q72. Using a Karnaugh map, derive a Boolean expression for the function

r = f (x, y, z)

described by the truth table


f
x y z r
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 ?
where ? denotes don’t care.

Q73. NAND is a universal logic gate in the sense that the behaviour of NOT, AND and OR gates can be implemented
using only NAND. Show how this is possible using a truth table to demonstrate your solution.

Q74. Both NAND and NOR gates are described as universal because any other Boolean gate (i.e., AND, OR, NOT)
can be constructed using them. Imagine your friend suggests a 4-input, 1-bit multiplexer (that selects between
four 1-bit inputs using two 1-bit control signals to produce a 1-bit output) is also universal: state whether or
not you believe them, and explain why.

git # 8b6da880 @ 2023-09-27 226


© Daniel Page ⟨dan@phoo.org⟩

Q75. Consider the following circuit where the propagation delay of logic gates in the circuit are 10ns for NOT, 20ns
for AND, 20ns for OR and 60ns for XOR:

b
e
d

a Draw a Karnaugh map for this circuit and derive a Sum of Products (SoP) expression for the result.
b Describe advantages and disadvantages of your SoP expression and the dynamic behaviour it produces.
c If the circuit is used as combinatorial logic within a clocked system, what is the maximum clock speed of
the system?

Q76. A game uses nine LEDs to display the result of rolling a six-sided dice; the i-th LED, say Li for 0 ≤ i < 9, is
driven with 1 or 0 to turn it on or off respectively. A 3-bit register D represents the dice as an unsigned integer.

a The LEDs are arranged as follows,

L0 L3 L6

L1 L4 L7

L2 L5 L8

and the required mapping between dice and LEDs, given a filled dot means an LED is on, is

D=1 D=2 D=3 D=4 D=5 D=6


↓ ↓ ↓ ↓ ↓ ↓

Using Karnaugh maps as appropriate, write a simplified Boolean expression for each LED (i.e., for each
Li in terms of D).
b The 2-input XOR, AND, OR and NOT gates used to implement your expressions have propagation delays
of 40, 20, 20 and 10 nanoseconds respectively. Calculate how many times per-second the dice can be
rolled, i.e., D can be updated, if the LEDs are to provide the correct output.

git # 8b6da880 @ 2023-09-27 227


© Daniel Page ⟨dan@phoo.org⟩

c The results of individual dice throws will be summed using a ripple-carry adder circuit, to give a total;
each 3-bit output D will be added to and stored in an n-bit accumulator register A.

i Using a high-level block diagram, show how an n-bit ripple-carry adder circuit is constructed from
full-adder cells.
ii If m = 8 throws of the dice are to be summed, what value for n should be selected?
iii Imagine that instead of D, we want to add 2 · D to A. Doubling D can be achieved by computing
either D + D or D ≪ 1 (i.e., a left-shift of D by 1 bit). Carefully state which method is preferable, and
why.

Q77. Consider a simple component called C that compares two inputs x and y (both are unsigned 8-bit integers) in
order to produce their maximum and minimum as two outputs:

?
x - C - min(x, y)

?
max(x, y)

Instances of C can be connected in a mesh to sort integers: the input is fed into the top and left-hand edges of
the mesh, the sorted output appears on the bottom and right-hand edges. An example is given below:

5 2 4 1

? ? ? ?
3 - C 3- C 2- C 2- C - 1

5 3 4 2
? ? ? ?
2 - C 2- C 2- C 2- C - 2

5 3 4 2
? ? ? ?
6 - C 5- C 3- C 3- C - 2

6 5 4 3
? ? ? ?
7 - C 6- C 5- C 4- C - 3

? ? ? ?
7 6 5 4

a Using standard building blocks (e.g., adder, multiplexer etc.) rather than individual logic gates, draw a
block diagram that implements the component C.
b Imagine that an n × n mesh of components is created. Based on your design for C and clearly stating any
assumptions you need to make, write down an expression for the critical path of such a mesh.
c Algorithms for sorting integers can clearly be implemented on a general-purpose processor. Explain two
advantages and two disadvantages of using such a processor versus using a mesh like that above.

Q78. Imagine you are working for a company developing the “Pee”, a portable games console. The user interface is
a fancy controller that has

• three fire buttons represented by the 1-bit inputs F0 , F1 and F2 , and


• a 8-direction D-pad represented by the 3-bit input D

and you are charged with designing some aspects of it.

git # 8b6da880 @ 2023-09-27 228


© Daniel Page ⟨dan@phoo.org⟩

a The fire button inputs are described as level triggered and active high; explain what this means (in
comparison to the alternatives in each case).
b Some customers want an “autofire” feature that will automatically and repeatedly press the F0 fire button
for them. The autofire can operate in four modes, selected by a switch called M: off (where the fire button
F0 works as normal), slow, fast or very fast (where the fire button F0 is turned on and off repeatedly at the
selected speed). Stating any assumptions and showing your working where appropriate, design a circuit
that implements such a feature.
c In an attempt to prevent counterfeiting, each controller can only be used with the console it was sold
with. This protocol is used:
P C

$
− {0, 1}3
c←
c
−→
r = T(c)
r
←−
?
r = T(c)

which, in words, means that


• the console generates a random 3-bit number c and sends it to the controller,
• the controller computes a 3-bit result r = T(c) and sends it to the console,
• the console checks that r matches T(c) and assumes the controller is valid if so.

i There is some debate as to whether the protocol should be synchronous or asynchronous; explain
what your recommendation would be and why.
ii The function T is simply a look-up table. For example

2 if x = 0 if x = 4


 4
 6 if x = 1 if x = 5

 0
T(x) = 

 7 if x = 2
 5 if x = 6
 1 if x = 3 if x = 7

3

Each pair of console and controller has such a T fixed inside them during the manufacturing process.
Stating any assumptions and showing your working where appropriate, explain how this T might
be implemented as a circuit.

Q79. Imagine you have three Boolean values x, y, and z. Given access to as many AND and OR gates as you want
but only two NOT gates, write a set of Boolean expressions to compute all three results ¬x, ¬y and ¬z.

Q80. SAT is the problem of finding an assignment to n Boolean variables which means a given Boolean expression
is satisfied, i.e., evaluates to 1. For example, given n = 3 and the expression

(x ∧ y) ∨ ¬z,

x = 1, y = 1, z = 0 is one assignment (amongst several) which solves the associated SAT problem.
The ability to solve SAT can be used to test whether or not two n-input, 1-output combinatorial circuits C1
and C2 are equivalent. Show how this is possible.

Q81. Consider the following combinatorial circuit, which is the composition of four parts (labelled A, B, C and REG):
each part is annotated with a name and an associated critical path. The circuit computes an output r = f (x)
from the corresponding input x.

x A B C REG r = f (x)
10ns 30ns 20ns 10ns

With respect to this circuit,

a first define the terms latency and throughput, then

git # 8b6da880 @ 2023-09-27 229


© Daniel Page ⟨dan@phoo.org⟩

b explain how and why you would expect use of pipelining to influence both metrics.

Q82. The figure below shows a block of combinatorial logic built from seven parts; the name and latency of each
part is displayed inside it. Note that the last part is a register which stores the result:

x A B C D E F REG r = f (x)
40ns 10ns 30ns 10ns 50ns 10ns 10ns

It is proposed to pipeline the block of logic using two stages such that there is a pipeline register in between
parts D and E:

x A B C D REG E F REG r = f (x)


40ns 10ns 30ns 10ns 10ns 50ns 10ns 10ns

a Explain the terms latency and throughput in relation to the idea of pipelining.
b Calculate the overall latency and throughput of the initial circuit described above.
c Calculate the overall latency and throughput of the circuit after the proposed change.
d Calculate the number of extra pipeline registers required to maximise the circuit throughput; state this
new throughput and the associated latency. Explain the advantages and disadvantages of this change.

This is a (large) set of example Boolean minimisation questions: each asks you to transform some truth table
describing an n-input Boolean function into a Boolean expression. Each solution includes

1. a reference implementation (produced by forming a SoP expression with a full term for each minterm,
i.e., row where r = 1), and
2. a Karnaugh map annotated with sensible groups, and an optimised implementation based on those
groups.

The goal is to focus on producing the latter, since the former is somewhat easier. Keep in mind and take care
wrt. the following:
n n
• There are 22 Boolean functions with n inputs (or 32 if you include don’t-care as a valid output); whereas
for small n a complete set of functions is included, but for large n there is only a random sub-set.
• No real effort is made to order the questions, and only minor effort to avoid duplicates. That said, there
should be no trivial (in the sense r = 1 or r = 0 for all inputs, e.g., tautological) cases.
• The questions and solutions are generated automatically, meaning a small but real chance of bugs in the
associated implementation!

Q83.
y z r
0 0 1
0 1 0
1 0 1
1 1 1

Q84.
y z r
0 0 1
0 1 1
1 0 0
1 1 1

Q85.
y z r
0 0 1
0 1 0
1 0 0
1 1 0

git # 8b6da880 @ 2023-09-27 230


© Daniel Page ⟨dan@phoo.org⟩

Q86.
y z r
0 0 1
0 1 1
1 0 0
1 1 0

Q87.
y z r
0 0 0
0 1 0
1 0 0
1 1 1

Q88.
y z r
0 0 1
0 1 0
1 0 0
1 1 1

Q89.
y z r
0 0 0
0 1 0
1 0 1
1 1 0

Q90.
y z r
0 0 0
0 1 1
1 0 1
1 1 0

Q91.
y z r
0 0 0
0 1 1
1 0 0
1 1 0

Q92.
y z r
0 0 1
0 1 0
1 0 1
1 1 0

Q93.
y z r
0 0 0
0 1 ?
1 0 1
1 1 0

git # 8b6da880 @ 2023-09-27 231


© Daniel Page ⟨dan@phoo.org⟩

Q94.
y z r
0 0 ?
0 1 ?
1 0 0
1 1 1

Q95.
y z r
0 0 0
0 1 1
1 0 ?
1 1 1

Q96.
y z r
0 0 1
0 1 ?
1 0 1
1 1 0

Q97.
y z r
0 0 1
0 1 1
1 0 0
1 1 ?

Q98.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0

Q99.
x y z r
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1

Q100.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 0

git # 8b6da880 @ 2023-09-27 232


© Daniel Page ⟨dan@phoo.org⟩

Q101.
x y z r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1

Q102.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1

Q103.
x y z r
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0

Q104.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 1

Q105.
x y z r
0 0 0 1
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0

git # 8b6da880 @ 2023-09-27 233


© Daniel Page ⟨dan@phoo.org⟩

Q106.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1

Q107.
x y z r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 1

Q108.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 ?
1 0 0 0
1 0 1 ?
1 1 0 1
1 1 1 0

Q109.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 ?

Q110.
x y z r
0 0 0 0
0 0 1 ?
0 1 0 1
0 1 1 1
1 0 0 ?
1 0 1 1
1 1 0 ?
1 1 1 ?

git # 8b6da880 @ 2023-09-27 234


© Daniel Page ⟨dan@phoo.org⟩

Q111.
x y z r
0 0 0 ?
0 0 1 0
0 1 0 ?
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 0
1 1 1 1

Q112.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 0
1 1 1 0

Q113.
x y z r
0 0 0 ?
0 0 1 0
0 1 0 0
0 1 1 ?
1 0 0 1
1 0 1 ?
1 1 0 1
1 1 1 1

Q114.
x y z r
0 0 0 ?
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1

Q115.
x y z r
0 0 0 0
0 0 1 1
0 1 0 ?
0 1 1 ?
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 ?

git # 8b6da880 @ 2023-09-27 235


© Daniel Page ⟨dan@phoo.org⟩

Q116.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 ?
1 1 0 ?
1 1 1 1

Q117.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 ?

Q118.
w x y z r
0 0 0 0 1
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

Q119.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0

git # 8b6da880 @ 2023-09-27 236


© Daniel Page ⟨dan@phoo.org⟩

Q120.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1

Q121.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0

Q122.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0

git # 8b6da880 @ 2023-09-27 237


© Daniel Page ⟨dan@phoo.org⟩

Q123.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1

Q124.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

Q125.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 0

git # 8b6da880 @ 2023-09-27 238


© Daniel Page ⟨dan@phoo.org⟩

Q126.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 1
0 1 0 0 1
0 1 0 1 0
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 0
1 1 1 1 0

Q127.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1

Q128.
w x y z r
0 0 0 0 0
0 0 0 1 ?
0 0 1 0 ?
0 0 1 1 0
0 1 0 0 0
0 1 0 1 ?
0 1 1 0 ?
0 1 1 1 ?
1 0 0 0 0
1 0 0 1 0
1 0 1 0 1
1 0 1 1 ?
1 1 0 0 0
1 1 0 1 1
1 1 1 0 1
1 1 1 1 ?

git # 8b6da880 @ 2023-09-27 239


© Daniel Page ⟨dan@phoo.org⟩

Q129.
w x y z r
0 0 0 0 ?
0 0 0 1 ?
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 0
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 0

Q130.
w x y z r
0 0 0 0 ?
0 0 0 1 ?
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 ?
1 0 1 1 0
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 1

Q131.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 ?
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 0
1 1 1 1 ?

git # 8b6da880 @ 2023-09-27 240


© Daniel Page ⟨dan@phoo.org⟩

Q132.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 ?
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 ?
1 0 0 0 ?
1 0 0 1 1
1 0 1 0 0
1 0 1 1 ?
1 1 0 0 ?
1 1 0 1 1
1 1 1 0 ?
1 1 1 1 0

Q133.
w x y z r
0 0 0 0 0
0 0 0 1 ?
0 0 1 0 0
0 0 1 1 ?
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 ?
1 0 0 1 ?
1 0 1 0 ?
1 0 1 1 1
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 0

Q134.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 ?
0 1 0 0 0
0 1 0 1 0
0 1 1 0 ?
0 1 1 1 1
1 0 0 0 ?
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 ?
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1

git # 8b6da880 @ 2023-09-27 241


© Daniel Page ⟨dan@phoo.org⟩

Q135.
w x y z r
0 0 0 0 ?
0 0 0 1 1
0 0 1 0 ?
0 0 1 1 0
0 1 0 0 1
0 1 0 1 ?
0 1 1 0 1
0 1 1 1 ?
1 0 0 0 ?
1 0 0 1 1
1 0 1 0 ?
1 0 1 1 ?
1 1 0 0 ?
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 1

Q136.
w x y z r
0 0 0 0 ?
0 0 0 1 0
0 0 1 0 ?
0 0 1 1 1
0 1 0 0 ?
0 1 0 1 ?
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 ?
1 0 1 0 ?
1 0 1 1 ?
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 ?
1 1 1 1 1

Q137.
w x y z r
0 0 0 0 ?
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 ?
1 0 0 1 0
1 0 1 0 ?
1 0 1 1 0
1 1 0 0 ?
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 ?

A.3 Chapter 3
Q138. Mike Rowchip was an engineer, working on an ALU design for a new processor: he had completed the design
and implementation of most but not all modules, before he was, unfortunately, run over by a bus. You have

git # 8b6da880 @ 2023-09-27 242


© Daniel Page ⟨dan@phoo.org⟩

been tasked with completing the missing modules.


Mike first prototyped each module in software, using C functions, which he then used as a specification
for (and reference for testing) the associated hardware implementation. The prototype for one of the missing
modules is
uint8_t f( uint8_t x, uint8_t y ) {
uint8_t m = 1;

while( x & m ) {
x = x & ~m;
m = m << 1;
}

return x | m;
}

However, Mike left no documentation beyond this. Thanks Mike. What do you think it does?
A: add x to y
B: compute the Hamming weight of x
C: left-rotate x by 1 bit
D: increment x
E: decrement x

Q139. Consider the following conditional statement written in C


if( c ) {
...
}

wherein c is a placeholder for the condition expression; the statement body (i.e., the continuation dots) is
executed iff. evaluating the condition expression yields a non-zero result. C represents signed integers using
two’s-complement: for an unsigned, 32-bit integer x, imagine we want to execute the statement body if either
every bit of x is 0 or every bit of x is 1. Which of the following choices for the condition expression would
achieve this?
A: ( x == 0 ) || ( x == -1 )
B: !x || !(~x)
C: ( x + 1 ) < 2
D: All of the above
E: None of the above

Q140. Figure ?? captures the design of an n-bit ripple-carry adder, constructed using n full-adder instances connected
by a carry chain denoted c. If c0 = ci (the carry-in) and cn = co (the carry-out), then ci would more generally
denote the carry into the i-th full-adder instance. If n = 4 and ci = 0, which of the following options

A x = 0000(2) y = 0000(2)
B x = 1100(2) y = 0001(2)
C x = 0100(2) y = 0100(2)
D x = 1011(2) y = 1001(2)
E x = 0110(2) y = 0101(2)

would produce c2 = 1?

Q141. Consider two integers x and y, whose sum r = x + y is computed using a ripple-carry adder; x, y, and r are all
8-bit signed integers, represented using two’s-complement. The associated flag

0 if x + y did not overflow


(
f =
1 if x + y did overflow

is used to signal whether an overflow occurred during computation of r. Which of the following
A: f = r8
B: f = (x7 ∧ y7 ∧ ¬r7 ) ∨ (¬x7 ∧ ¬y7 ∧ r7 )
C: f = (x7 ∨ y7 ∨ ¬r7 ) ∧ (¬x7 ∨ ¬y7 ∨ r7 )
D: f = x7 ∧ y7 ∧ r7

git # 8b6da880 @ 2023-09-27 243


© Daniel Page ⟨dan@phoo.org⟩

E: f = x7 ⊕ y7 ⊕ r7
is the correct Boolean expression for f ?

Q142. An n-bit ripple-carry adder has a critical path that can be described as O(n) gate delays. Explain intuitively
why this is the case, and name an alternative whose critical path is shorter.

Q143. Give a single-line C expression to test if a non-zero integer x is an exact power-of-two; i.e., if x = 2n for some n
then the expression should evaluate to a non-zero value, otherwise it evaluates to zero.

Q144. Imagine you are writing a C program that includes a variable called x. If x has the type char and a current
value of 127, what is the new value after

a decrementing (i.e., subtracting 1 from it), or


b incrementing (i.e., adding 1 to it)

the variable?

Q145. Imagine x represents a two’s-complement, signed integer using 4 bits; xi denotes the i-th bit of x. Write a
human-readable description (i.e., the meaning) of what the Boolean function

f (x) = ¬x3 ∧ (x2 ∨ x1 ∨ x0 )

computes arithmetically.

Q146. Given an n-bit input x, draw a block diagram of an efficient (i.e., with a short critical path) combinatorial circuit
that can compute r = 7 · x (i.e., multiply x by the constant 7). Take care to label each component, and the size
(in bits) of each input and output.

Q147. Let xi and yi denote the i-th bit of two unsigned, 2-bit integers x and y (meaning that 0 ≤ i < 2). Design a
(2 × 2)-bit combinatorial multiplier circuit that can compute the 4-bit product r = x · y.

Q148. a Comparison operations for a given processor take two 16-bit operands and return zero if the comparison
is false or non-zero if it is true. By constructing some of the comparisons using combinations of other
operations, show that implementing all of =, ,, <, ≤, > and ≥ is wasteful. State the smallest set of
comparisons that need dedicated hardware such that all the standard comparisons can be executed.
b The ALU in the same processor design does not include a multiply instruction. So that programmers can
still multiply numbers, write an efficient C function to multiply two 16-bit inputs together and return the
16-bit lower half of the result. You can assume the inputs are always positive.
c The population count or Hamming weight of x, denoted by HW(x) say, is the number of bits in the binary
expansion of x that equal one. Some processors have a dedicated instruction to do this but the proposed
one does not; write an efficient C function to compute the population count of 16-bit inputs.

Q149. Imagine we want to compute the result of multiplying two n-bit numbers x and y together, i.e., r = x · y, where
n is even. One can adopt a divide-and-conquer approach to this computation by splitting x and y into two
parts each of size n/2 bits
x = x1 · 2n/2 + x0
y = y1 · 2n/2 + y0
and then computing the full result
r = r2 · 2n + r1 · 2n/2 + r0
via the parts
r2 = x1 · y1
r1 = x1 · y0 + x0 · y1
r0 = x0 · y0 .
The naive approach above uses four multiplications of (n/2)-bit values. The Karatsuba-Ofman method reduces
this to three multiplications (and some extra low-cost operations); show how this is achieved.

git # 8b6da880 @ 2023-09-27 244


© Daniel Page ⟨dan@phoo.org⟩

Q150. Assume that unsigned integers are represented in 4 bits.

a What is the result of using a normal 4-bit adder circuit to compute the sum 10 + 12?

b A saturating (or clamped) adder is such that if an overflow occurs, i.e., the result does not fit into 4 bits,
the highest possible result is returned instead. With a clamped 4-bit addition denoted by ⊎, we have that
10 ⊎ 12 = 15 for example. In general, for an n-bit clamped adder

x+y if x + y < 2n
(
x⊎y= n
2 − 1 otherwise

Design a circuit that implements a 4-bit adder of this type.

Q151. A software application needs 8-bit, unsigned modular multiplication, i.e., it needs to compute

x·y (mod N)

which is the same as


t − (N · ⌊t/N⌋)
where t = x · y. You have been asked to extend an existing ALU to support this operation. The high cost of
a dedicated circuit for division rules out that option; using standard building blocks (e.g., adder, multiplexer)
rather than individual logic gates, draw a block diagram of an alternative solution.

A.4 Chapter 4
Q152. From the following list
A: the design of a DRAM cell includes more transistors than an SRAM cell
B: an SRAM cell can store more information than a DRAM cell
C: SRAM cells can be accessed more quickly than DRAM cells
D: DRAM cells require a mechanism to refresh their content
identify each statement that correctly describes SRAM and DRAM cells.

Q153. Figure A.11 illustrates the design of a DRAM memory. The labels on four components in the block diagram
have been blanked-out, then replaced with the symbols α, β, γ, and δ: which of the following mappings

α 7→ row address buffer





 β 7→ row address decoder


A 


 γ 7→ column address buffer
 δ 7→ column address decoder


α 7→ row address buffer



 β 7→ column address buffer



B 
 γ 7→ row address decoder
 δ 7→ column address decoder


α 7→ column address buffer



 β 7→ row address buffer



C 
 γ 7→ column address decoder
 δ 7→ row address decoder


α 7→ row address decoder



 β 7→ column address decoder



D 
 γ 7→ row address buffer
 δ 7→ column address buffer


α 7→ column address decoder



 β 7→ row address decoder



E 
 γ 7→ column address buffer
 δ 7→ row address buffer

do you think is correct?

git # 8b6da880 @ 2023-09-27 245


operation. quence. The CBR REFRESH cycle will invoke the internal
Returning RAS\ and CAS\ HIGH terminates a memory cycle refresh counter for automatic RAS\ addressing.
and decreases chip current to a reduced standby level. Also,
the chip is preconditioned for the next cycle during the RAS\
© Daniel Page ⟨dan@phoo.org⟩
FUNCTIONAL BLOCK DIAGRAM
FAST PAGE MODE
4
WE\ DATA IN DQ1
CAS\ BUFFER DQ2
*EARLY-WRITE DQ3
DETECTION CIRCUIT DATA OUT 4 DQ4
BUFFER
4

NO. 2 CLOCK
GENERATOR
OE\

COLUMN
10
β
ADDRESS 10
COLUMN
DECODERδ Vcc
A0 BUFFER 4 Vss
A1 1024

A2 SENSE AMPLIFIERS
REFRESH I/O GATING
A3 CONTROLLER
A4 1024 x 4
A5
A6
REFRESH
A7
COUNTER
A8

DECODER
A9 10

ROW
MEMORY
α
1024
10
ROW ADDRESS ARRAY
10
BUFFERS (10)

NO. 1 CLOCK
RAS\
GENERATOR

NOTE: WE\ LOW prior to CAS\ LOW, EW detection circuit output is a HIGH (EARLY-WRITE)
CAS\ LOW
Figure A.11: A 4Mbit prior to WE\
DRAM LOW,
block EW detection
diagram circuit output
(source: is a LOW (LATE-WRITE)
http://www.micross.com/pdf/MT4C4001J.pdf ).
MT4C4001J Micross Components reserves the right to change products or specifications without notice.
Rev. 2.3 03/10
2

Q154. Identify each statement which is correct:


A: SRAM-based memories typically have a lower access latency than DRAM-based alternatives
B: SRAM-based memories typically have a lower density than DRAM-based alternatives
C: a memory that uses big-endian byte ordering will have a higher access latency than one using a little-endian
order
D: a Harvard-style organisation means both instructions and data are stored in the same memory

Q155. Consider a DRAM-based memory device with a capacity of 65536 addressable bytes. Of the following options
A: 8 address pins, 65536 cells
B: 16 address pins, 65536 cells
C: 8 address pins, 524288 cells
D: 16 address pins, 524288 cells
E: none of the above
which offers the most likely description of said device?

Q156. Consider an SRAM-based memory device, which has a 4-bit data bus and 12-bit address bus. If your goal is to
construct a 32KiB memory and only have such devices available, how many will you need?
A: 1
B: 2
C: 4
D: 8
E: 16

Q157. Imagine you are using an 1kB, byte-addressable memory within some larger system. In doing so, you make
a mistake which means the 4-th address wire A4 is not correctly connected: it therefore has the fixed value
A4 = 0. Which of the following options
A: 1
B: 4

git # 8b6da880 @ 2023-09-27 246


© Daniel Page ⟨dan@phoo.org⟩

address
A
decoder
en0 en1 en2 en3

CPU MEM0 MEM1 MEM2 MEM3


address
bus
D
data
bus

Figure A.12: A diagrammatic description of an 8-bit micro-processor and associated memory system.

C: 256
D: 512
E: 1024
reflects the number of addresses now accessible if the memory is
a an SRAM, or
b a DRAM.

Q158. Consider an 8-bit micro-processor, connected to a memory system via an 18-bit address bus: let A denote said
bus, such that Ai for 0 ≤ i < 18 is the i-th bit. The memory system is comprised of 4 separate memory devices
(either RAMs or ROMs) denoted MEM0 , MEM1 , MEM2 , and MEM3 . An address decoder maps addresses to
memory devices by controlling a set of associated enable (or chip select) signals, i.e., en0 , en1 , en2 , and en3 .
Figure A.12 offers a diagrammatic version of the same description, noting various extraneous control signals
are omitted for clarity.
If the enable signals are
en0 = ¬A17 ∧ ¬A16 ∧ ¬A15
en1 = ¬A17 ∧ ¬A16 ∧ A15 ∧ ¬A14
en2 = ¬A17 ∧ A16
en3 = A17 ∧ A16 ∧ A15 ∧ A14 ∧ A13 ∧ A12 ∧ A11 ∧ A10 ∧ A9 ∧ A8 ∧ A7 ∧ A6 ∧ A5
which memory device is address A = 48350 mapped to?
A: MEM0
B: MEM1
C: MEM2
D: MEM3

Q159. Draw a transistor-level circuit diagram describing a 6T SRAM memory cell.

Q160. Consider a 1Mbit SRAM memory device (i.e., housing a total of 106 SRAM memory cells, each holding a 1-bit
value), and a DRAM-based alternative with the same capacity: you are tasked with deciding which device to
use within some larger system. After reading the data sheets, it seems that
a the DRAM-based device might be harder to integrate into the system, and
b the SRAM-based device should have a lower access latency.
Briefly explain why each statement is accurate.

Q161. At a high level, a DRAM memory device could be described as an array (or matrix) of 1-bit cells with an
interface including a data pin, address pins and control pins (e.g., chip select, output and write enable, row
and column strobes). Carefully explain the purpose of
a row and column buffers, and
b row and column decoders
which represent components in such a device.

git # 8b6da880 @ 2023-09-27 247


© Daniel Page ⟨dan@phoo.org⟩

A.5 Chapter ??
Q162. Consider a Finite State Machine (FSM) whose concrete implementation is as follows:

D Q D Q
en en
¬Q ¬Q r
Φ2

D Q D Q
en en
¬Q ¬Q

Φ1

Notice that the implementation is based on use of four D-type latches, and a 2-phase clock supplied via Φ1 and
Φ2 ; one additional input x plus one output r are also evident. To function correctly, a clock generator ensures
Φ1 and Φ2 are driven as follows:

Φ2

Φ1

a From the following list

A: Φ1 and Φ2 are digital signals


B: Φ1 and Φ2 are non-overlapping
C: Φ1 and Φ2 are gated
D: Φ1 and Φ2 are unskewed
E: Φ1 and Φ2 each have a duty cycle of 33%

identify each property the clock generator must guarantee is true for the implementation to function
correctly.

b Consider the two D-type latches at the bottom of the diagram, which form a 2-bit register. Imagine
the value stored in this register is expressed as a 2-bit integer: when the implementation is initially
powered-on, is this value equal to

A: 00(2)
B: 01(2)
C: 10(2)
D: 11(2)
E: any of the above

c Any FSM specification will include a transition function, often denoted δ, which can be described in

git # 8b6da880 @ 2023-09-27 248


© Daniel Page ⟨dan@phoo.org⟩

either tabular or diagrammatic form. Of the following options

x=1




x=1


x=1 x=1 x=1




 start S0 S1 S2 S3 S4
x=0



A

x=0 x=0 x=0







x=0







x=1




x=1

x=1 x=1




 start S0 S1 S2 S3
B x=0




x=0 x=0

x=0






x=0




x=0

x=0 x=0




 start S0 S1 S2 S3
C x=1




x=1 x=1

x=1






x=1




x=1 x=1 x=1





 start S0 S1 S2 S3
D x=0 x=0 x=0





x=0









 x=1


start S0 S1 x=1 S2




E



x=0 x=1 x=0







which captures the transition function of this FSM?


d Which of the following FSM types, namely

A: Mealy
B: Moore

does this implementation represent?


e In the 2-phase clock waveform above, ρ illustrates the clock period: recall this is inversely proportional
to the clock frequency. Imagine the gate delay for NOT, AND, and 2- and 3-input OR gates are 10ns 20ns
20ns, 30ns respectively, and the critical path associated for a D-type latch is 60ns. Which of the following
best matches the maximum possible clock frequency of this implementation?

A: 1.0kHz
B: 5.9MHz
C: 9.0MHz
D: 9.5MHz
E: 1.0GHz

f Which of the following best describes the purpose of this FSM?

A: set r = 1 iff. the current value of x is different from the previous value of x,
B: act as a modulo 4 counter that is incremented by the value of x, and set r = 1 the current counter
value is zero,
C: compute the Hamming weight of a sequence fed as input bit-by-bit via x, and set r = 1 once this is
equal to 3
D: count the number of consecutive times x = 1, and set r = 1 once this is equal to 3

git # 8b6da880 @ 2023-09-27 249


© Daniel Page ⟨dan@phoo.org⟩

E: inspect the sequence fed as input bit-by-bit via x, and set r = 1 iff. this sequence, when interpreted
as an unsigned integer, is odd

Q163. Figure A.13 and Figure A.14, describe an FSM implementation and an associated waveform. When read left-
to-right, the waveform captures how values of Φ1 and Φ2 (a 2-phase clock), and rst (a reset signal) change over
time; the other input s maintains the value A6(16) throughout. Note that the waveform is annotated with some
instances and periods in time (e.g., ρ, and each ti ).

a What is the value of r at time t0 ?

A: 0
B: 1
C: undefined

b What is the value of r at time t1 ?

A: 0
B: 1
C: undefined

c What is the value of r at time t2 ?

A: 0
B: 1
C: undefined

d Consider the following NAND-based implementations

D-type latch 7 → Figure A.15


2-input XOR gate 7 → Figure A.16
2-input, 1-bit multiplexer → 7 Figure A.17

relating to components used within Figure ??. The waveform is annotated with ρ, which illustrates the
clock period. If a 2-input NAND gate imposes a gate delay of Tnand = 10ns, which value most closely
reflects the maximum possible clock frequency?

A: 1.0MHz
B: 1.2GHz
C: 3.8MHz
D: 5.9MHz
E: 6.6MHz

Q164. Consider the design as shown in Figure A.18, which implements a simple Finite State Machine (FSM) using
D-type latches and a 2-phase clock. Note that the r output reflects whether the FSM is in an accepting state, the
rst input resets the FSM into the start state, and the Xi input drives transitions between states: the idea is that
the i-th element of a sequence
X = ⟨X0 , X1 , . . . , Xn−1 ⟩
is provided as input, via Xi , in the i-th step. Assuming the entirety of X is consumed, which of the following
A: r = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1
B: r = X0 ∧ X1 ∧ · · · ∧ Xn−1
C: r = X0 ∧ X1 ∧ · · · ∧ Xn−1
D: r = X0 ∨ X1 ∨ · · · ∨ Xn−1
E: r = X0 ∨ X1 ∨ · · · ∨ Xn−1
best describes the output from, or functionality of the FSM?

git # 8b6da880 @ 2023-09-27 250


© Daniel Page ⟨dan@phoo.org⟩

r
Q

¬Q

¬Q
en

en
D

D
y c

s7
r
x

¬Q

¬Q
en

en
D

D
y c

s6
r
x
Q

¬Q

¬Q
en

en
D

D
y c
s5

r
x
Q

¬Q

¬Q
en

en
D

D
y c
s4

r
x
Q

¬Q

¬Q
en

en
D

y c
s3

r
x
Q

¬Q

¬Q
en

en
D

y c
s2

r
x
Q

¬Q

¬Q
en

en
D

y c
s1

r
x
Q

¬Q

¬Q
en

en
D

y c
s0

r
x
rst
Φ2

Φ1

Figure A.13: An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and rst on the left-hand side; 8-bit s spread
within the design) and 1 output (1-bit r on the right-hand side).

git # 8b6da880 @ 2023-09-27 251


© Daniel Page ⟨dan@phoo.org⟩

Φ2

Φ1

rst

ρ
t0 t1 t2

Figure A.14: A waveform describing behaviour of Φ1 , Φ2 , and rst within Figure A.13.

D S0
Q

en

¬Q
R0
Figure A.15: A NAND-based implementation of a D-type latch.

Figure A.16: A NAND-based implementation of a 2-input XOR gate.

r
y

c
Figure A.17: A NAND-based implementation of a 2-input, 1-bit multipliexer.

git # 8b6da880 @ 2023-09-27 252


© Daniel Page ⟨dan@phoo.org⟩

D Q
ϕ2 en
¬Q

rst

D Q
ϕ1 en
¬Q

r Xi

Figure A.18: Implementation of a simple FSM, using D-type latches and a 2-phase clock.

Q165. Consider the design as shown in Figure A.19, which implements a simple Finite State Machine (FSM) using
D-type latches and a 2-phase clock; note that it includes one output labelled r, and one input labelled x. Which
of the following options
A: 2
B: 3
C: 5
D: 6
E: 9
reflects

a the number of gates involved in the output function implementation?

b the number of gates involved in the transition function implementation?

Q166. The parity function f accepts an n-bit sequence X as input, and yields f (X) = 1 iff. X has an odd number
of elements equal to 1. If f (X) = 1 (resp. f (X) = 0), we say the parity of X is odd (resp. even). Using a
combinatorial circuit, one can compute this as

f (X) = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1

since XOR can be thought of as addition modulo two. However, how could we design a Finite State Machine
(FSM) to compute f (X) when supplied with X one element at a time? Explain step-by-step how you would
solve this challenge: start with a high-level design for any FSM then fill in detail required for this FSM. Are there
any features or requirements you can add to this basic description so the FSM is deemed “better” somehow?

Q167. Imagine you are asked to to build a simple DNA matching hardware circuit as part of a research project. The
circuit will be given DNA strings which are sequences of tokens that represent chemical building blocks. The
goal is to search a large input sequence of DNA tokens for a small sequence indicative of some feature.
The circuit will receive one token per clock cycle as input; the possible tokens are adenine (A), cytosine
(C), guanine (G) and thymine (T). The circuit should, given the input sequence, set an output flag to 1 when
the matching sequence ACT is found somewhere in the input or 0 otherwise. You can assume the inputs are
infinitely long, i.e., the circuit should just keep searching forever and set the flag when the match is a success.

a Design a circuit to perform the required task, show all your working and explain any design decisions
you make.

git # 8b6da880 @ 2023-09-27 253


© Daniel Page ⟨dan@phoo.org⟩

D Q D Q
Q′ Q′
0 1
en en
¬Q ¬Q
r
ϕ2

D Q D Q
Q0 Q1
en en
¬Q ¬Q

ϕ1

Figure A.19: Implementation of a simple FSM, using D-type latches and a 2-phase clock.

b Now imagine you are asked to build two new matching circuits which should detect the sequences CAG
and TTT respectively. It is proposed that instead of having three separate circuits, they are combined into a
single circuit that matches the input sequence against one matching sequence selected with an additional
input. Describe one advantage and one disadvantage you can think of for the two implementation
options.

Q168. A revolutionary, ecologically sound washing machine is under development by your company. When turned
on, the machine starts in the idle state awaiting input. The washing cycle consists of the three stages: fill (when
it fills with water), wash (when the wash occurs), spin (when spin dying occurs); the machine then returns to
idle when it is finished. Two buttons control the machine: pressing B0 starts the washing cycle, pressing B1
cancels the washing cycle at any stage and returns the machine to idle; if both buttons are pressed at the same
time, the machine continues as normal as if neither were pressed.

a You are asked to design a circuit to control the washing machine. Draw a diagram illustrating states the
washing machine can be in, and valid transitions between them.

b Translate your diagram from above into a corresponding, tabular description of the transition function.

c Using an appropriate technique, derive Boolean expressions which allow computation of the transition
function; note that because the washing machine is ecologically sound, minimising the overall gate count
is important.

Q169. Recall that an n-bit Gray code is a cyclic, 2n -element sequence S where each i-th element Si is itself an n-element
binary sequence, and the Hamming distance between adjacent elements is one, i.e.,

HD(Si , Si−1 (mod 2n ) ) = HD(Si , Si+1 (mod 2n ) ) = 1.

a Using an expression (rather than words), define

i HW(X), the Hamming weight of a binary sequence X, and


ii HD(X, Y), the Hamming distance between binary sequences X and Y.

b Consider a D-type flip-flop, capable of storing a 1-bit value, realised using CMOS-based transistors
arranged into logic gates. Using a gate-level circuit diagram, describe the design of such a component
(clearly explaining the purpose of each part).

git # 8b6da880 @ 2023-09-27 254


© Daniel Page ⟨dan@phoo.org⟩

c Imagine successive elements of a 3-bit Gray code sequence are stored, one after another, in a register
realised using flip-flops of the type described above. The fact only one bit changes each time the register
is updated could be viewed as advantageous: explain why.
d Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for example
δ, ω and any input and output; clearly explain the purpose of each component in the framework.
e Using the framework outlined above, design a concrete FSM which has

• two 1-bit inputs rst and clk, and


• one 3-bit output r.

and whose behaviour is as follows: at each positive edge of the clock signal clk, if rst = 0 then r should be
updated with the next element of a 3-bit Gray code, otherwise r should be reset to the first element.
Note that your answer should provide enough detail to fully specify each component in the framework
(e.g., Boolean expressions for δ).

Q170. An electronic security system, designed to prevent unauthorised use of a door, is attached to a mains electricity
supply. The system has the following components:

• Three buttons, say Bi for 0 ≤ i < 3, whose value is initially 0; when pressed, a button remains pressed and
the value changes to 1.
• A door handle modelled by
(
1 when the handle is turned
H=
0 when the handle is unturned

• A lock mechanism modelled by


(
1 when the door is locked
L=
0 when the door is unlocked

If the door handle is turned after the order of button presses matches a 3-element password sequence P, the
door should be unlocked; if there is a mismatch, it should remain locked. The mechanism is reset (and all
buttons released) whenever the handle is turned (whether or not the door is unlocked). If P = ⟨B1 , B0 , B2 ⟩, then
for example

• B1 then B0 then B2 is pressed, then the handle is turned, the door is unlocked, i.e., L is set to 0, and the
mechanism is reset,
• B0 then B1 then B2 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and the
mechanism is reset,
• B1 then B0 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and the
mechanism is reset.

a Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for example
the transision and output functions (i.e., δ and ω) and any input and output; clearly explain the purpose
of each component in the framework.
b Imagine the password is fixed to P = ⟨B2 , B0 , B1 ⟩. Using the framework outlined above, design a concrete
FSM which can be used to control the security system as required.
Note that your answer should provide enough detail to fully specify each component in the framework
(e.g., Boolean expressions for the transision function).
c After inspecting your design, someone claims they can avoid the need for a clock signal: explain how
this is possible.
d The same person suggests an alternative approach whereby P is not fixed, but rather stored in an SRAM
memory device. Although this approach could be more useful, explain one reason it could be viewed as
disadvantageous.
e Before being sold, each physical system needs to be tested to ensure it functions as advertised. Explain a
suitable testing strategy for your design, and any alterations required to facilitate it.

git # 8b6da880 @ 2023-09-27 255


© Daniel Page ⟨dan@phoo.org⟩

Q171. Imagine you are John Connor in the film Terminator II: your aim is to design a device that guesses ATM (or
cash machine) Personal Identification Numbers (PINs) using brute-force search. The ATM uses 4-digit decimal
PINs, examples being 1234 and 9876. The device stores a current PIN denoted P: it performs each guess in
sequence by first checking whether P is correct, then incrementing P ready for the next step. The process
concludes when P is deemed correct.

a Two potential representations for the PIN are suggested:

a decimal representation in which the PIN is stored as a sequence of four unsigned integers, i.e., P =
⟨P0 , P1 , P2 , P3 ⟩, with each 0 ≤ Pi < 10, or
a binary representation in which the PIN is stored as a single unsigned integer, i.e., P, with 0 ≤ P < 10000.

State one advantage of each option, and explain which you think is more appropriate.

b A combinatorial component within the device should take the current PIN P as input, and produce two
outputs:

• the guess sent to the ATM, i.e., G = ⟨G0 , G1 , G2 , G3 ⟩, where each 0 ≤ Gi < 10 is the i-th decimal digit
of the current PIN, and
• the incremented PIN P′ ready for the next guess.

Produce a design for this component; include a block diagram and enough detail to fully specify how a
gate-level implementation could be performed.

c The device is controlled by a simple Finite State Machine (FSM) which can be described diagrammatically:

b=0 ϵ

b=1 ϵ ϵ r=1
start S0 S1 S2 S3 S4

r=0

In a more explanatory form, the idea is as follows:

• The device starts in state S0 , in which P is initialised; once the start button b is pressed, it moves into
state S1 .
• In state S1 , P is driven as input into combinatorial component and the device moves into state S2 .
• In state S2 , G is sent to the ATM and P′ is latched to form the new value of P; the device moves into
state S3 .
• In state S3 the device checks the ATM response r. If r = 1 then G was the correct guess and the device
moves into state S4 where it halts (i.e., remains in S4 ); otherwise, the device moves into state S1 and
the process repeats.

Focusing on the diagram above only, produce a design for the FSM; include a block diagram, and enough
detail to fully specify how a gate-level implementation could be performed.

Q172. A given counter machine has r = 4 registers, and supports an instruction set detailed in Figure A.20. Consider
two configurations of this counter machine:

a the program, held in memory as machine code, is fixed to

MEM = ⟨ 0A3(16) , 060(16) , 080(16) , 097(16) ,


050(16) , 020(16) , 083(16) , 0C0(16) ⟩

and the initial configuration is

C0 = (l = 0, v0 = 0, v1 = 2, v2 = 1, v3 = 0),

git # 8b6da880 @ 2023-09-27 256


© Daniel Page ⟨dan@phoo.org⟩

8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr + 1 then goto Li+1 7→ 000 addr 0000


8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr − 1 then goto Li+1 7→ 001 addr 0000


8 7 6 5 4 3 2 1 0

Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0

Li : halt 7→ 011 00 0000

Figure A.20: The instruction set for an example 4-register counter machine.

b the program, held in memory as machine code, is fixed to

MEM = ⟨ 0B3(16) , 070(16) , 080(16) , 097(16) ,


030(16) , 050(16) , 083(16) , 0AB(16) ,
030(16) , 060(16) , 087(16) , 0C0(16) ⟩

and the initial configuration is

C0 = (l = 0, v0 = 0, v1 = 3, v2 = 2, v3 = 1).

For each configuration, a) produce a trace of execution, then b) decide which of the following options
A: Compare the values in R1 and R2 , setting R3 to reflect the result
B: Add the values in R1 and R2 , setting R3 to reflect the result
C: Swap the values in R1 and R2
D: Copy the value in R1 into R2 , retaining the value in R1
E: Copy the value in R1 into R2 , clearing the value in R1
is the best description of what the associated program does.

Q173. Figure A.20 describes the instruction set of an example 4-register counter machine. Consider some i-th encoded,
machine code instruction 0A5(16) expressed in hexadecimal. Which of the following
A: halt computation
B: if register 2 equals 0 then goto instruction 5, else goto instruction i + 1
C: if register 10 equals 0 then goto instruction 5, else goto instruction i + 1
D: increment register 2, then goto instruction i + 1
E: decrement register 10, then goto instruction i + 1
best describes the instruction semantics?

Q174. Figure A.21 outlines, at a high level, a 4-register counter machine implementation; Figure A.22 completes said
implementation, detailing internals of the decoder component. Note that the multiplexer inputs should be
read left-to-right, and use zero-based indexing. Using the left-most multiplexer in the decoder as an example,
if the 3-bit control-signal derived from inst is 001(2) = 1(10) then the 1-st input is selected; this means the output
is 2(10) . Which of the following
A: Li : if R3 = 0 then goto L9 else goto Li+1
B: Li : if R3 + 1 = 0 then goto L9 else goto Li+1
C: Li : R3 ← R3 + 1 then goto Li+1
D: Li : R3 ← 0 then goto Li+1
E: None of the above
describes the semantics of a machine code instruction 100111001(2) for this counter machine?

A.6 Chapter ??
Q175. Write the following as Verilog declarations:

a An 8-bit little-endian wire vector called a.

git # 8b6da880 @ 2023-09-27 257


Q Q
R0 PC0
rst D en rst D en
rst rst
¬halt ∧ Φ2 ¬halt ∧ Φ1

git # 8b6da880 @ 2023-09-27


op jmp ∧ cmp
© Daniel Page ⟨dan@phoo.org⟩

target
op
wr
? addr
0 +1 −1 =0 cmp +1 decoder target
jmp
halt

inst

addr

Q Q Q Q Q addr
R0 R1 Rr−1 PC MEM IR
rst D en rst D en rst D en rst D en rst D en data

rst rst

¬halt ∧ Φ1 ¬halt ∧ Φ2

addr

Figure A.21: The high-level data- and control-path for an example 4-register counter machine.
wr

258
© Daniel Page ⟨dan@phoo.org⟩

halt
jmp

merge
target

inst3,...,0
merge

inst5,...,4
addr

merge

inst8,...,6
11001000
wr

merge

inst8,...,6
12000000
op

inst8
inst7
inst6
inst5
inst4
inst3
inst2
inst1
inst0

Figure A.22: The low-level decoder implementation for an example 4-register counter machine.

git # 8b6da880 @ 2023-09-27 259


© Daniel Page ⟨dan@phoo.org⟩

b A 5-bit big-endian wire vector called b.

c A 32-bit register called c.

d A signed 16-bit register called d.

e A memory of 1024 elements, each 8 bits in size, called e.

f A generate variable called f.

Q176. Given the declarations:

3 : 0 ] a;
3 : 0 ] b;
1 : 0 ] c;
3 : 0 ] d;
e;

a = 4'b1101;
b = 4'b01XX;

what are the values resulting from the following assignments:

a assign c = a[ 1 : 0 ];

b assign c = a[ 3 : 2 ];

c assign d = a & b;

d assign d = a ^ b;

e assign d = { a[3:2], a[1:0] };

f assign d = { a[1:0], a[3:2] };

g assign c = { 2{ b[ 1 ] } };

h assign c = { 2{ b[ 2 ] } };

i assign e = &a;

j assign e = ^a;

Q177. a Consider the following Verilog processes for appropriately defined 1-bit wires a, b, x, y, p and q:

always @ ( posedge p ) begin


x <= a;
y <= b;
end

always @ ( posedge q ) begin


x <= b;
y <= a;
end

Given that p and q are independent and may change at any time, write down one potential problem with
this design and outline one potential solution.

b Consider the following Verilog process, for appropriately defined 1-bit wire clk and 2-bit wire vector
state, which implements a state machine with three states:

always @ ( posedge clk ) begin


case( state )
0 : begin do_0; state = 1; end
1 : begin do_1; state = 2; end
2 : begin do_2; state = 0; end
endcase
end

git # 8b6da880 @ 2023-09-27 260


© Daniel Page ⟨dan@phoo.org⟩

If this process constitutes the entirety of the design, write down one potential problem with it and outline
one potential solution.

Q178. A Decimal Digit Detector (DDD) is a device that accepts a 1-bit input at each positive clock edge, and waits
until four such bits have been received. At this point, it sets a 1-bit output signal to true if the 4-bit accumulated
input (interpreted in little-endian form) is a valid Binary Coded Decimal (BCD) digit and false if it is not; it
then repeats the process for the next set of four input bits.
Design a Verilog module to model the DDD device; your design should incorporate a reset signal that is
able to initialise the DDD.

Q179. A comparator C is a function which takes two unsigned n-bit integers x and y as input and produces max(x, y)
and min(x, y) as outputs. One can think of C as sorting the 2-element sequence (x, y) into the resulting sequence
(min(x, y), max(x, y)). Design a Verilog module to model a single comparator for n-bit numbers.

Q180. An n-bit shift register Q is a register (say n D-type flip-flops) whose content is right-shifted on each positive of
a shared clock edge. This means if Qi refers to the i-th bit of Q,

• Q0 is discarded (i.e., shifted out of the register),


• every Qi for 0 ≤ i < n − 1 is set to Qi+1 , and
• Qn−1 is replaced by some new value (i.e., shifted in to the register).

A Linear Feedback Shift Register (LFSR) is a type of pseudo-random number generator based on this compo-
nent: at each positive clock edge

• Q0 is used as a 1-bit pseudo-random output from the LFSR, and


• Qn−1 is replaced by a bit dictated by other bits in Q, which are XOR’ed together according to a tap
sequence.

For example, if the tap sequence is T = ⟨0, 3, 4, 5, 7⟩ then the new bit that replaces Qn−1 is
M
t= Qt = Q0 ⊕ Q3 ⊕ Q4 ⊕ Q5 ⊕ Q7 .
t∈T

Design a Verilog module to model an 8-bit LFSR with the tap sequence above; your design should incorporate
a reset signal which initialises the LFSR with a seed value given as input.

git # 8b6da880 @ 2023-09-27 261


© Daniel Page ⟨dan@phoo.org⟩

git # 8b6da880 @ 2023-09-27 262


© Daniel Page ⟨dan@phoo.org⟩

APPENDIX

B
EXAMPLE EXAM-STYLE SOLUTIONS

B.1 Chapter 1
S1. The question essentially just demands application of

i<n
X
x̂ 7→ xi · bi ,
i=0

which defines a mapping between the representation (on the LHS) and value (on the RHS) of x. In this case,
setting b = 3 allows computation of a (decimal) value for each representation (i.e., literal); we need to select the
one whose value turns out to be 123(10) . As such, it should be clear that

x̂ = 11120 7→ ⟨0, 2, 1, 1, 1⟩(3)


Pi<n i
7→ i=0 xi · 3
7→ 0 · 3 + 2 · 31 + 1 · 32 + 1 · 33 + 1 · 34
0

7→ 0 · 1 + 2 · 3 + 1 · 9 + 1 · 27 + 1 · 81
7→ 123(10)

st. 11120 is the correct answer.

S2. It should be clear that


x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
= ⟨1, 1, 0, 0, 1, 1, 0, 0⟩
−xn−1 · 2n−1 + n−2 i
P
7 → i=0 xi · 2
7→ 2 + 2 + 2 + 2
0 1 4 5

7→ 1 + 2 + 16 + 32
7→ 51(10)

ŷ = ⟨y0 , y1 , . . . , yn−1 ⟩
= ⟨1, 1, 0, 0, 1, 1, 0, 0⟩
−1 yn−1 · n−2 i
P
7 → i=0 yi · 2
7 → 1 · (2 + 2 + 2 + 25 )
0 1 4

7→ 1 + 2 + 16 + 32
7→ 51(10)

i.e., both x and y are represented by the binary literal 00110011, which yields the decimal value 51(10) in both

git # 8b6da880 @ 2023-09-27 263


© Daniel Page ⟨dan@phoo.org⟩

cases. However, if we set the MSB of x̂ and ŷ to 1, then we get

xˆ′ = ⟨x′0 , x′1 , . . . , x′n−1 ⟩


= ⟨1, 1, 0, 0, 1, 1, 0, 1⟩
−x′n−1 · 2n−1 + n−2 ′ i
P
7 → i=0 xi · 2
7→ 2 + 2 + 2 + 2 − 2
0 1 4 5 7

7→ 1 + 2 + 16 + 32 − 128
7→ −77(10)

ŷ′ = ⟨y′0 , y′1 , . . . , y′n−1 ⟩


= ⟨1, 1, 0, 0, 1, 1, 0, 1⟩

−1 yn−1 · n−2 ′ i
P
7 → i=0 yi · 2
7→ −1 · (2 + 2 + 2 + 25 )
0 1 4

7→ −1 − 2 − 16 − 32
7→ −51(10)
so −77 and −51 is the correct answer.

S3. In 16 bits, the largest possible positive value we can represent using two’s-complement is

2n−1 − 1 = 216−1 − 1 = 32767

and the largest possible negative value is

−2n−1 = −216−1 = −32768

Using this, we can deduce the following:

a The largest possible positive product is given by

x · y = −32768 · −32768 = 1073741824

which is somewhat counter-intuitive given both operands are negative, but stems from the fact that in
two’s-complement more negative values can be represented than positive.
b The largest possible negative product is given by

x · y = −32768 · 32767 = −1073709056

or
x · y = 32767 · −32768 = −1073709056.

S4. First, note that


x = 256(10) = 0000000100000000(2)
y = 4852(10) = 0001001011110100(2)
Casting from short to char basically truncates the value from 16 to 8 bits (i.e., leaves the 8 LSBs only), meaning

( char )( x ) = 00000000(2) = 0(10)


( char )( y ) = 11110100(2) = −12(10)
st. 0 and −12 is the correct answer.

S5. Given
x = 9(10)
= 00001001(2)
0x97 = 97(16)
= 10010111(2)
the expression ( ~x << 4 ) | 0x97 evaluates to give

( ~x ) = 11110110(2)
( ~x << 4 ) = 01100000(2)
( ~x << 4 ) | 0x97 = 11110111(2)
= −9(10)
st. r = −9 is the correct answer.

git # 8b6da880 @ 2023-09-27 264


© Daniel Page ⟨dan@phoo.org⟩

S6. For a signed, n-bit integers represented using two’s-complement we know that

−2n−1 ≤ x ≤ 2n−1 − 1

so for the 8-bit examples in this case


−128 ≤ x ≤ 127.
Of the 256 possible values of x, 128 are negative (i.e., < 0) and 128 are positive (i.e., ≥ 0). Clearly all positive
values are fixed points: for these, the if statement causes the assignment r = x to be executed. So the question
is whether or not any of the negative values are fixed points?
At first glance, this would seem impossible so none is an understandable response. However, recall that
negation in two’s-complement is achieved via

r = −x = ¬x + 1.

Now consider what happens if we negate x = −128:

−x = ¬x +1
= ¬ 10000000(2) + 1
= 01111111(2) + 1
= 10000000(2)
= x

This means x = −128 is a fixed point, i.e., that abs fails to work correctly for this value (since we cannot
represent 128 in 8 bits using two’s-complement). As such, 129 is the correct answer: the 128 positive values,
plus the 1 negative value from above.

S7. First, note that


x = −10(10)
7→ −27 + 26 + 25 + 24 + 22 + 21
7→ 11110110
Next, since x is signed (implying an arithmetic vs. logical right-shift), we can evaluate the expression as follows

~( ( x >> 2 ) ^ 0xF4 ) 7→ ¬((11110110 ≫ 2) ⊕ 11110100)


7 → ¬(11111101 ⊕ 11110100)
7→ ¬(00001001)
7→ 11110110
7→ −27 + 26 + 25 + 24 + 22 + 21
7→ −10(10)

S8. We know that


0 ≤ x, y < 28 = 256
due to their type and representation, so there are 256 · 256 = 65536 possible assignments to them variables, i.e.,
pairs
(x, y)
to consider. Only the case where their sum x + y is zero will yield a Hamming weight of zero: the repersentation
of a non-zero sum will have at least one bit in it equal to one, and hence a Hamming weight of greater than
zero.
The obvious initial answer would be that the assignment x = 0 and y = 0 is the only case where x + y = 0
and hence HW(x + y) = 0. However, others exist due to the effect of overflow: x = 255 and y = 1 should yield
x + y = 256, for example, but, due to overflow (i.e., the fact we we cannot represent 256 as an unsigned, 8-bit
integer), actually yields x + y = 0 and hence HW(x + y) = 0. Applying this principle more generally, the
correct answer is that 256 pairs will yield x + y = 0 and hence HW(x + y) = 0: put simply, every possible x
has exactly one y that will yield the sum x + y = 0.

S9. The central idea in a positional number system is


n−1
X
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩ 7→ xi · bi ,
i=0

git # 8b6da880 @ 2023-09-27 265


© Daniel Page ⟨dan@phoo.org⟩

i.e., each i-th digit in the (left-hand side) literal is weighted by bi and accumulated to yield the (right-hand side)
represented value. In this case the literal has n = 2 digits, so, given x0 = 0 and x1 = 1, we can see that

x̂ = ⟨x0 , x1 ⟩ 7→ x0 · b0 + x1 · b1 = 0 · 1 + 1 · b = 0 + b = b.

This fact holds for any b, and of course if we knew b one of the other options might be correct (e.g., if b = 10 then
x = 10 would also be correct).

S10. First, note that in two’s-complement representation

0(10) 7→ 00000000
−1(10) 7→ 11111111

yield n-bit “all 0” and “all 1” sequences for any n. Now consider an arbitrary x, and each option stemming
from it:
x = 01101010 7→ 106(10)
¬x = 10010101 7→ −107(10)

x ∧ ¬x = 00000000 7→ 0(10)
x ∨ ¬x = 11111111 7→ −1(10)
x ⊕ ¬x = 11111111 7→ −1(10)
x + ¬x = 11111111 7→ −1(10)
x − ¬x = 11010101 →
7 −43(10)
Note that if the i-th bit of x is 0 then that of ¬x will be 1; if the i-th bit of x is 1 then that of ¬x will be 0. Based
on this, the AND case will always yield the “all 0” sequence, i.e., 0, because the per-bit computation will be
0 ∧ 1 or 0 ∧ 1 (both yielding 0). Likewise, the OR case will always yield the “all 1” sequence, i.e., −1, because
the per-bit computation will be 0 ∨ 1 or 0 ∨ 1 (both yielding 1); the same is true for both the XOR case and the
addition case, with the latter stemming from the absence of carries. In fact, all these cases will apply for any
n and x based on the same reasoning. So the subtraction case is incorrect: even the result is slightly confusing
in that this example overflows (the actual result 213(10) cannot be represented in 8 bits, at least when using
two’s-complement).

S11. a i |A| = 3.
ii A ∪ B = {1, 2, 3, 4, 5}.
iii A ∩ B = {3}.
iv A − B = {1, 2}.
v A = {4, 5, 6, 7, 8}.
vi {x | 2 · x ∈ U} = {1, 2, 3, 4}.
b i +0 in sign-magnitude is 00000000, in two’s-complement is 00000000.
ii −0 in sign-magnitude is 10000000, in two’s-complement is 00000000.
iii +72 in sign-magnitude is 01001000, in two’s-complement is 01001000.
iv −34 in sign-magnitude is 10100010, in two’s-complement is 11011110.
v −8 in sign-magnitude is 10001000, in two’s-complement is 11111000.
vi This is a trick question: one cannot represent 240 in 8-bit sign-magnitude or two’s-complement; the
incorrect guess of 11111000 in two’s-complement for example is actually −8.

S12. The population count or Hamming weight of x, denoted by H(x) say, is the number of bits in the binary
expansion of x that equal one. Using an unsigned 32-bit integer x for example, an implementation might be
written as follows:

int H( uint32_t x ) {
int t = 0;

for( int i = 0; i < 32; i++ ) {


if( ( x >> i ) & 1 ) {
t = t + 1;
}
}

return t;
}

git # 8b6da880 @ 2023-09-27 266


© Daniel Page ⟨dan@phoo.org⟩

S13. Writing
t0 = (x ∧ y) ⊕ z
t1 = (¬x ∨ y) ⊕ z
t2 = (x ∨ ¬y) ⊕ z
t3 = ¬(x ∨ y) ⊕ z
t4 = ¬¬(x ∨ y) ⊕ z
for brevity, we can write the following truth table:
x y z t0 t1 t2 t3 t4
0 0 0 0 1 1 1 0
0 0 1 1 0 0 0 1
0 1 0 0 1 0 0 1
0 1 1 1 0 1 1 0
1 0 0 0 0 1 0 1
1 0 1 1 1 0 1 0
1 1 0 1 1 1 0 1
1 1 1 0 0 0 1 0
Looking at the row where x = 0, y = 0 and z = 1, it is clear that t0 = 1 and t4 = 1, so (x ∧ y) ⊕ z and ¬¬(x ∨ y) ⊕ z
are the correct answers.

S14. You may be able to just spot which one is incorrect, but looking at each case exhaustively (via a truth table for
the LHS and RHS of the supposed equivalence), we see that
x y z (x ∧ y) ∧ z x ∧ (y ∧ z) x∨1 x x ∨ ¬x 1 ¬(x ∨ y) ¬x ∧ ¬y ¬¬x x
0 0 0 0 0 1 0 1 1 1 1 0 0
0 0 1 0 0 1 0 1 1 1 1 0 0
0 1 0 0 0 1 0 1 1 0 0 0 0
0 1 1 0 0 1 0 1 1 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 1 1
1 0 1 0 0 1 1 1 1 0 0 1 1
1 1 0 0 0 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1 0 0 1 1
and identify x ∨ 1 ≡ x as the incorrect case: it probably should be x ∨ 1 ≡ 1, or maybe x ∧ 1 ≡ x.

S15. By writing
t0 = x ∨ (z ∨ y)
t1 = ¬(¬y ∧ ¬z)
we can shorten the expression to
t2 = t0 ∧ t1 .
Then, you can see either by enumeration, i.e.,
x y z t0 t1 t2 y∨z
0 0 0 0 0 0 0
0 0 1 1 1 1 1
0 1 0 1 1 1 1
0 1 1 1 1 1 1
1 0 0 1 0 0 0
1 0 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 1 1 1
or via the derivation
(x ∨ (z ∨ y)) ∧ ¬(¬y ∧ ¬z)
= (x ∨ (z ∨ y)) ∧ (y ∨ z) (de Morgan)
= (x ∨ (y ∨ z)) ∧ (y ∨ z) (commutativity)
= (x ∨ (y ∨ z)) ∧ ((y ∨ z) ∨ 0) (identity)
= ((y ∨ z) ∨ x) ∧ ((y ∨ z) ∨ 0) (commutativity)
= (y ∨ z) ∨ (x ∧ 0) (distribution)
= (y ∨ z) ∨ (0) (null)
= (y ∨ z) (identity)
that the correct answer is y ∨ z.

git # 8b6da880 @ 2023-09-27 267


© Daniel Page ⟨dan@phoo.org⟩

S16. By writing
t0 = x∨y
t1 = x∧z
we can shorten the expression to
t2 = t0 ∨ t1 .
Then, you can see either by enumeration, i.e.,
x y z t0 t1 t2 x∨y
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 1 0 1 1
0 1 1 1 0 1 1
1 0 0 1 0 1 1
1 0 1 1 1 1 1
1 1 0 1 0 1 1
1 1 1 1 1 1 1
or via the derivation
(x ∨ y) ∨ (x ∧ z)
= (y ∨ x) ∨ (x ∧ z) (commutativity)
= y ∨ (x ∨ (x ∧ z)) (association)
= y∨x (absorption)
= x∨y (commutativity)
that the correct answer is x ∨ y.

S17. In the same way as NAND, we know NOR is functionally complete: this can be shown via
¬x ≡ x∨x
x∧y ≡ ¬x ∨ ¬y ≡ (x ∨ x) ∨ (y ∨ y)
x∨y ≡ (x ∨ y) ∨ (x ∨ y)
Then, since x ∨ y ≡ ¬(x ∨ y), clearly {∨, ¬} is functionally complete: this set can be rewritten directly as { ∨ }.
We can harness these facts to show that in fact all other options are functionally complete.
• Given the truth table
x y ⊕
0 0 0
0 1 1
1 0 1
1 1 0
we have ¬x ≡ x ⊕ 1, or, put another way, we can construct ¬ using ⊕. Overall then,
{⊕, ∨} { {¬, ∨},
i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.
• This option is somewhat more difficult: using the same strategy as above, we now need to construct both
¬ and ∨.
– Given the truth table
x y ≡ .
0 0 1 0
0 1 0 1
1 0 0 1
1 1 1 0
it should be clear that since x . y ≡ x ⊕ y, we have ¬x ≡ x . 1. Alternatively, given the truth table
x y ⇒
0 0 1
0 1 1
1 0 0
1 1 1
we have x ⇒ 0 ≡ ¬x.

git # 8b6da880 @ 2023-09-27 268


© Daniel Page ⟨dan@phoo.org⟩

– Given the axiom x ⇒ y ≡ ¬x ∨ y, we could write ¬x ⇒ y ≡ ¬(¬x) ∨ y ≡ x ∨ y. That is, based on


constructions for ¬ and ⇒ (the LHS) we can construct ∨ (the RHS).

Overall then,
{⇒, .} { {¬, ∨},

i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.

• Based on the above, it should be clear that

{⇒} { {¬, ∨}

because we can construct both ¬ and ∨ using it alone; in the above . was potentially redundant in fact,
which is also true of ⇏ here. Alternatively, given the truth table

x y ⇒ ⇏
0 0 1 0
0 1 1 0
1 0 0 1
1 1 1 0

we could write 1 ⇏ y ≡ ¬y, and use ⇏ as a replacement for . and so ¬ as required in the above. Either
way, it is clearly the case that
{⇒, ⇏} { {¬, ∨}

i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.

S18. First, note that each input can obviously be assigned one of two values, namely 0 or 1, so there are 2n possible
assignments to n inputs. For example, if we have 1 input, say x, there are 21 = 2 possible assignments because
x can either be 0 or 1. In the same way, for 2 inputs, say x and y, there are 22 = 4 possible assignments: we can
have
x=0 y=0
x=0 y=1
x=1 y=0
x=1 y=1

This is why a truth table for n inputs will have 2n rows: each row details one assignment to the inputs, and the
associated output.
So how many functions are there? A function with n-inputs means a truth table with 2n rows; each row
includes an output that can either be 0 or 1 (depending on exactly which function the truth table describes). So
to count how many functions there are, we can just count how many possible assignments there are to the 2n
n
outputs. The correct answer is 22 .

S19. The derivation is as follows

LHS = (y ∧ ¬x) ∨ (x ∧ ¬y)


= (y ∧ ¬x) ∨ 0 ∨ (x ∧ ¬y) ∨ 0 (identity)
= (y ∧ ¬x) ∨ (y ∧ ¬y) ∨ (x ∧ ¬y) ∨ (x ∧ ¬x) (inverse)
= (y ∧ (¬x ∨ ¬y)) ∨ (x ∧ (¬y ∨ ¬x)) (distribution)
= ((¬x ∨ ¬y) ∧ y) ∨ ((¬x ∨ ¬y) ∧ x) (commutativity)
= (¬x ∨ ¬y) ∧ (x ∨ y) (distribution)
= (x ∨ y) ∧ (¬x ∨ ¬y) (commutativity)
= (x ∨ y) ∧ ¬(x ∧ y) (de Morgan)
= RHS

suggesting that the correct option is the de Morgan axiom.

git # 8b6da880 @ 2023-09-27 269


© Daniel Page ⟨dan@phoo.org⟩

S20. These are all (or close to) Boolean axioms, which potentially can be identified by just looking at them: for
example, the first one is the association axiom. Taking a more systematic approach, the following, exhaustive
truth-table

x y z (x ∧ y) ∧ z x ∧ (y ∧ z) x⇒y ¬x ∨ y x ∧ (x ∨ y) ¬(x ∨ y) ¬x ∧ ¬y y x∨0 x


0 0 0 0 0 1 1 0 1 1 0 0 0
0 0 1 0 0 1 1 0 1 1 0 0 0
0 1 0 0 0 1 1 0 0 0 1 0 0
0 1 1 0 0 1 1 0 0 0 1 0 0
1 0 0 0 0 0 0 1 0 0 0 1 1
1 0 1 0 0 0 0 1 0 0 0 1 1
1 1 0 0 0 1 1 1 0 0 1 1 1
1 1 1 1 1 1 1 1 0 0 1 1 1

demonstrates that
x ∧ (x ∨ y) . y.
This is an incorrect version of the absorption axiom, which could be corrected to read

x ∧ (x ∨ y) ≡ x.

S21. Adding parentheses for clarity throughout, using either derivation

(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (distribution)

(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (distribution)
= (x ∨ ¬x) ∧ x (commutativity)

(x ∧ x) ∨ (x ∧ ¬x)
= x ∨ (x ∧ ¬x) (idempotency)
= x∨0 (inverse)
= x (identity)

(x ∧ x) ∨ (x ∧ ¬x)
= ¬(¬(x ∧ x) ∧ ¬(x ∧ ¬x)) (deMorgan)
= ¬((¬x ∨ ¬x) ∧ (¬x ∨ x)) (deMorgan)

or, failing that, enumeration

x x ∧ x ∨ x ∧ ¬x x ∧ (x ∨ ¬x) (x ∨ ¬x) ∧ x x ¬x ¬((¬x ∨ ¬x) ∧ (¬x ∨ x))


0 0 0 0 0 1 0
1 1 1 1 1 0 1

means we can conclude that


x ∧ x ∨ x ∧ ¬x . ¬x.

S22. At first glance, this question looks like a lot of work. However, we can immediately rule out several options
because the associated Karnaugh maps are clearly invalid:

• option A is invalid because the dimensions do not match the truth table: it ignores z s is for a 3-input
rather than 4-input function,

• option B is invalid because the content does not match the truth table: the truth table has 6 entries equal
to 1 whereas the Karnaugh map has 5,

• option D is invalid because the 3-element red group is invalid: groups must be rectangular, but this is
L-shaped.

So only options C and E remain. Even just looking at them, we can guess that option C will yield a more efficient
expression because it uses fewer, larger groups (option E uses unit-sized groups only). In more detail

git # 8b6da880 @ 2023-09-27 270


© Daniel Page ⟨dan@phoo.org⟩

• Option C yields
r = f (w, x, y, z) = ( ¬x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y )
and thus 5 AND, 2 OR, and 6 NOT operators.
• Option E yields
r = f (w, x, y, z) = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z )
and thus 18 AND, 5 OR, and 16 NOT operators.
Even considering the (significant) potential for applying common sub-expression, e.g., computing and sharing
the result of ¬x once versus using using one operator for each instance, option C will clearly involve fewer
operators.

S23. Adding parentheses for clarity throughout, using either derivation


= (x ∧ y) ∨ (x ∧ y ∧ z)
= (x ∧ y ∧ 1) ∨ (x ∧ y ∧ z) (identity)
= (x ∧ y) ∧ (1 ∨ z) (distribution)
= (x ∧ y) ∧ (z ∨ 1) (commutativity)
= x∧y∧1 (null)
= x∧y (null)
or, failing that, enumeration
x y z (x ∧ y) ∨ (x ∧ y ∧ z) x∧y x∧z y∧z x∧y∧z 1
0 0 0 0 0 0 0 0 1
0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 0 1
0 1 1 0 0 0 1 0 1
1 0 0 0 0 0 0 0 1
1 0 1 0 0 1 0 0 1
1 1 0 1 1 0 0 0 1
1 1 1 1 1 1 1 1 1
means we can conclude that
x ∧ y ∨ x ∧ y ∧ z ≡ x ∧ y.

n
S24. One can show that there are 22 possible Boolean functions with n inputs. In this case n = 1 so we know there
1
are 22 = 22 = 4 such functions, which we can enumerate as follows:
x f0 f1 f2 f3
0 0 0 1 1
1 0 1 0 1
In essence, f0 is the constant 0 function (noting f0 ( f0 (x)) = 0 = f0 (x)), f1 is the identity function (noting
f1 ( f1 (x)) = x = f1 (x)), f2 is the complement function (noting f2 ( f2 (x)) = ¬¬x = x , ¬x = f2 (x)), and f3 is the
constant 1 function (noting f3 ( f3 (x)) = 1 = f3 (x)), As such, only 1 of the 4 possible functions, namely f2 , is not
idempotent.

n
S25. One can show that there are 22 possible Boolean functions with n inputs. In this case n = 2 so we know there
2
are 22 = 24 = 16 such functions, which we can enumerate as follows:
x y f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15
0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

git # 8b6da880 @ 2023-09-27 271


© Daniel Page ⟨dan@phoo.org⟩

The cases where x = y = 0 and x = y = 1 are naturally symmetric, so the question is basically whether or not
the remaining cases are also symmetric, i.e., whether or not fi (0, 1) = fi (1, 0) holds for a given i. By inspection
of the truth table above, one can see that the relationship holds for each
i ∈ S = {0, 1, 6, 7, 8, 9, 14, 15},
noting that |S| = 8 possible functions are therefore symmetric. As an aside, symmetric Boolean functions are
known as Boolean counting functions in some contexts; one can prove that there are 2n+1 such functions with
n inputs, which in our case means 2n+1 = 22+1 = 23 = 8 as expected.

S26. Adding parentheses for clarity throughout, using either derivation


(x ∧ (¬x ∨ y)) ∨ y
= (x ∧ ¬x) ∨ (x ∧ y) ∨ y (distribution)
= (x ∧ ¬x) ∨ y ∨ (x ∧ y) (commutativity)
= (x ∧ ¬x) ∨ y ∨ (y ∧ x) (commutativity)
= (x ∧ ¬x) ∨ y (absorption)
= (x ∧ ¬x) ∨ (y ∧ 1) (identity)
= (x ∧ ¬x) ∨ (y ∧ (x ∨ 1)) (null)
= (x ∧ ¬x) ∨ (y ∧ (1 ∨ x)) (commutativity)

(x ∧ (¬x ∨ y)) ∨ y
= (x ∧ ¬x) ∨ (x ∧ y) ∨ y (distribution)
= 0 ∨ (x ∧ y) ∨ y (inverse)

(x ∧ (¬x ∨ y)) ∨ y
= (x ∧ ¬x) ∨ (x ∧ y) ∨ y (distribution)
= 0 ∨ (x ∧ y) ∨ y (inverse)
= (x ∧ y) ∨ 0 ∨ y (commutativity)
= (x ∧ y) ∨ y ∨ 0 (commutativity)
= (x ∧ y) ∨ y (identity)
= y ∨ (x ∧ y) (commutativity)
= y ∨ (y ∧ x) (commutativity)
= y (absorption)

(x ∧ (¬x ∨ y)) ∨ y
= ¬(¬(x ∧ (¬x ∨ y)) ∧ ¬y) (deMorgan)
= ¬((¬x ∨ ¬(¬x ∨ y)) ∧ ¬y) (deMorgan)
= ¬((¬x ∨ (x ∧ ¬y)) ∧ ¬y) (deMorgan)
or, failing that, enumeration
x y x ∧ (¬x ∨ y) ∨ y x ∧ ¬x ∨ y ∧ (1 ∨ x) 0∨x∧y∨y x∧y y ¬((¬x ∨ (x ∧ ¬y)) ∧ ¬y)
0 0 0 0 0 0 0 0
0 1 1 1 1 0 1 1
1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
means we can conclude that
x ∧ (¬x ∨ y) ∨ y . x ∧ y.

S27. Adding parentheses for clarity throughout, using either derivation


(x ∨ y) ∧ (x ∨ ¬y)
= x ∨ (y ∧ ¬y) (distribution)
= x∨0 (inverse)
= x (identity)
or, failing that, enumeration
x y (x ∨ y) ∧ (x ∨ ¬y) ¬x x ¬y y 0
0 0 0 1 0 1 0 0
0 1 0 1 0 0 1 0
1 0 1 0 1 1 0 0
1 1 1 0 1 0 1 0

git # 8b6da880 @ 2023-09-27 272


© Daniel Page ⟨dan@phoo.org⟩

means we can conclude that


(x ∨ y) ∧ (x ∨ ¬y) ≡ x.

S28. a The truth table for this function is as follows


a b c (a ∧ b ∧ ¬c) (a ∧ ¬b ∧ c) (¬a ∧ ¬b ∧ c) f (a, b, c)
0 0 0 0 0 0 0
0 0 1 0 0 1 1
0 1 0 0 0 0 0
0 1 1 0 0 0 0
1 0 0 0 0 0 0
1 0 1 0 1 0 1
1 1 0 1 0 0 1
1 1 1 0 0 0 0

Since there are n = 3 input variables, there are clearly 2n = 23 = 8 input combinations; three of these
produce 1 as an the output from the function.
b The truth table for this function is as follows
a b c d f (a, b, c, d)
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0

so since there is only one case where f (a, b, c, d) = 1, the only assignment given which matches the criteria
is a = 0, b = 1, c = 0 and d = 1.
This hints at a general principle: when we have an expression like this, a term such as ¬x can be read as
“x should be 0” and x as “x should be 1”. So the expression as a whole is read as “a should be 0 and b
should be 1 and c should be 0 and d should be 1”. Since we have basically fixed all four inputs, only one
entry of the truth table matches. On the other hand, if we instead had

f (a, b, c, d) = ¬a ∧ b ∧ ¬c

for example, we would be saying “a should be 0 and b should be 1 and c should be 0, and d can be
anything” which gives two possible assignments (i.e., a = 0, b = 1, c = 0 and either d = 0 or d = 1).
c Informally, SoP form means there are say n terms in the expression: each term is the conjunction of some
variables (or their complement), and the expression is the disjunction of the terms. As conjunction and
disjunction basically means the AND and OR operators, and AND and OR act sort of like multiplication
and addition, the SoP name should make some sense: the expression is sort of like the sum of terms
which are themselves each a product of variables. The second option is correct as a result; the first and
last violate the form described above somehow (e.g., the first case is in the opposite, PoS form).
d One can easily make a comparison using a truth table such as

a b a∨1 a⊕1 ¬a a∧1 ¬(a ∧ b) ¬a ∨ ¬b


0 0 1 1 1 0 1 1
0 1 1 1 1 0 1 1
1 0 1 0 0 1 1 1
1 1 1 0 0 1 0 0

git # 8b6da880 @ 2023-09-27 273


© Daniel Page ⟨dan@phoo.org⟩

from which it should be clear that all the equations are correct except for the first one. That is, a ∨ 1 , a
but rather a ∨ 1 = 1.

e i Inspecting the following truth table


a ¬a ¬¬a
0 1 0
0 1 0
1 0 1
1 0 1
shows this equivalence is correct (this is the involution axiom).
ii Inspecting the following truth table

a b ¬a ¬b a ∧ b ¬(a ∧ b) ¬a ∨ ¬b
0 0 1 1 0 1 1
0 1 1 0 0 1 1
1 0 0 1 0 1 1
1 1 0 0 1 0 0

shows this equivalence is correct (this is the de Morgan axiom).


iii Inspecting the following truth table

a b ¬a ¬b ¬a ∧ b a ∧ ¬b
0 0 1 1 0 0
0 1 1 0 1 0
1 0 0 1 0 1
1 1 0 0 0 0

shows this equivalence is incorrect.


iv Inspecting the following truth table
a ¬a a⊕a
0 1 0
1 0 0
shows this equivalence is incorrect.

S29. a The dual of any expression is constructed by using the principle of duality, which informally means
swapping each AND with OR (and vice versa) and each 0 with 1 (and vice versa); this means, for
example, we can take the OR form of each axiom and produce the AND form (and vice versa).
So in this case, we start with an OR form: this means the dual will the corresponding AND form. Making
the swaps required means we end up with
x∧0≡0
so the second option is correct.

b This question is basically asking for the complement of f , since the options each have ¬ f on the left-
hand side: this means using the principle of complements, a generalisation of the de Morgan axiom, by
swapping each variable with the complement (and vice versa), each AND with OR (and vice versa), and
each 0 with 1 (and vice versa). If we apply these rules (taking care with the parenthesis) to

f = ¬a ∧ ¬b ∨ ¬c ∨ ¬d ∨ ¬e,

we end up with
¬ f = (a ∨ b) ∧ c ∧ d ∧ e
which matches the last option.

c The de Morgan axiom, which can be generalised using by the principle of complements, says that

¬(x ∧ y) ≡ ¬x ∨ ¬y

or conversely that
¬(x ∨ y) ≡ ¬x ∧ ¬y

git # 8b6da880 @ 2023-09-27 274


© Daniel Page ⟨dan@phoo.org⟩

You can think of either form as “pushing” the NOT operator on the left-hand side into the parentheses:
this acts to complement each variable, and swap the AND to an OR (or vice versa). We know that
x∧y ≡ ¬(x ∧ y)
x∨y ≡ ¬(x ∨ y)
So pattern matching against the options, it is clear the first one is correct, for example, because
x∨y ≡ ¬(x ∨ y) ≡ ¬x ∧ ¬y
where the right-hand side matches the description of an AND whose two inputs are complemented.
Likewise, the second one is correct because
x∧y ≡ ¬(x ∧ y) ≡ ¬x ∨ ¬y.

S30. a The third option, i.e., ¬a ∧ ¬b is the correct one; the three simplification steps, via two axioms, are as
follows:
¬ (a ∨ b) ∧ ¬ (c ∨ d ∨ e) ∨ ¬ (a ∨ b)
= (¬a ∧ ¬b) ∧ ¬ (c ∨ d ∨ e) ∨ (¬a ∧ ¬b) (de Morgan)
= (¬a ∧ ¬b) ∧ (¬c ∧ ¬d ∧ ¬e) ∨ (¬a ∧ ¬b) (de Morgan)
= ¬a ∧ ¬b (absorption)
b We can clearly see that
(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)
= (a ∨ b ∨ c) ∧ (¬(d ∨ e) ∨ (d ∨ e)) (distribution)
= (a ∨ b ∨ c) ∧ ((d ∨ e) ∨ ¬(d ∨ e)) (commutativity)
= (a ∨ b ∨ c) ∧ 1 (inverse)
= a∨b∨c (identity)
meaning the first option is the correct one.
c We can clearly see that
a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)
= (a ∧ c) ∨ (c ∧ (¬a ∨ (a ∧ b))) (precedence)
= (c ∧ a) ∨ (c ∧ (¬a ∨ (a ∧ b))) (commutativity)
= c ∧ (a ∨ ¬a ∨ (a ∧ b)) (distribution)
= c ∧ (1 ∨ (a ∧ b)) (inverse)
= c ∧ ((a ∧ b) ∨ 1) (commutativity)
= c∧1 (null)
= c (identity)
meaning the last option is the correct one: none of the above is correct, since the correct simplification is
actually just c.
d The fourth option, i.e., a ∧ b is correct. This basically stems from repeated application of the absorption
axiom, the AND form of which states
x ∨ (x ∧ y) ≡ x.
Applying it from left-to-right, we find that
a∧b∨a∧b∧c∨a∧b∧c∧d∨a∧b∧c∧d∧e∨a∧b∧c∧d∧e∧ f
= (a ∧ b) ∨ (a ∧ b) ∧ (c) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (precedence)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) (absorption)

e We can simplify this function as follows


f (a, b, c) = (a ∧ b) ∨ a ∧ (a ∨ c) ∨ b ∧ (a ∨ c)
= (a ∧ b) ∨ a ∨ b ∧ (a ∨ c) (absorbtion)
= a ∨ (a ∧ b) ∨ b ∧ (a ∨ c) (commutitivity)
= a ∨ b ∧ (a ∨ c) (absorbtion)
= a ∨ (b ∧ a) ∨ (b ∧ c) (distribution)
= a ∨ (a ∧ b) ∨ (b ∧ c) (commutitivity)
= a ∨ (b ∧ c) (commutitivity)

git # 8b6da880 @ 2023-09-27 275


© Daniel Page ⟨dan@phoo.org⟩

at which point there is nothing else that can be done: we end up with 2 operators (and AND and an OR),
so the second option is correct.

f Working from the right-hand side toward the left, we have that

¬x ∨ ¬y
= (¬x ∧ 1) ∨ ¬y (identity)
= (¬x ∧ 1) ∨ (¬y ∧ 1) (identity)
= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ 1) (inverse)
= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ (x ∨ ¬x)) (inverse)
= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ (x ∨ ¬x)) (distribution)
= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ x) ∨ (¬y ∧ ¬x) (distribution)
= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ∨ (¬x ∧ ¬y) (commutativity)
= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) (idempotency)

g By writing
t0 = x ∧ y
t1 = y ∧ z
t2 = y ∨ z
t3 = x ∨ z
t4 = t1 ∧ t2

we can shorten the LHS and RHS to


f = t0 ∨ t4
g = y ∧ t3

and then perform a brute-force enumeration

x y z t0 t1 t2 t3 t4 f g
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0 0 0
0 1 1 0 1 1 1 1 1 1
1 0 0 0 0 0 1 0 0 0
1 0 1 0 0 1 1 0 0 0
1 1 0 1 0 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1

to demonstrate that f = g, i.e., the equivalence holds. Note that this approach is not as robust if the
intermediate steps are not shown; simply including f and g in the truth table does not give much more
confidence that simply writing the equivalence!
To prove the equivalence using an axiomatic approach, the following steps can be applied:

(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z))
= (x ∧ y) ∨ (y ∧ z ∧ y) ∨ (y ∧ z ∧ z) (distribution)
= (x ∧ y) ∨ (y ∧ y ∧ z) ∨ (y ∧ z ∧ z) (commutativity)
= (x ∧ y) ∨ (y ∧ z) ∨ (y ∧ z) (idempotency)
= (x ∧ y) ∨ (y ∧ z) (idempotency)
= (y ∧ x) ∨ (y ∧ z) (commutativity)
= y ∧ (x ∨ z) (distribution)

h Using four simplification steps, via three axioms and the AND operator, as follows

¬(a ∨ b) ∧ ¬(¬a ∨ ¬b)


= (¬a ∧ ¬b) ∧ (a ∧ b) (de Morgans)
= (¬a ∧ a) ∧ (¬b ∧ b) (association)
= (¬a ∧ a) ∧ (¬b ∧ b) (identity)
= false ∧ false (operator)
= false

we get a form that contains zero operators (which by definition must be the fewest).

git # 8b6da880 @ 2023-09-27 276


© Daniel Page ⟨dan@phoo.org⟩

B.2 Chapter 2
S31. We can deal with the first two statements in one go: an N-MOSFET (or N-type MOSFET) has N-type semicon-
ductor terminals and a P-type body. If the type of semiconductors were swapped, we have a P-MOSFET (or
P-type MOSFET) not an N-MOSFET.
A CMOS cell is a pairing of two transistors. However, it depends on use of complementary types, namely
one N-type and one P-type, rather than the same type as suggested. As such, this statement is incorrect
(although subtly so).
A given N-MOSFET is deemed active (resp. inactive) when current is allowed (resp. disallowed) to flow.
The flow is, in some sense, controlled by the voltage level applied: it acts to widen or narrow the associated
depletion region. At some threshold, “enough” voltage is applied for there to be “enough” current flowing for
source and drain to be deemed connected and hence the MOSFET active. So this statement is true, although
arguably some detail is glossed over (e.g., the fact a leakage current will always exist, so a transistor is always a
little bit active).

S32. This is a NAND gate, so clearly there are two inputs and one output: by inspecting the circuit we can see them
labelled x, y and r, plus identify the two power rails labelled Vdd and Vss . The output r is “pulled-up” to the Vdd
voltage level iff. a connection is made via one or other of the top two transistors. Their parallel arrangement
gives a hint at their type, even if the system is not recognisable. If we look at the truth table for NAND, i.e.,
NAND
x y r
0 0 1
0 1 1
1 0 1
1 1 0
then if either x = 0 or y = 0 then r = 1, where Vss ≡ 0 and Vdd ≡ 1: these must be P-MOSFETS, therefore,
because we want a connection to be formed between r and Vdd if x = Vss or y = Vss . There is, of course,
a companion pull-down network allowing r to be “pulled-down” to the Vss voltage level. However, this is
constructed from a sequential arrangement of N-MOSFETS: we did not study BJT transistors, which represent
a different technology from MOSFETs.
Finally, note that it is highly unlikely you will see a flux capacitor in a circuit other than during Back To The
Future!

S33. Provided you know what the behaviour of the pull-up network (top, consisting of P-MOSFETs) and pull-down
network (bottom, consisting of N-MOSFETs) is, it is reasonably easy (if long winded) to answer the question by
looking at a case-by-case analysis. A more efficient way is to spot the sequential and parallel organisation of
MOSFETs:
• If x = Vss , y and z are irrelevant because a connection between r and Vdd is formed (via the bottom-left
P-MOSFET), while a connection between r and Vdd is impossible (due to the top N-MOSFET).
• If x = Vdd , y and z are relevant:

– If y = Vss , z = Vss then a path between r and Vss is impossible; in this case, however, a path between
r and Vdd is formed (via the top P-MOSFETs).
– If y = Vss , z = Vdd then a path between r and Vss is formed (via the top and bottom-right N-MOSFETs).
– If y = Vdd , z = Vss then a path between r and Vss is formed (via the top and bottom-left N-MOSFETs).
– If y = Vdd , z = Vdd then a path between r and Vdd is impossible; in this case, however, a path between
r and Vss is formed (via the bottom N-MOSFETs).

So given Vss ≡ 0 and Vdd ≡ 1, we can write


x y z r = f (x, y, z)
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0

git # 8b6da880 @ 2023-09-27 277


© Daniel Page ⟨dan@phoo.org⟩

which, translated into an expression, is r = ¬(x ∧ (y ∨ z) or ¬x ∨ (¬y ∧ ¬z) if you prefer: basically r = 1 if x = 0
or both y = 0 and z = 0, otherwise r = 0.

S34. This question is tricky, in the sense there are lots of ways an XOR gate can be constructed using logic gate
instances. One can show, for example, that x ⊕ y is equivalent to

a (¬x ∧ y) ∨ (x ∧ ¬y),

b (x ∨ y) ∧ ¬(x ∧ y),

c t0 ∧ t1 where t0 = (x ∧ x) ∧ y and t1 = (y ∧ y) ∧ x, or

d t1 ∧ t2 where t0 = x ∧ y, t1 = x ∧ t0 , and t2 = y ∧ t0 ,

e (x ∨ y) ∨ ¬(x ∧ y)

and potentially more besides: note that the question rules out the otherwise viable option of directly using
transistors, for example.
To compare the options, the starting point is how each logic gate we could use will be realised using
transistors. In reality this may increase the range of possible answers even further, but to provide an answer
imagine that we only consider the four options above, and assume that

NOT { 2 transistors
NAND { 4 transistors
NOR { 4 transistors
AND { 6 transistors
OR { 6 transistors

i.e., since we know we can construct NOT, NAND and NOR using the stated number of MOSFET-based
transistors, the best way to form AND and OR is simply to append a NOT to a NAND or NOR. This means we
can just count the logic gate instances, and translate:

2 · NOT + 2 · AND + 1 · OR { 22 transistors


1 · NOT + 2 · AND + 1 · OR { 20 transistors
5 · NAND { 20 transistors
4 · NAND { 16 transistors
1 · NOT + 1 · NAND + 2 · NOR { 14 transistors

Based on this 14 is the correct answer, although keep in mind it might be possible to do better based on using a
different set of options (for XOR) and assumptions.

S35. This is quite a tricky question, in the sense there are several plausible answers: selecting between them really
needs some justification, which of course is impossible with a multiple choice question! It is important to
keep in mind that the question focuses on design of some behaviour based on transistors, not precision wrt.
manufacture of the design. Put another way, the question is intentionally pitched at a high-level, with the
various caveats attempting to limit the possible answers:

a It might seem possible to use 0 transistors, in that one can just connect x directly to r. This may realise
the pass through behaviour required, but does not impose a delay (and more generally does not satisfy
use-cases for current or voltage buffers) so is not really valid.

b One could implement a buffer using 1 N-MOSFET transistor, say, in series with a pull-down resistor: the
idea is that when x = Vdd the transistor connects r to Vdd , otherwise the resistor pulls r down to Vss .
However, the material provided does not cover use of resistors: the question caters for this by asking for
an implementation using only transistors, and not listing 1 as an answer! This is justified further by the
fact by placing an emphasis on CMOS, using only an N-MOSFET might seem confusing therefore (given
the argument that N- and P-MOSFETs occur in pairs as part of a pull-down and pull-up network). So
although this is arguably the best answer, it is not the expected one!

c One could start by considering a NOT gate implementation, which will clearly invert x. It does so via
a P-MOSFET that connects Vdd to r, and an N-MOSFET that connects Vss to r. As such, when x = Vss
(resp. x = Vdd ) the P-MOSFET will be connected and N-MOSFET disconnected (resp. vice versa) meaning
that r = Vdd ≃ ¬x (resp. r = Vss ≃ ¬x). A somewhat simple observation is that if we swap the P- and

git # 8b6da880 @ 2023-09-27 278


© Daniel Page ⟨dan@phoo.org⟩

N-MOSFETs, we end up with a buffer. That is, if the P-MOSFET connects Vss to r and the N-MOSFET
connects Vdd to r, then the behaviour swaps st. if x = Vdd (resp. x = Vss ) then r = Vdd ≃ x (resp. r = Vss ≃ x).
So 2 transistors is a reasonable answer if we assume it is possible to organise them this way. In reality,
this is debatable: it is the opposite of normal pull-down and pull-up networks connected to Vdd and Vss , so
may not be reasonable under the constraints of a given manufacturing process. However, the question is
careful to ask for an unconstrained organisation for transistors, so the assumption is allowed.

d Finally, one could simply use two NOT gate implementations in series with each other: this inverts x
twice, computing r = ¬¬x = x using 4 transistors. This approach needs no assumptions, but on the other
hand is obviously not very efficient wrt. the number of transistors used.

In summary then, 2 transistors is a correct (or the expected) answer in this case (although you could quite
reasonably argue for other answers).

S36. From the truth table, one can form a Karnaugh map

x
z
00 01 11 10

0 1 1 1 0
0 1 5 4

y 1 0 1 1 ?
2 3 7 6

which includes two groups: unlike some other examples, the don’t care in this case is uncovered (i.e., we treat
it as 0) since we cannot make fewer or larger groups by covering it (i.e., treating it as 1). Even so, translating
each group into a term yields the SoP expression

r = f (x, y, z) = (¬x ∧ ¬y) ∨ z

which is the correct answer: the fact this is the only one in SoP form at least hints at the correct answer even
without going through the above.

S37. To answer this question, the idea is that we

• use a 3-level cascade to produce an 8-input, 1-bit multiplexer, then

• replicate this 8 times to produce an 8-input, 8-bit multiplexer

using a generic 2-input, 1-bit multiplexer whose behaviour can be described as

x if c = 0
(
r=
y otherwise

i.e., it selects the input x (i.e., connects the output r to x) if the control signal c = 0, and selects the input y (i.e.,
connects the output r to y) if the control signal c = 1.
Imagine the 8, 1-bit inputs are named s, t, u, v, w, x, y and z and there is a 3-bit control signal c (to select
between 23 = 8 inputs). The cascade of 2-input, 1-bit multiplexers is constructed as follows

s if c0 = 0
(
t0 =
t otherwise

if c0 = 0 if c1 = 0
( (
u t0
t1 = t4 =
v otherwise t1 otherwise
if c2 = 0
(
t4
r =
t5 otherwise
if c0 = 0 if c1 = 0
( (
w t2
t2 = t5 =
x otherwise t3 otherwise

if c0 = 0
(
y
t3 =
z otherwise

git # 8b6da880 @ 2023-09-27 279


© Daniel Page ⟨dan@phoo.org⟩

noting there are 3 layers (i.e., they form a tree of depth 3). Using a table such as

c c2 c1 c0 t0 t1 t2 t3 t4 t5 r
0 0 0 0 s u w y s w s
1 0 0 1 t v x z t x t
2 0 1 0 s u w y u y u
3 0 1 1 t v x z v z v
4 1 0 0 s u w y s w w
5 1 0 1 t v x z t x x
6 1 1 0 s u w y u y y
7 1 1 1 t v x z v z z

makes it clear the c-th input is selected as the output r. Given such a component, we then just replicate it 8
times: the same control signal is used for each i-th replication, which then produces the i-th bit of r by selecting
between the i-th bits of the 8-bit inputs s through to z.
We use 7 instances of the 2-input, 1-bit multiplexer for each of the cascades; there are 8 replicated cascades,
so the correct answer is that we need 7 · 8 = 56 instances.

S38. First, notice that the critical path (or longest sequential path) runs through the 4 full-adder instances (as a
result of the carry chain): the 3-rd (most-significant) instance cannot produce either co = c4 or r3 until the carry
propagates from the 0-th (least-significant) instance, and so is dependent on ci = c0 , x0 , and y0 ).
Next, we need some detail about each full-adder instance: the i-th such instance will compute the sum and
carry-out
ri = xi ⊕ yi ⊕ ci
ci+1 = (xi ∧ yi ) ∨ (xi ∧ ci ) ∨ (yi ∧ ci )
= (xi ∧ yi ) ∨ ((xi ⊕ yi ) ∧ ci )
from summands xi and yi , plus the carry-in ci (where c0 = ci and co = c4 ). There are two different options for
ci+1 , so, for the quoted gate delays, we find the critical paths from input to output can be described as:

Input(s) Output(s) Critical path


xi , yi ri TXOR + TXOR = 120ns
ci ri TXOR = 60ns
xi , yi ci+1 TAND + TIOR + TIOR = 60ns
Option 1
ci ci+1 TAND + TIOR + TIOR = 60ns
xi , yi ci+1 TXOR + TAND + TIOR = 100ns
Option 2
ci ci+1 TAND + TIOR = 40ns

The correct answer will clearly differ depending on the option we select for ci+1 , but imagine we select the
latter: irrespective of whether it is better or worse, it matches the lecture slide(s). As such, we can deduce the
following:

• For the 0-th full-adder instance, it takes 100ns to generate c1 from c0 , x0 , and y0 .
• For the 1-st full-adder instance, it takes 40ns to generate c2 from c1 , The reason it is not 100ns, as you might
expect, is because the gate computing x1 ⊕ y1 can produce an output before c1 is available; this means it
does not contribute to the critical path.
• For the 2-nd full-adder instance, it takes 40ns to generate c3 from c2 , x2 , and y2 . The reason it is not 100ns,
as you might expect, is because the gate computing x2 ⊕ y2 can produce an output before c2 is available;
this means it does not contribute to the critical path.
• For the 3-rd instance, it takes 40ns to generate c4 from c3 , The reason it is not 100ns, as you might expect,
is because the gate computing x3 ⊕ y3 can produce an output before c3 is available; this means it does not
contribute to the critical path. Likewise, it takes 60ns to generate r3 from c3 , x3 , and y3 ; for the same
reason as above, this is not 120ns as you might expect.

So the critical path is 220ns wrt. c4 and 240ns wrt. r3 , and therefore 240ns overall. It turns out applying similar
reasoning to the former option yields a slightly longer critical path of 240ns wrt. c4 because

• For the 0-th full-adder instance, it takes 60ns to generate c1 from c0 , x0 , and y0 .
• For the 1-st full-adder instance, it takes 60ns to generate c2 from c1 , x1 , and y1 .
• For the 2-nd full-adder instance, it takes 60ns to generate c3 from c2 , x2 , and y2 .

git # 8b6da880 @ 2023-09-27 280


© Daniel Page ⟨dan@phoo.org⟩

• For the 3-rd instance, it takes 60ns to generate c4 from c3 , x3 , and y3 , and 60ns to generate r3 from c3 , x3 ,
and y3 . The reason it is not 120ns, as you might expect, is because the gate computing x3 ⊕ y3 can produce
an output before c3 is available; this means it does not contribute to the critical path.

but this has no impact on the answer, which is still 240ns overall.

S39. The optimisation they are suggested is captured by the following:

co co co co co
ci x x x x
y s y s y s y s

x0 r0 x1 r1 x2 r2 x3 r3

Put another way, their idea implies the half-adders used no longer have a carry-in. This is not a problem in
terms of computing an addition: we can still use the two available inputs (vs. three for a full-adder) to provide
one operand (i.e., xi , the i-th bit of x) plus a carry-in from the previous half-adder instance. However, clearly
this is not the same addition as previously, since the input we would use to provide the other operand is no
longer available. That is, we can select x as normal, but can no longer provide a y input (the half-adders use
that for to propagate the carry).
Other than x, the only other input we can control is ci which acts as the overall carry-in to the addition.
Since ci ∈ {0, 1}, this means y = 0 and y = 1 are the correct answers.

S40. Recall that a 2-input XOR operator can be described via the following truth table:

XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0

Crucially, this demonstrates that


x⊕x≡0

and
x ⊕ 0 ≡ x,

and hence
x⊕x⊕y≡ y

for any x and y.


Using these axioms, we can write expressions to describe the output of each XOR gate, and hence each
overall output ri . Inspecting the circuit from left-to-right and top-to-bottom, we can show

t0 = x0 ⊕ x2
t1 = x1 ⊕ x3
r0 = t2 = x0 ⊕ t0 = x0 ⊕ x0 ⊕ x2 = x2
r1 = t3 = x1 ⊕ t1 = x1 ⊕ x1 ⊕ x3 = x3
r2 = t4 = t0 ⊕ t2 = x0 ⊕ x2 ⊕ x2 = x0
r3 = t5 = t1 ⊕ t3 = x1 ⊕ x3 ⊕ x3 = x1

st. it becomes clear r0 = x2 , r1 = x3 , r2 = x0 , and r3 = x1 , i.e., the most- and least-significant 2-bit halves of x are
swapped over to produce r.

S41. a Although alternative organisations of the variables may alter the form, a representative Karnaugh map
would be:

git # 8b6da880 @ 2023-09-27 281


© Daniel Page ⟨dan@phoo.org⟩

w
x
r = f (w, x, y, z) 00 01 11 10

00 0 1 1 1
0 1 5 4

01 1 1 1 1
2 3 7 6
z
11 1 0 0 1
10 11 15 14
y
10 ? 1 1 0
8 9 13 12

The most efficient approach would attempt to form the fewest groups possible, and the largest groups
possible: these will combine to minimise the number and complexity of each term in the resulting
expression. As such, the 1 entries can be covered by 4 groups per

w
x
r = f (w, x, y, z) 00 01 11 10

00 0 1 1 1
0 1 5 4

01 1 1 1 1
2 3 7 6
z
11 1 0 0 1
10 11 15 14
y
10 ? 1 1 0
8 9 13 12

noting that a) each group covers 4 entries (with certain entries covered more than once), b) the don’t care
is assumed to be 0 and therefore remains uncovered, and c) 2 of the groups, in center rows and columns,
wrap around the left and right, and top and bottom edges respectively.
b Performing the translation as is, we find that

r = f (w, x, y, z) = ( ¬x ∧ z ) ∨
( x ∧ ¬z ) ∨
( x ∧ ¬y ) ∨
( w ∧ ¬y )

This expression requires 4 NOT, 4 AND, and 3 OR operators and so 4 + 4 + 3 = 11 in total. To further
reduce the number of operators, we can apply various optimisation steps. First, it is possible to show
that (¬x ∧ z) ∨ (x ∧ ¬z) ≡ x ⊕ z, meaning we collapse two terms into one term (involving one XOR).
Second, the term ¬y is used in two terms; we can compute the result once, using one NOT, and share it
between the terms. In combination, we produce the alternative

r = f (w, x, y, z) = ( x⊕z)∨
( x∧ t)∨
(w∧ t)

where t = ¬y. However, finally, it is possible to apply the distribution axiom to the latter two terms: by
rewriting it as
r = f (w, x, y, z) = ( (x⊕ z))∨
(t∧(x∨w))
we produce an expression that requires 1 NOT, 1 AND, 2 OR and 1 XOR operators and so 1 + 2 + 2 + 1 = 5
in total.

S42. For SR-type latch and flip-flop components, we expect at least S, R and en inputs and a Q output; in contrast, for
the D-type (resp. T-type) latch and flip-flop components we expect D (resp. T) and en inputs and a Q output
which match. Even based on the argument that we might have Q and ¬Q from one or other, this at least makes
the former less likely. Additionally, we might expect S and R to be used in a controlled way so as to avoid a

git # 8b6da880 @ 2023-09-27 282


© Daniel Page ⟨dan@phoo.org⟩

problematic meta-stable state; there are instances where x = y = z = 1 and x = y = z = 0, so this control is not
clearly being applied.
Narrowing down the choices further requires interpretation of the signal labelling and behaviour. While
somewhat tricky, it should be clear that the value of z changes to match y while x = 1 and is unchanged when
x = 0. Crucially, it is not the case that z changes to match y only at the point where x transitions from 0 to 1 (or
1 to 0): as shown in the right-hand portion of the waveform, z changes at any point that x = 1. As such, we
infer the component is likely to be a level-triggered latch not an edge triggered flip-flop. Additionally, z does
not toggle between 0 and 1 while x = 1; it matches whatever y is.
In summary therefore, i.e., having ruled out a) SR-type components, b) flip-flop components, and c) a T-type
flip-flop, we conclude that the correct answer is a D-type latch: x represents the enable signal en, y represents
the input D and z represents the output Q.

S43. First, recall that the truth table for NAND is


NAND
x y r
0 0 1
0 1 1
1 0 1
1 1 0
If one or other (or both) of x and y is equal to 0 then the output r must be 1: only when x and y equal 1 do we
get r equal to 0. This gives a useful way to ascertain the behaviour of the circuit, in the sense we can enumerate
the set of consistent states it can be in:
a If S = 0 then the top NAND gate must output 1, and if R = 0 then the bottom NAND gate must output 1.
That is, S = 0 and R = 0 means we have Q = 1 and ¬Q = 1.
b If R = 0 then the bottom NAND gate must output 1, so if S = 1 then the top NAND gate outputs 1 ∧ 1 = 0.
That is, S = 1 and R = 0 means we have Q = 0 and ¬Q = 1.
c If S = 0 then the top NAND gate must output 1, so if R = 1 then the bottom NAND gate outputs 1 ∧ 1 = 0.
That is, S = 0 and R = 1 means we have Q = 1 and ¬Q = 0.
d If S = 1 and R = 1 then one of two possibilities can apply: either

i the top NAND gate outputs 0, meaning the bottom NAND gate outputs 0 ∧ 1 = 1 (which is
consistent with the top gate computing 1 ∧ 1 = 0)
ii the bottom NAND gate outputs 0, meaning the top NAND gate outputs 0 ∧ 1 = 1 (which is
consistent with the bottom gate computing 1 ∧ 1 = 0).

Put another way, S = R = 0 is the meta-stable state (which is inconsistent in the sense wrt. Q = ¬Q: they
should differ) and S = R = 1 is the storage state (which retains whatever values Q and ¬Q have already); S = 0
and R = 1 sets Q = 1, while S = 1 and R = 0 resets Q = 0, whatever the currently value of Q is. As such, the
second excitation table is correct (the first one is for a NOR-based SR-latch), noting this is sort of the inverse of
a NOR-based SR-latch wrt. the meaning of S and R.

S44. This is quite a tricky question, but the central feature to note, for both multiplexers in the circuit, is the loop
between output and (an) input:
• in the left-hand case the loop connects the multiplexer output, say t0 , to the y input, whereas
• in the right-hand case the loop connects the multiplexer output, say t1 , to the x input.
In both cases, this allows a 1-bit value to be stored by “holding” it in the loop. Put another way, when b = 0 then
t0 = a since the x input is selected; when b = 1, however, whatever value t0 has is fed back into the multiplexer.
This is conceptually similar to how SRAM cells work, for example. In that case we had a loop through two
NOT gates that acted to refresh (or reinforce) the stored value, plus extra access transistors. More formally,
in each case the loop results in bistability. Focusing on the left-hand multiplexer as an example, if b = 1 then
either of the states t0 = 0 or t0 = 1 (meaning y = 0 and y = 1) is stable (meaning it will not transition into the
other state without a stimulus), so the value is retained until updated (or lost if the power supply is removed).
In this case, the left-hand and right-hand multiplexers are organised in a primary-secondary form. The idea
is basically that b acts as an enable signal (typically it will be a clock), operating both primary and secondary
multiplexer in one of two modes. Per the above,

git # 8b6da880 @ 2023-09-27 283


git # 8b6da880 @ 2023-09-27
© Daniel Page ⟨dan@phoo.org⟩

primary multiplexer in pass-through mode


secondary multiplexer in storage mode
positive edge on b
α latched by primary multiplexer
primary multiplexer in storage mode
secondary multiplexer in pass-through mode
negative edge on b
α latched by secondary multiplexer
primary multiplexer in pass-through mode
secondary multiplexer in storage mode

Figure B.1: A time line illustrating behaviour of a multiplexer-based, primary-secondary flip-flop.

284
© Daniel Page ⟨dan@phoo.org⟩

• when b = 0 the primary multiplexer passes a through to t0 , whereas the secondary multiplexer is in
storage mode, and
• when b = 1 the primary multiplexer is in storage mode, whereas the secondary multiplexer passes t0
through to t1
To understand their combined behaviour, focus on the instant in time when a positive edge occurs on b and
imagine that a = α for a value α ∈ {0, 1}. Before the edge, because b = 0, the primary and secondary multiplexers
will be in pass-through and storage mode respectively. This means t0 = a = α and c = t1 . At the instance the
edge occurs on b, t0 = α. Since the primary multiplexer flips into storage mode, this value is retained (i.e., fed
back around the loop into the y input) because b = 1: changes to a are irrelevant. Simultaneously, the secondary
multiplexer flips into pass-through mode st. c = α. Then, at some point, there is a negative edge on b meaning
both primary and secondary multiplexers flip back to the opposite mode. Given c = t1 = α at this instant, the
fact the secondary multiplexer is now in storage mode (again) means it retains the value of α. Likewise, since
the primary multiplexer is in pass-through mode, any change to a is reflected in t0 (but since the secondary
multiplexer is in storage mode, this is irrelevant to the value, namely α, it retains). Diagrammatically, this can
be viewed as in Figure B.1.
A somewhat reasonable analogy is that the primary and secondary multiplexers act as latches (each being
level triggered) in isolation, but as a flip-flop once combined. α can be viewed as being “passed along” a 2-step
“conveyor belt”. First, at a positive edge on b, the primary multiplexer will stores whatever α is passed as
input by the user. Then, at the subsequent negative edge on b, that α is passed on to the secondary multiplexer
which stores it; in a sense, the primary multiplexer “protects” the stored α from subsequent changes to a until
another positive edge on b occurs.
So the correct answer is that the circuit represents a flip-flop, i.e., an edge triggered storage cell: a is the
flip-flop input, c is the flip-flop output (i.e., the stored value), and b is the flip-flop enable signal.

S45. Given an l-bit control signal c, the demultiplexer can select between at most 2l outputs: we treat c as an
unsigned, l-bit integer which will clearly range in value between 0 and 2l − 1. In general, we want an l st.
2l ≥ m so each output can be specified; typically m is a power-of-two, since this matches the maximum number
of outputs that can be specified. However, in this case we have m = 5.
Since 22 = 4 < 5 and 23 = 8 > 5 we know a 2-bit control signal is not enough (it cannot select r4 since
0 ≤ c < 4), but a 3-bit control signal is (although it could cope with upto m = 8, and since 0 ≤ c < 8 select r5 , r6
and r7 if they existed). In summary then, l = 3 is the correct answer.

S46. a Note that if a given ri is not connected to either Vdd or Vss , it is deemed to have the high impedance value
Z. This suggests the correct truth table is
x y r0 r1
0 0 1 Z
0 1 Z Z
1 0 Z Z
1 1 Z 0
The reason is because C0 is st. r0 connects to Vdd via two (pull-up) P-type MOSFETs; since these MOSFETs
only connect source to drain if the gate is Vss , we can say that r0 = 1 if x = y = 0 and r0 = Z (i.e.,
disconnected) otherwise. Conversely, C1 is st. r1 connects to Vss via two (pull-down) N-type MOSFETs;
since these MOSFETs only connect source to drain if the gate is Vdd , we can say that r1 = 0 if x = y = 1
and r1 = Z (i.e., disconnected) otherwise.
b Note that the option using 1 instance of C0 and 1 instance of C1 sort of makes sense: one can implement a
NAND gate using 2 P-type and 2 N-type MOSFETS, matching those that exist within instances of C0 and
C1 . However, the question explicitly says we need to use instances of C0 and C1 : we cannot, for example,
“merge” their internal implementation to make this option viable. So, as a first step, we implement a
NOT gate as follows:

t0
C0

x r

C1
t1

git # 8b6da880 @ 2023-09-27 285


© Daniel Page ⟨dan@phoo.org⟩

This is useful because we can reuse it when implementing a NAND gate, but also because it explains the
design approach involved: the idea is basically that the output is driven by one instance of C0 or C1 at
a time, with all the others producing the high impedance value (which is “overridden” by the driving
value). The behaviour can be described as follows:
x t0 t1 r
0 1 Z 1
1 Z 0 0
Using the same design approach, we can now implement an NAND gate as follows:

x y

t0
C0

t1
C0

C0
t2

C1
t3

Applying the same argument wrt. behaviour, we find that


x y t0 t1 t2 t3 r
0 0 1 Z Z Z 1
0 1 Z 1 Z Z 1
1 0 Z Z 1 Z 1
1 1 Z Z Z 0 0
matches the truth table for NAND: remembering to count the components within each NOT gate, we
therefore use 5 instances of C0 and 3 instances of C1 .
As an aside, note that one can implement a NOR gate by swapping the components types in the NAND
implementation: we therefore implement the required behaviour using 3 instances of C0 and 5 instances
of C1 . Also note that it is tempting to consider something like

x y

t0
C1

C0 r
t1

C0
t2

git # 8b6da880 @ 2023-09-27 286


© Daniel Page ⟨dan@phoo.org⟩

as a solution. There is no corresponding option in the table, however: the reason for this is that it violates
the stated design strategy. This can be seen by considering the (hypothetical) truth table

x y t0 t1 t2 r
0 0 Z 1 1 1
0 1 Z Z 1 1
1 0 Z 1 Z 1
1 1 0 Z Z 0

which shows the case where x = 0 and y = 0 means both t1 = 1 and t2 = 1: in such a case r is driven by
two non-Z values.

S47. The first three options are all related to the fact that the number of transistors is increasing; the rate of increase
is important in that the associated limits are reached quicker, but is not so relevant beyond that.

• The fact there are more transistors in a fixed unit of area implies the transistors are smaller, and so, as
a result, that their feature size (i.e., the size of components in their design, such as channel length or
layer thicknesses) is also smaller. Limits clearly exist wrt. how small feature sizes can shrink. Even
if manufacturing process keep pace, at some point the feature size would be measured in some small
number of atoms: at this scale it is plausible the transistor cannot operate correctly, and beyond it one
would need to consider a (radically) different approach.

• Dennard scaling states, roughly, that as transistors become smaller, their power density remains constant.
At face value then, power consumption should not represent a limit. However, Dennard scaling has
now started to break down: with such small feature sizes, otherwise insignificant factors (e.g., static
vs. dynamic power consumption) become significant, and thus lead to increased power consumption
if the number of transistors is increased. An indefinite increase in the amount of power supplied is not
plausible, meaning it acts to limit the number of transistors one can house per unit of area.

• With a small enough feature size, the channel allowing electrical current to flow through the transistor
will not always be able to “contain” it, i.e., there is leakage current. This manifests itself as heat, which
must be dissipated away from the transistors to ensure their correct operation. So, with a fixed capacity
to dissipate heat, this will act to limit the number of transistors one can house per unit of area.

In summary then, all the first three options plausibly constrain or limit Moore’s Law.

S48. A logical approach to this question would likely use two steps: 1) we need to assess which option(s) will toggle
s, then 2) find the shortest path from the inputs to s, which is sort of the opposite of the critical path, and use this
to decide the correct option.
Note that the truth table for this component (including the various annotated intermediate variables) is as
follows:
ci x y t0 t1 t2 co s
0 0 0 0 0 0 0 0
0 0 1 0 1 1 0 1
0 1 0 0 1 1 0 1
0 1 1 0 1 0 1 0
1 0 0 0 1 1 0 1
1 0 1 0 1 0 1 0
1 1 0 0 1 0 1 0
1 1 1 1 1 0 1 1
So, as a first step, we can see that

(0, 0, 0) ⇒ t0 =0 t2 =0 s=0 → (0, 0, 1) ⇒ t′0 =0 t′2 =1 s′ =1


(0, 0, 1) ⇒ t0 =0 t2 =1 s=1 → (0, 1, 1) ⇒ t′0 =0 t′2 =0 s′ =0
(0, 1, 1) ⇒ t0 =0 t2 =0 s=0 → (0, 0, 1) ⇒ t′0 =0 t′2 =1 s′ =1
(1, 1, 1) ⇒ t0 =1 t2 =0 s=1 → (1, 1, 0) ⇒ t′0 =0 t′2 =0 s′ =0
(1, 0, 1) ⇒ t0 =0 t2 =0 s=0 → (0, 1, 1) ⇒ t′0 =0 t′2 =0 s′ =0

i.e., all options bar the last one will toggle s (either from 0 to 1, or from 1 to 0). Next, imagine TNOT , TAND , and
TOR denote the gate delay of a NOT gate, and 2-input AND and OR gate respectively. We can consider five

git # 8b6da880 @ 2023-09-27 287


© Daniel Page ⟨dan@phoo.org⟩

paths from the inputs to s, which each pass through one of the gates organised in a column on the left-hand
side of the diagram.
top 3-input AND { 2 · TAND + 1 · TOR
3-input OR { 1 · TAND + 3 · TOR
↕ 2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
bottom 2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
Given it is likely TAND ≃ TOR , it seems clear the top path will be the shortest. Put another way, given s = t0 ∨ t2 ,
we can toggle s by controlling t0 or t2 ; having identified the top path as the shortest, controlling t0 will allow
control over s within the shortest period of time. Of the options, the second to last one, i.e., (1, 1, 1) → (1, 1, 0),
is the only one that toggles t0 , and would therefore be deemed correct.

S49. Since this is a cyclic counter, we know selecting n = 4 means the output r will step through values

0, 1, . . . , 24 − 1 = 15, 0, 1, . . . .

Writing this information in a tabular form as follows

r r3 r2 r1 r0
0 0 0 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
8 1 0 0 0
9 1 0 0 1
10 1 0 1 0
11 1 0 1 1
12 1 1 0 0
13 1 1 0 1
14 1 1 1 0
15 1 1 1 1
0 0 0 0 0
1 0 0 0 1
.. .. .. .. ..
. . . . .

highlights the behaviour of each i-th bit ri as r itself changes.


The question asks which ri transition between 0 and 1 at the lowest frequency, i.e., slowest. Obviously it
cannot be r4 : for n = 4 there is no r4 ! By looking a the table above, the solution is also obvious: the MSB r3
transition between 0 and 1 slowest, doing so every 8 updates to r (vs. r0 , for example, which does do every 1
update).

S50. CMOS combines complementary transistor types: one N-MOSFET, and one P-MOSFET. Since these transistor
behave in a complementary manner, it is always that case that one will be active and one inactive. This produces
an attractive feature wrt. power, in that static consumption (i.e., when there is no chance in state) is very low.
The dynamic power consumption (i.e., when there is a change of state) is of course higher, but occurs only
when the inputs cause switching activity.
So, at a high-level at least, we could argue the highest consumption is likely when the initial value differs the
most from stored value. That is, we argue that the highest switching activity occurs when the largest number
of bits stored by the register change. We can measure this using the Hamming distance between initial and
stored value: given
t0 = DEAD(16) = 1101111010101101(2)
t1 = BEEF(16) = 1011111011101111(2)
t2 = F00D(16) = 1111000000001101(2)
t3 = 1234(16) = 0001001000110100(2)
t4 = FFFF(16) = 1111111111111111(2)
t5 = 0000(16) = 0000000000000000(2)

git # 8b6da880 @ 2023-09-27 288


© Daniel Page ⟨dan@phoo.org⟩

and recalling that


i<16
X
HD(x, y) = xi ⊕ yi ,
i=0
we can compute
HD(t0 , t1 ) = 4
HD(t0 , t2 ) = 6
HD(t0 , t3 ) = 8
HD(t0 , t4 ) = 5
HD(t0 , t5 ) = 11
This means that based on our argument above, the correct answer would be 0000(16) . A more precise answer
requires a much more detailed analysis of the component: we would need to assess the implementation in
terms of individual transistors, and compute the so-called toggle count, i.e., switching activity, at that level
(rather than via the assumption about toggling the bits stored).

S51. Using a component set with some number of AND, OR, and NOT gates is clearly a more familiar approach.
However, all the component sets can be used to implement f . Writing out the truth table

x y z r
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1

and a Karnaugh map

y
z
00 01 11 10

0 1 0 1 1
0 1 5 4

x 1 0 0 1 1
2 3 7 6

for f helps explain how and why:

a Using 2-input, 1-bit multiplexers for clarity, component set 1 can be used to implement f as follows:

1 x
r
0 y c

x
r
z y c

1 x
r
1 y c y
x
z r r
y c

0 x
r
0 y c x
x
r
z y c

1 x
r
1 y c y

git # 8b6da880 @ 2023-09-27 289


© Daniel Page ⟨dan@phoo.org⟩

b Using 2-input, 1-bit multiplexers for clarity, component set 2 can be used to implement f as follows:

1 x
r
y y c

x
z r r
y c

y x
r
y y c x

c Component set 3 can be used to implement f as follows:

¬z x
r
0 y c r

x y

S52. First, recall that the following truth table


c x y r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1
specifies the behaviour of a 2-input, 1-bit multiplexer: in short, we find that
x if c = 0
(
r= .
y if c = 1
As such, we can implement an AND gate as follows:

x
y
r r
y c

We can show why the implementation is valid (i.e., produces a result matching AND) by inspection:
x y r
0 0 x=0
0 1 x=0
1 0 y=0
1 1 y=1
Notice that x = 0 implies the multiplexer selects the top input and hence r = x, whereas x = 1 implies the
multiplexer selects the top input and hence r = y; overall, r clearly matches AND in the sense r = 1 if x = 1 and
y = 1. Using the same approach, we can implement OR as follows

y x
r r
y c

git # 8b6da880 @ 2023-09-27 290


© Daniel Page ⟨dan@phoo.org⟩

and justify validity again by inspection:


x y r
0 0 y=0
0 1 y=1
1 0 x=1
1 1 x=1
Overall then, the expression
(x ∧ y) ∨ z
can be implemented using just two multiplexers: one to implement the AND operator, and one to implement
the OR operator.

S53. This question can be approached in several ways. First, one could employ basic pattern matching: read
from left-to-write, the three dominant structures can be matched against known NAND, NOR, and NOT gate
implementations. As such, the design implements the expression

r = ¬((x ∧ y) ∨ z)

which we manipulate as follows

¬((x ∧ y) ∨ z)
= ¬((¬(x ∧ y)) ∨ z) (NAND)
= ¬(¬((¬(x ∧ y)) ∨ z)) (NOR)
= ¬(x ∧ y ∧ ¬z) (deMorgan)

such that
r = ¬(x ∧ y ∧ ¬z).
Second, although it involves more work, one can enumerate the transistor and signal states for each input
combination. For example, using + (resp. −) to denote where a given transistor is connected or activated (resp.
disconnected or deactivated), we can write

x y z m0 m1 m2 m3 t0 m4 m5 m6 m7 t1 m8 m9 r
0 0 0 + + − − 1 − + + − 0 + − 1
0 0 1 + + − − 1 − − + + 0 + − 1
0 1 0 − + + − 1 − + + − 0 + − 1
0 1 1 − + + − 1 − − + + 0 + − 1
1 0 0 + − − + 1 − + + − 0 + − 1
1 0 1 + − − + 1 − − + + 0 + − 1
1 1 0 − − + + 0 + + − − 1 − + 0
1 1 1 − − + + 0 + − − + 0 + − 1

and then derive the expression


r = ¬(x ∧ y ∧ ¬z)
directly.

S54. To start with, keep in mind that this design uses flip-flops: these are edge-triggered (versus latches, which are
level-triggered). By focusing on and inspecting the left-hand flip-flop, we infer that the state will be updated
to reflect D = 1 ⊕ Q on each positive edge of clk. Given the truth table

x y r
0 0 0
0 1 1
1 0 1
1 1 0

we find that D = 1 ⊕ Q ≡ ¬Q, suggesting, therefore, that this is a toggle flip-flop constructed by using a
D-type flip-flop: on each positive edge of clk, the state will toggle either from 0 to 1 or from 1 to 0. Note that the
right-hand flip-flop has a similar construction, but that the lower input of the XOR comes from the left-hand
flip-flop.
Imagine that both flip-flops are reset, so their initial state is 0. We can draw a waveform which describes
each signal:

git # 8b6da880 @ 2023-09-27 291


© Daniel Page ⟨dan@phoo.org⟩

clk
t0
t1
t2
t3
r

Put simply, this suggests that each toggle flip-flop acts to halve the frequency: t1 toggles at half the frequency
of clk and t3 toggles at quarter the frequency of clk. Given that clk has a frequency of 400MHz, we therefore
4 = 100MHz.
expect r = t3 to toggle with a frequency of 400

S55. We know that the multiplexer will select the top input if c = 0, or the bottom input if c = 1. This means the
design can be expressed as
r = (¬q ∧ ¬p) ∨ (q ∧ p),
i.e., if q = 0 then r = ¬p, whereas if q = 1 then r = p. As such, we can produce a truth table

p q r
0 0 1
0 1 0
1 0 0
1 1 1

from which it is clear(er) that r = ¬(p ⊕ q).

S56. If the current input is S = R = 0, then the latch must be in the invalid state Q = 1, ¬Q = 1. The question is
focused on instantaneously setting S = R = 1, which means the latch is in storage mode: the question is, what
state does it end up in? The answer is that it does not remain in the invalid state, due to the imbalanced gate
delays. Let Tt and Tb denote the top and bottom gate delay respectively:

a If Tt = x > x − δ = Tb , the output of the bottom gate, i.e., ¬Q, will change state first: it changes from 1 to
0, at which point the only valid (eventual) output from the top gate is 1 (i.e., it will stay the same). So,
the bottom gate “winning” by changing first is like we reset the latch via the bottom, R input.

b If Tt = x < x + δ = Tb , the output of the top gate, i.e., Q, will change state first: it changes from 1 to 0, at
which point the only valid (eventual) output from the bottom gate is 1 (i.e., it will stay the same). So, the
top gate “winning” by changing first is like we set the latch via the top, S input.

This means the latch outputs will be either Q = 0, ¬Q = 1 or Q = 1, ¬Q = 0.

S57. Since the question is ultimately about Boolean expressions, we use 0 and 1 in place of Vss and Vdd for convenience.
Recall that an N-type MOSFET is is connected or activated (resp. disconnected or deactivated) if the gate
terminal is 1 (resp. 0), whereas an P-type MOSFET is is connected or activated (resp. disconnected or
deactivated) if the gate terminal is 0 (resp. 1).
By inspection, t0 is clearly connected to 1 when either 1) x = 0 and y = 0 (through m0 and m1 ), or 2) z = 0
(through and m2 ). Note that the connectives “and” and “or” used here reflect the sequential and parallel way
the P-type MOSFETS are organised. But, either way, and assuming a matching pull-down network, we can
write
t0 = (¬x ∧ ¬y) ∨ ¬z.
The output r is then produced via m3 and m4 , which form a NOT gate: this means

r = ¬((¬x ∧ ¬y) ∨ ¬z)

which, using the de Morgan axiom, we can rewrite as

r = (x ∨ y) ∧ z.

git # 8b6da880 @ 2023-09-27 292


© Daniel Page ⟨dan@phoo.org⟩

S58. This is an N-type MOSFET, used here as an enable gate: if en = 0 there is no connection between x and r, but
if en = 1 there is a connection between x and r. The former case is the more interesting, in the sense that r is
disconnected from any driving signal. This situation is modelled using 3-state logic, wherein an additional
high impedance value Z is considered. Using Z, we can therefore model the transistor using this truth-table:

x en r
0 0 Z
1 0 Z
0 1 0
1 1 1

Put simply, if en = 0 then r = Z because there is no driving signal, but if en = 1 then r = x ∈ {0, 1}. Based on the
potential values of x and en we conclude that r ∈ {0, 1, Z}, i.e., it can potentially take 3 different values.

S59. The simplest approach to producing a solution for this question is brute-force enumeration: by just inspecting
the truth-table, i.e.,
x1 x0 y1 y0 r
0 0 0 0 0 ⇒ 0≯0
0 0 0 1 0 ⇒ 0≯1
0 0 1 0 0 ⇒ 0≯2
0 0 1 1 0 ⇒ 0≯3
0 1 0 0 1 ⇒ 1>0
0 1 0 1 0 ⇒ 1≯1
0 1 1 0 0 ⇒ 1≯2
0 1 1 1 0 ⇒ 1≯3
1 0 0 0 1 ⇒ 2>0
1 0 0 1 1 ⇒ 2>1
1 0 1 0 0 ⇒ 2≯2
1 0 1 1 0 ⇒ 2≯3
1 1 0 0 1 ⇒ 3>0
1 1 0 1 1 ⇒ 3>1
1 1 1 0 1 ⇒ 3>2
1 1 1 1 0 ⇒ 3≯3
we can see that r = 1 in 6 cases.

S60. Several facts inform the optimisation one would expect.

• The adder inputs are PC and 4: whereas PC can be any 32-bit value, so the input is general-purpose, 4 is
a fixed, special-purpose input.

• Since PC is word-aligned, we know that

PC ≡ PC + 4 ≡ 0 (mod 4).

As a result, the 2 LSBs of PC + 4 always equal 0: there is no need to compute them.

• For a general-purpose adder, the carry-in input and carry-out output are useful in various situations.
Here, however, neither is useful and so they remain unused; as a result, we can optimise the adder by
eliminating them, e.g., simplifying the associated full- or half-adder cell.

Rather than a general-purpose ripple-carry adder, which uses 32 full-adder cells, we can use the facts above to
compute the same result by using 30 half-adder cells. That is, the adder design is of the form

0 0 x2 x3 x31

co co co co
x x x x
1 y s y s y s y s

r0 r1 r2 r3 r31

git # 8b6da880 @ 2023-09-27 293


© Daniel Page ⟨dan@phoo.org⟩

where the half-adder that generates r31 is optimised even further by considering that the carry-out is unused.
Under the same assumptions, the resulting area is

29 · (1 · XOR + 1 · AND) + 1 · (1 · XOR)


= 29 · (1 · 4 + 1 · 2) + 1 · (1 · 4)
= 29 · 6 + 1·4
= 178

which is a factor of 448/178 = 2.52 improvement.

S61. Saying that the cell contains “duplicate” makes no sense, and it is not true that we do not know what the cell
content, or output, is. Rather, we do not care what the cell content is: the input is impossible, so the output is
irrelevant to f .
Of the two remaining options, the correct one mirrors the associated approach. That is, the don’t care cell
can be treated as either 0 or 1; we select the option which will most effectively simplify the resulting term. Since
the input is impossible, that selection has no impact on the functionality of f for the possible inputs.

S62. Just by inspection, there is no way we can use either XOR, AND, or OR gate types to achieve the required
functionality. So if NAND and NOR are the only viable options, the question is asking which of them has been
used. The definition of NAND or NOR is as follows:

x y x∧y x∨y
0 0 1 1
0 1 1 0
1 0 1 0
1 1 0 0

This allows us to reason about the relationship between inputs (i.e., S and R) and outputs (i.e., Q and ¬Q).
For example, if the current state is Q = 0, ¬Q = 1 and we have S = 1 and R = 1 then the top component is
computing 0 = S ⊙ ¬Q = 1 ⊙ 1, and the bottom component is computing 1 = R ⊙ Q = 1 ⊙ 0; in this case the
top component is consistent with either NAND or NOR, but the bottom component is consistent with only
NAND. So, although we could test other cases to gain more confidence, this case alone acts as a strong enough
hint that ⊙ ≡ ∧ , i.e., NAND gates have been used.

S63. Note that the additional inputs are often termed preset and clear, matching the labels used here.

a The output of a synchronous circuit depends on the input in a discrete manner, i.e., it can change only
at a specific point in time; this is achieved using a clock signal, which acts to control when components
in the circuit, e.g., by gating them. In contrast, the output of an asynchronous circuit depends on the
input in a continuous manner, i.e., it can change at any time; there is no control of when changes in the
output take effect. The same terminology can be applied to the inputs themselves, or even signals more
generally. S and R are most accurately classified as synchronous. The reason is that they can influence
the output only when en = 1: we can see this by noting en ∧ x = 1 if en = 0, whereas en ∧ x = ¬x if en = 1.
As such, en can be said to gate S and R (and hence control their influence). P and C are most accurately
classified as asynchronous. The reason is that they can influence the output at any point in time, whether
en = 1 or en = 0 and so irrespective of en: in a sense this is obvious, because they do not interact with en
(they feed directly into the two NAND gates towards the right-hand side).

b When used to describe a control signal x, the terms active low and active high relate to when that signal
exerts control, i.e., whether that is when x = 0 or x = 1; in the former case, this is often highlighted by
writing ¬x or x rather than x as the input label. S and R are most accurately classified as active high: we
can show that if en = 1 then S = 1 will set Q = 1, whereas if en = 0 then R = 1 will reset Q = 0. P and C are
most accurately classified as active low: we can show that P = 0 will override S and set Q = 1, whereas
C = 0 will override R and reset Q = 0.

S64. Using a Karnaugh map, for example, one can produce the result

r = f (x, y, z) = y ∨ (z ∧ ¬x) ∨ (x ∧ ¬z)

git # 8b6da880 @ 2023-09-27 294


© Daniel Page ⟨dan@phoo.org⟩

which, by inspection, gives


f
x y z y z ∧ ¬x x ∧ ¬z r
0 0 0 0 0 0 0
0 0 1 0 1 0 1
0 1 0 1 0 0 1
0 1 1 1 1 0 1
1 0 0 0 0 1 1
1 0 1 0 0 0 0
1 1 0 1 0 1 1
1 1 1 1 0 0 1
However, notice that the sub-expression
(z ∧ ¬x) ∨ (x ∧ ¬z)
can be simplified to
z⊕x
so per the question, the simplest implementation of f is
r = f (x, y, z) = y ∨ (z ⊕ x).

S65. First, note that the following identities can be applied:


¬x ≡ x∧x
x ∨ y ≡ (x ∧ x) ∧ (y ∧ y)
x ∧ y ≡ (x ∧ y) ∧ (x ∧ y)
As such, we can write
¬(x ∨ y) ≡ ((x ∧ x) ∧ (y ∧ y)) ∧ ((x ∧ x) ∧ (y ∧ y)).

S66. The excitation table of a standard SR latch is


Current Next
S R Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 ? ? ? ?
meaning the reset-dominate alteration gives
Current Next
S R Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 ? ? 0 1
Given the inputs S and R, the circuit can be constructed by using a standard SR latch with two cross-coupled
NOR gates whose inputs are S′ and R′ . We simply set R′ = R and S′ = ¬R ∧ S so S′ is only 1 when R = 0 and
S = 1: if R = 1 (including the case when both R = 1 and S = 1), the latch it reset since S = 0. The circuit is as
follows:

-------+
| +---+
+---+ +->| | +---+
| | |AND|-- S' -->| |
|NOR|---->| | |NOR|-- ~Q --+
| | +---+ +->| | |
+---+ | +---+ |
+--|---------------+
| |
| +---------------+
| +---+ |
+---->| | |
|NOR|-- Q --+
----------------- R' -->| |
+---+

git # 8b6da880 @ 2023-09-27 295


© Daniel Page ⟨dan@phoo.org⟩

S67. A wide range of answers are clearly possible. Obvious examples include physical size, and power consumption
or heat dissipation. Other variants include worst-case versus average-case versions of each metric, for example
in the case of efficiency.

S68. a MOSFET transistors work by sandwiching together N-type and P-type semiconductor layers. The dif-
ferent types of layer are doped with different substances to create more holes or more electrons. For
example, in an N-type MOSFET the layers are constructed as follows

gate
+-------+
| metal |
==== source ========= drain ==== silicon oxide layer
+--+--------+---------+--------+--+
| | N-type | | N-type | |
| +--------+ +--------+ |
| P-type |
+---------------------------------+

with additional layers of silicon oxide and metal. There are three terminals on the transistor. Roughly
speaking, applying a voltage to the gate creates a channel between the source and drain through which
charge can flow. Thus the device acts like a switch: when the gate voltage is high, there is a flow of charge
but when it is low there is little flow of charge. A P-type MOSFET swaps the roles of N-type and P-type
semiconductor and hence implements the opposite switching behaviour.
b One can construct an NAND to compute z = x ∧ y gate from such transistors as follows:

V_dd
|
+-------+-------+
| |
v v
+--------+ +--------+
x -->| P-type | y -->| P-type |
+--------+ +--------+
| |
+---------------+---> z
|
+--------+
x -->| N-type |
+--------+
|
+--------+
y -->| N-type |
+--------+
^
|
+-------+
|
VSS

If x and y are connected to Vss then both top P-type transistors will be connected, and both bottom
N-type transistors will be disconnected; r will be connected to Vdd . If x and y are connected to Vdd
and Vss respectively then the right-most P-type transistor will be connected, and both lower-most N-
type transistor will be disconnected; r will be connected to Vdd . If x and y are connected to Vss and
Vdd respectively then the left-most P-type transistor will be connected, and both upper-most N-type
transistor will be disconnected; r will be connected to Vdd . If x and y are connected to Vdd then both top
P-type transistors will be disconnected, and both bottom N-type transistors will be connected; r will be
connected to Vss . In short, the behaviour we get is described by
x y r
Vss Vss Vdd
Vss Vdd Vdd
Vdd Vss Vdd
Vdd Vdd Vss
which, if we substitute 0 and 1 for Vss and Vdd , is matches that of the NAND operation.

S69. This question is a lot easier than it sounds; basically we just add two extra transistors (one P-MOSFET and one
N-MOSFET) to implement a similar high-level approach. That is, we want r connected to Vss only when each of
x, y and z are connected to Vdd ; this means the bottom, N-MOSFETs are in series. If any of x, y or z are connected
to Vss , we want r connected to Vdd ; this means the top, P-MOSFETs are in parallel. Diagrammatically, the result
is as follows:

git # 8b6da880 @ 2023-09-27 296


© Daniel Page ⟨dan@phoo.org⟩

Vdd

Vss

S70. This is quite an open-ended question, but basically it asks for high-level explanations only. As such, some
example answers include the following:

a CMOS transistors are constructed from atomic-level understanding and manipulation; the immutable
size of atoms therefore acts as a fundamental limit on the size of any CMOS-based transistor.

b Feature scaling improves the operational efficiency of transistors, simply because smaller features reduce
delay. Beyond this however, one must utilise the extra transistors to achieve some useful task if compu-
tational efficiency is to scale as well: improvements to an architecture or design are often required, for
instance, to exploit parallelism and so on.

c Even assuming the transistors available can be harnessed to improve computational efficiency, this has
implications: more transistors within a fixed size area will increase power consumption and also heat
dissipation for example, both of which act as limits even if managed (e.g., via aggressive forms of cooling).

d On one hand, smaller transistors mean less cost per-transistor: with a fixed number of transistors, their
area and manufacturing cost will decrease. With a fixed sized area and hence more transistors in it
however, this probably means increase defect rate during manufacture. The resulting cost implication
could act as an economic limit to transistor size.

S71. a The most basic interpretation (i.e., not really doing any grouping using Karnaugh maps but just picking
out each cell with a 1 in it) generates the following SoP equations

e = (¬a ∧ ¬b ∧ c ∧ ¬d) ∨ (a ∧ ¬b ∧ ¬c ∧ ¬d) ∨ (¬a ∧ b ∧ ¬c ∧ d)


f = (¬a ∧ ¬b ∧ ¬c ∧ d) ∨ (¬a ∧ b ∧ ¬c ∧ ¬d) ∨ (a ∧ ¬b ∧ c ∧ ¬d)

b From the basic SoP equations, we can use the don’t care states to eliminate some of the terms to get

e = (¬a ∧ ¬b ∧ c) ∨ (a ∧ ¬c ∧ ¬d) ∨ (b ∧ d)
f = (¬a ∧ ¬b ∧ d) ∨ (b ∧ ¬c ∧ ¬d) ∨ (a ∧ c)

then, we can share both the terms ¬a ∧ ¬b and ¬c ∧ ¬d since they occur in e and f .

git # 8b6da880 @ 2023-09-27 297


© Daniel Page ⟨dan@phoo.org⟩

S72. Simply transcribing the truth table into a suitable Karnaugh map gives

y
z
00 01 11 10

0 1 1 0 1
0 1 5 4

x 1 0 1 ? 0
2 3 7 6

from which we can derive the SoP expression

r = (¬y ∧ z) ∨ (¬x ∧ ¬z).

S73. Define ∧ as the NAND operation with the truth table:

x y x∧y
0 0 1
0 1 1
1 0 1
1 1 0

Using NAND, we can implement NOT, AND and OR as follows:

¬x = x∧x
x∧y = (x ∧ y) ∧ (x ∧ y)
x∨y = (x ∧ x) ∧ (y ∧ y)

To prove this works, we can construct truth tables for the expressions and compare the results with what we
would expect; for NOT we have:
x x ∧ x ¬x
0 1 1
1 1 0
while for AND we have:
x y x∧y (x ∧ y) ∧ (x ∧ y) x∧y
0 0 1 0 0
0 1 1 0 0
1 0 1 0 0
1 1 0 1 1
and finally for OR we have:

x y x∧x y∧y (x ∧ y) ∧ (x ∧ y) x∨y


0 0 1 1 0 0
0 1 1 0 1 1
1 0 0 1 1 1
1 1 0 0 1 1

such that it should be clear all three are correct.

S74. Conventionally a 4-input, 1-bit multiplexer might be described using a truth table such as the following:

c1 c0 w x y z r
0 0 ? ? ? 0 0
0 0 ? ? ? 1 1
0 1 ? ? 0 ? 0
0 1 ? ? 1 ? 1
1 0 ? 0 ? ? 0
1 0 ? 1 ? ? 1
1 1 0 ? ? ? 0
1 1 1 ? ? ? 1

git # 8b6da880 @ 2023-09-27 298


© Daniel Page ⟨dan@phoo.org⟩

This assumes that there are four inputs, namely w, x, y and z, with two further control signals c1 and c0 deciding
which of them provides the output r. However, another valid way to write the same thing would be
c1 c0 r
0 0 w
0 1 x
1 0 y
1 1 z
This reformulation describes a 2-input, 1-output Boolean function whose behaviour is selected by fixing w, x,
y and z, i.e., connecting each of them directly to either 0 or 1. For instance, if w = x = y = 0 and z = 1 then the
truth table becomes
c1 c0 r
0 0 w=0
0 1 x =0
1 0 y =0
1 1 z =1
which is of course the same as AND. So depending on how w, x, y and z are fixed (on a per-instance basis) we
can form any 2-input, 1-output Boolean function; this includes NAND and NOR, which we know are universal,
meaning the multiplexer is also universal.

S75. a The expression for this circuit can be written as


e = (¬c ∧ ¬b) ∨ (b ∧ d) ∨ (¬a ∧ c ∧ ¬d) ∨ (a ∧ c ∧ ¬d)
which yields the Karnaugh map

a
b
00 01 11 10

00 1 0 0 1
0 1 5 4

01 1 1 1 1
2 3 7 6
d
11 0 1 1 0
10 11 15 14
c
10 1 1 1 1
8 9 13 12

and from which we can derive a simplified SoP form for e, namely
e = (b ∧ d) ∨ (¬b ∧ ¬c) ∨ (c ∧ ¬d)

b The advantages of this expression over the original are that is is simpler, i.e., contains less terms and
hence needs less gates for implementation, and shows that the input a is essentially redundant. We have
probably also reduced the critical path through the circuit since it is more shallow. The disadvantages
are that we still potentially have some glitching due to the differing delays through paths in the circuit,
although these existed before as well, and the large propagation delay.
c The longest sequential path through the circuit goes through a NOT gate, two AND gates and two OR
gates; the critical path is thus 90ns long. This time bounds how fast we can used it in a clocked system
since the clock period must be at least 90ns. So the shortest clock period would be 90ns, meaning the
clock ticks about 11111111 times a second (or at about 11MHz).

S76. a Examining the behaviour required, we can construct the following truth table:
D2 D1 D0 L8 L7 L6 L5 L4 L3 L2 L1 L0
0 0 0 ? ? ? ? ? ? ? ? ?
0 0 1 0 0 0 0 1 0 0 0 0
0 1 0 0 1 0 0 0 0 0 1 0
0 1 1 1 0 0 0 1 0 0 0 1
1 0 0 1 0 1 0 0 0 1 0 1
1 0 1 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 0 0 1 1 1
1 1 1 ? ? ? ? ? ? ? ? ?

git # 8b6da880 @ 2023-09-27 299


© Daniel Page ⟨dan@phoo.org⟩

Note that
L3 = 0
L5 = 0
L6 = L2
L7 = L1
L8 = L0

so actually we only need expressions for L0...2 and L4 , and that don’t care states are used to capture the
idea that D = 0 and D = 7 never occur. The resulting four Karnaugh maps

D1 D1
D0 D0
L0 00 01 11 10 L1 00 01 11 10

0 ? 0 1 0 0 ? 0 0 1
0 1 5 4 0 1 5 4

D2 1 1 1 ? 1 D2 1 0 0 ? 1
2 3 7 6 2 3 7 6

D1 D1
D0 D0
L2 00 01 11 10 L4 00 01 11 10

0 ? 0 0 0 0 ? 1 1 0
0 1 5 4 0 1 5 4

D2 1 1 1 ? 1 D2 1 0 1 ? 0
2 3 7 6 2 3 7 6

can be translated into the expressions:

L0 = D2 ∨ (D1 ∧ D0 )
L1 = (D1 ∧ ¬D0 )
L2 = D2
L4 = D0

b All the LEDs can be driven in parallel, i.e., the critical path relates to the single expression whose critical
path is the most. L2...6 have no logic involved, so we can discount them immediately. Of the two remaining
LEDs, we find
L0 { 20ns + 20ns
L1 { 10ns + 20ns

hence L0 represents the critical path of 40ns. Thus if one throw takes 40ns, we can perform

1s 1 · 109 ns
= = 25000000
40ns 40ns
throws per-second. Which is quite a lot, and certainly too many to actually see with the human eye!

c i A rough block diagram would resemble

+-----+ +-----+ +-----+


co <----------|co ci|<- - - - - |co ci|<----------|co ci|<---------- ci = 0
+--|s y|<----+ +--|s y|<----+ +--|s y|<----+
| | x|<-+ | | | x|<-+ | | | x|<-+ |
| +-----+ | | | +-----+ | | | +-----+ | |
| | | | | | | | |
| x_{n -1} | | x_1 | | x_0 |
| | | | | |
| y_{n -1} | y_1 | y_0
v v v
r_{n -1} r_1 r_0

ii If we sum 8 values 1 ≤ xi ≤ 6, where xi is the i-th throw (or i-th value of D supplied), then the
maximum total is 8 · 6 = 48. We can represent this in 6 bits, hence n = 6.

git # 8b6da880 @ 2023-09-27 300


© Daniel Page ⟨dan@phoo.org⟩

iii Using the left-shift method, we compute D′ = 2 · D by simply relabelling the bits in D. That is, D′0 = 0
and D′i+1 = Di for 0 ≤ i < 3. For example, given D = 6(10) = 110(2) we have

D′0 = 0
D′1 = D0 = 0
D′2 = D1 = 1
D′3 = D2 = 1

and hence D′ = 1100(2) = 12(10) . Since there is no need for any logic gates to implement this
method, the critical path is essentially nil: the only propagation delay relates to (small) wire delays.
In comparison to the larger critical path of a suitable n-bit adder, this clearly means the left-shift
approach is preferable.

S77. a A basic design would use two building blocks:

• lth_8bit compares two 8-bit inputs a and b and produces a 1-bit result r, where r = 1 if a < b and
r = 0 if a ≥ b:

a b
| |
v v
+-----------+
| lth_8bit |
+-----------+
|
v
r

• mux2_8bit selects between two 8-bit inputs; if the inputs are a and b, the output r = a if the control
signal s = 0, or r = b if s = 1:

a b
| |
v v
+-----------+
| mux2_8bit |<-- s
+-----------+
|
v
r

Based on these building blocks, one can describe the component C as follows:

x y
| |
v v
+-----------+
| lth_8bit |
+-----------+
y x | x y
| | | | |
v v | v v
+-----------+ | +-----------+
| mux2_8bit |<--+-->| mux2_8bit |
+-----------+ r +-----------+
| |
v v
min(x,y) max(x,y)

From a functional perspective, C compares x and y using an instance of the lth_8bit building block, and
then uses the result r as a control signal for two instances of mux2_8bit. The left-hand instance selects y
as the output if r = 0 and x if r = 1; that is, if x < y then the output is x = min(x, y) otherwise the output is
y = min(x, y). The right-hand instance swaps the inputs so it selects x as the output if r = 0 and y if r = 1;
that is, if x < y then the output is y = max(x, y) otherwise the output is x = max(x, y).
b The short answer (which gets about half the marks) is that the longest path through the mesh will go
through 2n − 1 of the C components: this is the path from the top-left corner down along one edge to the
bottom-left and then along another edge to the bottom-right. So in a sense, if we write the propagation
delay associated with each instance of C as TC then the overall critical path is

(2n − 1) · TC .

git # 8b6da880 @ 2023-09-27 301


© Daniel Page ⟨dan@phoo.org⟩

In a bit more detail, the critical path through C is through one instance of lth_8bit and one instance of
mux2_8bit. So we could write the overall critical path is

(2n − 1) · (Tlth_8bit + Tmux2_8bit ).

To be more detailed than this, we need to think about individual logic gates. Imagine we assume
TAND = 50ns, TAND = 20ns, TOR = 20ns and TNOT = 10ns.

• mux2_8bit is simply eight mux2_1bit instances placed in parallel with each other; that is, the i-th
such instance produces the i-th bit of the output based on the i-th bit of the inputs (but all using the
same control signal). Assuming that the propagation delay of AND and OR gates dominates that of
a NOT gate, the critical path through mux2_1bit will be TAND + TOR .
• lth_8bit is a combination of eight sub-components:

x_i y_i x_i y_i


| | | |
v v v v
+----------+ +----------+
| lth_1bit | | equ_1bit |
+----------+ +----------+
| |
| | t_{i -1}
| | |
| v v
| +----------+
| +----| AND |
| | +----------+
v v
+----------+
| OR |
+----------+
|
v
t_i

Each of these sub-components is placed in series so that ti−1 is an input from the previous sub-
component and ti is an output provided to the next.
Based on simple circuits derived from their truth tables, the critical paths for lth_1bit and equ_1bit
are TAND + TNOT and TXOR + TNOT respectively. Thus the critical path of the whole sub-component
is TXOR + TNOT + TAND + TOR (since the critical path of equ_1bit is longer). Overall, the critical path
of lth_8bit is
8 · (TXOR + TNOT + TAND + TOR ),
or more exactly
7 · (TXOR + TNOT + TAND + TOR ) + TAND + TNOT
because the 0-th sub-component is “special”: there is no input from the previous sub-component.

Using this we can write the overall critical path for the mesh as

(2n − 1) · (7 · TXOR + 8 · TNOT + 9 · TAND + 8 · TOR )

or roughly (2n − 1) · 770ns if we plug in the assumed delays.


c One problem is that the mesh does not always give the right result! If you were to build a 4 × 4 mesh and
feed 5, 6, 7, 8 into the top and 4, 3, 2, 1 into the left-hand side, the bottom reads 5, 6, 7, 8 and the right-hand
side reads 4, 3, 2, 1: numbers cannot move from bottom to top or from right to left, so there are some
inputs a mesh cannot sort.
Beyond this trick question, the main idea is that the mesh is special-purpose, the processor is general-
purpose: this implies a number of trade-offs in either direction that could be viewed as advantages or
disadvantages in certain cases. For example, depending on n, one might argue that the processor will
be require more logic to realise it (since it will include features extraneous to the task of sorting). Since
it operates a fetch-decode-execute cycle to complete each instruction, there is an overhead (i.e., the fetch
and decode at least) which means it potentially performs the task of sorting less quickly. On the other
hand, once constructed the mesh is specialised to one task: it cannot be used to sort strings for example,
and the size of input (i.e., n) is fixed. The processor makes the opposite trade-off; it should be clear that
while it might be slower and potentially larger, it is vastly more flexible.

S78. a Imagine a component which is enabled (i.e., “turned on”) using the input en:

git # 8b6da880 @ 2023-09-27 302


© Daniel Page ⟨dan@phoo.org⟩

• The idea of the component being level triggered is that the value of en is important, not a change in
en: the component is enabled when en has a particular value, rather than at an edge when the value
changes.
• The fact en is active high means that the component is enabled when en = 1 (rather than en = 0 which
would make it active low). Though active high might seem the more logical choice, this is just part
of the component specification: as long as everything is consistent, i.e., uses the right semantics to
“turn on” the component, there is sometimes no major benefit of one approach over the other.

b Assume that M is a 4-state switch represented by a 2-bit value M = ⟨M0 , M1 ⟩: ⟨0, 0⟩ means off, ⟨1, 0⟩ means
slow, ⟨0, 1⟩ means fast and ⟨1, 1⟩ means very fast. Also assume there is a clock signal called clk available,
for example supplied by an oscillator of some form.
One approach would basically be to take clk and divide it to create two new clock signals c0 and c1 which
have a longer period: each of the clock signals could then satisfy the criteria of toggling the fire button
on and off at various speeds. A clock divider is fairly simple: the idea is to have a counter c clocked by
clk and to sample the (i − 1)-th bit of the counter: this behaves like clk divided by 2i . For example the 0-th
bit acts like clk but with twice the period.
A circuit to do this is fairly simple: we need some D-type flip-flops to hold the counter state, and some
full-adders to increment the counter:

+-----+ +-----+
|co ci|<----------|co ci|<-- 0
+--|s y|<-- 0 +--|s y|<-- 1
| | x|<-+ | | x|<-+
| +-----+ | | +-----+ |
| | | |
| c_1 --+ | c_0 --+
| | | |
| +-----+ | | +-----+ |
+->|D Q|--+ +->|D Q|--+
| <|<-+ | <|<-+
| | | | | |
+-----+ | +-----+ |
| |
+-----------------+-- clk

Given such a component which runs freely as long as it is driven by clk, we want to feed the original
fire button F0 through to form the new fire button input F′0 when M = 0, and c1 , c0 or clk through when
M = 1, M = 2 or M = 3 (meaning a slow, fast or very fast toggling behaviour). We can describe this as the
following truth table:
M1 M0 F′0
0 0 F0
0 1 c1
1 0 c0
1 1 clk
This is essentially a multiplexer controlled by M, and permits us to write

F′0 = ( ¬M0 ∧ ¬M1 ∧ F0 )


( M0 ∧ ¬M1 ∧ c1 )
( ¬M0 ∧ M1 ∧ c0 )
( M0 ∧ M1 ∧ clk )

c i A synchronous protocol demands that the console and controller share a clock signal which acts
to synchronise their activity, e.g., ensures each one sends and receives data at the right time. The
problem with this is ensuring that the clock is not skewed for either component: since they are
physically separate, this might be hard and hence this is not such a good option.
An asynchronous protocol relies on extra connections between the components, e.g., “request” and
“acknowledge”, that allow them to engage in a form of transaction: the extra connections essentially
signal when data has been sent or received on the associated bus. This is more suitable given the
scenario: the extra connections could potentially be shared with those that already exist (e.g., F0 , F1 ,
F2 and D) thereby reducing the overhead, plus performance is not a big issue here (the protocol will
presumably only be executed once when the components are turned on or plugged in).
Both approaches have an issue in that
• once the protocol is run someone could just plug in another, fake controller, or

git # 8b6da880 @ 2023-09-27 303


© Daniel Page ⟨dan@phoo.org⟩

• or simply intercept c and T(c) pairs until it recovers the whole look-up table and then “imitate”
it using a fake controller
so neither is particularly robust from a security point of view!
ii The temptation here is to say that the use of a 3-bit memory (or register) is the right way to go.
Although this allows some degree of flexibility which is not required since the function is fixed, the
main disadvantage is retention of the content when the controller or console is turned off: some
form of non-volatile memory is therefore needed.
However, we can easily construct some dedicated logic to do the same thing. If we say that y = T(x),
the we can describe the behaviour of T using the following truth table:

x2 x1 x0 y2 y1 y0
0 0 0 0 1 0
0 0 1 1 1 0
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 1 0 1
1 1 1 0 1 1

This can be transformed into the following Karnaugh maps for y0 , y1 and y2
x1 x1 x1
x0 x0 x0
y2 00 01 11 10 y1 00 01 11 10 y0 00 01 11 10

0 0 1 0 1 0 1 1 0 1 0 0 0 1 1
0 1 5 4 0 1 5 4 0 1 5 4

x2 1 1 0 0 1 x2 1 0 0 1 0 x2 1 0 0 1 1
2 3 7 6 2 3 7 6 2 3 7 6

which in turn can be transformed into the following equations

y2 = ( x1 ∧ ¬x0 ) ∨
( x2 ∧ ¬x0 ) ∨
( ¬x2 ∧ ¬x1 ∧ x0 )
y1 = ( ¬x2 ∧ ¬x1 ) ∨
( ¬x2 ∧ ¬x0 ) ∨
( x2 ∧ x1 ∧ x0 )
y0 = ( x1 )

which are enough to implement the look-up table: we pass x as input, and it produces the right y
(for this fixed T) as output.

S79. This is a classic “puzzle” question in digital logic. There are a few ways to describe the strategy, but the one
used here is based on counting the number of inputs which are 1. In short, we start by computing

t1 = ¬(x ∧ y ∨ y ∧ z ∨ x ∧ z)
t2 = ¬((x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z))

which use our quota of NOT gates. The idea is that t1 = 1 iff. one or zero of x, y and z are 1, and in the same
way t2 = 1 iff. two or zero of x, y and z are 1. This can be hard to see, so consider a truth table

x y z x∧y y∧z x∧z x∧y∧z x∨y∨z


0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 1 1 0 1 0 0 1
1 0 0 0 0 0 0 1
1 0 1 0 0 1 0 1
1 1 0 1 0 0 0 1
1 1 1 1 1 1 1 1

git # 8b6da880 @ 2023-09-27 304


© Daniel Page ⟨dan@phoo.org⟩

meaning
x y z x∧y∨y∧z∨x∧z t1 (x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z) t2
0 0 0 0 1 0 1
0 0 1 0 1 1 0
0 1 0 0 1 1 0
0 1 1 1 0 0 1
1 0 0 0 1 1 0
1 0 1 1 0 0 1
1 1 0 1 0 0 1
1 1 1 1 0 1 0
and hence t1 and t2 are as required. Now, we can generate the three results as

¬x = (t1 ∧ t2 ) ∨ (t1 ∧ (y ∨ z)) ∨ (t2 ∧ y ∧ z)


= (t1 ∧ t2 ) ∨ (t1 ∧ y) ∨ (t1 ∧ z) ∨ (t2 ∧ y ∧ z)
¬y = (t1 ∧ t2 ) ∨ (t1 ∧ (x ∨ z)) ∨ (t2 ∧ x ∧ z)
= (t1 ∧ t2 ) ∨ (t1 ∧ x) ∨ (t1 ∧ z) ∨ (t2 ∧ x ∧ z)
¬z = (t1 ∧ t2 ) ∨ (t1 ∧ (x ∨ y)) ∨ (t2 ∧ x ∧ y)
= (t1 ∧ t2 ) ∨ (t1 ∧ x) ∨ (t1 ∧ y) ∨ (t2 ∧ x ∧ y)

S80. Imagine that for some n-bit input x, we let yi = Ci (x) denote the evaluation of Ci to get an output yi . As such,
the equivalence of C1 and C2 can be stated as a test whether y1 = y2 for all values of x; another way to say the
same thing is to test whether an x exists such that y1 , y2 which will distinguish the circuits, i.e., imply they
are not equivalent.
Using the second formulation, we can write the test as y1 ⊕ y2 since the XOR will produce 1 when y1 differs
from y2 and 0 otherwise. As such, we have n Boolean variables (the bits of x) and want an assignment that
implies the expression C1 (x) ⊕ C2 (x) will evaluate to 1. This is the same as described in the description of SAT,
so if we can solve the SAT instance we prove the circuits are (not) equivalent.

S81. a The latency of the circuit is the time taken to perform the computation, i.e., to compute some r given x.
For this circuit, the latency is simply the sum of the critical paths.
b The throughput is the number of operations performed per unit time period. This is essentially the
number of operations we can start (resp. that finish) within that time period.
By pipelining the circuit, using say 3 stages, one might expect the latency to increase slightly (by virtue of
having to add pipeline registers between each stage) but the throughput to increase (by virtue of decreasing
the overall critical path to the longest stage, and hence increasing the maximum clock frequency). The trade-off
is strongly influenced by the number of and balance between stages, meaning careful analysis of the circuit
before applying the optimisation is important.

S82. a The latency of a circuit is the time elapsed between when a given operation starts and when it finishes.
The throughput of a circuit is the number of operations that can be started in each time period; that is,
how long it takes between when two subsequent operations can be started.
b The latency of the circuit is the sum of all the latencies of the parts,i.e.,

40ns + 10ns + 30ns + 10ns + 50ns + 10ns + 10ns = 160ns.

The throughput relates to the length of the longest pipeline stage; the circuit is not pipelined, so more
1
specifically we can say it is 160·10−9 .

c The new latency is still the sum of all the parts, but now includes the extra pipeline register:

40ns + 10ns + 30ns + 10ns + 10ns + 50ns + 10ns + 10ns = 170ns.

However, the throughput is now more because the longest pipeline stage only has a latency of 100ns
1
(including the extra register). Specifically, the throughput increases to 100·10 −9 which essentially means we

can start new operations more often than before.


d To maximise the throughput we need to minimise the latency of the longest pipeline stage (i.e., the one
whose individual latency is the largest) since this will act as a limit. The latency of part E is largest (at
50ns) and hence represents said limit: the longest pipeline stage cannot have a latency of less than 60ns
(i.e., the latency of part E plus the latency of a pipeline register).

git # 8b6da880 @ 2023-09-27 305


© Daniel Page ⟨dan@phoo.org⟩

We can achieve this by creating a 4-stage pipeline: adding two more pipeline registers, between parts B
and C and parts E and F, ensures the stages have latencies of

A + B + REG { 40ns + 10ns + 10ns = 60ns


C + D + REG { 30ns + 10ns + 10ns = 50ns
E + REG { 50ns + 10ns = 60ns
F + REG { 10ns + 10ns = 20ns
1
Overall, the latency is increased to 190ns but the throughput is 60·10−9
.

S83. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( y ∧ ¬z ) ∨
( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 1 0
0 1

y 1 1 1
2 3

and associated, optimised implementation:

r = ( ¬z ) ∨
( y )

S84. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( ¬y ∧ z ) ∨
( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 1 1
0 1

y 1 0 1
2 3

and associated, optimised implementation:

r = ( ¬y ) ∨
( z )

S85. a Reference implementation:


r = ( ¬y ∧ ¬z )

b Annotated Karnaugh map

z
0 1

0 1 0
0 1

y 1 0 0
2 3

git # 8b6da880 @ 2023-09-27 306


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:

r = ( ¬y ∧ ¬z )

S86. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( ¬y ∧ z )

b Annotated Karnaugh map

z
0 1

0 1 1
0 1

y 1 0 0
2 3

and associated, optimised implementation:

r = ( ¬y )

S87. a Reference implementation:


r = ( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 0 0
0 1

y 1 0 1
2 3

and associated, optimised implementation:

r = ( y ∧ z )

S88. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 1 0
0 1

y 1 0 1
2 3

and associated, optimised implementation:

r = ( y ∧ z ) ∨
( ¬y ∧ ¬z )

S89. a Reference implementation:


r = ( y ∧ ¬z )

b Annotated Karnaugh map

git # 8b6da880 @ 2023-09-27 307


© Daniel Page ⟨dan@phoo.org⟩

z
0 1

0 0 0
0 1

y 1 1 0
2 3

and associated, optimised implementation:

r = ( y ∧ ¬z )

S90. a Reference implementation:


r = ( ¬y ∧ z ) ∨
( y ∧ ¬z )

b Annotated Karnaugh map

z
0 1

0 0 1
0 1

y 1 1 0
2 3

and associated, optimised implementation:

r = ( y ∧ ¬z ) ∨
( ¬y ∧ z )

S91. a Reference implementation:


r = ( ¬y ∧ z )

b Annotated Karnaugh map

z
0 1

0 0 1
0 1

y 1 0 0
2 3

and associated, optimised implementation:

r = ( ¬y ∧ z )

S92. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( y ∧ ¬z )

b Annotated Karnaugh map

z
0 1

0 1 0
0 1

y 1 1 0
2 3

and associated, optimised implementation:

r = ( ¬z )

git # 8b6da880 @ 2023-09-27 308


© Daniel Page ⟨dan@phoo.org⟩

S93. a Reference implementation:


r = ( y ∧ ¬z )

b Annotated Karnaugh map

z
0 1

0 0 ?
0 1

y 1 1 0
2 3

and associated, optimised implementation:

r = ( y ∧ ¬z )

S94. a Reference implementation:


r = ( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 ? ?
0 1

y 1 0 1
2 3

and associated, optimised implementation:

r = ( z )

S95. a Reference implementation:


r = ( ¬y ∧ z ) ∨
( y ∧ z )

b Annotated Karnaugh map

z
0 1

0 0 1
0 1

y 1 ? 1
2 3

and associated, optimised implementation:

r = ( z )

S96. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( y ∧ ¬z )

b Annotated Karnaugh map

z
0 1

0 1 ?
0 1

y 1 1 0
2 3

git # 8b6da880 @ 2023-09-27 309


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:

r = ( ¬z )

S97. a Reference implementation:


r = ( ¬y ∧ ¬z ) ∨
( ¬y ∧ z )

b Annotated Karnaugh map

z
0 1

0 1 1
0 1

y 1 0 ?
2 3

and associated, optimised implementation:

r = ( ¬y )

S98. a Reference implementation:


r = ( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 1 1
0 1 5 4

y 1 0 0 0 1
2 3 7 6

and associated, optimised implementation:

r = ( x ∧ ¬z ) ∨
( x ∧ ¬y )

S99. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( ¬x ∧ y ∧ ¬z ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z ) ∨
( x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 1 1 0 1
0 1 5 4

y 1 1 0 1 1
2 3 7 6

git # 8b6da880 @ 2023-09-27 310


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:


r = ( ¬x ∧ ¬y ) ∨
( ¬z ) ∨
( x ∧ y )

S100. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ ¬y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 1 1 1 0
0 1 5 4

y 1 0 1 0 0
2 3 7 6

and associated, optimised implementation:


r = ( ¬x ∧ ¬y ) ∨
( ¬x ∧ z ) ∨
( ¬y ∧ z )

S101. a Reference implementation:


r = ( ¬x ∧ y ∧ ¬z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 1 0
0 1 5 4

y 1 1 1 1 0
2 3 7 6

and associated, optimised implementation:


r = ( x ∧ z ) ∨
( ¬x ∧ y )

S102. a Reference implementation:


r = ( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 1 0
0 1 5 4

y 1 0 0 1 0
2 3 7 6

git # 8b6da880 @ 2023-09-27 311


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:


r = ( x ∧ z )

S103. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ z ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ ¬z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 0 1 1 1
0 1 5 4

y 1 0 0 0 1
2 3 7 6

and associated, optimised implementation:


r = ( x ∧ ¬z ) ∨
( ¬y ∧ z )

S104. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ ¬z ) ∨
( x ∧ y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 1 1 1 1
0 1 5 4

y 1 0 0 1 1
2 3 7 6

and associated, optimised implementation:


r = ( ¬y ) ∨
( x )

S105. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 1 0 0 0
0 1 5 4

y 1 0 0 0 1
2 3 7 6

git # 8b6da880 @ 2023-09-27 312


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z )

S106. a Reference implementation:


r = ( ¬x ∧ y ∧ z ) ∨
( x ∧ y ∧ ¬z ) ∨
( x ∧ y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 0 0
0 1 5 4

y 1 0 1 1 1
2 3 7 6

and associated, optimised implementation:


r = ( x ∧ y ) ∨
( y ∧ z )

S107. a Reference implementation:


r = ( ¬x ∧ y ∧ ¬z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z ) ∨
( x ∧ y ∧ z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 1 1
0 1 5 4

y 1 1 1 1 0
2 3 7 6

and associated, optimised implementation:


r = ( y ∧ z ) ∨
( x ∧ ¬y ) ∨
( ¬x ∧ y )

S108. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z )
b Annotated Karnaugh map

x
z
00 01 11 10

0 1 ? ? 0
0 1 5 4

y 1 0 ? 0 1
2 3 7 6

git # 8b6da880 @ 2023-09-27 313


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:

r = ( ¬x ∧ ¬y ) ∨
( x ∧ y ∧ ¬z )

S109. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 1 ? 1 0
0 1 5 4

y 1 0 0 ? 0
2 3 7 6

and associated, optimised implementation:

r = ( ¬x ∧ ¬y ) ∨
( x ∧ z )

S110. a Reference implementation:


r = ( ¬x ∧ y ∧ ¬z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 0 ? 1 ?
0 1 5 4

y 1 1 1 ? ?
2 3 7 6

and associated, optimised implementation:

r = ( y ) ∨
( z )

S111. a Reference implementation:


r = ( x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 ? 0 ? ?
0 1 5 4

y 1 ? ? 1 0
2 3 7 6

and associated, optimised implementation:

r = ( y ∧ z )

git # 8b6da880 @ 2023-09-27 314


© Daniel Page ⟨dan@phoo.org⟩

S112. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 1 1 ? ?
0 1 5 4

y 1 0 ? 0 0
2 3 7 6

and associated, optimised implementation:

r = ( ¬y )

S113. a Reference implementation:


r = ( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z ) ∨
( x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 ? 0 ? 1
0 1 5 4

y 1 0 ? 1 1
2 3 7 6

and associated, optimised implementation:

r = ( x )

S114. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ z ) ∨
( ¬x ∧ y ∧ ¬z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ y ∧ ¬z ) ∨
( x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 ? 1 0 1
0 1 5 4

y 1 1 1 1 1
2 3 7 6

and associated, optimised implementation:

r = ( ¬z ) ∨
( ¬x ) ∨
( y )

git # 8b6da880 @ 2023-09-27 315


© Daniel Page ⟨dan@phoo.org⟩

S115. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ z ) ∨
( x ∧ ¬y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 0 1 0 1
0 1 5 4

y 1 ? ? ? 0
2 3 7 6

and associated, optimised implementation:

r = ( ¬x ∧ z ) ∨
( x ∧ ¬y ∧ ¬z )

S116. a Reference implementation:


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ y ∧ z ) ∨
( x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 1 ? ? 0
0 1 5 4

y 1 0 1 1 ?
2 3 7 6

and associated, optimised implementation:

r = ( ¬x ∧ ¬y ) ∨
( z )

S117. a Reference implementation:


r = ( x ∧ ¬y ∧ ¬z ) ∨
( x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

0 0 0 1 1
0 1 5 4

y 1 0 0 ? 0
2 3 7 6

and associated, optimised implementation:

r = ( x ∧ ¬y )

git # 8b6da880 @ 2023-09-27 316


© Daniel Page ⟨dan@phoo.org⟩

S118. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 1 1 0 1
0 1 5 4

01 0 0 1 0
2 3 7 6
y
11 0 0 0 0
10 11 15 14
w
10 1 0 0 0
8 9 13 12

and associated, optimised implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ) ∨
( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( ¬w ∧ ¬y ∧ ¬z )

S119. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 1 0 1 0
0 1 5 4

01 1 0 1 0
2 3 7 6
y
11 0 1 0 1
10 11 15 14
w
10 1 1 0 1
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ z ) ∨
( w ∧ ¬x ∧ z ) ∨
( ¬w ∧ ¬x ∧ ¬z )

git # 8b6da880 @ 2023-09-27 317


© Daniel Page ⟨dan@phoo.org⟩

S120. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 1 0 1 0
0 1 5 4

01 0 0 0 1
2 3 7 6
y
11 0 0 1 0
10 11 15 14
w
10 0 0 0 1
8 9 13 12

and associated, optimised implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )

S121. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 1 1 1
0 1 5 4

01 0 0 0 0
2 3 7 6
y
11 1 1 0 1
10 11 15 14
w
10 1 1 0 1
8 9 13 12

and associated, optimised implementation:

r = ( ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ) ∨
( w ∧ ¬x )

git # 8b6da880 @ 2023-09-27 318


© Daniel Page ⟨dan@phoo.org⟩

S122. a Reference implementation:


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 1 0 0 1
0 1 5 4

01 0 1 1 0
2 3 7 6
y
11 1 1 0 1
10 11 15 14
w
10 1 0 0 1
8 9 13 12

and associated, optimised implementation:


r = ( ¬x ∧ y ∧ z ) ∨
( w ∧ ¬z ) ∨
( ¬w ∧ y ∧ z ) ∨
( ¬y ∧ ¬z )

S123. a Reference implementation:


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 1 0 1
0 1 5 4

01 0 0 0 1
2 3 7 6
y
11 0 0 1 1
10 11 15 14
w
10 0 1 1 1
8 9 13 12

and associated, optimised implementation:


r = ( x ∧ ¬z ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( w ∧ x )

git # 8b6da880 @ 2023-09-27 319


© Daniel Page ⟨dan@phoo.org⟩

S124. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 0 1 1
0 1 5 4

01 1 1 1 1
2 3 7 6
y
11 1 0 0 0
10 11 15 14
w
10 1 0 0 1
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ ¬x ∧ ¬z ) ∨
( ¬w ∧ x ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ y )

S125. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 0 1 1
0 1 5 4

01 1 0 1 0
2 3 7 6
y
11 0 1 0 1
10 11 15 14
w
10 0 1 1 1
8 9 13 12

git # 8b6da880 @ 2023-09-27 320


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( x ∧ ¬y ) ∨
( w ∧ x ∧ ¬z ) ∨
( w ∧ ¬x ∧ z ) ∨
( ¬w ∧ x ∧ z )

S126. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 1 0 0 1
0 1 5 4

01 1 1 1 1
2 3 7 6
y
11 1 1 0 0
10 11 15 14
w
10 1 1 1 1
8 9 13 12

and associated, optimised implementation:

r = ( ¬w ∧ ¬z ) ∨
( w ∧ ¬x ) ∨
( ¬w ∧ y ) ∨
( w ∧ ¬y )

S127. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

git # 8b6da880 @ 2023-09-27 321


© Daniel Page ⟨dan@phoo.org⟩

x
z
00 01 11 10

00 0 1 0 0
0 1 5 4

01 0 0 0 1
2 3 7 6
y
11 0 1 1 0
10 11 15 14
w
10 1 1 0 0
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ ¬x ∧ ¬y ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( w ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z )

S128. a Reference implementation:

r = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 ? ? 0
0 1 5 4

01 ? 0 ? ?
2 3 7 6
y
11 1 ? ? 1
10 11 15 14
w
10 0 0 1 0
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ y ) ∨
( x ∧ z )

S129. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z )

b Annotated Karnaugh map

git # 8b6da880 @ 2023-09-27 322


© Daniel Page ⟨dan@phoo.org⟩

x
z
00 01 11 10

00 ? ? 1 1
0 1 5 4

01 0 1 0 0
2 3 7 6
y
11 1 1 0 0
10 11 15 14
w
10 1 1 ? 0
8 9 13 12

and associated, optimised implementation:

r = ( ¬x ∧ z ) ∨
( w ∧ ¬x ) ∨
( ¬w ∧ ¬y )

S130. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 ? ? 0 0
0 1 5 4

01 1 0 0 0
2 3 7 6
y
11 ? 0 1 0
10 11 15 14
w
10 1 0 ? 1
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ x ∧ ¬y ) ∨
( w ∧ x ∧ z ) ∨
( ¬x ∧ ¬z )

S131. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z )

b Annotated Karnaugh map

git # 8b6da880 @ 2023-09-27 323


© Daniel Page ⟨dan@phoo.org⟩

x
z
00 01 11 10

00 0 1 ? 0
0 1 5 4

01 0 0 1 0
2 3 7 6
y
11 1 1 ? 0
10 11 15 14
w
10 0 0 1 1
8 9 13 12

and associated, optimised implementation:

r = ( w ∧ x ∧ ¬y ) ∨
( ¬w ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ) ∨
( x ∧ z )

S132. a Reference implementation:

r = ( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 0 0 1 0
0 1 5 4

01 0 ? ? 1
2 3 7 6
y
11 0 ? 0 ?
10 11 15 14
w
10 ? 1 1 ?
8 9 13 12

and associated, optimised implementation:

r = ( x ∧ y ∧ ¬z ) ∨
( w ∧ ¬y ) ∨
( ¬w ∧ x ∧ z )

S133. a Reference implementation:

r = ( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )

b Annotated Karnaugh map

git # 8b6da880 @ 2023-09-27 324


© Daniel Page ⟨dan@phoo.org⟩

x
z
00 01 11 10

00 0 ? 0 1
0 1 5 4

01 0 ? 0 0
2 3 7 6
y
11 ? 1 0 0
10 11 15 14
w
10 ? ? ? 1
8 9 13 12

and associated, optimised implementation:


r = ( ¬x ∧ z ) ∨
( x ∧ ¬y ∧ ¬z )

S134. a Reference implementation:


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map


x
z
00 01 11 10

00 0 0 0 0
0 1 5 4

01 1 ? 1 ?
2 3 7 6
y
11 0 0 1 1
10 11 15 14
w
10 ? 0 0 ?
8 9 13 12

and associated, optimised implementation:


r = ( ¬w ∧ y ) ∨
( x ∧ y )

S135. a Reference implementation:


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map


x
z
00 01 11 10

00 ? 1 ? 1
0 1 5 4

01 ? 0 ? 1
2 3 7 6
y
11 ? ? 1 0
10 11 15 14
w
10 ? 1 ? ?
8 9 13 12

git # 8b6da880 @ 2023-09-27 325


© Daniel Page ⟨dan@phoo.org⟩

and associated, optimised implementation:

r = ( x ∧ z ) ∨
( ¬w ∧ x ) ∨
( ¬y )

S136. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 ? 0 ? ?
0 1 5 4

01 ? 1 1 1
2 3 7 6
y
11 ? ? 1 ?
10 11 15 14
w
10 1 ? ? 1
8 9 13 12

and associated, optimised implementation:

r = ( y ) ∨
( w )

S137. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z )

b Annotated Karnaugh map

x
z
00 01 11 10

00 ? 0 1 1
0 1 5 4

01 1 0 1 0
2 3 7 6
y
11 ? 0 ? 0
10 11 15 14
w
10 ? 0 ? ?
8 9 13 12

and associated, optimised implementation:

r = ( x ∧ z ) ∨
( x ∧ ¬y ) ∨
( ¬x ∧ ¬z )

git # 8b6da880 @ 2023-09-27 326


© Daniel Page ⟨dan@phoo.org⟩

B.3 Chapter 3
S138. First, notice that the function does not use y at all, so the function cannot add x to y. All other options are
plausible: to assess which is correct, one could take a brute-force approach and execute it via
int main( int argc , char* argv [] ) {
for( int i = 0; i < 256; i++ ) {
printf ( "%3d %3d\n", i, f( i ) );
}

return 0;
}

Doing so shows that that the function increments x. But why is this? Consider some specific examples:

• For x = 14(10) = 00001110(2) , the loop performs 0 iterations then terminates: the value

x | m = 00001110(2) ∨ 00000001(2)
= 00001111(2)
= 15(10)

i.e., x + 1 is returned.

• For x = 7(10) = 00000111(2) , the loop performs 3 iterations

x = x & ~m = 00000111(2) ∧ ¬00000001(2)


= 00000111(2) ∧ 11111110(2)
= 00000110(2)
m = m << 1 = 00000001(2) ≪ 1
= 00000010(2)

x = x & ~m = 00000110(2) ∧ ¬00000010(2)


= 00000110(2) ∧ 11111101(2)
= 00000100(2)
m = m << 1 = 00000010(2) ≪ 1
= 00000100(2)

x = x & ~m = 00000100(2) ∧ ¬00000100(2)


= 00000100(2) ∧ 11111011(2)
= 00000000(2)
m = m << 1 = 00000100(2) ≪ 1
= 00001000(2)

then terminates: the value


x | m = 00000000(2) ∨ 00001000(2)
= 00001000(2)
= 16(10)
i.e., x + 1 is returned.

More generally, the idea is that, as a result of initialising m to 1 and then left-shifting it in each iteration, the
while loop iterates over each LSB of x which is 1; while doing so, x is assigned the value x & ~m which toggles
the current bit. Once the loop terminates, the value x | m is returned: this sets the current bit of x, which we
know is 0 due to the loop condition, to 1.

S139. • x == 0 evaluates to non-zero if x = 0, i.e., every bit of x is 0. x == -1 evaluates to non-zero if x = −1 ≡


232 − 1, i.e., every bit of x is 1: although x is unsigned, the signed value −1 will “wrap around” to the
representation
x 7→ −231 + 230 · · · + 21 + 20 ,
i.e., where every bit of x is 1. ( x == 0 ) || ( x == -1 ) therefore yields the required behaviour.

• !x evaluates to non-zero if x = 0, i.e., every bit of x is 0. !(~x) evaluates to non-zero if x = −1 ≡ 232 − 1,


i.e., every bit of x is 1: if every bit of x is 1, ~x means every bit is 0 and thus !(~x) evaluates to non-zero
in exactly that case. !x || !(~x) therefore yields the required behaviour.

git # 8b6da880 @ 2023-09-27 327


© Daniel Page ⟨dan@phoo.org⟩

• This option is arguably trickier, in the sense there is only one term; it is therefore harder to see how it
captures the two cases. To see why it does, you could apply sort of the opposite reasoning to the above. If
every bit of x is 0, then x = 0 and so x+1 = 1; ( x + 1 ) < 2 evaluates to non-zero in this case. If every bit
of x is 1, then x = −1 ≡ 232 − 1 and so x+1 = 0; ( x + 1 ) < 2 evaluates to non-zero in this case. Given x is
unsigned, all other representations it could take imply 1 ≤ x ≤ 232 − 2 and so 2 ≤ ( x + 1 ) < 2 ≤ 232 − 1;
in such cases we conclude that ( x + 1 ) > 2 evaluates to zero.

S140. Although one could select the correct option by inspection, the easiest approach is simply work through each
one. Doing that via a complete trace (e.g., of each intermediate computation, by each full-adder instance) is
overly verbose, so in the below, we capture the pertinent details only (noting the sequence representing the
carry-chain is read left-to-right; this matches the ripple-carry diagram, but might seem odd wrt. the right-to-left
order of digits in the literals):
x = 0000(2) = 0(10)
y = 0000(2) = 0(10)
r = 0000(2) = 0(10)
c = ⟨0, 0, 0, 0, 0⟩
c2 = 0

x = 1100(2) = 12(10)
y = 0001(2) = 1(10)
r = 1101(2) = 13(10)
c = ⟨0, 0, 0, 0, 0⟩
c2 = 0

x = 0100(2) = 4(10)
y = 0100(2) = 4(10)
r = 1000(2) = 8(10)
c = ⟨0, 0, 0, 1, 0⟩
c2 = 0

x = 1011(2) = 11(10)
y = 1001(2) = 9(10)
r = 0100(2) = 4(10)
c = ⟨0, 1, 1, 0, 1⟩
c2 = 1

x = 0110(2) = 6(10)
y = 0101(2) = 5(10)
r = 1011(2) = 11(10)
c = ⟨0, 0, 0, 1, 0⟩
c2 = 0

S141. In general, an overflow condition occurs when the correct result of some arithmetic operation cannot be
represented (meaning the result is then incorrect). Within the context outlined by the question, two instances of
this can occur: either 1) x is positive and y is positive, but r is negative, or 2) x is negative and y is negative, but
r is positive. In both instances, the sign of r is incorrect because the correct value of r has too large a magnitude
to represent in 8 bits.
In two’s-complement the MSB indicates the sign, meaning that x7 , y7 , and r7 indicate whether x, y, and r are
positive or negative respectively. Using this information we can translate each condition above into a Boolean
expression, i.e.,
x7 ∧ y7 ∧ ¬r7
indicates that x is positive, y is positive, and r is negative, while

¬x7 ∧ ¬y7 ∧ r7

indicates that x is negative, y is negative, and r is positive. As such, simply OR’ing these expressions together
produces the flag we require, i.e.,

f = (x7 ∧ y7 ∧ ¬r7 ) ∨ (¬x7 ∧ ¬y7 ∧ r7 ).

git # 8b6da880 @ 2023-09-27 328


© Daniel Page ⟨dan@phoo.org⟩

S142. O(n) implies the critical path is proportional to the number of bits (including some constant factor) required
to represent each of the operands. The reason is the carry chain which runs through all n full-adders in the
design: each i-th full-adder produces a carry-out used as a carry-into the (i + 1)-th full-adder. This means each
i-th bit of the result depends on, and cannot be computed before, all j-th bits for 0 ≤ j < i.
An alternative, carry look-ahead design separates computation of carries from the full-adder cells them-
selves; this allows an organisation whose critical path can be described as O(log n), although the number of
logic gates required is less attractive.

S143. The relationship


x is a power-of-two ≡ (x ∧ (x − 1)) = 0
performs the test which can be written as the C expression

( x & ( x - 1 ) ) == 0

This works because if x is an exact power-of-two then x − 1 sets all bits less-significant that the n-th to one; when
this is AND’ed with x (which only has the n-th bit set to one) the result is zero. If x is not an exact power-of-two
then there will be bits in x other than the n-th set to one; in this case x − 1 only sets bits less-significant than the
least-significant and hence there are others left over which, when AND’ed with x, result in a non-zero result.
Note that the expression fails, i.e., it is non-zero, for x = 0 but this is allowed since the question says x , 0.

S144. x is of type char, so is therefore represented using two’s-complement in 8 bits; values for such a representation
range between 2n−1 − 1 = 28−1 − 1 = 127 and −2n−1 = −28−1 = −128 inclusive. This means that by

a decrementing x we get the value before 127, which is 126, or

b incrementing x we get the value after 127, which is −128: the reason for this is that the representation of
127 is 01111111(2) , but the next value 10000000(2) is the largest negative value possible. That is, there has
been an overflow with the result “wrapping around”.

S145. The expression computes the comparison 0 < x. This is because if x < 0 then x3 = 1, and if x = 0 then
x3 = x2 = x1 = x0 = 0. Therefore, x > 0 if both x3 = 0 and one of xi , 0 for i ∈ {2, 1, 0}. Strictly speaking, it tests
whether 0 < x ≤ 7 but the upper bound is implied by the representation of x: it cannot take a value greater
than 7 by definition.

S146.
The initial temptation is to use six adder components to compute

r=7·x=x+x+x+x+x+x+x

where the size of inputs and outputs increases as one progresses through the computation; a considered
approach might utilise carry save adders to reduce the critical path associated with the multiple summands,
but here we consider ripple-carry designs only.
A more efficient alternative would use three adders to compute

r = 7 · x = 4 · x + 2 · x + 1 · x = 22 · x + 21 · x + 20 · x

noting that the multiplications by powers-of-two are “free” since they can be achieved by simply relabelling
bits rather than computation. This approach can be further refined to compute

r = 7 · x = 8 · x − 1 · x = 23 · x − 20 · x

using just one adder (assuming addition and subtraction can be realised using the same component). Clearly
this will produce the shortest critical path, and relates to the following diagram:
+------------------+ +---+
x -- n-bit -->| 3-bit left - shift |-- (n+3) - bit -->| |
+------------------+ |sub|-- (n+4) - bit --> r
x -- n-bit -------------------------------------->| |
+---+

git # 8b6da880 @ 2023-09-27 329


© Daniel Page ⟨dan@phoo.org⟩

S147. The truth table for this operation is as follows:

x1 x0 y1 y0 r3 r2 r1 r0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 0
1 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1

Using four Karnaugh maps to produce each ri is overkill, since we can easily derive expressions for r0 and r3
by inspection. Therefore, transcribing the truth table into suitable Karnaugh maps for just r1 and r2 gives

x1 x1
x0 x0
r1 00 01 11 10 r2 00 01 11 10

00 0 0 0 0 00 0 0 0 0
0 1 5 4 0 1 5 4

01 0 0 1 1 01 0 0 0 0
2 3 7 6 2 3 7 6
y0 y0
11 0 1 0 1 11 0 0 0 1
10 11 15 14 10 11 15 14
y1 y1
10 0 1 1 0 10 0 0 1 1
8 9 13 12 8 9 13 12

from which we can derive the SoP expressions

r0 = ( x0 ∧ y0 )

r1 = ( x1 ∧ ¬ y1 ∧ y0 ) ∨
( x0 ∧ y1 ∧ ¬ y0 ) ∨
( ¬ x1 ∧ x0 ∧ y1 ) ∨
( x1 ∧ ¬ x0 ∧ y0 )

r2 = ( x1 ∧ ¬ x0 ∧ y1 ) ∨
( x1 ∧ y1 ∧ ¬ y0 )

r3 = ( x1 ∧ x0 ∧ y1 ∧ y0 )

S148. a Clearly we can implement , by negating the result of = and likewise for < and ≥, and > and ≤.
Furthermore, we can build ≥ from > and =, and ≤ from < and =. So essentially we only need two
comparisons, say = and < to be able to compute the rest so long as we have the logic operations as well.
The choice of which three is simply a matter of which ones you want to go faster: the ones built from a
combination of other comparison and logic instructions will take longer to execute. One might take the
approach of looking at C programs and selecting the set most used. For example = and < are used a lot
to program typical loops; one might select them for this reason.

b You can be as fancy as you want with any optimisations or special cases, for example checking for
multiplication by zero, one or a power-of-two might be a good idea. But basically, the easiest way to do
this is as follows:

git # 8b6da880 @ 2023-09-27 330


© Daniel Page ⟨dan@phoo.org⟩

uint16_t mul( uint16_t x, uint16_t y ) {


switch ( x ) {
case 0 : return 0;
case 1 : return y;
case 2 : return y << 1;
case 4 : return y << 2;
case 8 : return y << 3;
case 16 : return y << 4;
}
switch ( y ) {
case 0 : return 0;
case 1 : return x;
case 2 : return x << 1;
case 4 : return x << 2;
case 8 : return x << 3;
case 16 : return x << 4;
}

uint16_t t = 0;

for( int i = 15; i >= 0; i-- ) {


t = t << 1;

if( ( y >> i ) & 1 ) {


t = t + x;
}
}

return t;
}

c A basic implementation might look like the following:

int H( uint16_t x ) {
int t = 0;

for( int i = 0; i < 16; i++ ) {


if( ( x >> i ) & 1 ) {
t = t + 1;
}
}

return t;
}

but this has a number of drawbacks. First, the overhead of of operating the loop quite high in comparison
to the content; for example the loop body needs only a few instructions, while it takes nearly as many
again to test and increment i during each iteration. Second, the number of branches in the code means
that pipelined processors might not execute them efficiently at all. An improvement is to use some form
of divide-and-conquer approach where we split the problem into 2-bit then 4-bit chunks and so on. The
result might look like:

int H( uint16_t x ) {
x = ( x & 0 x5555 ) + ( ( x >> 1 ) & 0 x5555 );
x = ( x & 0 x3333 ) + ( ( x >> 2 ) & 0 x3333 );
x = ( x & 0 x0F0F ) + ( ( x >> 4 ) & 0 x0F0F );
x = ( x & 0 x00FF ) + ( ( x >> 8 ) & 0 x00FF );

return ( int )( x );
}

S149. First, note that the result via a naive method would be

r2 = x1 · y1
r1 = x1 · y0 + x0 · y1
r0 = x0 · y0 .

However, we can write down three intermediate values using only three multiplications as

t2 = x1 · y1
t1 = (x0 + x1 ) · (y0 + y1 )
t0 = x0 · y0 .

git # 8b6da880 @ 2023-09-27 331


© Daniel Page ⟨dan@phoo.org⟩

The original result can then be expressed in terms of these intermediate values via

r2 = t2 = x1 · y1
r1 = t1 − t0 − t2 = x0 · y0 + x0 · y1 + x1 · y0 + x1 · y1 − x0 · y0 − x1 · y1
= x1 · y0 + x0 · y1
r0 = t0 = x0 · y0 .

So roughly speaking, over all we use three (n/2)-bit multiplications and four (n/2)-bit additions.

S150. a In binary, the addition we are looking at is

10(10) = 1010(2)
12(10) = 1100(2) +
10110(2)

where 10110(2) = 22(10) . In 4 bits this value is 0110(2) = 6(10) however, which is wrong.

b A design for the 4-bit clamped adder looks like this:

+------+ +------+ +------+ +------+


+-----|co ci|<----------|co ci|<----------|co ci|<----------|co ci|<- 0
| | x|<- x_3 | x|<- x_2 | x|<- x_1 | x|<- x_0
| +--|s y|<- y_3 +--|s y|<- y_2 +--|s y|<- y_1 +--|s y|<- y_0
| | +------+ | +------+ | +------+ | +------+
| | | | |
| | +------+ | +------+ | +------+ | +------+
| +->| OR |-> r_3 +->| OR |-> r_2 +->| OR |-> r_1 +->| OR |-> r_0
| +------+ +------+ +------+ +------+
| | | | |
+---------+------------------+------------------+------------------+

Essentially the idea is that if a carry-out occurs from the most-significant adder, this turns all the output
bits to 1 via the additional OR gates. That is, if the carry-out occurs then we get 1111(2) = 15(10) as the
result, i.e., the largest 4-bit result possible.

S151. Since we know nothing about N, there is no obvious short-cut to performing the modular reduction after the
multiplication. Instead, the most simple way to approach the design is to recall that

x · y = x + x + ··· + x + x.
| {z }
y copies

So to compute x · y (mod N), we just have to make sure that each of the additions is modulo N; then we can
use whatever method we want. A circuit for modular addition is actually quite simple:

+-----+ +-----+
x ->| |---+------------->| |
| add | | | sub |---> r
y ->| | | +-->| |
+-----+ | | +-----+
v |
+-----+ +-----+
| | | |
N ->| lth |--->| mux |
| | | |
+-----+ +-----+
^ ^
| |
N 0

In short, we add x and y together, and then compare the result t with N: if t is smaller, we select 0 as the output
from the multiplexer otherwise we select N. Then, we subtract the value we selected from t. The end result is
that we get x + y − 0 = x + y (mod N) if x + y < N, and x + y − N = x + y (mod N) if x + y ≥ N.
Recall that an 8-bit, bit-serial multiplier would compute the product x · y as follows:

git # 8b6da880 @ 2023-09-27 332


© Daniel Page ⟨dan@phoo.org⟩

Input: An 8-bit multiplicand x, and 8-bit multiplier y


Output: The product x · y
1 t←0
2 for i = 7 downto 0 do
3 t←t+t
4 if yi = 1 then
5 t←t+x
6 end
7 end
8 return t

Armed with our modular adder circuit, we can rewrite this as


Input: An 8-bit modulus N, a multiplier 0 ≤ x < N and multiplicand 0 ≤ y < N
Output: The product x · y (mod N)
1 t←0
2 for i = 7 downto 0 do
3 t ← t + t (mod N)
4 if yi = 1 then
5 t ← t + x (mod N)
6 end
7 end
8 return t

which then simply demand eight iterations, under control of a clock, over the circuit

+---------+ +---------+ +---------+


N ->| | N ->| | N ->| |
t ->| mod add | x ->| mod add |------->| mux |---> t'
t ->| |---+-->| | +--->| |
+---------+ | +---------+ | +---------+
| | ^
+-----------------+ |
y_i

Notice that we first perform the operation t + t (mod N), then use a multiplexer to decide if we take t + t
(mod N) or t + t + x (mod N) as the next value of t. So each iterated use of the circuit represents an iteration
of the algorithm loop. Of course, one could construct a combinatorial multiplier using the same approach, i.e.,
replacing any standard adder circuits with modular alternatives.

B.4 Chapter 4
S152. We can deal with the statements one-by-one:

• DRAM cells are based on use of a capacitor, so only use one transistor (to access the capacitor), while an
SRAM cells uses only transistors: a 6T SRAM cell design requires six for example, so certainly more than
one.
• Both SRAM and DRAM cells store one bit of information: larger memories are constructed by replicating
the cells, but it is not true that one or other cell can store more information.
• Their transistor-based design makes the access latency of SRAM cells low, i.e., they can be accessed
quickly. In contrast, the need to (dis)charge the capacitor limits a DRAM cell in this respect; it is a fair
assumption that DRAM cell access latency will be greater as a result.
• Their need to retain charge in the capacitor means DRAM cells need to be refreshed, since over time that
charge will naturally leak (st. the stored value will “degrade” in some sense).

S153. Various clues should (in combination) be strong enough to hint that

α 7→ row address buffer


β 7 → column address buffer
γ 7 → row address decoder
δ 7 → column address decoder

is the correct answer. For example, note that:

git # 8b6da880 @ 2023-09-27 333


© Daniel Page ⟨dan@phoo.org⟩

• α is provided input from Ai (the address pins), and is controlled (indirectly) by RAS: this is the row
address strobe. As such, this is likely to be the row address buffer.
• β is provided input from Ai (the address pins), and is controlled (indirectly) by CAS: this is the column
address strobe. As such, this is likely to be the column address buffer.
• γ is taking the content of α and controlling signals on the left-hand side (horizontal orientation) of the
memory array, suggesting it computes the row address: it is likely to be the row address decoder.
• δ is taking the content of β and controlling signals on the top side (vertical orientation) of the memory
array, suggesting it computes the column address: it is likely to be the column address decoder.

S154. SRAMs have a lower access latency in part because of their design: by using only transistors means their
operation is very fast. Therefore, the first statement is true. On the other hand, SRAMs are larger than
DRAMs since their design includes more components (typically six or so transistors versus one transistor and a
capacitor); as a result, their density (i.e., how many one can fit into unit area) is lower, and the second statement
is true as well. The third statement is false, and basically nonsense: the access latency should not depend on
the order. Finally, the forth statement is also false. Rather, a stored program or von Neumann architecture
holds both instructions and data in the same memory: a Harvard architecture segregates them into separate
memories.

S155. There are 216 addressable bytes, meaning a 16-bit address needs to be supplied. However, in contrast to an
SRAM memory, a DRAM memory will normally use a 2-step (or more, potentially) approach: half the address
is supplied by each of the steps (under control of row and column address strobe signals), which requires only
half the number of address pins.
The memory stores bytes, i.e., 8-bit elements, so we expect there to be 8 duplicated arrays each consisting of
65536 cells. Overall, there will be 8 · 65536 = 524288 cells. So, in summary, an answer of 8-bit address pins, and
524288 cells is correct; the alternative of 16-bit address pins, and 524288 cells is not wrong per se, but certainly
less likely in practice.

S156. We want a 32KiB memory, i.e., 32·1024 = 32768 addressible words each of 8 bits (or 1 byte). The memory devices
we have use a 4-bit data bus and 12-bit address bus: this implies that each one has 212 = 4096 addressible
words each of 4 bits. Two such devices could be combined to support an 8-bit word size: we simply take the
LSBs of each byte from one device, and the MSBs from the other. Therefore, to construct the memory required
we need
8 32768
· = 2 · 8 = 16
4 4096
devices.

S157. a A 1kB, byte-addressable SRAM would usually require n = 10 address wires. The n-bit A means addresses
between 0 and 2n − 1 = 210 − 1 = 1023 are accessible.
Consider a small(er) example of an SRAM where n = 3: the addresses
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 1 0 010(2) ≡ 2(10)
0 1 1 011(2) ≡ 3(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 1 0 110(2) ≡ 6(10)
1 1 1 111(2) ≡ 7(10)
are accessible. Now imagine the m-th address wire is misconnected where m = 1, meaning A1 = 0: this
yields
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)

git # 8b6da880 @ 2023-09-27 334


© Daniel Page ⟨dan@phoo.org⟩

so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible. Put another way, 1/2 of the originally
accessible addresses will remain accessible. The same fact applies for any m, so for n = 1024 we conclude
that 1024/2 = 512 addresses are accessible.

b A 1kB, byte-addressable DRAM would usually require n = 5 address wires. The n-bit A means addresses
between 0 and 22·n − 1 = 22·5 − 1 = 210 − 1 = 1023 are accessible, because the address is communicated via
A in two steps: each communicates n bits, so produce a (2 · n)-bit address overall.
j
Consider a small(er) example of a DRAM where n = 2: if Ai denotes the i-th address wire as used in the
j-th step, the addresses
A11 A10 A01 A00 A
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 1 0 0010(2) ≡ 2(10)
0 0 1 1 0011(2) ≡ 3(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 1 0 0110(2) ≡ 6(10)
0 1 1 1 0111(2) ≡ 7(10)
1 0 0 0 1000(2) ≡ 8(10)
1 0 0 1 1001(2) ≡ 9(10)
1 0 1 0 1010(2) ≡ 10(10)
1 0 1 1 1011(2) ≡ 11(10)
1 1 0 0 1100(2) ≡ 12(10)
1 1 0 1 1101(2) ≡ 13(10)
1 1 1 0 1110(2) ≡ 14(10)
1 1 1 1 1111(2) ≡ 15(10)
are accessible. Now imagine the m-th address wire is misconnected where m = 1, meaning A1 = 0: this
yields
A11 A10 A01 A00 A
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible.
Put another way, 1/4 of the originally accessible addresses will remain accessible. The same fact applies
for any m, so for n = 1024 we conclude that 1024/4 = 256 addresses are accessible.

S158. By checking all possible addresses

0 ≤ A < 218 = 262144(10) = 40000(16) ,

against the enable signals, we find that

en0 =1 for A ∈ { 00000(16) , . . . , 07FFF(16) } ⇒ MEM0 is enabled


en1 =1 for A ∈ { 08000(16) , . . . , 0BFFF(16) } ⇒ MEM1 is enabled
en2 =1 for A ∈ { 10000(16) , . . . , 1FFFF(16) } ⇒ MEM2 is enabled
en3 =1 for A ∈ { 3FFE0(16) , . . . , 3FFFF(16) } ⇒ MEM3 is enabled

git # 8b6da880 @ 2023-09-27 335


© Daniel Page ⟨dan@phoo.org⟩

Since
A = 48350(10) = 0BCDE(16) ,
this address therefore maps to memory device MEM1 because
08000(16) ≤ 0BCDE(16) ≤ 0BFFF(16) .
Of course, doing a similar search by hand is very time consuming; a manual solution would therefore use the
form of each eni as a short-cut. For example, we know
en0 = ¬A17 ∧ ¬A16 ∧ ¬A15 ,
i.e., en0 = 1 when the 3 MSBs of A are 0. This fact leads to the range
A ∈ {00000(16) , . . . , 07FFF(16) }
fairly directly, because it captures all 18-bit values whose 3 MSBs are 0, i.e.,
000000000000000000(2) = 00000(16)
000000000000000001(2) = 00001(16)
..
.
000111111111111111(2) = 07FFF(16)

S159. The main components to include in such a diagram are


a the four internal transistors that form a loop of two NOT gates, and
b the two external transistors that allow access to the loop via the bit- and word-lines.
The diagram is as follows:
wl
|
v
+---------------------+---------------------+
| |
| ----+------------------+----- V_dd |
| | | |
| v v |
| +--------+ +--------+ |
| | P-type |--+ +--| P-type | |
| +--------+ | | +--------+ v
| | | | | +--------+
v | +---|------+--------| N-type |<-- ~bl
+--------+ | | | | +--------+
| N-type |------+-------|---+ |
+--------+ | | | |
+--------+ | | +--------+
| N-type |--+ +--| N-type |
+--------+ +--------+
^ ^
| |
----+------------------+----- VSS

S160. a Two main answers are clear. First, use of the DRAM device could imply a somewhat more involved, 2-step
access algorithm: it is common to latch the row and column buffers (under control of two dedicated row
and column strobes) in two steps, hence allowing half the number of pins to address the same number of
cells. Second, the DRAM cells need to be refreshed periodically since their content will decay. Typically
a mechanism to do this might be built into the device, but if not then the system itself will need to be
responsible for doing so.
b The obvious reason an SRAM device might have a lower access latency is because the individual cells
have a lower access latency: since SRAM cells are constructed from transistors, they can be accessed (read
from or written to) more quickly than a capacitor-based DRAM cell (which takes longer to charge and
discharge).

S161. a As the number of cells grows larger, providing enough address pins to identify each physical cell can
become impractical. One way to combat this problem is to multiplex the address pins; roughly this means
using less pins in more steps, e.g., n′ /2 pins in two steps rather than n′ pins in one step. Thus, under
control of the row and column strobes, two steps will see the (n′ /2)-bit row and column addresses latched
into the row and column buffer: once latched, the combined content forms a usable n′ -bit address. So in
short, the buffers are required to retain the row and column addresses during this process.

git # 8b6da880 @ 2023-09-27 336


© Daniel Page ⟨dan@phoo.org⟩

b Once the row and column buffers are latched with the address, the device is ready to access the identified
cell. However, depending on the geometry, i.e., number of rows and columns, a translation needs to
be made: this is the task of the row and column decoders. Essentially they implement the translation
between logical address and physical cell, activating said cell to perform the required operation (which
is either a read or a write).

B.5 Chapter ??
S162. Throughout the following, keep in mind that three main component groups can be identified in the implemen-
tation: read from bottom-to-top, these are

• an input register (bottom),

• some combinatorial logic (middle), and

• an output register (top).

We know this is an FSM, so we expect the input register to hold the current state and the combinatorial logic to
compute both the transition and output functions. Specifically, the input register and x are provided as input
to combinatorial logic (in the middle-left and -center) that represents the transition function. It computes the
next state, then stored in the output register; the rest of this logic (in the middle-right) clearly represents the
output function, since it computes r.

a i Saying a signal is digital is the same as saying it takes the values 0 and 1 only; for a clock signal, this
is the same as saying it the form is a perfect square wave.
In practice this is difficult to achieve since transitions between 0 and 1 cannot be perfectly instanta-
neous. This implies each edge has a slope, however shallow this is. Even so, the same issue is true
for all digital logic components: if the inputs to an AND gate are neither 0 or 1 (or their associated
voltage levels), the output is undefined. So it is also fair to say this is a requirement for Φ1 and Φ2 ,
at least as far as is practical.
ii Φ1 and Φ2 are said to be non-overlapping in the sense a positive level on one always occurs at the
same time as a negative level on the other.
If this were not true, e.g., Φ1 = Φ2 , then a “loop” would form during the overlap: the output register
would be updated with whatever was computed by the combinatorial logic, which is fed by the
input register also being updated at the same time by the output register. This would likely result in
a malfunction of some sort: the input register could not settle into a stable state, for example.
iii To gate any signal, Φ1 and Φ2 included, means to (conditionally) disable them. This is typically
realised by adding extra logic, e.g., an AND gate, so the clock signal can be forced to 0.
This might be useful; it allows one to disable latch updates and so “pause” the FSM (e.g., to save
power when idle). However, doing so is not a requirement and not evident in this implementation.
iv In general, the concept of skew describes a situation where the clock signal arrives at two components
at different times; with a 2-phase clock, we might also find cases where Φ1 and Φ2 can arrive at the
same component at different times.
This is clearly undesirable, since the clock is meant to synchronise the component. If they were not
synchronised then malfunction is the likely result: one latch might be updated at a different time,
and hence with an unrelated value, than another, for example.
v The duty cycle of a clock (signal) is the percentage of each clock period in which it has a positive
level. For example, saying that Φ1 has a duty cycle of 33% is the same as saying Φ1 = 1 for a third of
the time (and hence Φ1 = 0 for two thirds of the time).
Here, there is no reason the duty cycle of Φ1 and Φ2 must be 33%. If is was 40%, for example, the
implementation would still function correctly. Assuming Φ1 and Φ2 have the same form, then of
course it must be true their duty cycles are less than 50% otherwise they would have to overlap.
Other than that, however, getting closer to 50% just means less separation between their positive
levels.

b This is not a trick question per se, but the correct answer is that the register might hold any 2-bit value
when powered-on. Although the register must settle into some state, it is not clear how we could predict
what this will be: the stored value is basically random, or more precisely determined by physics of the
underlying implementation and fabrication process.

git # 8b6da880 @ 2023-09-27 337


© Daniel Page ⟨dan@phoo.org⟩

As an aside, this FSM has an related, unattractive design feature: there is no reset input. This means
there is no way to enforce a start state, i.e., whatever value is held in the register at power-on is used as
the start state: the only way to alter this, and hence make the FSM function as required, is to power-cycle
the implementation and hope the initial stored value is the required start state!
c Partly as a result of the multiple-choice format, this question might seem odd. Exactly the same concepts
are involved, but the steps are the opposite way around: rather than derive an implementation from a
given specification, it asks you to reverse engineer a specification from a given implementation.
Each of the registers constitutes 2 D-type latches, so the FSM can be in at most 22 = 4 possible states.
Denoting the current (resp. next) state Q = ⟨Q0 , Q1 ⟩ (resp. Q′ = ⟨Q′0 , Q′1 ⟩), we can write an expression for
the next state and output in terms of the current state and input: this basically just means translating the
logic gate symbols into a Mathematical form. That is, we can write

Q′1 = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 ) ∨ (x ∧ Q0 )
Q′0 = (Q1 ∧ Q0 ) ∨ (x ∧ ¬Q0 )
r = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 )

By enumerating the possible values of x, Q0 and Q1 we find

x Q1 Q0 Q′1 Q′0 r
0 0 0 0 0 0
0 0 1 0 0 0
0 1 0 0 0 0
0 1 1 1 1 1
1 0 0 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
1 1 1 1 1 1

i.e., we reverse engineer the transition and output functions, expressed as a truth table. For example, if
the current state is Q = ⟨0, 0⟩ and the input is x = 1 then we can see the next state will be Q′ = ⟨1, 0⟩ and
the output will be r = 0.
The truth table encodes the same information as a diagrammatic alternative. The only difference is the
use of a concrete representation, rather than an abstract label for each state. We need the former, because
of course the implementation stores and computes Boolean values: it cannot deal with a label such as S0
or S3 means unless we give that label a value. So imagine we make such an (reverse) assignment, namely

⟨0, 0⟩ 7→ S0
⟨1, 0⟩ 7 → S1
⟨0, 1⟩ 7 → S2
⟨1, 1⟩ 7 → S3

Now we can say, for example that if the current state is S0 and the input is x = 1 then the next state will
be S1 and the output will be r = 0. This makes drawing the diagrammatic alternative a little easier: if
we draw 4 nodes for the four states, we join them with edges based on rows of the truth table. The end
result, and correct answer, is

x=1

x=1 x=1 x=1


start S0 S1 S2 S3
x=0

x=0 x=0 x=0

d The central difference between Mealy- and Moore-type FSMs stems from how the output function is
defined. In the former, the output is a function of the current state and input; in the latter, only the current
state is relevant. For a set of states S and input and output alphabets Σ and Γ, this means for a Mealy-type
FSM we have
ω:S×Σ→Γ
whereas for a Moore-type FSM we have
ω : S → Γ.

git # 8b6da880 @ 2023-09-27 338


© Daniel Page ⟨dan@phoo.org⟩

For this FSM, we already know from the previous question that the output is described by

r = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 ).

As such, it should be obvious r = ω(Q, x) is a function of both Q (the current state) and x (the input): this
is therefore a Mealy-type FSM.

e The behaviour of this FSM can be described as repeated iteration over two steps under control of the
clock. That is, it repeatedly does

• Step #1:
– the combinatorial logic compute the next state Q′ = δ(Q, x) and output r = ω(Q, x), and
– the output register latches Q′ .
Note that the critical path of this step is that from the the Q output of the input register to the Q
output of the output register.
• Step #2:
– the input register latches Q′ as Q.
Note that the critical path of this step is that from the the Q output of the output register to the Q
output of the input register.

f For example, the first step occurs during the period when Φ1 = 1 and the second when Φ2 = 1. Within
every clock period, i.e., within the “time limit” represented by ρ, both steps must be completed. Therefore,
we can say
ρ ≥ (Tlogic + Tlatch ) + (Tlatch )

where Tlatch and Tlogic are the critical paths associated with a D-type latch and the combinatorial logic
respectively.
Note the critical path runs through the middle-left or middle-center of the combinatorial logic: although
the former includes a NOT gate, the latter includes a 3- versus 2-input OR gate. Either way the delay is
50ns, so we can write
ρ ≥ 50 + 60 + 60ns
≥ 170ns
then compute the maximum clock frequency as

fmax = 1/ρ
= 1/170ns
≃ 5.9MHz

g From the definition of the transition function (above), it should be clear the FSM will progress from left-
to-right as consecutive inputs x = 1 are encountered. Moreover, encountering an input of x = 0 means
restarting in state S0 , and when eventually the FSM reaches state S3 it stays in that state (whether x = 1
or x = 0). So it basically counts the number of consecutive times x = 1 until that count is 3 (if it is in state
Si , then the count is i). This already provides the correct answer, but is further confirmed by inspecting
the output function: r = 1 when the FSM is in state S3 , i.e., when the count is 3.

S163. a Before t0 , we can see that a pulse on rst at the same time as Φ2 = 1; this acts as a reset, storing s (as a result
of the multiplexers) into the top register. Then, at t0 we find that Φ1 = 1: during this period, the design
stores a the bottom register as provided by the top register (which, at that point, is fixed since Φ2 = 0).
As such, at t0 we expect the bottom register to store s and hence r to be the MSB of s, i.e., r = s7 = 1.

b At t1 the design has performed one cycle relative to t0 : the value stored in the bottom register at t0 is
updated by the middle of the design, then stored in the top register, and finally stored back in the bottom
register (ready for the next cycle). The middle of the design is fairly simple. Ignoring the less-significant
end since this does not impact r (yet), it basically just shifts the bits toward the more-significant end. At
t1 , we therefore expect the bottom register to be st. r = s6 = 0.

c This design is a Linear Feedback Shift Register (LFSR); such a design might be used to support a variety of
use-cases, with a common example being the generation of (pseudo-)random bits. As the name suggests,
an LFSR is essentially an n-bit shift register. After initialising (or seeding) the register state with s,
successive updates are performed; each such update a) shifts-out an output bit (wlog. the MSB), which
forms the LFSR output, and b) shifts-in an input bit (wlog. the LSB), which is computed using a linear

git # 8b6da880 @ 2023-09-27 339


© Daniel Page ⟨dan@phoo.org⟩

function of the state. A set T captures the tap bits, which specify the function of x used to compute the
input bit; given n, T is selected to maximise the period of the LFSR, noting that x = 0 should be disallowed
to avoid trivial behaviour.
Both Fibonacci- and Galois-form LFSR designs are possible; in this case, we have an example of the
former, with n = 8 and T = {3, 4, 5, 7}. Given a state x, the update process, yielding an output bit r and a
next state x′ , can be formalised as
r = x7
=
L
x′ (x ≪ 1) ∥ ( i∈T xi )
= (x6 ∥ x5 ∥ · · · ∥ x0 ) ∥ (x3 ⊕ x4 ⊕ x5 ⊕ x7 )
As such, we can use a table to trace the state and output as it is updated:
i x x′ r
A6(16) seed x with s
0 A6(16) 4C(16) 1 generate 0-th output bit
1 4C(16) 99(16) 0 generate 1-st output bit
2 99(16) 33(16) 1 generate 2-nd output bit
3 33(16) 66(16) 0 generate 3-rd output bit
4 66(16) CD(16) 0 generate 4-th output bit
5 CD(16) 9A(16) 1 generate 5-th output bit
6 9A(16) 35(16) 1 generate 6-th output bit
7 35(16) 6A(16) 0 generate 7-th output bit
8 6A(16) D4(16) 0 generate 8-th output bit
.. .. .. ..
. . . .
Using this table, we can infer that at time t2 (where the 8-th output bit is generated, which is the first bit
which is computed from x vs. matching s), r = 0.
d Within the clock period (i.e., within the “time limit” which ρ dictates), two steps must be completed; those
steps are completed when Φ1 = 1 and Φ2 = 1 respectively, and can be described as 1) the top register must
be updated with a value computed by the middle of the design (i.e., the combinatorial logic) from the
value in the bottom register, then 2) the bottom register must be updated with the value in the top register.
So if Tlatch and Tlogic are the critical paths associated with a D-type latch and said combinatorial logic
respectively, then we can write
ρ ≥ (Tlogic + Tlatch ) + (Tlatch ).
Adding more detail, we could then reflect the critical path of components constituting the combinatorial
logic: writing
Tlogic = Txor + Txor + Tmux
then reflects the fact that the critical path includes two XOR gates and one multiplexer. Overall then, we
have
ρ ≥ (Txor + Txor + Tmux + Tlatch ) + (Tlatch )
≥ 2 · Tlatch + 2 · Txor + Tmux
Since we have the design of each component, we can, as a next step, be more concrete about each term
above: inspecting the NAND based designs, we can deduce
Tlatch = 4 · Tnand = 40ns
Txor = 3 · Tnand = 30ns
Tmux = 3 · Tnand = 30ns
and thus
ρ ≥ 2 · Tlatch + 2 · Txor + Tmux
≥ 2 · 40ns + 2 · 30ns + 30ns
≥ 80ns + 60ns + 30ns
≥ 170ns
Tlatch arguably represents the more tricky case, noting that the cross-coupled right-hand side means the
path is through 4 NAND gates. Finally, the maximum clock frequency is inversely proportional to this
critical path so we find
fmax = 1/ρ
= 1/170ns
≃ 5.9MHz
is correct.

git # 8b6da880 @ 2023-09-27 340


© Daniel Page ⟨dan@phoo.org⟩

S164. Basically this question is asking us to reverse engineer the FSM implementation into a design and hence
functionality; to do that, we can step backwards through the process we have that would normally step forwards.
The first step is therefore be to inspect the implementation and extract pertinent features: 1) the bottom and
top D-type latches capture 1-bit current and next states, i.e., Q and Q′ , respectivly, 2) between the two we can
identify an output function r = ω(Q) = ¬Q and a transision function Q′ = δ(Q, rst) = (¬rst) ∧ (¬Xi ∨ Q). Note
that we can classify this as a Moore-type FSM, since the output r is determined by the current state Q alone.
The next step is to reconstruct a concrete, tabular description of the FSM, i.e., a truth table, using ω and δ:

δ ω
rst Xi Q Q′ r
0 0 0 1 1
0 0 1 1 0
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 1
1 1 1 0 0

Because Q and Q′ are each represented by a single D-type latch, we can infer the FSM has (at most) two states.
Other assignments are possible provided we are consistent, but the most natural would be to say Q = 0 7→ S0
and Q = 1 7→ S1 . Given that rst = 1 forces Q′ = 0, we can infer than S0 is an initial state; Given that r = ¬Q and
so r = 1 iff. Q = 0, we can infer than S0 is an accepting state.
The next step is to reconstruct a abstract, diagrammatic description of the FSM:

Xi = 0
Xi = 0

start S0 S1

Xi = 1 Xi = 1

The final step demands some creativity, in the sense that we need to interpret the functionality realised: although
doing so is not trivial, we can approach it by tring to explain in words what the FSM does step-by-step. For
example, note that the FSM starts in state S0 and stays there while the input is Xi = 1. However, as soon as it
encounters an input st. Xi = 0 it will transistion to state S1 : it stays there whether the input is Xi = 0 or Xi = 1.
So, put another way, the FSM

• stays in state S0 if Xi = 1 for all i; if therefore accepts such an input, meaning r = 1,


• transistions to and stays in state S1 if Xi = 0 for any i; if therefore rejects such an input, meaning r = 0.

This description matches the definition of AND: we have that

r = X0 ∧ X1 ∧ · · · ∧ Xn−1 .

S165. Interpreting the design, we can see that

Q′0 = x ∧ ( Q0 ∨ Q1 )
Q′1 = x ∧ ( ¬Q0 ∨ Q1 )

r = ¬x ∧ ( Q0 ∧ Q1 )

where the expressions for Q′0 and Q′1 constitute the transition function δ, and the expression for r constitutes
the output function ω. This means:

a 3 gates are involved in the output function implementation (2 AND gates, and 1 NOT gate), and
b 5 gates are involved in the transition function implementation (2 AND gates, 2 OR gates, and 1 NOT
gate).

S166. A generic, block diagram style framework for FSMs is as follows:

git # 8b6da880 @ 2023-09-27 341


© Daniel Page ⟨dan@phoo.org⟩

+---------+
+-->| \delta |
| +---------+
| ^ |
| Q | | Q'
| | v
| +---------+
input --+ | state |<-- clock
| +---------+
| |
| Q |
| v
| +---------+
+-->| \omega |--> output
+---------+

st.

• An n-bit register (middle component) holds Q, the current state of the FSM.

• Within a given clock period, the current state is provided as input to δ, the transition function: based on
Q and any input, this computes the next state Q′ .

• At the same time that δ is computing the next state, the output function ω computes any output from the
FSM; depending on the type of FSM, this might be based on Q only, or on Q and any input.

• A positive edge of the clock signal causes the state to be updated with the output from δ. That is, the FSM
advances from the current to next state; computation by δ and ω is performed in the same way during
the subsequent clock period, once Q has been updated with Q′ .

Note that this framework is assumed in any of the following questions that ask for it.
This FSM can be in one of two states: either the bits of X processed so far have an even or odd number of
elements equal to 1; we give each of the states a label, so in this case Seven and Sodd for example. Next we can
describe how the FSM can transitions from some current state to a next state, i.e., how the transition function δ
works: based on an input Xi provided at each step, we might draw

Xi = 0 Xi = 0
Xi = 1

start Seven Sodd

Xi = 1

or equivalently say
δ
Q Q′
Xi = 0 Xi = 1
Seven Seven Sodd
Sodd Sodd Seven
where Q is the current state and Q′ is the next state.
Given the FSM has two states only, we can store the current state using a 1-bit register. Based on a natural
mapping of the abstract to concrete state labels (i.e., Seven 7→ 0 and Sodd 7→ 1), we can rewrite the transition
function as a truth table:
Xi Q Q ′
0 0 0
0 1 1
1 0 1
1 1 0
and see clearly that Q′ = Q ⊕ Xi . Inspecting Q directly provides the output: if Q = 0 we have (so far) even
parity, and in contrast if Q = 1 we have odd parity. So in a sense the output function ω is simply the identity
function. In short, the low-level detail filled into the high-level design is very simple (in this case at least) once
the question has been digested.
One obvious addition would be some form of mechanism to reset the FSM: as stated above, we assume it
starts in the state Seven when powered-on but clearly this may not be true (the content in Q will essentially be
random initially).

git # 8b6da880 @ 2023-09-27 342


© Daniel Page ⟨dan@phoo.org⟩

S167. a There are several approaches to solving this problem. Possibly the easiest, but perhaps not the most
obvious, is to simply build a shift-register: the register stores the last three inputs, when a new input is
available the register shifts the content along by one which means the oldest input drops off one end and
the new input is inserted into the other end. One can then build a simple circuit to test the current state
of the shift-register to see if the last three inputs match what is required..
Alternatively, one can take a more heavy-weight approach and formulate the solution as a state machine.
First we need to decide on an encoding for our state; when searching though the input we can have
matched zero through three correct tokens we denote this by the integer S stored in two bits using Q1
and Q0 as the most-significant and least-significant respectively. We also need an encoding of the actual
input tokens I which are being passed to the matching circuit. Arbitrarily we might select A = 0, C = 1,
G = 2 and T = 3 although other encodings are valid and might actually simplify things; we use I1 and
I0 to denote the most and least-significant bits of the input token I. From this we can now create a table
describing the mapping between current state S and input I to next state S′ which can be roughly written
as
I1 I0 Q1 Q0 Q′1 Q′0
0 0 0 0 0 1
0 1 0 0 0 0
1 0 0 0 0 0
1 1 0 0 0 0
0 0 0 1 0 1
0 1 0 1 1 0
1 0 0 1 0 0
1 1 0 1 0 0
0 0 1 0 0 1
0 1 1 0 0 0
1 0 1 0 0 0
1 1 1 0 1 1
0 0 1 1 0 1
0 1 1 1 0 0
1 0 1 1 0 0
1 1 1 1 0 0
Thus, if we are in state (Q1 , Q0 ) = (0, 0) = 0 and see an A input, we move to state (Q′1 , Q′0 ) = (0, 1) = 1
otherwise we stay in state (Q′1 , Q′0 ) = (0, 0) = 0. Now we can define the transition function from current
state to next state as
Q0 = (¬Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0 )∨
(¬Q1 ∧ Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ ¬Q0 ∧ I1 ∧ I0 )
Q1 = (¬Q1 ∧ Q0 ∧ ¬I1 ∧ I0 )∨
(Q1 ∧ ¬Q0 ∧ I1 ∧ I0 )
with simplifications as appropriate. Finally, the output flag F will be set only according to

F = Q1 ∧ Q0

to signal when we have matched three characters. As such, we can realise the FSM framework described
in Solution 166 by filling each component with the associated implementation above.

b Making a general-purpose matching circuit will probably use less logic than having three separate circuits;
this will reduce the space required. As an extension one might consider implementing the transition and
output functions as a look-up table instead of hard-wiring them; this will mean the circuit could be used
to match any sequence providing the tables were correctly initialised. Introducing a more complex circuit
design could have the disadvantage of increasing the critical path (the longest sequential path though
the entire circuit). If the critical path is longer, the design will have to be clocked slower and hence will
not perform the matching function as quickly.

S168. a A basic diagram should show the four states and transitions between them which relate the movement
from one to the other as a result of the washing cycle, and movement as a result of input from the buttons;
for example a (very) basic diagram would be:

git # 8b6da880 @ 2023-09-27 343


© Daniel Page ⟨dan@phoo.org⟩

+------+ +------+ +------+ +------+


+-->| idle |-->| fill |-->| wash |-->| spin |
| +------+ +------+ +------+ +------+
| | | | |
+------+-----------+----------+----------+

b Since there are four states, we can encode them using one two bits; we assign the following encoding
idle = 00, fill = 01, wash = 10 and spin = 11. We use Q1 and Q0 to represent the current state, and Q′1 and
Q′0 to represent the next state; B1 and B0 are the input buttons. Using this notation, we can construct the
following state transition table which encodes the state machine diagram:
B1 B0 Q1 Q0 Q′1 Q′0
0 0 0 0 0 0
0 1 0 0 0 1
1 0 0 0 0 0
1 1 0 0 0 0
0 0 0 1 1 0
0 1 0 1 1 0
1 0 0 1 0 0
1 1 0 1 1 0
0 0 1 0 1 1
0 1 1 0 1 1
1 0 1 0 0 0
1 1 1 0 1 1
0 0 1 1 0 0
0 1 1 1 0 0
1 0 1 1 0 0
1 1 1 1 0 0
so that if, for example, the machine is in the wash state (i.e., Q1 = 1 and Q0 = 0) and no buttons are pressed
then the next state is spin (i.e., Q′1 = 1 and Q′0 = 1); however if button B1 is pressed to cancel the cycle, the
next state is idle (i.e., Q′1 = 0 and Q′0 = 0).
c From the state transition table, we can easily extract the two Karnaugh maps:

Q1 Q1
Q0 Q0
Q′1 00 01 11 10
Q′0 00 01 11 10

00 0 1 0 1 00 0 0 0 1
0 1 5 4 0 1 5 4

01 0 1 0 1 01 1 0 0 1
2 3 7 6 2 3 7 6
B0 B0
11 0 1 0 1 11 0 0 0 1
10 11 15 14 10 11 15 14
B1 B1
10 0 0 0 0 10 0 0 0 0
8 9 13 12 8 9 13 12

Basic expressions can be extracted from the tables as follows:


Q′1 = (¬Q1 ∧ Q0 ∧ B0 ) ∨ (¬Q1 ∧ Q0 ∧ ¬B1 ) ∨ (Q1 ∧ ¬Q0 ∧ B0 ) ∨ (Q1 ∧ ¬Q0 ∧ ¬B1 )
Q′0 = (Q1 ∧ ¬Q0 ∧ B0 ) ∨ (Q1 ∧ ¬Q0 ∧ ¬B1 ) ∨ (¬Q0 ∧ B0 ∧ ¬B1 )
and through sharing, i.e., by computing
t0 = ¬B1
t1 = ¬B0
t2 = ¬Q1
t3 = ¬Q0
t4 = t 2 ∧ Q0
t5 = t 3 ∧ Q1
we can simplify these to
Q′0 = (t5 ∧ B0 ) ∨ (t5 ∧ t0 ) ∨ (t3 ∧ B0 ∧ t0 )
Q′1 = (t4 ∧ B0 ) ∨ (t4 ∧ t0 ) ∨ (t5 ∧ B0 ) ∨ (t5 ∧ t0 )

git # 8b6da880 @ 2023-09-27 344


© Daniel Page ⟨dan@phoo.org⟩

S169. a The two properties are defined as follows:

i The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of times
Xi = 1. This can be computed as
n−1
X
HW(X) = Xi .
i=0

ii The Hamming distance between X and Y is the number of bits in X that differ from the corresponding
bit in Y, i.e., the number of times Xi , Yi :
n−1
X
HD(X, Y) = Xi ⊕ Yi .
i=0

b There are two main approaches to constructing a flip-flop of this type; since both start with an SR-latch,
the difference is mainly in how the edge-triggered behaviour is realised. Use of a primary-secondary
organisation is probably the more complete solution, but a simpler alternative would be to use a pulse
generator. The overall design can be described roughly as follows:

+---+ +---+
D--+---------------------------->| |---S-->| |
| |AND| |NOR|-->r_0 = ~Q
v +-->| | r_1 -->| |
+---+ +------------+ | +---+ +---+
| | | | |
|NOT| en -->| pulse gen. |--+
| | | | |
+---+ +------------+ | +---+ +---+
| +-->| |---R-->| |
| |AND| |NOR|-->r_1 = Q
+---------------------------->| | r_0 -->| |
+---+ +---+

There are basically four features to note:

i An SR-latch has two inputs S and R, and two outputs Q and ¬Q. When
• S = 0, R = 0 the component retains Q,
• S = 1, R = 0 the component updates to Q = 1,
• S = 0, R = 1 the component updates to Q = 0,
• S = 1, R = 1 the component is meta-stable.
The component is level-triggered in the sense that Q is updated within the period of time when
S = 1 or R = 1 (rather than when they transition to said values).
ii To provide more fine-grained control over the component, the two inputs are typically gated using
(i.e., AND’ed with) an enable signal en: when en = 0, the latch inputs are always zero and hence it
retains the same state, when en = 1 it can be updated as normal.
iii In order to change from the current level-triggered behaviour into an edge-triggered alternative,
one approach is to use a pulse generator. The idea here is to intentionally create a mismatch in
propagation delay into the inputs of an AND gate: each time en changes, the result is that we see a
small pulse on the output of the AND gate. Provided this is small enough, one can argue it acts like
an edge rather than a level.
iv Finally, the gated S and R inputs are tied together and controlled by one input D meaning S = D and
R = ¬D. This prevents the component being used erroneously: it can only retain or update the state.

c The power consumed by CMOS transistors can be decomposed into two parts: the static part (which
relates to leakage) and the dynamic part (which relates to power consumed when the transistor switches).
In short, a value switching (i.e., changing from one value to another) consumes much more power than
staying the same. In this case, clearly we have an advantage in the all but one of the n bits in the register
will stay the same; hence in terms of power consumption, storing elements of the the Gray code (versus
some other sequence for example) is an advantage.

d See Solution 166.

git # 8b6da880 @ 2023-09-27 345


© Daniel Page ⟨dan@phoo.org⟩

e As an aside, a potentially neat approach here is to use a Johnson counter. This is basically an n-bit register
(initialised to zero) whose content is shifted by one place on each clock edge. The new incoming, 0-th bit
is computed as the NOT of the outgoing, (n − 1)-th bit and every other bit is shifted up by one place (e.g.,
each i-th bit for 1 ≤ i < n − 1 becomes the (i + 1)-th bit). For n = 3, this produces the sequence

⟨0, 0, 0⟩
⟨1, 0, 0⟩
⟨1, 1, 0⟩
⟨1, 1, 1⟩
⟨0, 1, 1⟩
⟨0, 0, 1⟩
..
.

which satisfies the Hamming distances property, but does not include all possible values: for example,
⟨1, 0, 1⟩ is not included. So this does not really answer the question in the sense that we require a
component that cycles through the full 2n -element sequence, an example of which is

⟨0, 0, 0⟩
⟨1, 0, 0⟩
⟨1, 1, 0⟩
⟨0, 1, 0⟩
⟨0, 1, 1⟩
⟨1, 1, 1⟩
⟨1, 0, 1⟩
⟨0, 0, 1⟩

As a result, we can use an FSM-based approach based on the framework in the question above. For n = 3
there are 23 = 8 elements in the Gray code, and so a 3-bit state Q = ⟨Q0 , Q1 , Q2 ⟩ is enough to store the
current element. The output function ω is basically free: we simply provide the current state Q as output,
which is also the current element in the Gray code sequence. Based on the inputs Q and rst, the state
transition function δ can be described as follows:

rst Q2 Q1 Q0 Q′2 Q′1 Q′0


0 0 0 0 0 0 1
0 0 0 1 0 1 1
0 0 1 1 0 1 0
0 0 1 0 1 1 0
0 1 1 0 1 1 1
0 1 1 1 1 0 1
0 1 0 1 1 0 0
0 1 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 1 0 0 0
1 0 1 1 0 0 0
1 0 1 0 0 0 0
1 1 1 0 0 0 0
1 1 1 1 0 0 0
1 1 0 1 0 0 0
1 1 0 0 0 0 0

From this truth table we can (more easily that usual perhaps) extract Karnaugh maps for each bit of the
next state Q′

git # 8b6da880 @ 2023-09-27 346


© Daniel Page ⟨dan@phoo.org⟩

Q1 Q1 Q1
Q0 Q0 Q0
Q′2 00 01 11 10
Q′1 00 01 11 10
Q′0 00 01 11 10

00 0 0 0 1 00 0 1 1 1 00 1 1 0 0
0 1 5 4 0 1 5 4 0 1 5 4

01 0 1 1 1 01 0 0 0 1 01 0 0 1 1
2 3 7 6 2 3 7 6 2 3 7 6
Q2 Q2 Q2
11 0 0 0 0 11 0 0 0 0 11 0 0 0 0
10 11 15 14 10 11 15 14 10 11 15 14
rst rst rst
10 0 0 0 0 10 0 0 0 0 10 0 0 0 0
8 9 13 12 8 9 13 12 8 9 13 12

and hence Boolean expressions

Q′2 = ( ¬rst ∧ Q2 Q0 ) ∨
( ¬rst ∧ Q1 ¬Q0 )
Q′1 = ( ¬rst ∧ ¬Q2 ∧ Q0 ) ∨
( ¬rst ∧ Q1 ∧ ¬Q0 )

Q′0 = ( ¬rst ∧ ¬Q2 ∧ ¬Q1 ) ∨


( ¬rst ∧ Q2 ∧ Q1 )

Placing the associated combinatorial logic and a 3-bit, D-type flip-flop based register to store Q into the
generic framework, we end up with a component that cycles through our 3-bit Gray code sequence under
control of a clock signal.

S170. a See Solution 166.

b There are a few different ways to interpret some parts of the problem definition, but one reasonable
approach is as follows:

H=1
B2 = 1 B0 = 1 B1 = 1
and and and
H=0 H=0 H=0
start S0 S1 S2 S3

H=1

H=1

H=1

Essentially, the idea is that by pressing buttons we advance from the stating state S0 toward the final state
S3 (as long as the handle is not turned, which means we go back to the start): when in S3 the door is
unlocked, otherwise it remains locked. In particular, if the buttons are pressed in the wrong order we
get “stuck” half way along the sequence and never reach S3 . For example if B1 is pressed while in state
S1 , the FSM does not (and cannot ever) transition into S2 since the button stays pressed: the only way to
“unstick” the FSM is to turn the handle, reset the mechanism and start again.
There are four states in total; since 22 = 4 we can represent the current state Q as a 2-bit integer, making
the concrete assignment
S0 7→ ⟨0, 0⟩
S1 7 → ⟨1, 0⟩
S2 7→ ⟨0, 1⟩
S3 7 → ⟨1, 1⟩

git # 8b6da880 @ 2023-09-27 347


© Daniel Page ⟨dan@phoo.org⟩

The FSM diagram can be expressed as a truth table, particular to this P, which captures the various
transitions:
H B2 B1 B0 Q1 Q0 Q′1 Q′0
0 0 ? ? 0 0 0 0
0 1 ? ? 0 0 0 1
1 ? ? ? 0 0 0 0
0 ? ? 0 0 1 0 1
0 ? ? 1 0 1 1 0
1 ? ? ? 0 1 0 0
0 ? 0 ? 1 0 1 0
0 ? 1 ? 1 0 1 1
1 ? ? ? 1 0 0 0
0 ? ? ? 1 1 1 1
1 ? ? ? 1 1 0 0
Implementing this truth table via a 6-input Karnaugh map is a little more tricky than with fewer inputs;
instead, we simply derive the expressions by inspection (i.e., by forming a term for each 1 entry in a given
output) to yield

Q′0 = ( ¬H ∧ B2 ∧ ¬Q1 ∧ ¬Q0 ) ∨


( ¬H ∧ ¬B0 ∧ ¬Q1 ∧ Q0 ) ∨
( ¬H ∧ B1 ∧ Q1 ∧ ¬Q0 ) ∨
( ¬H ∧ Q1 ∧ Q0 )

Q′1 = ( ¬H B0 ∧ ¬Q1 ∧ Q0 ) ∨
( ¬H ∧ ¬B1 ∧ Q1 ∧ ¬Q0 ) ∨
( ¬H ∧ B1 ∧ Q1 ∧ ¬Q0 ) ∨
( ¬H ∧ Q1 ∧ Q0 )

with minor optimisation possible thereafter. Returning to the framework, the idea is then that we

i instantiate the middle box with a 2-bit register, using D-type flip-flops for example, to store Q,
ii instantiate the top box to implement δ using the equations above,
iii instantiate the bottom box to implement ω using the equation

L = ¬(Q1 ∧ Q0 )

so the door is locked unless the FSM is in state S3 .

c The purpose of a clock signal is to control the FSM, advancing it through steps (i.e., transitions) with all
components synchronised. However, the only updates of state occur on positive transistors of Bi or H.
That is, the FSM only chances state when one of the buttons is pressed, or the handle turned: in each
case, this means the associated value transitions from 0 to 1. As a result, one could argue the expression

H ∨ B0 ∨ B1 ∨ B1 ∨ B2

can be used to advance the FSM (i.e., latch the next state produced by the transition function), rather than
“polling” the buttons and handle at each clock edge to see if their value has changed.

d Among various valid answers, the following are clear:

i The content stored in an SRAM memory is lost if the power supply is removed: such devices depend
on a power supply so transistors used to maintain the stored content can operate. In the context of
the proposed approach, this means if a power cut occurs, for example, then the password will be
“forgotten” by the lock.
ii When the power supply comes back online the password might be essentially random due to the
way SRAMs work. If this is not true however, and the SRAM is initialised into a predictable value
(e.g., all zero), this could offer an attractive way to bypass the security offered!
iii Given physical access to the lock, one might simply read the password out of the SRAM. With an FSM
hard-wired to a single password, the analogue is arguably harder: one would need to (invasively)
reverse engineer the gate layout and connectivity, then the FSM design.

git # 8b6da880 @ 2023-09-27 348


© Daniel Page ⟨dan@phoo.org⟩

Less attractive answers include degradation of performance (e.g., as a result of SRAM access latency) or
increase in cost: given constraints of the application, neither seems particular important. For example
the access latency of SRAM memory is measured in small fractions of a second; although arguably true
in general, from the perspective of a human user of the door lock the delay will be imperceptible.
e This is quite open-ended, but one reasonable approach would be as follows:

i This is a slightly loaded question in that it implies some alteration is needed; as such, marks might
typically be given for identifying the underlying reason, and explaining each aspect of the proposed
alteration.
The crucial point to realise is testing implementations of δ and ω, for example, depends on being
able to set (and possibly inspect) the state Q which acts as input to both. An example technology to
allow this would be JTAG, which requires an additional interface (inc. TDI, TDO, TCLK, TMS and
TRST pins) and also injection of a scan chain to access all flip-flops. This allows the test process to
scan a value into Q one bit at a time, run the system normally, then scan out Q to test it.
ii The idea would be to place each system under the control of a test stimulus that automates a series of
tests: the test stimulus has access to all inputs (i.e., the JTAG interface, each button and the handle)
and outputs (e.g., the JTAG interface, and the lock mechanism), and is tasked with making sure the
overall behaviour matches some reference.
In this context, the number of states, inputs and outputs is small enough that a brute force approach
is reasonable; this is also motivated by the fact there are no obvious boundary cases and so on. The
strategy would therefore be: for each entry in the truth table
• put the device in test mode,
• scan-in the state Q and drive each Bi with the associated values,
• put the device in normal mode, and force an update of the FSM using the clock signal,
• put the device in test mode,
• check the value of L matches that expected,
• scan-out and check the value of Q matches that expected.

An alternative answer might focus on some form of BIST, but in essence this just places all the above
inside the system rather than viewing it as something done externally.

S171. a At least three advantages (or disadvantages, depending on which way around you view the options) are
evident:

• With option one, extracting each digit of the current PIN to form a guess is trivial; with option
two this is much harder, in that we need to take the integer P and decompose it into a decimal
representation (through repeated division and modular reduction).
• With option one, incrementing the current PIN is harder (since the addition is in decimal); with
option two this is much easier, in that we can simply use a standard integer adder.
• With option one, the total storage requirement is 4 · 4 = 16 bits; with option two this is only 14 bits,
since 214 = 16384 > 9999.

Based on this, and reading ahead to the next question, the decimal representation seems more attractive:
designing a decimal adder is significantly easier than a binary divider.
b Given the choice, and although both options are viable, we focus on a design for the second, decimal
representation: this is simpler by some way, so the expected answer. At a high-level, the component can
be described as follows:

P_3 P_2 P_1 P_0


| | | |
G_3 <--+ G_2 <--+ G_1 <--+ G_0 <--+
| | | |
v v v v
+-----------+ +-----------+ +-----------+ +-----------+
| x | | x | | x | | x |
| | | | | | | |
|co add ci|<--|co add ci|<--|co add ci|<--|co add ci|<-- 1
| | | | | | | |
| r | | r | | r | | r |
+-----------+ +-----------+ +-----------+ +-----------+
| | | |
v v v v
P'_3 P'_2 P'_1 P'_0

git # 8b6da880 @ 2023-09-27 349


© Daniel Page ⟨dan@phoo.org⟩

Pi = Gi so production of the guess is trivial; the other output is a little harder. The basic idea is to use
something similar to a ripple-carry adder. Each i-th cell takes a decimal digit Pi and a carry-in from the
previous, (i − 1)-th cell; it produces a decimal digit P′i and a carry-out into the next (i + 1)-th cell. The
difference from a binary ripple-carry adder then is that it only accepts one digit rather than two as input
(since it increments P rather than computes a general-purpose addition), plus it obviously works with
decimal rather than binary digits.

There are various ways to approach the design of each decimal adder cell, but perhaps the most straight-
forward uses two stages:

x_3 x_2 x_1 x_0


| | | |
v v v v
+------------+ +------------+ +------------+ +------------+
| x | | x | | x | | x |
| | | | | | | |
| co ha ci |<--| co ha ci |<--| co ha ci |<--| co ha ci |<-- 0
| | | | | | | |
| r | | r | | r | | r |
+------------+ +------------+ +------------+ +------------+
| | | |
r'_3 r'_2 r'_1 r'_0
| | | |
v v v v
+---------------------------------------------------------------+
| |
co <--| modular reduction |
| |
+---------------------------------------------------------------+
| | | |
r_3 r_2 r_1 r_0
| | | |
v v v v

The first stage computes an integer sum r′ = x + ci. Although this could be realised using a standard
ripple-carry adder, we can make a more problem-specific improvement: a ripple-carry adder normally
uses full-adder cells that compute x + y + ci, but we lack the second input y. Thus we can use half-adder
cells instead, which use half the number of gates; we assume such a half-adder is available as a standard
component. The second stage takes r′ = x + ci as input, and produces the outputs r and co, implementing
the modular reduction. The range of each input means 0 ≤ r′ < 11, or equivalently that cases where
r′ > 10 are impossible. We can describe the behaviour of the stage using the following truth table:

r′3 r′3 r′3 r′3 co r3 r2 r1 r0


0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 1
0 0 1 0 0 0 0 1 0
0 0 1 1 0 0 0 1 1
0 1 0 0 0 0 1 0 0
0 1 0 1 0 0 1 0 1
0 1 1 0 0 0 1 1 0
0 1 1 1 0 0 1 1 1
1 0 0 0 0 1 0 0 0
1 0 0 1 0 1 0 0 1
1 0 1 0 1 0 0 0 0
1 0 1 1 ? ? ? ? ?
1 1 0 0 ? ? ? ? ?
1 1 0 1 ? ? ? ? ?
1 1 1 0 ? ? ? ? ?
1 1 1 1 ? ? ? ? ?

As such, we can produce a set of Karnaugh maps

git # 8b6da880 @ 2023-09-27 350


© Daniel Page ⟨dan@phoo.org⟩

r′1 r′1
r′0 r′0
r3 00 01 11 10 r2 00 01 11 10

00 0 0 0 0 00 0 0 0 0
0 1 5 4 0 1 5 4

01 0 0 0 0 01 1 1 1 1
r′2 2 3 7 6
r′2 2 3 7 6

11 ? ? ? ? 11 ? ? ? ?
r′3 10 11 15 14
r′3 10 11 15 14

10 1 1 ? 0 10 0 0 ? 0
8 9 13 12 8 9 13 12

r′1 r′1
r′0 r′0
r1 00 01 11 10 r0 00 01 11 10

00 0 0 1 1 00 0 1 1 0
0 1 5 4 0 1 5 4

01 0 0 1 1 01 0 1 1 0
r′2 2 3 7 6
r′2 2 3 7 6

11 ? ? ? ? 11 ? ? ? ?
r′3 10 11 15 14
r′3 10 11 15 14

10 0 0 ? 0 10 0 1 ? 0
8 9 13 12 8 9 13 12

which translate fairly easily into Boolean expressions

r3 = r′3 ∧ ¬r′1
r2 = r′2
r1 = ¬r′3 ∧ r′1
r0 = r′0

that allow implementation.

c The FSM maintains a current state Q. Given there are five states, we can represent the current state as

Q = ⟨Q0 , Q1 , Q2 ⟩

i.e., three bits (since 23 = 8 > 5), so the device could store it in a register comprised of three D-type
flip-flops; doing so accepts there are three unused state representations.
We can represent the states as follows
S0 = ⟨0, 0, 0⟩
S1 = ⟨1, 0, 0⟩
S2 = ⟨0, 1, 0⟩
S3 = ⟨1, 1, 0⟩
S4 = ⟨0, 0, 1⟩
and therefore formulate a tabular transition function δ:

b r Q2 Q1 Q0 Q′2 Q′1 Q′0


0 ? 0 0 0 0 0 0
1 ? 0 0 0 0 0 1
? ? 0 0 1 0 1 0
? ? 0 1 0 0 1 1
? 0 0 1 1 0 0 1
? 1 0 1 1 1 0 0
? ? 1 0 0 1 0 0

Turning these into Karnaugh maps and then Boolean expressions is a little tricky due to the need for five
inputs. To cope, we assume there are no transitions from S0 and ignore b, then patch the equation for Q′0
(the only bit of the next state influenced by moving out of S0 ) appropriately. That is, we get the following

git # 8b6da880 @ 2023-09-27 351


© Daniel Page ⟨dan@phoo.org⟩

Q1 Q1 Q1
Q0 Q0 Q0
Q′2 00 01 11 10
Q′1 00 01 11 10
Q′0 00 01 11 10

00 0 0 0 0 00 0 1 0 1 00 0 0 1 1
0 1 5 4 0 1 5 4 0 1 5 4

01 1 ? ? ? 01 0 ? ? ? 01 0 ? ? ?
2 3 7 6 2 3 7 6 2 3 7 6
Q2 Q2 Q2
11 1 ? ? ? 11 0 ? ? ? 11 0 ? ? ?
10 11 15 14 10 11 15 14 10 11 15 14
r r r
10 0 0 1 0 10 0 1 0 1 10 0 0 0 1
8 9 13 12 8 9 13 12 8 9 13 12

which then translate into

Q′2 = ( r∧ Q1 ∧ Q0 ) ∨
( Q2 )
Q′1 = ( ¬ Q1 ∧ Q0 ) ∨
( Q1 ∧ ¬ Q0 )
Q′0 = ( b ∧ ¬ Q2 ∧ ¬ Q1 ∧ ¬ Q0 ) ∨
( ¬r ∧ Q1 ) ∨
( Q1 ∧ ¬ Q0 )

S172. a First we need to decode the machine code program: using Figure A.20, we find that

0A3(16) = 010100011(2) 7→ L0 : if R2 = 0 then goto L3 else goto L1


060(16) = 001100000(2) 7 → L1 : R2 ← R2 − 1 then goto L2
080(16) = 010000000(2) 7→ L2 : if R0 = 0 then goto L0 else goto L3
097(16) = 010010111(2) 7 → L3 : if R1 = 0 then goto L7 else goto L4
050(16) = 001010000(2) 7 → L4 : R1 ← R1 − 1 then goto L5
020(16) = 000100000(2) 7 → L5 : R2 ← R2 + 1 then goto L6
083(16) = 010000011(2) 7 → L6 : if R0 = 0 then goto L3 else goto L7
0C0(16) = 011000000(2) 7 → L7 : halt

Next we can produce a trace of execution for the program: starting with the initial configuration given,

git # 8b6da880 @ 2023-09-27 352


© Daniel Page ⟨dan@phoo.org⟩

we find that

C0 = (0, 0, 2, 1, 0)
L0 { if R2 = 0 then goto L3 else goto L1
C1 = (1, 0, 2, 1, 0)
L1 { R2 ← R2 − 1 then goto L2
C2 = (2, 0, 2, 0, 0)
L2 { if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 2, 0, 0)
L0 { if R2 = 0 then goto L3 else goto L1
C4 = (3, 0, 2, 0, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 2, 0, 0)
L4 { R1 ← R1 − 1 then goto L5
C6 = (5, 0, 1, 0, 0)
L5 { R2 ← R2 + 1 then goto L6
C7 = (6, 0, 1, 1, 0)
L6 { if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 1, 1, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 1, 1, 0)
L4 { R1 ← R1 − 1 then goto L5
C10 = (5, 0, 0, 1, 0)
L5 { R2 ← R2 + 1 then goto L6
C11 = (6, 0, 0, 2, 0)
L6 { if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 0, 2, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C13 = (7, 0, 0, 2, 0)
L7 { halt

where the final configuration halts execution.

As a result, stating that the program will “copy the value in R1 into R2 , clearing the value in R1 ” is the
best match. Note that the program itself is in two parts: L0 to L2 clear (or zero) R2 , and L3 to L6 move R1
into R2 . Also note that it depends on having R0 = 0, allowing the construction of unconditional branches
in L2 and L6 .

b First we need to decode the machine code program: using Figure A.20, we find that

0B3(16) = 010110011(2) 7→ L0 : if R3 = 0 then goto L3 else goto L1


070(16) = 001110000(2) 7 → L1 : R3 ← R3 − 1 then goto L2
080(16) = 010000000(2) 7→ L2 : if R0 = 0 then goto L0 else goto L3
097(16) = 010010111(2) 7 → L3 : if R1 = 0 then goto L7 else goto L4
030(16) = 000110000(2) 7 → L4 : R3 ← R3 + 1 then goto L5
050(16) = 001010000(2) 7 → L5 : R1 ← R1 − 1 then goto L6
083(16) = 010000011(2) 7 → L6 : if R0 = 0 then goto L3 else goto L7
0AB(16) = 010101011(2) 7 → L7 : if R2 = 0 then goto L11 else goto L8
030(16) = 000110000(2) 7 → L8 : R3 ← R3 + 1 then goto L9
060(16) = 001100000(2) 7 → L9 : R2 ← R2 − 1 then goto L10
087(16) = 010000111(2) 7 → L10 : if R0 = 0 then goto L7 else goto L11
0C0(16) = 011000000(2) 7 → L11 : halt

Next we can produce a trace of execution for the program: starting with the initial configuration given,

git # 8b6da880 @ 2023-09-27 353


© Daniel Page ⟨dan@phoo.org⟩

we find that
C0 = (0, 0, 3, 2, 1)
L0 { if R3 = 0 then goto L3 else goto L1
C1 = (1, 0, 3, 2, 1)
L1 { R3 ← R3 − 1 then goto L2
C2 = (2, 0, 3, 2, 0)
L2 { if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 3, 2, 0)
L0 { if R3 = 0 then goto L3 else goto L1
C4 = (3, 0, 3, 2, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 3, 2, 0)
L4 { R3 ← R3 + 1 then goto L5
C6 = (5, 0, 3, 2, 1)
L5 { R1 ← R1 − 1 then goto L6
C7 = (6, 0, 2, 2, 1)
L6 { if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 2, 2, 1)
L3 { if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 2, 2, 1)
L4 { R3 ← R3 + 1 then goto L5
C10 = (5, 0, 2, 2, 2)
L5 { R1 ← R1 − 1 then goto L6
C11 = (6, 0, 1, 2, 2)
L6 { if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 1, 2, 2)
L3 { if R1 = 0 then goto L7 else goto L4
C13 = (4, 0, 1, 2, 2)
L4 { R3 ← R3 + 1 then goto L5
C14 = (5, 0, 1, 2, 3)
L5 { R1 ← R1 − 1 then goto L6
C15 = (6, 0, 0, 2, 3)
L6 { if R0 = 0 then goto L3 else goto L7
C16 = (3, 0, 0, 2, 3)
L3 { if R1 = 0 then goto L7 else goto L4
C17 = (7, 0, 0, 2, 3)
L7 { if R2 = 0 then goto L11 else goto L8
C18 = (8, 0, 0, 2, 3)
L8 { R3 ← R3 + 1 then goto L9
C19 = (9, 0, 0, 2, 4)
L9 { R2 ← R2 − 1 then goto L10
C20 = (10, 0, 0, 1, 4)
L10 { if R0 = 0 then goto L7 else goto L11
C21 = (7, 0, 0, 1, 4)
L7 { if R2 = 0 then goto L11 else goto L8
C22 = (8, 0, 0, 1, 4)
L8 { R3 ← R3 + 1 then goto L9
C23 = (9, 0, 0, 1, 5)
L9 { R2 ← R2 − 1 then goto L10
C24 = (10, 0, 0, 0, 5)
L10 { if R0 = 0 then goto L7 else goto L11
C25 = (7, 0, 0, 0, 5)
L7 { if R2 = 0 then goto L11 else goto L8
C26 = (11, 0, 0, 0, 5)
L11 { halt
where the final configuration halts execution.
As a result, stating that the program will “add the values in R1 and R2 , setting R3 to reflect the result” is
the best match. Note that the program itself is in three parts: L0 to L2 clear (or zero) R3 , L3 to L6 add R1
to R3 , L7 to L10 add R2 to R3 . Also note that it depends on having R0 = 0, allowing the construction of
unconditional branches in L2 , L6 , and L10 .

git # 8b6da880 @ 2023-09-27 354


© Daniel Page ⟨dan@phoo.org⟩

S173. First, note that


0A5(16) ≡ 000010100101(2) 7→ 010100101(2) .
We can see that the (red) opcode determines the instruction type, i.e.,

if Raddr = 0 then goto Ltarget else goto Li+1 .

More specifically, the (green) register address and the (blue) branch target address mean the instruction
semantics are
if R2 = 0 then goto L5 else goto Li+1 ,
i.e., if register 2 equals 0 then goto instruction 5, else goto instruction i + 1.

S174. Once fetched, the instruction inst = 100111001(2) is provided as input to the decoder: based on the implemen-
tation given, the decoder will therefore produce

• op = 0(10) because inst8,···6 = 100(2) = 4(10) ,


• wr = 1(10) because inst8,···6 = 100(2) = 4(10) ,
• addr = 3(10) because inst5,···4 = 11(2) = 3(10) ,
• target = 9(10) because inst3,···0 = 1001(2) = 9(10) ,
• jmp = 0(10) because ¬inst8 ∧ inst7 ∧ ¬inst6 = 0 ∧ 0 ∧ 1 = 0,
• halt = 0(10) because ¬inst8 ∧ inst7 ∧ inst6 = 0 ∧ 0 ∧ 0 = 0.

as output. Looking then at the data- and control-path to assess how these outputs are used to execute the
instruction, we conclude that

• since addr = 3(10) , the register R3 is read from,


• since op = 0(10) , this value is discarded and a 0 output by the multiplexer is written into register R′ ,
• since wr = 1(10) register R′ is written into register R3 ,
• since halt = 0(10) the counter machine does not halt,
• since jmp = 0(10) , the program counter is incremented as normal (i.e., not set to target, which in fact is
unused).

In general then, this instruction will writes 0 into register Raddr ; given addr = 3 here,

Li : R3 ← 0 then goto Li+1

is therefore the correct semantics.

B.6 Chapter ??
S175. a wire [ 7 : 0 ] a;
b wire [ 0 : 4 ] b;
c reg [ 31 : 0 ] c;
d reg signed [ 15 : 0 ] d;
e reg [ 7 : 0 ] e[ 0 : 1023 ];
f genvar f;

S176. a c = 2'b01
b c = 2'b11
c d = 4'b010X
d d = 4'b10XX

git # 8b6da880 @ 2023-09-27 355


© Daniel Page ⟨dan@phoo.org⟩

e d = 4'b1101

f d = 4'b0111

g c = 2'bXX

h c = 2'b11

i e = 1'b0

j e = 1'b1

S177. a One potential problem is the if p and q can change at any time, and hence trigger execution of the processes
at any time, the two might change at the exact same time. In this case, it is not clear which values x and y
will be assigned to. Maybe the top assignment to x beats the bottom one, but the bottom assignment to y
beats the top one. Any combination is possible; since it is not clear which will occur, it is possible that x
and y do not get assigned to the values one would expect.
As an attempt at a solution, we can try to exert some control over which block of assignments is executed
first. For example, we might try to place a guard around the assignments:

always @ ( posedge p ) begin


if( !q ) begin
x <= a;
y <= b;
end
end

always @ ( posedge q ) begin


if( !p ) begin
x <= b;
y <= a;
end
end

An alternative might be to combine the two blocks into one:

always @ ( posedge p, posedge q ) begin


if ( p ) begin
x <= a;
y <= b;
end
else if( q ) begin
x <= b;
y <= a;
end
end

since now the process is at least deterministic: if p is equal to 1 then the first block executes, if q is equal
to 1 then the second block executes and if both are equal to 1 we execute the first block as the default.

b The problem with this is that the state signal is not initialised; to start with it could be any value which
might either result in the state machine operating in the wrong sequence or, since case is used and not
casex or casez, none of the cases being matched at all. A slightly more minor issue is that we have to
assume that no other process assigns to state. For example, if another process sets state to 3 the state
machine process will malfunction.
The best way to rectify this problem is by introducing a reset signal called rst and initialising the state
variable whenever it is equal to 1:

always @ ( posedge clk , posedge rst ) begin


if( rst ) begin
state = 0;
end
else begin
case( state )
0 : begin do_0; state = 1; end
1 : begin do_1; state = 2; end
2 : begin do_2; state = 0; end
endcase
end
end

git # 8b6da880 @ 2023-09-27 356


© Daniel Page ⟨dan@phoo.org⟩

S178. This is a bit of a vague question; it does not mention styles of Verilog and so you can assume any valid style is
allowed. With this in mind, a rough solution might look something like this:

module DDD( clk , rst , bit , out );

input wire clk;


input wire rst;

input wire bit;


output wire out;

reg [ 3 : 0 ] t;

assign out = ( t >= 0 ) && ( t <= 9 );

always @ ( posedge clk ) begin


if( rst ) begin
t = 0;
end
else begin
t = { bit , t[ 3 : 1 ] };
end
end

endmodule

S179. Essentially we want a design that looks, at a high-level, like this:

y
|
v
+-----+
x-->| C |-->min(x,y)
+-----+
|
V
max(x,y)

and thus need some sort of comparison inside; we already know how to design a less than comparison which
is good enough. Thus, the Verilog module could look something like this:

module C( min , max , x, y );

parameter n = 8;

output wire r;

output wire [ n - 1 : 0 ] min;


output wire [ n - 1 : 0 ] max;

input wire [ n - 1 : 0 ] x;
input wire [ n - 1 : 0 ] y;

wire [ n - 1 : 0 ] w0 = ~( x ^ y );
wire [ n - 1 : 0 ] w1 = ( ~x & y );
wire [ n - 1 : 0 ] w2;

assign w2[ 0 ] = w1[ 0 ];

genvar i;

generate
for( i = 1; i < n; i = i + 1 ) begin
assign w2[ i ] = w1[ i ] | ( w0[ i ] & w2[ i - 1 ] );
end
endgenerate

assign min = ( w2[ n - 1 ] ) ? x : y;


assign max = ( w2[ n - 1 ] ) ? y : x;

endmodule

S180. In software the task is fairly simple; we simply have an 8-bit variable called Q which maintains the content of
the shift register, and allow two functions to initialise and update the value (i.e., clock the register) as follows:

git # 8b6da880 @ 2023-09-27 357


© Daniel Page ⟨dan@phoo.org⟩

uint8_t Q = 0;

void seed( uint8_t s ) {


// seed the state
Q = s;
}

uint8_t lfsr () {
// compute outgoing bit
uint8_t r = Q & 1;

// compute incoming bit


uint8_t t = ( Q >> 7 ) ^
( Q >> 5 ) ^
( Q >> 4 ) ^
( Q >> 3 ) ^
( Q >> 0 ) ;

// update state
Q = ( Q >> 1 ) |
( x << 7 ) ;

return r;
}

Following this approach, in Verilog the design is also simple: we just need to deal with the required action
each time a positive clock edge triggers an update. For example, the following does more or less the same
thing as the C version:

module lfsr( input wire clk ,


input wire rst ,

input wire [ 7 : 0 ] s,
output reg r );

reg [ 7 : 0 ] Q;

always @ ( posedge clk ) begin


if ( rst ) begin
// seed the state
Q = s;
end else begin
// compute outgoing bit
r = Q[ 0 ] ;

// compute incoming bit , update state


Q = { Q[ 7 ] ^
Q[ 5 ] ^
Q[ 4 ] ^
Q[ 3 ] ^
Q[ 0 ], Q[ 7 : 1 ] };
end
end

endmodule

That is, we again have 8-bit register called Q; each time a positive edge occurs on clk we make a choice

• If rst is true then we initialise the LFSR by setting Q equal to the seed value s,
• If rst is false then we update the LFSR by first setting the output r equal to the 0-th bit of Q, then updating
Q via a concatenation expression (which performs the shift, meaning the (n − 1)-th bit of the result is the
XOR of the tap bits).

git # 8b6da880 @ 2023-09-27 358

You might also like