Logic Gate
Logic Gate
Architecture
CONTENTS
II Appendices 201
A Example exam-style questions 203
A.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
A.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
A.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
LIST OF FIGURES
2.20 A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour. . . 91
2.21 A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive
n target gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.22 An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells. . . . . . 95
2.23 Application of the isolated and cascaded replication design patterns. . . . . . . . . . . . . . . . . 96
2.24 An overview of equality and less than comparators. . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.25 An overview of half- and full-adder cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.26 Gate universality used to implement a NAND- and NOR-based half-adder. Note that the
dashed boxes in the NAND and NOR implementations (middle and bottom) are translations of
the primitive gates within the more natural description (top). . . . . . . . . . . . . . . . . . . . . . 100
2.27 An example encoder/decode pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.28 An incorrect counter design, using naive “looped” feedback. . . . . . . . . . . . . . . . . . . . . . 103
2.29 An illustration of standard features in 1- and 2-phase clocks. . . . . . . . . . . . . . . . . . . . . . 104
2.30 Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation
around en). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.31 A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated
(middle and bottom) control features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.32 A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases
for S = 0 and R = 0, and no sane cases for when S = 1 and R = 1. . . . . . . . . . . . . . . . . . . . 110
2.33 An annotated SR latch, decomposed into two NOR gates and then into transistors; r0 , the output
of the top NOR gate, is used as input by the bottom NOR gate and r1 , the output from the bottom
NOR gate, is used as input by the top NOR gate (although the physical connections are not drawn).112
2.34 A NOR-based D-type flip-flop created using a glitch generator. . . . . . . . . . . . . . . . . . . . 114
2.35 A NOR-based D-type flip-flop created using a primary-secondary organisation of latches. . . . . 114
2.36 An n-bit register, with n replicated 1-bit components synchronised using the same enable signal. 115
2.37 A correct counter design, using sequential logic components. . . . . . . . . . . . . . . . . . . . . . 116
2.38 Two illustrative waveforms, outlining stages of computation within the associated counter design.117
2.39 Two different high-level clocking strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.40 Production line #1, staffed with pre-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.41 Production line #2, staffed with post-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.42 Four different ways to split a (hypothetical) component X into stages. . . . . . . . . . . . . . . . . 121
2.43 A problematic pipeline, and a solution involving the use of pipeline registers and a control signal
to indicate when each stage should advance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.44 An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock. 124
2.45 An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative. . . . . . . . . 124
2.46 An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative. . 125
2.47 An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative. . . . . . . . . . 126
2.48 A high-level illustration of a lithography-based fabrication process. . . . . . . . . . . . . . . . . . 127
2.49 Bonding wires connected to a high quality gold pad (public domain image, source: http:
//en.wikipedia.org/wiki/Image:Wirebond-ballbond.jpg). . . . . . . . . . . . . . . . . . . . . 128
2.50 A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public
domain image, source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_
z-clip.png). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.51 A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.
intel.com/technology/mooreslaw/). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.52 Conceptual diagrams of a PLA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.53 Conceptual diagrams of an FPGA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.1 Two high-level ALU architectures: each combines a number of sub-components, but does so
using a different strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2 An n-bit, ripple-carry adder described using a circuit diagram. . . . . . . . . . . . . . . . . . . . . 139
3.3 An n-bit, ripple-carry subtractor described using a circuit diagram. . . . . . . . . . . . . . . . . . 139
3.4 An n-bit, ripple-carry adder/subtractor described using a circuit diagram. . . . . . . . . . . . . . 139
3.5 An n-bit, carry look-ahead adder described using a circuit diagram. . . . . . . . . . . . . . . . . . 139
3.6 An illustration depicting the structure of carry look-ahead logic, which is formed by an upper-
and lower-tree of OR and AND gates respectively (with leaf nodes representing gi and pi terms
for example). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.7 An overview of half- and full-subtractor cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.8 An iterative design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . . . . 152
3.9 A combinatorial design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . 152
3.10 Two examples demonstrating different strategies for accumulation of base-b partial products
resulting from two 3-digit operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.11 An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram. . . 162
3.12 A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs. 174
3.13 An (n × n)-bit tree multiplier design, described using a circuit diagram. . . . . . . . . . . . . . . . 175
3.14 A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram. . 175
3.15 An n-bit, unsigned equality comparison described using a circuit diagram. . . . . . . . . . . . . . 178
3.16 An n-bit, unsigned less than comparison described using a circuit diagram. . . . . . . . . . . . . 178
5.1 An example FSM to decide whether there is an odd number of 1 elements in some sequence X. . 188
5.2 An example FSM modelling a simple vending machine. . . . . . . . . . . . . . . . . . . . . . . . . 188
5.3 Two generic FSM frameworks (for different clocking strategies) into which one can place imple-
mentations of the state, δ (the transition function) and ω (the output function). . . . . . . . . . . 191
5.4 Two illustrative waveforms (for different clocking strategies), outlining stages of computation
within the associated FSM framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5 An example FSM modelling an ascending modulo 6 counter. . . . . . . . . . . . . . . . . . . . . . 194
5.6 An example FSM modelling an ascending or descending modulo 6 counter. . . . . . . . . . . . . 195
5.7 An example FSM modelling a traffic light controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 195
A.1 A set of 5 different Karnaugh maps, captioned with an associated option. . . . . . . . . . . . . . . 207
A.2 A truth table for the 4-input Boolean function f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
A.3 MOSFET-based implementations of C0 and C1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
A.4 An implementation of a full-adder cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
A.5 An implementation of a cyclic n-bit counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
A.6 A combinatorial logic design, described using N-type and P-type MOSFET transistors. . . . . . . 221
A.7 A sequential logic design, containing two D-type flip-flops. . . . . . . . . . . . . . . . . . . . . . . 221
A.8 A combinatorial logic design, described using N-type and P-type MOSFET transistors; note that
the pull-down network is (partially) missing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
A.9 An SR-latch, described in terms of abstract components labelled ⊙. . . . . . . . . . . . . . . . . . 224
A.10 An SR-latch variant, which includes additional inputs P, C, and en. . . . . . . . . . . . . . . . . . 224
A.11 A 4Mbit DRAM block diagram (source: http://www.micross.com/pdf/MT4C4001J.pdf). . . . . 246
A.12 A diagrammatic description of an 8-bit micro-processor and associated memory system. . . . . . 247
A.13 An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and rst on the left-hand side; 8-bit s
spread within the design) and 1 output (1-bit r on the right-hand side). . . . . . . . . . . . . . . . 251
A.14 A waveform describing behaviour of Φ1 , Φ2 , and rst within Figure A.13. . . . . . . . . . . . . . . 252
A.15 A NAND-based implementation of a D-type latch. . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
A.16 A NAND-based implementation of a 2-input XOR gate. . . . . . . . . . . . . . . . . . . . . . . . . 252
A.17 A NAND-based implementation of a 2-input, 1-bit multipliexer. . . . . . . . . . . . . . . . . . . . 252
A.18 Implementation of a simple FSM, using D-type latches and a 2-phase clock. . . . . . . . . . . . . 253
A.19 Implementation of a simple FSM, using D-type latches and a 2-phase clock. . . . . . . . . . . . . 254
A.20 The instruction set for an example 4-register counter machine. . . . . . . . . . . . . . . . . . . . . 257
A.21 The high-level data- and control-path for an example 4-register counter machine. . . . . . . . . . 258
A.22 The low-level decoder implementation for an example 4-register counter machine. . . . . . . . . 259
LIST OF TABLES
Part I
CHAPTER
1
MATHEMATICAL PRELIMINARIES
In Mathematics you don’t understand things. You just get used to them.
– von Neumann
The goal of this Chapter is to provide a fairly comprehensive overview of theory that underpins the rest of the book. At
first glance the content may seem a little dry, and is often excluded in other similar books. It seems clear, however, that
without a solid understanding of said theory, using the constituent topics to solve practical problems will be much harder.
The topics covered all relate to the field of discrete Mathematics; they include propositional logic, sets and functions,
Boolean algebra and number systems. These four topics combine to produce a basis for formal methods to describe,
manipulate and implement digital systems. Readers with a background in Mathematics or Computer Science might skip
this Chapter and use it simply for reference; those approaching it from some other background would be advised to read
the material in more detail.
In part because we use them naturally in language, it almost seems too formal to define what a proposition is.
However, by doing so we can start to use them as a building block to describe what propositional logic is and
how it works. This is best explained step-by-step by example:
is a proposition since it is definitely either true or false. When we take a proposition and decide whether it is true
or false, we say we have evaluated it. However, there are clearly a lot of statements that are not propositions
because they do not state any proposal. For example,
is a command or request of some kind, it does not evaluate to a truth value. Propositions must also be well
defined in the sense that they are definitely either true or false, i.e., there are no “grey areas” in between. The
statement
is not a proposition, because it could be true or false depending on the context: 90◦ C is probably too hot for
body temperature, but not for a cup of coffee. Finally, some statements seem to be propositions but cannot be
evaluated because they are paradoxical: a famous example is the so-called liar paradox, usually attributed to
the Greek philosopher Eubulides, who stated it as
If the man is telling the truth, everything he says must be true which means he is lying and hence everything
he says is false. Conversely, if the man is lying everything he says is false so he cannot be lying (because he
said that he was). In terms of the statement, we cannot be sure of the truth value so this cannot be classified as
a proposition.
Example 1.2. When a proposition contains one or more variables, we can only evaluate it having first assigned
each a concrete value. For example, consider
where x is a variable. By assigning x a value we get a proposition; setting x = 10, for example, gives
Definition 1.2. Informally, a propositional function is just a short-hand way of writing a proposition; we give the
function a name and a list of free variables. So, for example, the function
f (x, y) : x = y
is called f and has two variables named x and y. If we use the function as f (10, 20), performing the binding x = 10 and
y = 20, it has the same meaning as 10 = 20.
Now, h is representing a longer proposition. When we bind x to a value via h(10), we find
1.1.1 Connectives
Definition 1.3. A connective binds together a number of propositional terms into a single, compound proposition called
an expression. For brevity, we use symbols to denote common connectives:
• “not x” is denoted ¬x, and often termed logical complement (or negation).
• “x or y” is denoted x ∨ y, and often called an inclusive-or, and termed logical (inclusive) disjunction.
• “x or y but not x and y” is denoted x ⊕ y, and often called an exclusive-or, and termed logical (exclusive) disjunc-
tion.
• “x implies y” is denoted x ⇒ y, and sometimes written as “if x then y”, and termed logical implication, and
finally
• “x is equivalent to y” is denoted x ≡ y, and sometimes written as “x if and only if y” or even “x iff. y”. termed
logical equivalence.
Note that we group statements using parentheses when there could be some confusion about the order they are applied.
As such (x ∧ y) is the same as x ∧ y, and (x ∧ y) ∨ z simply means we apply the ∧ connective to x and y first, then ∨ to
the result and z.
Definition 1.4. Provided we include parentheses in a compound proposition, there will be no ambiguity wrt. the order
connectives are applied. For instance, if we write
(x ∧ y) ∨ z
it is clear that we first resolve the conjunction of x and y, then the disjunction of that result and z.
If parentheses are not included however, we rely on precedence rules to determine the order for us. In short, the
following list
1. ¬,
2. ∧,
3. ∨,
4. ⇒,
5. ≡
assigns a precedence level to each connective. Using the same example as above, if we omit the parentheses and instead
write
x∧y∨z
we still get the same result: ∧ has a higher precedence level than ∨ (sometimes we say ∧ “binds more tightly” to operands
than ∨), so we resolve the former before the latter.
Example 1.4. For example, the expression
“the temperature is less than 90◦ C ∧ the temperature is greater than 10◦ C”
and
These terms are joined together using the ∧ connective so that the whole expression evaluates to true if both of
the terms are true, otherwise it evaluates to false. In a similar way we might write a compound proposition
which can only be evaluated when we assign values to the variables x and y.
Definition 1.5. The meaning of connectives is usually describe in a tabular form which enumerates the possible values
each term can take and what the resulting truth value is; we call this a truth table.
Example 1.5. The ¬ connective complements (or negates) the truth value of a given expression. Considering
the expression
¬(x > 10),
we find that the expression ¬(x > 10) is true if the term x > 10 is false and the expression is false if x > 10 is
true. If we assign x = 9, x > 10 is false and hence the expression ¬(x > 10) is true. If we assign x = 91, x > 10 is
true and hence the expression ¬(x > 10) is false.
Example 1.6. The meaning of the ∧ connective is also as one would expect; the expression
(x > 10) ∧ (x < 90)
is true if both the expressions x > 10 and x < 90 are true, otherwise it is false. So if x = 20, the expression is true.
But if x = 9 or x = 91, then it is false: even though one or other of the terms is true, they are not both true.
Example 1.7. The inclusive-or and exclusive-or connectives are fairly similar. The expression
(x > 10) ∨ (x < 90)
is true if either x > 10 or x < 90 is true or both of them are true. Here we find that all the assignments x = 20,
x = 9 and x = 91 mean the expression is true; in fact it is hard to find an x for which it evaluates to false!
Conversely, the expression
(x > 10) ⊕ (x < 90)
is only true if only one of either x > 10 or x < 90 is true; if they are both true then the expression is false. We
now find that setting x = 20 means the expression is false while both x = 9 and x = 91 mean it is true.
Example 1.8. Implication is more tricky. If we write x ⇒ y, we typically call x the hypothesis and y the
conclusion. In order to justify the truth table for implication, consider the example
(x is prime ) ∧ (x , 2) ⇒ (x ≡ 1 (mod 2))
i.e., if x is a prime other than 2, it follows that it is odd. Therefore, if x is prime then the expression is true if
x ≡ 1 (mod 2) and false otherwise (since the implication is invalid). If x is not prime, then the expression does
not really say anything about the expected outcome: we only know what to expect if x was prime. Since it could
still be that x ≡ 1 (mod 2) even when x is not prime, based on what we know from the example, we assume it
is true when this case occurs.
Put in a less formal way, the idea is that anything can follow from a false hypothesis. If the hypothesis is
false, we cannot be sure whether or not the conclusion is false: we therefore we assume it is possibly true,
which is sort of an “optimistic default”. Consider a less formal example to support this. The statement “if I
am unhealthy then I will die” means x = “I am unhealthy” and y = “I will die”, and that r = x ⇒ y has four
possible cases:
1. I am healthy and do not die, so x = false, y = false and r = true,
2. I am healthy and die, so x = false, y = true and r = true,
3. I am unhealthy and do not die, so x = true, y = false and r = false, and
4. I am unhealthy and die, so x = true, y = true and r = true.
The first two cases do not contradict the original statement (since in them I am healthy, so it doesn’t apply):
only the third case does, in that I do not die (maybe I had a good doctor for instance).
Example 1.9. In contrast, equivalence is fairly simple. The expression x ≡ y is only true if x and y evaluate
to the same value. This matches the concept of equality in other contexts, such as between numbers. As an
example, consider
(x is odd ) ≡ (x ≡ 1 (mod 2)).
This expression is true since if the left side is true, the right side must also be true and vice versa. If we change
it to
(x is odd ) ≡ (x is prime ),
then the expression is false. To see this, note that only some odd numbers are prime: just because a number
is odd does not mean it is always prime although if it is prime it must be odd (apart from the corner case of
x = 2). So the equivalence works in one direction but not the other and hence the expression is false.
Definition 1.6. An expression which is equivalent to true, no matter what values are assigned to any variables, is called
a tautology; an expression which is equivalent to false is called a contradiction.
Definition 1.7. We call two expressions logically equivalent if they are composed of the same variables and have the
same truth value for every possible assignment to those variables. More formally, two expressions x and y are equivalent
iff. x ≡ y can be proved a tautology.
Various subtleties emerge when trying to prove two expressions are logically equivalent, but for our purposes
it suffices to adopt a brute-force approach by a) enumerating the values each variable can take, then b) checking
whether or not the expressions produce identical truth values in all cases. Note that, in practise, this can clearly
become difficult wrt. amount of work required: with n variables there will be 2n possible assignments, which
grows (very) quickly as n grows.
1.1.2 Quantifiers
Definition 1.8. A free variable in a given expression is one which has not yet been assigned a value. Roughly speaking,
a quantifier allows a free variable to take one of many values:
• the universal quantifier “for all x, y is true” is denoted ∀ x [y], while
• the existential quantifier “there exists an x such that y is true” is denoted ∃ x [y].
We say that binding a quantifier to a variable quantifies it; after it has been quantified we say it is bound (rather than
free).
As an aside, quantifiers can be roughly viewed as moving from propositional logic into predicate (or first-order)
logic (with second-order logic then a further extension, e.g., allowing quantification of relations). Put more
simply, however, when we encounter an expression such as
∃ x [y]
we are essentially assigning x all possible values; to make the expression true, just one of these values needs to
make the expression y true. Likewise, when we encounter
∀ x [y]
we are again assigning x all possible values. This time however, to make the expression true, all of them need
to make the expression y true.
Example 1.10. Consider the following
“there exists an x such that x ≡ 0 (mod 2)”
which we can rewrite symbolically as
∃ x [x ≡ 0 (mod 2)].
In this case, x is bound by an ∃ quantifier; we are asserting that for some value of x, it is true that x ≡ 0 (mod 2).
Restating the same thing another way, if just one x means x ≡ 0 (mod 2) is true then the whole (quantified)
expression is true. Clearly x = 2 satisfies this condition, so the expression is true.
Example 1.11. Consider the following
“for all x, x ≡ 0 (mod 2)”
which we can rewrite symbolically as
∀ x [x ≡ 0 (mod 2)].
This is a more general assertion about x, demanding that for all x it is true that x ≡ 0 (mod 2). Taking the
opposite approach to the above, to conclude the whole (quantified) expression is false we need an x such that
x . 0 (mod 2). This is easy, because any odd value of x is good enough, so the expression is false.
1.2.2 Operations
The concatenate operator can be used to join two together sequences. Although most often used on the right-
hand side of an equality (or an assignment), it is also allowed on the left-hand side: in such a case, it performs
“deconcatination” by splitting apart a sequence.
F = ⟨0, 1, 2, 3⟩
G = ⟨4, 5, 6, 7⟩
noting that the result H is an 8-element sequence whose first (resp. last) four elements match F (resp. G).
Likewise, we might write
I ∥ J=H
where now the concatenation operator appears on the left-hand side: this works basically the same way but in
reverse, meaning
I ∥ J = H = ⟨0, 1, 2, 3, 4, 5, 6, 7⟩ = ⟨0, 1, 2, 3⟩ ∥ ⟨4, 5, 6, 7⟩
and so I = F = ⟨0, 1, 2, 3⟩ and J = G = ⟨4, 5, 6, 7⟩. Note that this approach demands the left- and right-hand sides
have the same length, so elements can be organised appropriately.
1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.
For example, again consider the sequence
with the ellipsis representing the sub-sequence ⟨‘c’, ‘d’⟩. In fact, this approach is sometimes required. In B
there was a well defined start and end to the sequence, but in
E = ⟨1, 2, 3, 4, . . .⟩.
the ellipsis represents elements we either do not know, or which do not matter: because there is no end
to the sequence, we cannot necessarily fill in the ellipsis as before. Note that this also means |E| might be
infinite or simply unknown.
2. It can be convenient to apply similar reasoning to the indices used to specify elements. For example,
B0,1,...,3 = B0,1,2,3
= B0 ∥ B1 ∥ B2 ∥ B3
= ⟨‘a’, ‘b’, ‘c’, ‘d’⟩
3. The so-called comprehension (or builder) notation allows generation of a sequence using a rule. Consider
F = ⟨x | 4 ≤ x < 8⟩ = ⟨4, 5, 6, 7⟩
for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate, i.e.,
4 ≤ x < 8) that limits the instances of variables considered when forming the output. Informally, you
might read this example as “all x such that x is between 4 and 7”.
As an aside, this suggests the elements can potentially be other sets. Russell’s paradox, a discovery by Bertrand
Russell in 1901, describes an issue with formal set theory that stems from this fact. In a sense, the paradox is a
rephrasing of the liar paradox seen earlier. Consider A, the set of all sets which do not contain themselves: the
question is, does A contain itself? If it does, it should not be in A by definition but it is; if it does not, it should
be in the set A by definition but it is not.
A = {2, 3, 4, 5, 6, 7, 8}.
In this case, we conclude, for example, that |A| = 7, 2 ∈ A, and 9 < A (i.e., 2 is a member, but 9 is not a member,
of A). Notice that, unlike a sequence, because the order of elements is irrelevant, it makes no sense to refer to
them via an index: Ai implies there is some specific i-th element, but, without a specific order, which element it
refers to is unclear. However, also note the same fact means if we define
B = {8, 7, 6, 5, 4, 3, 2}
1.3.2 Operations
Definition 1.11. A sub-set, say Y, of a set X is such that for every y ∈ Y we have that y ∈ X. This is denoted Y ⊆ X.
Conversely, we can say X is a super-set of Y and write X ⊇ Y.
From this definition, it follows that every set is a valid sub-set and super-set of itself and, therefore, that X = Y iff.
X ⊆ Y and Y ⊆ X. If X , Y we use the terms proper sub-set and proper super-set, and so write Y ⊂ X and X ⊃ Y
respectively.
We say X and Y are disjoint (or mutually exclusive) if X ∩ Y = ∅. Note also that the complement operation can be
rewritten X − Y = X ∩ Y.
Definition 1.13. The union and intersection operations preserve a law of cardinality called the principle of inclusion,
which states
|A ∪ B| = |A| + |B| − |A ∩ B|.
This property has a simple intuition, in that elements in both A and B will be counted twice by |A| and |B|; this is corrected
via the last term (i.e., via |A ∩ B|).
Definition 1.14. The power set of a set X, denoted P(X), is the set of every possible sub-set of X. Note that ∅ is a member
of all power sets.
On first reading, these definitions can seem quite abstract. However, we have another tool at our disposal
which describes what they mean in a more concrete, visual way. This tool is a Venn diagram, named after
mathematician John Venn who invented the concept in 1881. The idea is that sets are represented by regions
drawn inside a frame that implicitly represents the universal set U. By placing the regions inside each other
and overlapping their boundaries, we can describe most set-related concepts very easily.
A B A B
(a) A ∪ B. (b) A ∩ B.
A B A B
(c) A − B. (d) A.
7 9
1 5
3
A B
4
2 6
8 10
Example 1.16. Figure 1.1 includes four Venn diagrams which describe the union, intersection, difference, and
complement operations: there is a shaded region representing members of each resulting set. For example, in
the diagram for A ∪ B the shaded region covers all of the sets A and B: the result contains all elements in either
A or B or both.
1. the union of A and B is A ∪ B = {1, 2, 3, 4, 5, 6}, i.e., elements which are either members of A or B or both;
note that the elements 3 and 4 do not appear twice because said result is a set,
2. the intersection of A and B is A ∩ B = {3, 4}, i.e., elements that are members of both A and B,
3. the difference between A and B is A − B = {1, 2}, i.e., elements that are members of A but not also members
of B, and
4. the complement of A is A = {5, 6, 7, 8, 9, 10}, i.e., elements that are not members of A.
We can also use this example to verify that the principle of inclusion holds: given |A| = 4 and |B| = 4, checking
the above shows |A ∪ B| = 6 and |A ∩ B| = 2 so by the principle of inclusion we have 6 = 4 + 4 − 2.
1.3.3 Products
Definition 1.15. The Cartesian product (or cross product) of n sets, say X0 , X1 , . . . , Xn−1 , is defined as
In the most simple case of n = 2, the Cartesian product X0 × X1 is the set of all possible pairs where the first item in the
pair is a member of X0 and the second item is a member of X1 .
Definition 1.16. The Cartesian product of a set X with itself n times is denoted Xn ; for completeness, we define X0 = ∅
and X1 = X. A special-case of this notation is X∗ , which applies the Kleene star operator: this captures the Cartesian
product of X with itself a finite number of times (i.e., zero or more): a more precise definition is therefore
Example 1.18. Imagine we have the set A = {0, 1}. The Cartesian product of A with itself is
That is, the pairs in A × A (or A2 , if you prefer) represent all possible sequences a) whose length is two, and b)
whose elements are members of A.
• The set ∅, called the null set or empty set, contains no elements: it is empty, meaning |∅| = 0. Note that ∅ is a set
not an element: one cannot write the empty set as {∅} since this is the set with one element, that element being the
empty set itself.
• The contents of the set U, called the universal set, depends on the context. Roughly speaking, it contains every
element from the problem being considered.
It can make sense to avoid enumerating a set completely, which is the approach used above: explicitly including
each element can become laborious, error prone, or simply inconvenient. The examples below show various
short-hands to address this problem:
1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.
For example, we might rewrite the set A as
A = {2, 3, . . . , 7, 8}
with the ellipsis representing the sub-set {4, 5, 6}. In fact, this approach is sometimes required. Imagine we
want to define a set of even integers which are greater than or equal to two: this set has an infinite size,
so we need to defined it as
C = {2, 4, 6, 8, . . .}.
2. The so-called comprehension (or builder) notation allows generation of a set using a rule. Consider
D = {x | f (x)}.
for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate,
i.e., f (x)) that limits the instances of variables considered when forming the output. Informally, you
might read this example as “all x such that f (x) = true”. Using the same idea, we could rewrite previous
examples as
A = {x | 2 ≤ x ≤ 8},
and
C = {x | x > 0 ∧ x ≡ 0 (mod 2)}
Definition 1.18. Several useful sets that relate to numbers can be defined:
• The integers are whole numbers which can be positive or negative and also include zero; this set is denoted by
or alternatively
Z = {0, ±1, ±2, ±3, . . .}.
• The natural numbers are whole numbers which are positive; they are denoted by the set
N = {0, 1, 2, 3, . . .}.
B = {0, 1},
• The rational numbers are those which can be expressed in the form x/y, where x and y are both integers and
termed the numerator and denominator. This set is denoted
Q = {x/y | x ∈ Z ∧ x ∈ Z ∧ y , 0}
where we disallow a value of y = 0 to avoid problems. Clearly the set of rational numbers is a super-set of Z, N,
and B, since, for example, we can write x/1 to convert any x ∈ Z as a member of Q. However, not all numbers are
rational: some are irrational in the sense that it is impossible to find a x and y such that they exactly represent the
required result. Examples include the value of π which is approximated by, but not exactly equal to, 22/7.
Note that, from here on, we use the terms sequence and tuple as an informal way to distinguish between cases
where elements are, respectively, a) of (potentially) homogeneous and heterogeneous type, and/or b) mutable
(i.e., can be altered) and immutable (i.e., cannot be altered).
Example 1.19. Noting the bracketing style used to differentiate it from a sequence, we can define an example
2-tuple or pair as
A = (4, ‘f’)
In this case, the elements A0 = 4 and A1 = ‘f’ clearly have different types: the first is a number and one which
is a character.
1.4.2 Strings
Definition 1.20. An alphabet is a non-empty set of symbols.
Definition 1.21. A string X wrt. some alphabet Σ is a sequence, of finite length, whose elements are members of Σ, i.e.,
X = ⟨X0 , X1 , . . . , Xn−1 ⟩
for some n st. Xi ∈ Σ for 0 ≤ i < n; if n is zero, we term X the empty string and denote it ϵ. It can be useful, and
is common to write elements in in human-readable form termed a string literal: this basically just means writing them
from right-to-left without any associated notation (e.g., brackets or commas).
Example 1.20. If
Σ = {0, 1}
then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:
⟨0, 0⟩ ≡ 00
⟨1, 0⟩ ≡ 01
⟨0, 1⟩ ≡ 10
⟨1, 1⟩ ≡ 11
Example 1.21. If
Σ = {‘a’, ‘b’, . . . , ‘z’}
then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:
⟨‘a’, ‘a’⟩ ≡ aa
⟨‘b’, ‘a’⟩ ≡ ab
..
.
⟨‘a’, ‘b’⟩ ≡ ba
⟨‘b’, ‘b’⟩ ≡ bb
..
.
⟨‘z’, ‘z’⟩ ≡ zz
1.5 Functions
Definition 1.23. If X and Y are sets, a function f from X to Y is a process that maps each element of X to an element of
Y. We write this as
f :X→Y
where X is termed the domain of f and Y is the codomain of f . For an element x ∈ X, which we term the pre-image,
there is only one y = f (x) ∈ Y which is termed the image of x. Finally, the set
{y | y = f (x) ∧ x ∈ X ∧ y ∈ Y}
which is all possible results, is termed the range of f and is always a sub-set of the codomain.
From this definition it might seem as though we can only have functions with one input and one output.
However, we are perfectly entitled to use sets of sets; this means we can use a Cartesian product as the domain.
For example, we can define a function
f :A×A→B
which takes elements from the Cartesian product A × A as input, and produces an element of B as output. So
since the inputs are of the form ⟨x, y⟩ ∈ A × A, f takes two input values “packaged up” as a single pair.
Example 1.22. Consider a function Inv which takes an integer x as input, and produces the rational number
1/x as output:
Z → Q
Inv :
x 7→ 1/x
Note that here we write the function signature, which defines the domain and codomain of Inv, inline with
the definition of the function behaviour. This is simply a short-hand for writing the function signature
Inv : Z → Q
separately. In either case the domain of Inv is Z, because it accepts an integer as input; the codomain is Q,
because it produces a rational number as output. If we take an integer and apply the function to get something
like Inv(2) = 1/2, we have that 1/2 is the image of 2 or conversely 2 is the pre-image of 1/2 under Inv.
This is the maximum function on integers; it takes two integers as input and produces an integer, the maximum
of the inputs, as output. So if we take the pair of integers ⟨2, 4⟩ say, and then apply the function, we get
Max(2, 4) = 4. In this case, the domain of Max is Z × Z and the codomain is Z; the integer 4 is the image of the
pair ⟨2, 4⟩ under Max.
1.5.1 Composition
Definition 1.24. Given two functions f : X → Y and g : Y → Z, the composition of f and g is denoted
g ◦ f : X → Z.
The notation g ◦ f should be read as “apply g to the result of applying f ”. That is, given some input x ∈ X, this
composition is equivalent to applying y = f (x) and then z = g(y) to get the result z ∈ Z. More formally, we have
(g ◦ f )(x) = g( f (x)).
1.5.2 Properties
Definition 1.25. For a given function f , we say that f is
• surjective if the range equals the codomain, i.e., there are no elements in the codomain which do not have a
pre-image in the domain,
• injective if no two elements in the domain have the same image in the range, and
• bijective if the function is both surjective and injective, i.e., every element in the domain is mapped to exactly one
element in the codomain.
Using the examples above, we clearly have that Inv is not surjective but Max is. This follows because we
can construct a rational 2/3 which does not have an integer pre-image under Inv so the function cannot be
surjective. Equally, for any integer x in the range of Max there is always a pair ⟨x, y⟩ in the domain such that
x > y so Max is surjective, in fact there are lots of them since Z is infinite in size! In the same way, we have that
Inv is injective but Max is not. Only one pre-image x maps to the value 1/x in the range under Inv but there
are multiple pairs ⟨x, y⟩ which map to the same image under Max, for example 4 is the image of both ⟨1, 4⟩ and
⟨2, 4⟩ under Max.
so that it maps all elements to themselves. Given two functions f and g defined by f : X → Y and g : Y → X, if g ◦ f is
the identity function on set X and f ◦ g is the identity on set Y, then f is the inverse of g and g is the inverse of f . We
denote this by f = g−1 and g = f −1 . If a function f has an inverse, we hence have f −1 ◦ f = I.
The inverse of a function maps elements from the codomain back into the domain, reversing the original
function. It is easy to see that not all functions have an inverse. In particular, if a function is not injective there
will be more than one potential pre-image for the inverse of any image; this suggests we cannot sensibly map
from the codomain back into the domain. The Inv function is another, more concrete example: some value
such as 1/x do have an inverse, namely x/1, yet others such as 2/3 do not. For example, 3/2 is not an integer,
i.e., not a member of the domain Z, so we cannot map it from the codomain back into the domain. Put another
way, Inv, as we have defined it at least, has no inverse.
which takes an integer x as input and produces the successor (or next) integer x + 1 as output. This function is
bijective, since the codomain and range are the same and no two integers have the same successor. As a result,
the inverse is easy to describe as
Z → Z
Pred :
x 7→ x − 1
which is the predecessor function: it takes an integer x as input and produces x − 1 as output. To see that
Succ−1 = Pred and Succ−1 = Pred note that
(Pred ◦ Succ)(x) = (x + 1) − 1 = x
(Succ ◦ Pred)(x) = (x − 1) + 1 = x
1.5.3 Relations
Definition 1.27. Informally, a binary relation f on a set X is like a propositional function which takes members of the
set as input and “filters” them to produce an output. As a result, for a set X the relation f forms a sub-set of X × X. For
a given set X and a binary relation f , we say f is
• transitive if f (x, y) = true and f (y, z) = true implies f (x, z) = true for all x, y, z ∈ X.
which tests whether two inputs are equal. Using the function we can form a sub-set of A × A called AEqu , for
example, by “filtering out” the pairs (x, y) Equ(x, y) = true to get
3. if Equ(x, y) = true and Equ(y, z) = true then Equ(x, z) = true, so the relation is transitive
which tests whether one input is less than another. Taking the same approach as above, we can form
ALth = {⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨2, 3⟩, ⟨2, 4⟩, ⟨3, 4⟩}
of all pairs (x, y) with x, y ∈ A st. Lth(x, y) = true. Now, for members of A, say x, y, z ∈ A,
2. if Lth(x, y) = true then Lth(y, x) = false, so the relation is not symmetric (it is anti-symmetric), but
3. if Lth(x, y) = true and Lth(y, z) = true then Lth(x, z) = true, so the relation is transitive.
• a set of axioms which dictate what the operators and relations mean and how they work.
Again, you may not know what these axioms are called, but you probably do know how they work. For
example, given x, y, z ∈ Z, you might know a) we can write x + (y + z) = (x + y) + z, i.e., say that addition is
associative, or b) we can write x · 1 = x, i.e., say that the multiplicative identity of x is 1. In reality, we can be
much more general than this: when we discuss “an” algebra, all we really mean is a set of values for which
there is a well defined set of operators, relations and axioms; abstract algebra is basically concerned with sets
of values that are, potentially, not numbers.
• a set of axioms which dictate what the operators and relations mean and how they work.
In the early 1840s, mathematician George Boole put this generality to good use by combining (or, in fact,
unifying) concepts in logic and set theory: the result forms Boolean algebra [1]. Put (rather too) simply, Boole
saw that working with logic a expression is much the same as working with an arithmetic expression, and
reasoned that the axioms of the latter should apply to the former as well. Based on what we already know, for
example, 0 and false and ∅ are all sort of equivalent, as are 1 and true and U; likewise, x ∧ y and x ∩ y are sort
of equivalent, as are x ∨ y and x ∪ y and ¬x and x. More formally, we can see that the identity axiom applies in
same way:
x ∨ false = x x ∧ true = x
x∪∅ = x x∩U = x
x+0 = x x·1 = x
Ironically, this was viewed as somewhat obscure; Boole himself did not necessarily regard logic directly as a
mathematical concept. It was not until 1937 that Claude Shannon, then a student of Electrical Engineering and
Mathematics, saw the potential of using Boolean algebra to represent and manipulate digital information [7].
This insight is fundamentally important, essentially allowing a “link” between theory (i.e., Mathematics) and
practice (i.e., physical circuits that we can build).
Definition 1.29. Putting everything together produces the following definition for Boolean algebra. Consider the set
B = {0, 1} on which there are two binary operators
B×B → B
x = 0 and y=0
0 if
∧:
x = 0 and y=1
0 if
⟨x, y⟩ 7 →
x = 1 and y=0
0 if
x = 1 and y=1
1 if
and
B×B → B
x = 0 and y=0
0 if
∨:
x = 0 and y=1
1 if
⟨x, y⟩ 7 →
x = 1 and y=0
1 if
x = 1 and y=1
1 if
AND, OR and NOT respectively; they are governed the following axioms
equivalence x≡y ≡ (x ⇒ y) ∧ (y ⇒ x)
implication x⇒y ≡ ¬x ∨ y
involution ¬¬x ≡ x
Note that the ∧ and ∨ operations in Boolean algebra behave in a similar way to · and + in a elementary algebra: as such,
they are sometimes referred to as “product” and “sum” operations (and denoted · and + as a result).
Definition 1.30. In line with propositional logic, it is common to add a third binary operator called XOR:
B×B → B
x = 0 and y = 0
0 if
⊕:
x = 0 and y = 1
1 if
⟨x, y⟩ 7 →
x = 1 and y = 0
1 if
x = 1 and y = 1
0 if
More generally, XOR is an example of a derived operator, a name which hints at the fact it is a short-hand derived from
operators we already have. Put another way, because
x ⊕ y ≡ (¬x ∧ y) ∨ (x ∧ ¬y),
XOR can be defined in terms of AND, OR and NOT. Two other examples, which will be useful later, are
x ∧ y ≡ ¬(x ∧ y),
and
x ∨ y ≡ ¬(x ∨ y).
Definition 1.31. A functionally complete (or universal) set of Boolean operators is st. every possible truth table can
be described by combining the constituent members into a Boolean expression. For example, the sets {¬, ∧} and {¬, ∨} are
functionally complete.
In 1921, Emil Post developed [5] a set of necessary and sufficient conditions for such a description to be valid
(i.e., a method to prove whether a given set is or is not functionally complete); where such a set is singleton,
i.e., contains one operator only, that operator is termed a Sheffer function [8] (after Henry Sheffer, who, during
1912, independently rediscovered work of 1880 by Charles Sanders Peirce). For example, the singleton sets
{ ∧ } and { ∨ } are functionally complete, meaning NAND and NOR can both be described as Sheffer functions.
Definition 1.32. Certain operators (and hence axioms) are termed monotone: this means changing an operand either
leaves the result unchanged, or that it always changes the same way as the operand. Conversely, other operators are
termed non-monotone when these conditions do not hold.
x ∧ 1.
In this case, notice that if x = 0 then the result is 0 whereas if x = 1 then the result is 1: this suggests changing x
from 0 to 1 (resp. from 1 to 0) changes the result in the same way.
Definition 1.33. The fact there are AND and OR forms of most axioms hints at a more general underlying principle.
Consider a Boolean expression e: the principle of duality states that the dual expression eD is formed by
Of course e and eD are different expressions, and clearly not equivalent; if we start with some e ≡ f however, then we do
still get eD ≡ f D .
As an example, consider axioms for
1. distribution, e.g., if
e = x ∧ (y ∨ z) ≡ (x ∧ y) ∨ (x ∧ z)
then
eD = x ∨ (y ∧ z) ≡ (x ∨ y) ∧ (x ∨ z)
and
2. identity, e.g., if
e=x∧1≡x
then
eD = x ∨ 0 ≡ x.
Definition 1.34. The de Morgan axiom can be turned into a more general principle. Consider a Boolean expression e: the
principle of complements states that the complement expression ¬e is formed by
1.6.1 Manipulation
Saying we have manipulated an expression just means we have transformed it from one form to another; when
done correctly, this should imply the original and alternative, transformed forms are equivalent. Often this is
presented as a derivation, or sequence of steps which relate to an axiom or assumption (so is assumed valid
by definition).
(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c) = b,
(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c)
= (a ∧ b ∧ c) ∨ (a ∧ b ∧ ¬c) ∨ (¬a ∧ b) (commutativity)
= (a ∧ b) ∧ (c ∨ ¬c) ∨ (¬a ∧ b) (distribution)
= (a ∧ b) ∧1 ∨ (¬a ∧ b) (inverse)
= (a ∧ b) ∨ (¬a ∧ b) (identity)
= b ∧ (a ∨ ¬a) (distribution)
= b ∧1 (inverse)
= b (identity)
Of course we might employ a brute-force approach instead. If we write a truth table for the left- and right-hand
sides, this allows us to compare them: if the outputs match in all rows, we can conclude the left- and right-hand
sides are equivalent. For example,
a b c t0 = a ∧ b ∧ c t1 = ¬a ∧ b t2 = a ∧ b ∧ ¬c t0 ∨ t1 ∨ t2
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 0 1 0 1
0 1 1 0 1 0 1
1 0 0 0 0 0 0
1 0 1 0 0 0 0
1 1 0 0 0 1 1
1 1 1 1 0 0 1
shows the left- and right-hand sides are equivalent, as expected. Of course if there were more variables, we
would need to enumerate all possible values of each one. Our truth table would grow, and, at some point,
the derivation-type approach starts to become more attractive: we achieve the same outcome, but without
brute-force enumeration.
Example 1.28. Another motivation for manipulating a given expression is to produce an alternative with some
goal or metric in mind; a common metric to use is the number of operators each expression uses, i.e., how
simple they are st. with the task then termed simplification, which is one way to judge their evaluation cost.
Consider the exclusive-or operator, i.e., an expression x ⊕ y, which we can write as the more complicated
expression
(y ∧ ¬x) ∨ (x ∧ ¬y)
f : Bn → B.
Note that each of the n inputs can obviously be assigned one of two values, namely 0 or 1, so there are 2n
possible assignments to n inputs. For example, if f were to have n = 1 input, say x, there would be 21 = 2
possible assignments because x can either be 0 or 1. In the same way, for n = 2 inputs, say x and y, there are
22 = 4 possible assignments: we can have
x=0 y=0
x=0 y=1
x=1 y=0
x=1 y=1
This is why a truth table for n inputs will have 2n rows: each row details one assignment to the inputs, and the
associated output.
So, how many functions are there? A function with n-inputs is specified by a truth table with 2n rows; each
row includes an output that is assigned 0 or 1, depending on exactly which function the truth table describes.
So to count how many functions there are, we can just count how many possible assignments there are to the
n
2n outputs. The correct answer is 22 .
(x ∨ y) ∧ ¬(x ∧ y).
One can prove these are equivalent by writing truth tables for them, as we did above. To do so, however, we
need the expressions in the first place: how did we get the alternative from the original one?
The answer is we start with one expression, and (somehow intelligently) apply axioms to move step-by-
step toward the other. For example, to do so more easily, notice that we can manipulate each term in the first
expression whose form is p ∧ ¬q as follows:
(p ∧ ¬q)
= (p ∧ ¬q) ∨ 0 (identity)
= (p ∧ ¬q) ∨ (p ∧ ¬p) (inverse)
= p ∧ (¬p ∨ ¬q) (distribution)
This introduces a new rule that we can make use of; since it was derived from axioms we assume are valid, we
can assume it is valid as well. Using it, we can rewrite the original expression as
(y ∧ ¬x) ∨ (x ∧ ¬y)
= (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y)) (p ∧ ¬q rule above)
= (x ∨ y) ∧ (¬x ∨ ¬y) (distribution)
= (x ∨ y) ∧ ¬(x ∧ y) (de Morgan)
which gives us the alternative we are looking for, noting it requires 4 operators rather than 5.
1.6.2 Functions
Definition 1.35. Given the definition of Boolean algebra, it is perhaps not surprising that a generic n-input, 1-output
Boolean function f can be described as
f : Bn → B.
It is possible to extend this definition so it caters for m-outputs; we write the function signature as
g : Bn → Bm .
n
We know there are 22 Boolean functions with n inputs; this represents a lot of functions as n grows. However,
n 2
for a small number of inputs, say n = 2, 22 = 22 = 24 = 16 functions is fairly manageable. In fact, we can easily
write them all down: if fi denotes the i-th such function, we find
g0 : Bn → B
g1 : Bn → B
..
.
gm−1 : Bn → B
That is, the output of g is just the m individual 1-bit outputs gi (x) concatenated together. This is often termed a vectorial
Boolean function: the inputs and outputs are vectors (or sequences) over the set B rather than single elements of it.
Definition 1.36. A Boolean-valued function (or predicate function)
f : X → {0, 1}
is a function whose output is a Boolean value: note the contrast with a Boolean function, in so far as it places no restriction
on what the input (i.e., the set X) must be.
Example 1.29. Consider a 2-input, 1-output Boolean function, whose signature we can write as
f : B2 → B
st. for r, x, y ∈ B, the input is a pair ⟨x, y⟩ and the output for a given x and y is written r = f (x, y). The function
itself can be specified in two ways. First, as previously, we could enumerate all possible input combinations,
and specify corresponding outputs. This can be written equivalently in the form of an inline function behaviour,
or as a truth table:
x y f (x, y)
if x = 0 and y = 0
0
0 0 0
if x = 0 and y = 1
1
f (x, y) 7→ ≡ 0 1 1
1 if x = 1 and y = 0
1 0 1
if x = 1 and y = 1
0
1 1 0
However, with a large number of inputs, this becomes difficult. As a short-hand, we can therefore specify f as
a Boolean expression instead, e.g.,
f : ⟨x, y⟩ 7→ (¬x ∧ y) ∨ (x ∧ ¬y).
This basically tells us how to compute outputs, rather than listing those outputs explicitly.
Example 1.30. Consider a 2-input, 2-output Boolean function
2
B → B2
x = 0 and y=0
⟨0, 0⟩ if
h:
x = 0 and y=1
⟨1, 0⟩ if
⟨x, y⟩ 7→
x = 1 and y=0
⟨1, 0⟩ if
x = 1 and y=1
⟨0, 1⟩ if
x y h(x, y)
0 0 ⟨0, 0⟩
0 1 ⟨1, 0⟩
1 0 ⟨1, 0⟩
1 1 ⟨0, 1⟩
and 2
B → B
x = 0 and y = 0
0 if
h1 :
x = 0 and y = 1
0 if
⟨x, y⟩ 7→
x = 1 and y = 0
0 if
x = 1 and y = 1
1 if
meaning that
h(x, y) ≡ h0 (x, y) ∥ h1 (x, y).
1. When the expression is written as a sum (i.e., OR) of terms which each comprise the product (i.e., AND) of variables,
e.g.,
(a ∧ b ∧ c) ∨(d ∧ e ∧ f ),
| {z }
minterm
it is said to be in disjunctive normal form or Sum of Products (SoP) form; the terms are called the minterms.
Note that each variable can exist as-is or complemented using NOT, meaning
(¬a ∧ b ∧ c) ∨(d ∧ ¬e ∧ f ),
| {z }
minterm
2. When the expression is written as a product (i.e., AND) of terms which each comprise the sum (i.e., OR) of variables,
e.g.,
(a ∨ b ∨ c) ∧(d ∨ e ∨ f ),
| {z }
maxterm
it is said to be in conjunctive normal form or Product of Sums (PoS) form; the terms are called the maxterms.
As above each variable can exist as-is or complemented using NOT.
Now ¬x is 1 for the first and second rows, rather than the second (as was the case with ¬x ∧ y), so we have
described another function h , g described by
x y h(x, y)
0 0 1
0 1 1
1 0 1
1 1 0
where x ∨ y and ¬x ∨ ¬y are the maxterms of g. By manipulating the expressions, we can prove that gSoP and
gPoS are just two different ways to write the same function, i.e., g. Recall that for p and q
1.7 Signals
Definition 1.38. In general, a signal can be described as a descriptive function (abstractly), or a physical quantity
(concretely), that varies in time or space so as to represent and/or communicate (i.e., convey) information. We say that
• a discrete-time signal is valid for a discrete (so finite) range of time indices, e.g., t ∈ Z,
• a continuous-time signal is valid for a continuous (so infinite) range of time indices, e.g., t ∈ R,
• a discrete-value signal has a value from a discrete (so finite) range, e.g., f (t) ∈ Z, and
• a continuous-value signal has a value from a continuous (so infinite) range, e.g., f (t) ∈ R.
Definition 1.39. The term analogue signal is a synonym of continuous-value signal: a physical quantity that varies in
time is typically used to represent (that is, it is analagous to) some abstract variable.
Definition 1.40. Strictly speaking, digital signal is a synonym of discrete-value signal: it will have a digital (i.e.,
discrete or exact) value. This terminology is often overloaded, however, and taken to mean a signal whose value is either
0 or 1 (cf. logic signal).
The transition of a digital signal from 0 to 1 (resp. 1 to 0) is called a positive (resp. negative) edge; we often say it
has toggled from 0 to 1 (resp. 1 to 0). During any time the signal has a value of 1 (resp. 0), we say it is at a positive
(resp. negative) level (and use the term pulse as a synonym for positive level, i.e., the period between a positive and
negative edge).
Definition 1.41. It is common to describe a signal by plotting it as a waveform: the y-axis represents the value of the
signal as time varies over time as represented by the x-axis.
Note that it is common, though incorrect, to describe discrete-time signals by using a continuous plot; connecting
discrete points implies a formally incorrect description (i.e., it gives the impression of a continuous-time
signal). Doing so typically stems from either a) the fact said discrete-time signal is derived from an associated
continuous-time signal (e.g., the latter has been quantised wrt. time, by sampling it at discrete time indices),
or b) aesthetics, in the sense it is easier to see when printed.
1.8 Representations
God made the integers; all the rest is the work of man.
– Kronecker
Definition 1.43. An n-bit bit-sequence (or binary sequence) is a member of the set Bn , i.e., it is an n-tuple of bits.
Much like other sequences, we use Xi to denote the i-th bit of a binary sequence X and |X| = n to denote the number of
bits in X.
Definition 1.44. Instead of writing out X ∈ Bn symbolically, i.e., writing ⟨X0 , X1 , . . . , Xn−1 ⟩, we sometimes prefer to list
the bits within a bit-literal (or bit-string, wrt. an implicit alphabet Σ = {0, 1}). For example, consider the following
bit-sequence
X = ⟨1, 1, 0, 1, 1, 1, 1⟩
st. |X| = 7, which can be written as the bit-literal
X = 1111011.
The question is however, what does a bit-sequence mean: what does it represent, other than just an (unstruc-
tured) sequence of bits? The answer is they can represent anything we decide they do; there is just one key
concept, namely
X̂ 7→ X
the representation
maps to
the value
of X
of X
That is, all we need is a) a representation and mapping specified concretely (i.e., written down, vs. reasoned
about abstractly), and b) a mapping that means the right thing wrt. values, plus is ideally consistent in both
directions (e.g., does not change based on the context, and is injective st. a single representation cannot be
interpreted ambiguously). Notice the (subtle) annotation on the left-hand side of the mapping: X̂ is intended to
highlight this is a representation of some X, whose value therefore depends on the mapping used. Put another
way, this suggests different mappings may legitimately map the same X̂ to different values by interpreting the
bit-sequence differently. This is, essentially, what means we can represent such a rich set of data (e.g., the pixels
in an image) using only a bit (or sequence thereof) as a starting point.
1.8.1.1 Properties
Definition 1.45. Following the idea of vectorial Boolean function, given an n-element bit-sequence X, and an m-element
bit-sequence Y we can clearly
Definition 1.46. Given two n-bit sequences X and Y, we can define some important properties named after Richard
Hamming, a researcher at Bell Labs:
The term endianness stems from a technical article [2], written in the 1980s by Danny Cohen, using Gulliver’s
Travels as an inspiration/analogy: an argument over whether cracking the big- or small-end of a soft-boiled
egg is proper in the former, inspired terminology wrt. arguments over byte ordering in the latter. It does a
brilliant job of surveying the significant impact of what is, at face value, a fairly trivial choice.
• The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of times Xi = 1. This
can be expressed as
n−1
X
HW(X) = Xi .
i=0
• The Hamming distance between X and Y is the number of bits in X that differ from the corresponding bit in Y,
i.e., the number of times Xi , Yi . This can be expressed as
n−1
X
HD(X, Y) = Xi ⊕ Yi .
i=0
Example 1.32. For example, given A = ⟨1, 0, 0, 1⟩ and B = ⟨0, 1, 1, 1⟩ we find that
n−1
X
HW(A) = Ai = 1 + 0 + 0 + 1 = 2
i=0
and
n−1
X
HD(A, B) = Ai ⊕ Bi = (1 ⊕ 0) + (0 ⊕ 1) + (0 ⊕ 1) + (1 ⊕ 1) = 1 + 1 + 1 + 0 = 3
i=0
st. two bits in A equal 1, and three bits differ between A and B.
Definition 1.47. Imagine that X ∥ p appends a parity bit p to some n-bit sequence X. Doing so implements a form of
error detecting code: having defined
Par+ (X) =
Pi<n Li<n
i=0 Xi (mod 2) = X
i=0 i
Pi<n Li<n
Par (X) = ¬ ( i=0 Xi
−
(mod 2) ) = ¬( X )
i=0 i
we say that
• setting p = Par+ (X) implements an even-parity code, in the sense that X ∥ p will have an even number of i st.
Xi = 1, whereas
• setting p = Par− (X) implements an odd-parity code, in the sense that X ∥ p will have an odd number of i st.
Xi = 1.
If/when the type is irrelevant, we drop the super-script and simply write Par(X) instead.
Example 1.33. For example, given A = ⟨1, 0, 0, 1⟩ and B = ⟨0, 1, 1, 1⟩ we find that
Par+ (A)
Ln−1
= Ai = 1⊕0⊕0⊕1 = 0
Li=0
n−1
Par− (A) = ¬( Ai ) = ¬(1⊕0⊕0⊕1) = 1
Li=0
Par+ (B)
n−1
= Bi = 0⊕1⊕1⊕1 = 1
Li=0
n−1
Par− (B) = ¬( B ) =
i=0 i
¬(0⊕1⊕1⊕1) = 0
1.8.1.2 Ordering
There is, by design, no “structure” to a bit-literal. This can be problematic if, for example, we need a way
to make sure the order of bits in the bit-literal is clear wrt. the corresponding bit-sequence. The same issues
appear whenever describing a large(r) quantity in terms of small(er) parts, but, focusing on bits, we can describe
endianness as follows:
Definition 1.48. A given literal, say
X = 1111011,
can be interpreted in two ways:
1. A little-endian ordering is where we read bits in a literal from right-to-left, i.e.,
XLE = ⟨X0 , X1 , X2 , X3 , X4 , X5 , X6 ⟩ = ⟨1, 1, 0, 1, 1, 1, 1⟩,
where
• the Least-Significant Bit (LSB) is the right-most in the literal (i.e., X0 ), and
• the Most-Significant Bit (MSB) is the left-most in the literal (i.e., Xn−1 = X6 ).
• the Least-Significant Bit (LSB) is the left-most in the literal (i.e., Xn−1 = X6 ), and
• the Most-Significant Bit (MSB) is the right-most in the literal (i.e., X0 ).
Unless specified, from here on it is (fairly) safe to assume that a little-endian convention is used. Keep in mind
that having selected an endianness convention, which acts as a rule for conversion, there is no real distinction
between a bit-sequence and a bit-literal: we can convert between them in either little-endian or bit-endian
cases.
1.8.1.3 Grouping
Definition 1.49. Some bit-sequences are given special names depending on their length. Given a word size w (e.g., the
natural size as dictated by a given processor), we can defined
bit ≡ 1-bit
nybble ≡ 4-bit
byte ≡ 8-bit
half-word ≡ (w/2)-bit
word ≡ w-bit
double-word ≡ (w · 2)-bit
quad-word ≡ (w · 4)-bit
but note that standards in particular often use the term octet as a synonym for byte (st. an octet string is therefore a
byte-sequence): although less natural, we follow this terminology where it seems of value to match associated literature.
Example 1.34. Given a bit-sequence
B = ⟨1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0⟩
it can be attractive to group the bits into short(er) sub-sequences. For example, we could rewrite the sequence
as either
C = ⟨⟨1, 1, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨1, 0, 1, 0⟩⟩
= ⟨1, 1, 0, 0⟩ ∥ ⟨0, 0, 0, 0⟩ ∥ ⟨1, 0, 0, 0⟩ ∥ ⟨1, 0, 1, 0⟩
= ⟨1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0⟩
= B
st. C has four elements (each of which is a sub-sequence of four bits from B), while D has two elements (each of
which is a sub-sequence of eight bits from B). It is important to see that we have not altered the bits themselves,
just how they are grouped together: we can easily “flatten out” the sub-sequences and reconstruct the original
sequence B.
Example 1.35. Consider four nibbles in C, i.e., the four 4-bit sub-sequences
C0 = ⟨1, 1, 0, 0⟩
C2 = ⟨0, 0, 0, 0⟩
C3 = ⟨1, 0, 0, 0⟩
C4 = ⟨1, 0, 1, 0⟩
If we want to reconstruct C itself, we need to know which order to put the sub-sequences in: via a little-endian
convention we get
CLE = ⟨C0 , C1 , C2 , C3 ⟩ = ⟨⟨1, 1, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨1, 0, 1, 0⟩⟩
whereas via a big-endian convention we get
CBE = ⟨C3 , C2 , C1 , C0 ⟩ = ⟨⟨1, 0, 1, 0⟩, ⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨1, 1, 0, 0⟩⟩.
1.8.1.4 Units
There is a standard notation for measuring multiplicities of bits and bytes: a suffix specifies the quantity (‘b’ or
“bit” for bits, ‘B’ for bytes), and a prefix specifies a multiplier. Although the notation remains consistent, some
ambiguities about how to interpret prefixes complicate matters.
The International System of Units (SI) works with decimal, base-10 prefixes so, for example, a kilobit means
103 = 1000 bits. As a result, we find that
1 kilobit (kbit) = 103 bits = 1000 bits
1 megabit (Mbit) = 106 bits = 1 000 000 bits
1 gigabit (Gbit) = 109 bits = 1 000 000 000 bits
1 terabit (Tbit) = 1012 bits = 1 000 000 000 000 bits
Given some w-bit word, the shift-and-mask paradigm allows us to extract (or isolate) individual or contiguous
sequences of bits. Understanding this is crucial in many areas, and often used in lower-level C programs; this,
and related techniques, it is often termed “bit twiddling” or “bit bashing”.
• Imagine we want to set the i-th bit of some x, i.e., xi , to 1. This can be achieved by computing
x ∨ (1 ≪ i)
x ∨ ( 0001(2) ≪ i )
0011(2) ∨ ( 0001(2) ≪ 2 )
0011(2) ∨ 0100(2)
0111(2)
• Imagine we want to set the i-th bit of some x, i.e., xi , to 0. This can be achieved by computing
x ∧ ¬(1 ≪ i)
x ∧ ¬ ( 0001(2) ≪ i )
0111(2) ∧ ¬ ( 0001(2) ≪ 2 )
0111(2) ∧ ¬ ( 0100(2) )
0111(2) ∧ 1011(2)
0011(2)
In both cases, the idea is to first create an appropriate mask then combine it with x to get x′ ; in both cases we
do no actual arithmetic, only Boolean-style operations.
Imagine we want to extract an m-bit sub-word (i.e., m contiguous bits) starting at the i-th bit of some x. This
can be achieved by computing
(x ≫ i) ∧ ((1 ≪ m) − 1)
The computation is a little more complicated, but basically the same principles apply: first we create an
appropriate mask (the right-hand term) and combine it with x (the left-hand term). For example, if x = 1011(2)
and m = 2:
( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 0 ) ∧ ( ( ≪ 2 ) − 1 )
( 1011(2) ) ∧ ( ( 0100(2) ) − 1 )
( 1011(2) ) ∧ ( 0011(2) )
0011(2)
( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 1 ) ∧ ( ( 1 ≪ 2 ) − 1 )
( 0101(2) ) ∧ ( ( 0100(2) ) − 1 )
( 0101(2) ) ∧ ( 0011(2) )
0001(2)
( x ≫ i ) ∧ ( ( 1 ≪ m ) − 1 )
( 1011(2) ≫ 2 ) ∧ ( ( 1 ≪ 2 ) − 1 )
( 0010(2) ) ∧ ( ( 0100(2) ) − 1 )
( 0010(2) ) ∧ ( 0011(2) )
0010(2)
Notice that the (0001(2) ≪ m) − 1 term is basically giving us a way to create a value y where ym−1...0 = 1, i.e.,
whose 0-th through to (m − 1)-th bits are 1. If we know m ahead of time, we can clearly simplify this by
providing y directly rather than computing it.
As a special case of extracting an m-element sub-sequence, when we set m = 1 we extract the i-th bit of x alone.
This is a useful and common operation: following the above, it is achieved by computing
(x ≫ i) ∧ 1,
i.e., replacing the general-purpose mask with the special-purpose constant (1 ≪ 1) − 1 = 2 − 1 = 1. For example:
( x ≫ i ) ∧ 1
( 0011(2) ≫ 2 ) ∧ 1
( 0000(2) ) ∧ 1
0000(2)
meaning x2 = 0.
• If x = 0011(2) and i = 0 then we compute
( x ≫ i ) ∧ 1
( 0011(2) ≫ 0 ) ∧ 1
( 0011(2) ) ∧ 1
0001(2)
meaning x0 = 1.
 = ⟨A0 , A1 , A2 ⟩
= ⟨3, 2, 1⟩
given that 3 is the first digit of 123, 2 is the second digit and so on; we are reading digits in the sequence from
left-to-right vs. right-to-left in the literal, but otherwise they capture the same meaning.
But how do we know what either 123 or  means? Informally at least, writing 123 intuitively means the
value “one hundred and twenty three” which might be rephrased as “one hundred, two tens and three units”.
The latter case suggests how to add formalism to this intuition: we are just weighting each digit 1, 2 and 3 by
some amount then adding everything up. For example, per the above we are computing the value via
 7→ 123 = 1 · 100 + 2 · 10 + 3 · 1.
A0 · 100 = 3 · 100 = 3· 1 = 3
A1 · 101 = 2 · 101 = 2 · 10 = 20
A2 · 102 = 1 · 102 = 1 · 100 = 100
to make a total of 123 as expected. Put another way, the sequence A represents the value “one hundred and
twenty three”. Two two facts start to emerge, namely
1. each digit is being weighted by a power of some base (or radix), which in this case is 10, and
2. the exponent in said weight is related to the position of the corresponding digit: the i-th digit is weighted
by 10i .
A neat outcome of identifying the base as some sort of parameter is that we can consider choices other than
b = 10. Generalising the example somewhat provides the following definition:
Definition 1.50. A base-b (or radix-b) positional number system uses digits from a digit set X = {0, 1, . . . , b − 1}.
A number x is represented using n digits in total, m of which form the fractional part, i.e.,
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
n−m−1
xi · bi
P
7→ ±
i=−m
noting that where we previously had 10 we now have 2, st. we add up the terms
B0 · 20 = 1 · 20 = 1· 1 = 1
B1 · 21 = 1 · 21 = 1· 2 = 2
B2 · 22 = 0 · 22 = 0· 4 = 0
B3 · 23 = 1 · 23 = 1· 8 = 8
B4 · 24 = 1 · 24 = 1 · 16 = 16
B5 · 25 = 1 · 25 = 1 · 32 = 32
B6 · 26 = 1 · 26 = 1 · 64 = 64
B7 · 27 = 0 · 27 = 0 · 128 = 0
to obtain a total of 123 as before.
1.8.2.1 Digits
Describing elements in the digit set {0, 1, . . . , b − 1}, for whatever b, using a single digit can be fairly important;
using multiple digits, for example, can start to introduce some ambiguity wrt. how we interpret a literal. In
particular, once we select a b > 10 we hit a problem: we run out of single Roman-style digits that we can write
down.
Example 1.37. Consider the same example as above where we have the literal 123: we know that if b = 10 and
 = ⟨3, 2, 1⟩ then
 7→ 123 = 1 · 102 + 2 · 101 + 3 · 100 .
However, if b = 16, although we know
123 = 7 · 161 + 11 · 160
we have no single-digit way to write 11. To solve this problem, we use the symbols (or in fact characters) A . . . F
to represent 10 . . . 15. Otherwise everything works the same way, meaning for example, that if B̂ = ⟨B, 7⟩ then
B̂ 7→ 123 = 7 · 161 + B · 160
= 7 · 161 + 11 · 160
It is useful to remember is that octal and hexadecimal can be viewed as just a short-hand for binary: each octal
or hexadecimal digit represents exactly three or four binary digits respectively. This can make it much easier to
write and remember long sequences of binary digits. As an example, consider hexadecimal. Each hexadecimal
digit xi ∈ {0, 1, . . . , 15} can be represented using four bits (since there are 24 = 16 possible combinations), so can
be viewed instead as those four binary digits.
Using a concrete example, the following translation steps
47
Figure 1.3: Number lines illustrating the mapping of 8-bit sequences to integer values using three different representations.
© Daniel Page ⟨dan@phoo.org⟩
1.8.2.2 Notation
Amazingly there are not many jokes about Computer Science, but here are two (bad, comically speaking)
examples:
1. There are only 10 types of people in the world: those who understand binary, and those who do not.
2. Why did the Computer Scientist always confuse Halloween and Christmas? Because 31 Oct equals 25
Dec.
Whether or not you laughed at them, both jokes stem from ambiguity in the representation of numbers: there
is an ambiguity between “ten” written in decimal and binary in the former, and “twenty five” written in octal
and decimal in the latter.
Look at the first joke: it is basically saying that the literal 10 can be interpreted as binary or decimal, i.e., as
1 · 2 + 0 · 1 = 2 in binary and 1 · 10 + 0 · 1 = 10 in decimal. So the two types of people are those who understand
that 2 can be represented by 10, and those that do not. Now look at the second joke: this is a play on words in
that “Oct” can mean “October” but also “octal” or base-8 and “Dec” can mean “December” but also “decimal”
or base-10. With this in mind, we see that
3 · 8 + 1 · 1 = 25 = 2 · 10 + 5 · 1.
i.e., 31 Oct equals 25 Dec in the sense that 31 in base-8 equals 25 in base-10.
Put in context, we saw above that the decimal sequence  and decimal number 123 are basically the same
iff. we interpret A in the right way. The if in that statement is a problem, in the sense there is ambiguity: if we
follow the same reasoning as in the jokes, how do we know what base the literal 01111011 is written down in? It
could mean the decimal number 123 (i.e., “one hundred and twenty three”) if we interpret it using b = 2, or the
decimal number 01111011 (i.e., “one million, one hundred and eleven thousand and eleven”) if we interpret it
using b = 10; clearly that is quite a difference!
To clear up this ambiguity, where necessary we write literal numbers and representations with the base
appended to them. For example, we write 123(10) to show that 123 should be interpreted in base-10, or
01111011(2) to shows that 01111011 should be interpreted in base-2. We can now be clear, for example, that
123(10) = 01111011(2) ; using this notation, the two jokes become even less amusing when written simply as
10(2) = 2(10) and 31(8) = 25(10) .
Example 1.38. Consider a case where m , 0, which allows negative values of i and therefore negative powers
of the base: whereas m = 0 implies no fraction part to the resulting value, because 10−1 = 1/10 = 0.1 and
10−2 = 1/100 = 0.01, for example, when m , 0 we can write down numbers which do have fractional parts.
Consider that
123.3125(10) = 1 · 102 + 2 · 101 + 3 · 100 + 3 · 10−1 + 1 · 10−2 + 2 · 10−3 + 5 · 10−4
given we have n = 7 digits, m = 4 of which capture the fractional part. Of course since the definition is the
same, we can do the same thing using a different base, e.g.,
123.3125(10) = 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 + 0 · 2−1 + 1 · 2−2 + 0 · 2−3 + 1 · 2−4
= 1111011.0101(2) .
The decimal point in the former has the same meaning (i.e., as a separator between fractional and non-fractional
parts) when translated into a binary point in the latter; more generally we call this a fractional point where
the base is irrelevant.
Example 1.39. We mentioned previously that certain numbers are irrational: the definition of Q suggested that,
in such cases, we could not find an x and y such that x/y provided the result required.
In fact, the base such numbers are represented in has some impact on their (ir)rationality. Informally, we
already know that when we write 1/3 as a decimal number we have 0.3333 . . .; the ellipsis mean the sequence
recurs infinitely. 1/10, however, is rational when written as 0.1 in decimal, but irrational when written in
binary; the closest approximation is 0.000110011 . . ..
In describing the C data type int as implying an associated set (or range)
of values, we simplified what is, in reality, a somewhat complicated issue. In short, the C language specifies the
above much more abstractly; the C compiler and platform (i.e., processor) make the details concrete, allowing us
to reason as we did above.
It is worth looking at this issue in more detail: on one hand it is not often covered elsewhere, but, on
the other hand, will help avoid making assumptions that may be (subtly, and infrequently) incorrect. Other
descriptions exist, but we follow that in [9] due to the clarity of presentation. Considering integer data types
only, i.e., for each type
T ∈ {char, short, int, long, long long},
the C language defines two abstract properties:
and allows us to destinguish between unsigned int and int, for example, and
2. the rank of a type T, denoted R(T), is an abstract measure of size (and hence range); rather than a numerical
value, types are simply ordered st.
R(char) < R(short) < R(int) < R(long) < R(long long).
The platform provides concrete detail, in particular assigning a width (or size) of
W(T) ∈ {1, 2, 4, 8}
bytes to each type; this is termed the data model. Based on use of two’s-complement, we can derive the range
of each type as
which matches our own definitions. Although the platform can select W(T) for each T, a crucial restriction
applies: for any types T1 and T2 where R(T1 ) < R(T2 ), the property W(T1 ) ≤ W(T2 ) must hold. Put another
way, we can be sure the width of int is less than or equal to that of long, even if those widths are not known; it
cannot be the other way around, for example, st. long is wider than int.
So we assumed W(int) = 4 in our description, but this is not the only possibilty. [9, Table 1] surveys various
data models, noting, for example, that
LP32 ILP32
W(char) 1 1
W(short) 2 2
W(int) 2 4
W(long) 4 4
W(long long) 8 8
are valid posibilities: if we assume W(int) = 4 in a program compiled and executed on a platform associated
with the left-hand data model problems may well occur, whereas the right-hand data model matches.
This is meant to illustrate, for example, that the int data type, which one might say as “an integer”, is in fact an
approximation of the integers (i.e., of Z): the range of values is finite. That said, however, why use this particular
approximation?
We can answer this question by investigating concrete representations used in C, basing our discussion on
positional number systems via use of bit-sequences (of fixed length n) to encode members of Z. Note that where
appropriate, we use colour to highlight parts of each representation that determine the sign and magnitude
(or size) of the associated value; since we are representing integers, we implicitly set m = 0 within the general
definition of a positional number system (since there is, by definition, no fractional part in an integer).
n−1
xi · 2i
P
7→
i=0
Binary Coded Decimal (BCD) BCD is an alternative method of representing unsigned integers: rather than
representing the number itself as a bit-sequence, the idea is to write it in decimal and encode each decimal digit
independently. The overall representation is the concatenation of bit-sequences which result from encoding
the decimal digits:
Definition 1.53. Consider the function
{0, 1, . . . , 9} → B4
d=0
⟨0, 0, 0, 0⟩ if
d=1
⟨1, 0, 0, 0⟩ if
d=2
⟨0, 1, 0, 0⟩ if
d=3
⟨1, 1, 0, 0⟩ if
f :
d=4
d
⟨0, 0, 1, 0⟩ if
7 →
⟨1, 0, 1, 0⟩ if d=5
d=6
⟨0, 1, 1, 0⟩ if
d=7
⟨1, 1, 1, 0⟩ if
d=8
⟨0, 0, 0, 1⟩ if
d=9
⟨1, 0, 0, 1⟩ if
which encodes a decimal digit d into a corresponding 4-bit sequence; this function corresponds to the Simple Binary Coded
Decimal (SBCD), or BCD 8421, standard. Given the decimal number
Example 1.41. If n = 8 for example, we can represent values in the range +0 . . . + 99999999; selected cases are
as follows:
* +
⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩,
10011001100110011001100110011001 7→
⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩, ⟨1, 0, 0, 1⟩
7→ ⟨9, 9, 9, 9, 9, 9, 9, 9⟩(10)
= +99999999(10)
..
.
* +
⟨1, 1, 0, 0⟩, ⟨0, 1, 0, 0⟩, ⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000100100011 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7→ ⟨3, 2, 1, 0, 0, 0, 0, 0⟩(10)
= +123(10)
..
.
* +
⟨1, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000000000001 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7→ ⟨1, 0, 0, 0, 0, 0, 0, 0⟩(10)
= +1(10)
* +
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩,
00000000000000000000000000000000 7→
⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩, ⟨0, 0, 0, 0⟩
7 → ⟨0, 0, 0, 0, 0, 0, 0, 0⟩(10)
= +0(10)
Sign-magnitude
Definition 1.54. A signed integer can be represented in n bits by using the sign-magnitude approach; 1 bit is reserved
for the sign (0 means positive, 1 means negative) and n − 1 for the magnitude. That is, we have
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
n−2
(−1)xn−1 · xi · 2i
P
7→
i=0
−2n−1 + 1 ≤ x ≤ +2n−1 − 1.
Note that there are two representations of zero (i.e., +0 and −0).
Example 1.42. If n = 8, for example, we can represent values in the range −127 . . . + 127; selected cases are as
follows:
One’s-complement
Definition 1.55. The one’s-complement method represents a signed integer in n bits by assigning the complement of
x (i.e., ¬x) the value −x. That is, given
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
n−2
xi · 2i
P
7→
i=0
for xi ∈ {0, 1}, then the encoding of ¬x is assumed to represent −x. This means we have
−2n−1 − 1 ≤ x ≤ +2n−1 − 1.
Note that there are two representations of zero (i.e., +0 and −0).
Example 1.43. If n = 8 for example, we can represent values in the range −127 . . . + 127; selected cases are as
follows:
01111111 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)
.. ..
. .
01111011 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)
.. ..
. .
00000001 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)
00000000 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)
11111111 7→ −0(10)
11111110 7→ −1(10)
..
.
10000100 7→ −123(10)
..
.
10000000 7→ −127(10)
Two’s-complement
Definition 1.56. A signed integer can be represented in n bits by using the two’s-complement approach. The basic
idea is to weight the (n − 1)-th bit using −2n−1 rather than +2n−1 , and all other bits as normal. That is, we have
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
n−2
xn−1 · −2n−1 + xi · 2i
P
7→
i=0
Example 1.44. If n = 8 for example, we can represent values in the range −128 . . . + 127; selected cases are as
follows:
Given that two’s-complement is the de facto choice for signed integer representation, it warrants some further
explanation: it is important to grasp how the representation works.
One approach is to consider Figure 1.3c, which is a a number line of values in two’s-complement repre-
sentation. Offset a little to the left, it shows that 0 (bottom) is represented by the literal 00000000 (which is, of
course, equivalent to a bit-sequence ⟨0, 0, 0, 0, 0, 0, 0, 0⟩); reading from the point toward the right, shows unsigned
integers up to 255 could be represented use their natural binary representation. Sometimes you see a number
line wrapped into a circle, to emphasise the fact that the values it captures will wraps-around: when we reach
255 (or 11111111), and give we have n = 8 bits here, the next value is 0 (or 00000000) because the representation
wraps-around. Toward the left of 0, it starts to be clear that two’s-complement is basically “moving” the upper
or right-hand range of what would be 128 to 255: by using a large, negative weight for the (n − 1)-th bit it moves
the (positive) range 128 to 255 into the (negative) range −128 to −1. This movement is direct, in the sense the
order of the range is preserved; this contrasts with sign-magnitude, for example, which, per the same idea in
Figure 1.3a, reverses the range as it is moved. This difference stems from the fact that two’s-complement fits the
concept of a positional number system naturally, whereas the same cannot be said of sign-magnitude where
the sign bit is sort of a special case (i.e„ weighted abnormally). However, subtle this point is, it is important.
More specifically, the fact that
2. as we step left or right through representations, they remain in-order wrt. the values they represent
means we can apply the same approach to arithmetic using signed integers represented using two’s-complement
as with the simpler case of unsigned integers; this is not true of sign-magnitude, for example, arguably making
it less attractive as a result.
Another approach is via an appeal to intuition: if we have x and add −x, i.e., compute x + (−x), then we
intuitively expect to produce 0 as a result. The two’s-complement representation satisfies this: we can see from
the above that
x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = −2(10) 7→ 1 1 1 1 1 1 1 0 +
c = 1 1 1 1 1 1 1 0 0
r = 0(10) 7→ 0 0 0 0 0 0 0 0
meaning that if we ignore the carry-out (which cannot be captured: we have too few bits), we get the result
expected. As a by-product, this yields a useful fact:
Definition 1.57. The term two’s-complement can be used as a noun (i.e., to describe the representation) or a verb (i.e., to
describe an operation): the latter case defines “taking the two’s-complement of x” to mean negating x and thus computing
the representation of −x. To do so, we compute −x 7→ ¬x + 1.
To see why this is true, first note that for an x represented in two’s-complement, adding x to ¬x produces −1 as
a result. For example:
x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = ¬2(10) 7→ 1 1 1 1 1 1 0 1 +
c = 0 0 0 0 0 0 0 0 0
r = −1(10) 7→ 1 1 1 1 1 1 1 1
This should make sense, in that each corresponding i-th bit in x and ¬x will be the opposite of each other: either
one will be 0 and the other is 1 or vice versa, st. their sum will always be 1. The result is off-by-one, however,
in the sense we produce −1 rather than the expected 0. So, if we compute
x = 2(10) 7→ 0 0 0 0 0 0 1 0
y = ¬2(10) + 1 7 → 1 1 1 1 1 1 1 0 +
c = 1 1 1 1 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0 0 0 0 0
instead then we are back to the same example as above: the result is 0, so it −x 7→ ¬x + 1.
we can apply roughly the same approach to represent R, the set of real numbers. Since we know a positional
number system can accommodate numbers with a fractional part (via an m > 0), the fact we can consider
representations for R should not be surprising. However, the approach we use doe differs somewhat: it make
senses to ignore the previous notation etc. and start afresh with another underlying idea. That is, we will
approximate some x by taking a base-b integer m (signed or otherwise) and scaling it, i.e., have
x̂ 7→ m · be ≃ x
for some e. Two more concrete representations based on this idea can be described as follows:
1. if e is fixed (i.e., does not vary between different x and hence m) we have a fixed-point representation,
whereas
2. if e is not fixed (i.e., can vary between different x and hence m) we have a floating-point representation.
1.8.4.1 Fixed-point
Definition 1.58. The goal of a fixed-point representation is to allow expression of real numbers whose form is
x = m · b−q
or, equivalently,
1
x=m· ,
bq
where
• q ∈ N is the exponent.
Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa. Since the exponent is
fixed, this essentially means interpreting m, and hence x, as two components, i.e.,
Figure 1.4: A visualisation of the impact of increasing q, the number of fractional digits, in a fixed-point representation;
the result is increased detail within the rendering of a Mandelbrot fractal.
where n = p + q; we use the notation Qp,q to denote this. Abusing notation a little, we have that
x̂ = ⟨x0 , x1 , . . . , xn−1 ⟩
1
7→ m · bq
n−1
1
mi · bi ·
P
7→ bq
i=0
n−1
mi · bi−q
P
7→
i=0
Definition 1.59. There are some important quantities relating to a fixed-point representation Qp,q :
• The resolution is the smallest difference between representable values, i.e., the value 1
bq .
• The precision is essentially n, the number of digits in the representation; in a sense this (in combination with the
resolution) governs the range of values that can be represented.
Example 1.45. This might seem confusing, but the basic idea is as above. That is, given an integer x, we just
shift the fractional point by a fixed amount to determine the associated value. Imagine we set b = 10, n = 7,
q = 4 and write the literal
x̂ = 1233125.
Interpreting x in the fixed-point representation specified by n and q means there are q = 4 fractional digits, i.e.,
3125, and p = n − q = 7 − 4 = 3 integral digits, i.e., 123. Therefore
1 1
= 1233125 · 4 = 123.3125(10) ,
x̂ 7→ x ·
q
b 10
meaning we have simply taken x and shifted the fractional point by q = 4 digits. Put yet another way, we
again alter the weight associated with each digit: taken as an integer the i-th digit will be weighted by bi , but
interpreting the same digit as above means weighting it by bi−q .
Example 1.46. There is a neat way to visualise the intuitive effect of adding more precision to (i.e., increasing
the number of fractional digits in) a fixed-point representation. Figure 1.4 includes different renderings of the
Mandelbrot fractal, named after mathematician Benoît Mandelbrot. Each rendering uses a 32-bit integer, i.e.,
n = 32, to specify a fixed-point representation but with different values of q, i.e., different numbers of fractional
digits. Quite clearly, as we increase q there is more detail. Without expanding on the detail, the fractal is
rendered by sampling points on a circle of radius 2 centred at the point (0, 0). With no fractional digits, we
can sample points (x, y) with x, y ∈ Z st. x, y ∈ {−2, −1, 0, +1, +2}; this is restrictive it the sense it allows only
a few points. However, by adding more fractional digits we can sample many more intermediate points, e.g.,
(0.5, 0.5) and so on, meaning more detail in the rendering.
Example 1.47. Although the definition is general enough to accommodate any choice of b, it may not be
surprising that b = 2 is attractive: this allows us to reuse what we know about representing integers using
bit-sequences, and apply it to representing real numbers using a fixed-point representation.
• We can describe an unsigned fixed-point representation based on an unsigned integer; imagine we select
n = 8 with p = 5 and q = 3, denoted QU
5,3
. This means
x̂ = ⟨x0 , x1 , . . . , x7 ⟩
Pp+q−1 1
7→ ( i=0
xi · 2i ) · 2q
which produces a value in the range
1
0 ≤ x ≤ 2p −
2q
or rather 0 ≤ x ≤ 31.875 with a resolution of 0.125. For example
x̂ = 15(10)
= 00001111(2)
7 → 00001111(QU )
5,3
7 → 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3
7→ 1.875(10)
31
30
23
22
0
s e m
52
51
0
s e m
Figure 1.5: Single- and double- precision IEEE-754 floating-point formats described graphically as bit-sequences and
concretely as C structures.
x̂ = ⟨x0 , x1 , . . . , x7 ⟩
Pp+q−2
7→ (−xp+q−1 · 2p+q−1 + i=0
xi · 2i ) · 1
2q
x̂ = 142(10)
= 10001111(2)
7 → 10001111(QS )
5,3
7 → 1 · −24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3
7→ −14.125(10)
1.8.4.2 Floating-point
Definition 1.60. The goal of a floating-point representation is to allow expression of real numbers whose form is
x = −1s · m · be
where
• s ∈ {0, 1} is the sign bit,
• m ∈ N is the mantissa, and
• e ∈ Z is the exponent.
Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa; since the exponent can
vary, it acts to “float” the fractional point, denoted ◦, into the correct position.
We say that a number of the form
x = −1s · (mn−1 ◦ mn−2 . . . m1 m0 ) · be ,
| {z }
n digits
is normalised: the fractional point is initially (i.e., before it is moved via the scaling factor) assumed to be after the first
non-zero digit of the mantissa. Note that n determines the precision.
Definition 1.61. IEEE-754 specifies two floating-point representations (or formats); each format represents a floating-
point number as a bit-sequence by concatenating together three components, i.e., the mantissa, the exponent and the sign
bit. There are two features to keep in mind:
• Imagine x is normalised as above, since b = 2 we know mn−1 = 1 since the leading digit of the mantissa must be
non-zero. This means we do not need to include mn−1 explicitly in the representation of x, the now implicit value
being termed a (or the) hidden digit.
• The exponent needs a signed integer representation; one might imagine that two’s-complement is suitable, but
instead an approach called biasing is used. Essentially this means that the representation of x adds a constant β to
the real value of e so it is always positive, i.e.,
ê 7→ e − β.
• The single-precision, 32-bit floating-point format allocates the least-significant 23 bits to the mantissa, the next-
significant 8 bits to the exponent and the most-significant bit to the sign:
x̂ = ⟨x0 , x1 , . . . , x31 ⟩
7→ −1s · m · 2e−127
x̂ = ⟨x0 , x1 , . . . , x63 ⟩
7→ −1s · m · 2e−1023
Definition 1.62. The IEEE floating-point representations reserve some values in order to represent special quantities.
For example, reserved values are used to represent +∞, −∞ and NaN, or not-a-number: +∞ and −∞ can occur when
a result overflows beyond the limits of what can be represented, NaN occurs, for example, as a result of division by zero.
For the single-precision, 32-bit format these special values are
00000000000000000000000000000000 7→ +0
10000000000000000000000000000000 7 → −0
01111111100000000000000000000000 7 → +∞
11111111100000000000000000000000 7 → −∞
01111111100000100000000000000000 7 → NaN
11111111100100010001001010101010 7 → NaN
x = 1111011.0101(2)
before normalising it, meaning we shift it so that there is only one digit to the left of the binary point, to get
x = 1.1110110101(2) · 26 .
Recalling that we do not store the implicit hidden digit (i.e., the digit to the left of the binary point), our
mantissa, exponent and sign become
m = 11101101010000000000000(2)
e = 00000110(2)
s = 0(2)
noting we pad both with less-significant zeros to ensure each is of the correct length. Finally, we can convert
each component into a literal using their associated representations, i.e.,
m̂ = 11101101010000000000000
ê = 10000101
ŝ = 0
noting that we bias e (i.e., add 127 to e = 6) to get the result, and concatenate the components into the single
literal
x̂ = 01000010111101101010000000000000.
Definition 1.63. Consider a case where the result of some arithmetic operation (or conversion) requires more digits of
precision than are available. That is, it cannot be represented exactly within the n digits of mantissa available. To combat
this problem, we can use the concept of rounding. For example, you probably already know that if we only have two
digits of precision available then
• 1.24(10) is rounded to 1.2(10) because the last digit is less than five, while
• 1.27(10) is rounded to 1.3(10) because the last digit is greater than or equal to five.
Such a rounding mode is essentially a rule that takes the ideal result, i.e., the result if one could use infinite precision,
to the most suitable representable result.
The IEEE-754 specification mandates the availability of four rounding modes. In each case, the idea is to
imagine the ideal result x is written using an l > n digit mantissa m, i.e.,
where m′i = mi+l−n , then “patch” m′0 = ml−n according to rules given by the rounding mode. Within the following,
we offer some decimal examples for clarity (minor alterations apply in binary), rounding for n = 2 digits of
precision in each example. Note that the the C standard library offers access to these features, using constant
values FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, and FE_TOWARDZERO respectively to refer to the rounding
modes themselves. For example, the rint function rounds a floating-point value using the currently selected
IEEE-754 rounding mode; this can be inspected and set using the fegetround and fesetround functions.
Definition 1.64. Sometimes termed Banker’s Rounding, the round to nearest mode alters basic rounding to provide
more specific treatment when the ideal result is exactly half way between representable results, i.e., when m′0 = 5. It can
be described via the following rules:
Definition 1.65. Sometimes termed ceiling, the round toward +∞ mode can be described via the following rules:
• If x is positive (i.e., s = 0), if ml−n−1 is non-zero then alter m′0 by adding one.
• If x is negative (i.e., s = 1), the trailing digits from ml−n−1 onward are discarded.
Definition 1.66. Sometimes termed floor, the round toward −∞ mode can be described via the following rules:
• If x is positive (i.e., s = 0), the trailing digits from ml−n−1 onward are discarded.
• If x is negative (i.e., s = 1), if ml−n−1 is non-zero then alter m′0 by adding one.
Definition 1.67. The round toward zero mode operates as round toward −∞ for positive numbers and as round toward
+∞ for negative numbers.
Example 1.52. Under the round toward zero mode, we find that
Example 1.53. The (slightly cryptic) C program in Figure 1.6 offers a practical demonstration that floating-point
works as expected. The idea is to “overlap” a single-precision, 32-bit floating-point value called x with an
instance of the ieee32_t structure called y; main creates an instance of this union, calling it t. Since we can
access individual fields within t.y (e.g., the sign bit t.y.s, or the mantissa t.y.m), we can observe the effect
altering them has on the value of t.x. Compiling and executing the program gives the following output:
(a) Two unions which “overlap” the representations of an actual floating-point field x with an instance y of the structure(s)
defined in Figure 1.5.
t.x = 2.8;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );
t.y.s = 0x01;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );
t.y.e = 0x81;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );
t.y.s = 0x00;
t.y.e = 0xFF;
t.y.m = 0 x400000 ;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );
t.y.s = 0x00;
t.y.e = 0xFF;
t.y.m = 0 x000000 ;
printf ( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );
return 0;
}
(b) A driver function main that uses an instance x of view32_t to demonstrate how manipulating fields in t.y impacts
on the value of t.x.
Figure 1.6: A short C program that performs direct manipulation of IEEE floating-point numbers.
+2.800000 0 80 333333
-2.800000 1 80 333333
-5.600000 1 81 333333
+nan 0 FF 400000
+inf 0 FF 000000
The question is, what on earth does this mean? We can answer this by looking at the each part of the program
(each concluding with a call to printf that produces the lines of output):
• t.x is set to 2.8(10) , and then t.x and each component of t.y is printed. The output shows that
t.y.s = 0(16) 7→ 0
t.y.e = 80(16) 7→ 10000000
t.y.m = 333333(16) 7→ 01100110011001100110011
Accounting for the bias and including the hidden bit, this of course represents the value
−10 · 1.011001100110011001100110011(2) · 21
or 2.8(10) as expected.
• t.y.s is set to 01(10) = 1(10) , and then t.x and each component of t.y is printed. We expect that setting
the sign bit to 1 rather than 0 will change t.x from being positive to negative; this is confirmed by the
output, which shows t.x is equal to −2.8(10) as expected.
• t.y.e is set to 81(10) = 129(10) , and then t.x and each component of t.y is printed. We expect that
setting the exponent to 129 rather than 128 will double t.x (the unbiased value of the exponent is now
129 − 127 = 2 st. the mantissa is scaled by 22 = 4 rather than 21 = 2); this is confirmed by the output,
which shows t.x is equal to −5.6(10) as expected.
• t.y.s, t.y.e and t.y.m are set to reserved values corresponding to NaN and +∞.
Figure 1.7: A teletype machine being used by UK-based Royal Air Force (RAF) operators during WW2 (public domain
image, source: http://en.wikipedia.org/wiki/File:WACsOperateTeletype.jpg).
• Imagine we want to test if one character x is alphabetically before some other y. The way the ASCII
translation is specified, we can simply compare their numeric representation. If we find Ord(x) < Ord(y)
then the character x is before the character y in the alphabet. For example ‘a’ is before ‘c’ because
• Imagine we want to convert a character x from lower-case into upper-case. The lower-case characters are
represented numerically as the contiguous range 97 . . . 122; the upper-case characters as the contiguous
range 65 . . . 90. So we can covert from lower-case into upper-case simply by subtracting 32. For example
Chr(Ord(‘a’) − 32) = ‘A’.
References
[1] G. Boole. An investigation of the laws of thought. Walton & Maberly, 1854 (see p. 29).
[2] D. Cohen. “On Holy Wars and a Plea for Peace”. In: IEEE Computer 14.10 (1981), pp. 48–54 (see p. 39).
[3] D. Goldberg. “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. In: ACM
Computing Surveys 23.1 (1991), pp. 5–48.
[4] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.
[5] E.L. Post. “Introduction to a General Theory of Elementary Propositions”. In: American Journal of Mathe-
matics 43.3 (1921), pp. 163–185 (see p. 31).
[6] C.E. Shannon. “A mathematical theory of communication”. In: Bell System Technical Journal 27.3 (1948),
pp. 379–423 (see p. 38).
[7] C.E. Shannon. “A Symbolic Analysis of Relay and Switching Circuits”. In: Transactions of the American
Institute of Electrical Engineers (AIEE) 57.12 (1938), pp. 713–723 (see p. 29).
[8] H.M. Sheffer. “A set of five independent postulates for Boolean algebras, with applications to logical
constants”. In: Transactions of the American Mathematical Society 14.4 (1913), pp. 481–488 (see p. 31).
[9] C. Wressnegger et al. “Twice the Bits, Twice the Trouble: Vulnerabilities Induced by Migrating to 64-Bit
Platforms”. In: Computer and Communications Security (CCS). 2016, pp. 541–552 (see p. 49).
CHAPTER
2
BASICS OF DIGITAL LOGIC
– Brooks
In the previous Chapter, we made some statements regarding various features of digital logic without backing them
up with any evidence or explanation. Adopting a “from atoms upwards” approach in order to support material in
subsequent Chapters, this Chapter has two central aims that, in combination, describe digital logic. First it expands on
previous statements, such as the above, demonstrating how they can be satisfied using introductory Physics. Note that
a detailed, in-depth treatment of such material could fill another book, and, arguably, is not strictly required given the
remit of this book; the focus is therefore at a high level, offering an overview of only pertinent details at the right level of
abstraction. For example, to connect theory such as Boolean algebra to practice, it is important to understand how we
can design and manufacture implementations of Boolean operators that can physically provide the same functionality.
Then, second, it explains why doing so is useful and important: the bulk of the Chapter demonstrates, step-by-step, how
successively higher-level components, capable of successively more complex and so useful computation, can be designed
and implemented.
For some, the electrical properties of atoms and sub-atomic particles may be an unfamiliar topic. As a result, it
is common, and potentially quite useful, to align them with more familiar concepts via the so-called hydraulic
analogy.
Imagine a water tower (resp. battery), connected via a pipes (resp. wire) which eventually powers a water
wheel (resp. lamp):
• the water pressure (resp. electrical potential) is dictated by how much water (resp. electrical charge) is
held in the water tower,
• water flows along the pipes; a wider pipe (resp. a wire with lower resistivity) allows water to flow more
easily, and hence quicker, than a narrower pipe (resp. wire with higher resistivity),
• when the water reaches the water wheel, it causes it to turn as a result of two properties: the pressure
(resp. voltage), and the flow rate (resp. current) of the water.
+ + +
− − −
• silicon has atomic number fourteen; it has three shells containing two, eight and four electrons respec-
tively, whereas
• lithium has atomic number three; it has two shells containing two and one electrons respectively.
Definition 2.1. Each type of sub-atomic particle carries a specific electrical charge: electrons carry a negative charge,
protons carry a positive charge, and neutrons carry no (or a neutral) charge; the unit of measurement is the coulomb
(after Charles-Augustin de Coulomb). This suggests any atom with an imbalance of electrons and protons will have a
non-neutral charge overall; we term such cases an ion, st. negatively (resp. positively) charged ions will have more (resp.
fewer) electrons than protons.
insulator
−
−
Figure 2.2: A simple circuit conditionally connecting a capacitor (or battery) to a lamp depending on the state of a switch.
Figure 2.3: Some simple examples of Boolean-style control of a lamp by combinations of switches.
and protons, using some energy. The exact amount of energy required relates to how tightly the electrons are
bound to the nucleus, and so by the type of atom. Electrons also exhibit a property whereby they repel each
other, but are attracted by holes (or “gaps”) in a given electron cloud; this implies they can move.
Definition 2.2. Electrical current refers to a (net) flow of electric charge; the unit of measurement is the ampere (or
amp, after André-Marie Ampére).
Definition 2.3. Electrical potential difference (or, more often, voltage) refers to the difference in electrical potential
energy between two points per unit electric charge; the unit of measurement is the volt (after Alessandro Volta). Informally,
you can think of voltage as the electrical work (or the effort) needed to move (or drive) the electrons and hence cause a
flow of current.
Definition 2.4. Electrical power refers to the rate of electrical work, i.e., the amount of charge driven, per unit of time,
by a given voltage; the unit of measurement is the watt (after James Watt). We say electrical power is dissipated (or
“consumed”) when electrical potential energy associated with some charge is converted into another form (e.g., heat or
light) by a component (or load).
An electron can move between atoms, doing so from a point of more negative charge toward a point of more
positive charge, i.e., from lower to higher voltage, or driven by a potential difference. This movement or flow
of valence electrons from one point to another suggests a (net) flow of charge and hence a current between the
two points.
This is formally termed electron current, in part to distinguish it from conventional current: when we
use the term current, we almost universally mean the latter. Although electron current describes the flow of
negative charge, means we actually focus on what would be the flow of positive charge if that were possible
(i.e., the opposite of electron current). Put another way, some electron moving from a more negative point X to
a more negative point Y will make Y more negative and X more positive; the electron current is from X to Y,
whereas the conventional current is from Y to X. This is why you might traditionally think of charge moving
from a terminal labelled +ve to that labelled −ve on a battery. Set in the context of what we now now to be
true, this is confusing1 . However, it also has a clear historical linage we are now stuck with: Benjamin Franklin
adopted this convention in the mid 1700s, also labelling charge using the positive or negative terminology,
during his pioneering study of electricity. Either way, from here on, you should read current as a synonym for
conventional current.
Definition 2.6. A conductor, e.g., a metal, has high-conductivity (resp. low-resistivity) and allows electrons to move
easily, whereas an insulator, e.g., a vacuum, has low-conductivity (resp. high-resistivity) and does not allow electrons to
move easily.
When we describe a material as conductive or resistive, we typically mean it is on a spectrum between the
two: although unlikely to represent a perfect conductor or insulator, we mean it is closer to one end of the
spectrum or other (e.g., is more conductive or more resistive). Although such properties are inherent in the
material, it is possible to explicitly manipulate the sub-atomic composition of semi-conductor materials using
a process called doping. For example, imagine we need a material for some task; any non-ideal material will
have non-ideal properties wrt. the task. The idea is instead to take some non-ideal material as a starting point,
then dope (or combine) it with a dopant material: their combination should be similar to the starting point,
but more ideal wrt. the properties required.
Example 2.2. Consider pure silicon, which has an electron cloud of four electrons (only about half full); it is
more or less an insulator. Doping with a boron or aluminium donor creates extra holes, while doping with
phosphor or arsenic creates extra electrons.
An important use-case for doping is the production of semi-conductor materials. Although various materials
might exhibit the properties of a semi-conductor, doping allows careful control over the ratio of electrons vs.
hole and hence conductivity (resp.resistivity) of the result. Rather than rely on the perfect material being
naturally available, we therefore produce a material with exactly the properties required for a given task.
Definition 2.7. A doped semi-conductor material falls into one of two classes, namely
1. an N-type semi-conductor has an abundance of electrons produced by doping with a donor material, or
2. a P-type semi-conductor has an abundance of holes produced by doping with a acceptor material.
Following from this example, it may be worth convincing ourselves that a switch is useful for something
beyond controlling a lamp as above. To provide an answer, we just need to generalise the example: we a) use
multiple switches, and b) treat each switch as an input and the lamp as an output. Put another way, imagine
we have two switches labelled x and y; we are interested in how their combination controls the lamp labelled
r, so, in effect, what the function f described by r = f (x, y) is.
1. Figure 2.3a controls the lamp via two switches, and models an AND operator: r = f (x, y) |= x ∧ y. Only
when both of the switches are closed will the lamp be on: if either is open, there is no connection with
the battery.
1 See,e.g., http://xkcd.com/567/.
2 Although the analogy is reasonable, keep in mind that a battery differs from a capacitor: behaviour of the former is due to chemical a
process, which converts chemical energy to electrical energy and thus delivers a flow of electrons (i.e., a current).
2. Figure 2.3b controls the lamp via two switches, and models an OR operator: r = f (x, y) |= x ∨ y. When
one or the other, or both the switches are closed will the lamp be on: there is connection with the battery
unless both of the switches are open.
3. Figure 2.3c controls the lamp via two switches, and models an XOR operator: r = f (x, y) |= x ⊕ y. This
time, the switches sort of operate in the opposite way to each other; to make a connection between the
lamp and battery along the top (resp. bottom) wire, the left-hand switch needs to be closed (resp. open)
while the right-hand switch needs to be open (resp. closed): there is connection with the battery if one
or the other, but not both switches are closed. You often find this sort of arrangement in homes where a
single light on some stairs is controlled by switches located at the top and bottom.
On one hand, the examples above should be encouraging: they show we can mirror the behaviour of Boolean
operators, using a careful organisation of multiple switches. On the other hand, however, push-button switches
are mechanically operated: we want an electrically operated switch, which is actuated (i.e., pressed or released)
via an electrical property (e.g., a flow of electrons) rather than by hand. Crucially, this will allow the output of
one such operator to be used as the input to another, and therefore the implementation of larger expressions.
1. allow charge to flow between two terminals (i.e., act as a conductor) when we turn the the switch on, and
2. prevent change flowing between two terminals (i.e., act as a resistor) when we turn the the switch off.
The word transistor is a portmanteau of “transfer resistor”, offering a hint as to the underlying principle: a
transistor is a resistor, but one we can control by altering how resistive it is. Put more simply, we can control it
st. it is conductive when we want to turn the switch on and resistive when we want to turn it off.
The question is then how such behaviour is realised. Improvement and different trade-offs have given us
numerous transistor designs, but we focus on just one: the Field Effect Transistor (FET), initially designed and
patented by Julius Lilienfeld in 1925. However, at that point in time the general understanding of sub-atomic
behaviour was less than now, meaning use of his design was limited. This changed in 1952, when a team
of Engineers at Bell Labs, led by William Shockley, invented what is now termed a junction gate FET (or
JFET, due to some legal wranglings wrt. to the Lilienfeld patent). In turn, this gave rise to the Metal Oxide
Semi-Conductor Field-Effect Transistor (MOSFET), invented in 1960 by Dawon Kahng and Martin Atalla,
also at Bell Labs. These designs delivery the properties we require to avoid their limiting complexity of modern
digital logic components; in particular, they a) have the switch-like functionality described as useful thus far,
while also b) are simultenously physically small, operate quickly, are reliable, and easy to manufacture.
Figure 2.4: A 6P1P (i.e., a 100W to 200W, photo-sensitive type) vacuum tube (public domain image, source: http:
//en.wikipedia.org/wiki/File:6P1P.jpg).
Figure 2.5: A moth found by operations of the Harvard Mark 2; the “bug” was trapped within the computer and caused
it to malfunction (public domain image, source: http://en.wikipedia.org/wiki/File:H96566k.jpg).
Figure 2.6: A replica of the first point-contact transistor, a precursor of designs such as the MOSFET, constructed at Bell
Labs (public domain image, source: http://en.wikipedia.org/wiki/File:Replica-of-first-transistor.
jpg).
body
Figure 2.7: A high-level diagram of a MOSFET transistor, showing the terminal and body materials.
Figure 2.8: A pair of N-MOSFET and P-MOSFET transistors, arranged to form a CMOS cell.
Figure 2.7 offers a high-level description of a MOSFET, in which atomic-scale layers of semi-conductor
material are combined with metal or poly-silicon layers for the terminals; although a lower-level, more detailed
description would require deeper understanding of related Physics (see, e.g., [8]), we already have enough
background to explain the basic concept at this high level. In short, the switch-like behaviour is realised by
using the gate terminal to control a conductive channel between the source and drain terminals . Unlike a
a JFET, where an explicit semi-conductor layer is constructed for use as the channel, in a MOSFET transistor
the channel is induced. Specifically, applying a small potential difference to the gate terminal repels holes in
the P-type body; doing so forms a depletion layer in which the number of holes is depleted. As the potential
difference applied grows, an inversion layer is formed at the surface: the abundance of electrons relative to
the number of (repelled) holes inverts the properties of the P-type body, turning it into N-type and so forming
a conductive channel between N-type source and drain terminals.
Realising this behaviour in practice depends on the careful selection of semi-conductor materials; Figure 2.9
illustrates the symbols used for two MOSFET variants. These symbols abstract away the implementation detail
(retaining only the terminals, with d, s and g denote the drain, source and gate), which is as follows:
Definition 2.8. An N-MOSFET (or N-type MOSFET, or N-channel MOSFET, or NPN MOSFET) is constructed
from N-type semi-conductor terminals and a P-type body:
• applying a potential difference to the gate widens the conductive channel, meaning source and drain are connected
(i.e., act like a conductor); the transistor is activated.
• removing the potential difference from the gate narrows the conductive channel, meaning source and drain are
disconnected (i.e., act like an insulator); the transistor is deactivated.
Definition 2.9. A P-MOSFET (or P-type MOSFET, or P-channel MOSFET, or PNP MOSFET) is constructed
from P-type semi-conductor terminals and an N-type body:
• applying a potential difference to the gate narrows the conductive channel, meaning source and drain are disconnected
(i.e., act like an insulator); the transistor is deactivated.
• removing the potential difference from the gate widens the conductive channel, meaning source and drain are
connected (i.e., act like a conductor); the transistor is activated.
Put another way, for an N-MOSFET, applying a large potential difference to the gate terminal produces a wider
conductive channel, and so allows electrons (i.e., current) to flow between source and drain. Conversely, a
small potential difference (or at least smaller than some threshold) means a narrower conductive channel,
which prevents said flow. The gate terminal therefore offers functionality much like a switch: controlling the
potential difference applied controls conductivity between source and drain, and hence regulates the current.
d s
g g
s d
(a) An N-MOSFET transistor. (b) A P-MOSFET transistor.
Definition 2.11. The threshold voltage of a given MOSFET (i.e., either N- or P-MOSFET) is the minimum voltage
level (i.e., potential difference between gate and source) required to activate the transistor and thus connect the source and
drain; below the threshold voltage, the source and drain remain disconnected.
Definition 2.12. The concept of sub-threshold leakage (or just leakage) relates to a non-ideal properties of the
conductive channel: below the threshold voltage the source and drain are not perfectly disconnected, st. a small flow of
electrons (i.e., the leakage current) is possible.
Rather than use MOSFET transistors in isolation, it is common to organise them into larger combinations; by
offering a higher level of abstraction, such combinations are usually easier to reason about from both functional
and behavioural perspectives.
Ultimately the aim is to (re)produce Section 2.1.1 where we outlined Boolean-like functionality using
mechanical switches, but now by using transistors. A popular3 first step relates to organisation of two transis-
tors (pairing an N-MOSFET with a P-MOSFET) to form one Complementary Metal-Oxide Semi-Conductor
(CMOS) component we term a cell. This approach, as illustrated at a high-level by Figure 2.8, was first con-
ceived in 1963 by Frank Wanlass at Fairchild Semi-conductor. The idea is to organise the transistors so they
operate in a complementary manner:
Definition 2.13. CMOS-based design strategies typically use two distinct parts to form a given component: there will be
1. a pull-up network of P-MOSFET transistors between the Vdd power rail and the output, and
2. a pull-down network of N-MOSFET transistors between the Vss power rail and the output.
A consequence of this logic style is that only one of the pull-up or pull-down networks can be active (i.e., connected) at a
time.
Definition 2.14. The power dissipation of a CMOS cell, and hence a CMOS-based design more generally, can be described
in terms of
1. a static component, where the transistors remain in a given state (to are “idle” in some sense), and
2. a dynamic component, where the transistors switch state, i.e., the gate is changes from being driven by Vdd to Vss ,
or vice versa.
CMOS exhibits a marginal amount of sub-threshold leakage, so the majority of power dissipation occurs due to switching
activity.
This has some obvious advantages, which make CMOS an attractive choice vs. alternatives. In particular,
when organising lots of transistors in close proximity, CMOS will have lower overall power consumption and
heat dissipation, and, in turn, better reliability.
The next step is to package CMOS cells into small, useful building-blocks that act as the next-level component
above transistors themselves. As an example, consider building a component which inverts the input st. if the
input x is Vdd the output is Vss and vice versa.
Vdd
x r
Vss
(a) A CMOS-based NOT gate.
Vdd
Vss
(b) A CMOS-based, 2-input NAND gate.
Vdd
x
Vss
(c) A CMOS-based, 2-input NOR gate.
Figure 2.10: MOSFET-based implementations of NOT, NAND and NOR logic gates.
Figure 2.11: A voltage-oriented truth table for NOT, NAND and NOR logic gates.
In a CMOS-based design strategy, we normally refer to the power rails as Vdd and Vss . The ‘d’ stands for drain:
Vdd could be read as “voltage level at the drain” st. it also makes sense to have Vss read as “voltage level at the
source”. This naming convention seems to stems from earlier bipolar-based transistors, where Vcc and Vee are
sort of the same thing but for collector and emitter terminals.
This all starts to become a little involved however, and beyond the scope of what we want to discuss.
All we really care about is that Vdd and Vss make our transistors work correctly, and we can tell them apart.
Although it might be too informal for some tastes, it is therefore enough to keep the following in mind:
• Vdd is the high or positive voltage level, e.g., 3.3V or 5V, and
• Vss is the low or negative voltage level, e.g., 0V ≃ GND.
Note that GND refers to ground: this can be thought of as a) a reference point other voltages are measured
relative to (note that voltage is a synonym for potential difference, meaning we need such a reference), or b) a
(or the) return path, i.e., the point to which electrons will move due to their preference to move from high to
low potential difference.
1. connecting x to Vss means the top P-MOSFET will be connected, the bottom N-MOSFET will be discon-
nected, so r will be connected to Vdd . while
2. connecting x to Vdd means the top P-MOSFET will be disconnected, the bottom N-MOSFET will be
connected, so r will be connected to Vss .
Note that even with this simple organisation, we can identify the pull-up and pull-down networks; although
there is just one transistor in each, it is true that the P-MOSFET connects Vdd to the output iff. x = Vss and the
N-MOSFET connects Vss to the output iff. x = Vdd . We can of course consider more complex organisations
under the same design strategy, by increasing the number of transistors.
Example 2.6. Consider Figure 2.10b, where
1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETS
will be disconnected, so r will be connected to Vdd ,
2. connecting x to Vdd and y to Vss means the right-most P-MOSFET will be connected, the upper-most
N-MOSFET will be disconnected, so r will be connected to Vdd ,
3. connecting x to Vss and y to Vdd means the left-most P-MOSFET will be connected, the lower-most
N-MOSFET will be disconnected, so r will be connected to Vdd , while
4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETS
will be connected, so r will be connected to Vss .
1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETS
will be disconnected, so r will be connected to Vdd ,
2. connecting x to Vdd and y to Vss means the upper-most P-MOSFET will be disconnected, the left-most
N-MOSFET will be connected, so r will be connected to Vss ,
3. connecting x to Vss and y to Vdd means the lower-most P-MOSFET will be disconnected, the right-most
N-MOSFET will be connected, so r will be connected to Vss , while
4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETS
will be connected, so r will be connected to Vss .
A second aspect of the design strategy is made evident by increasing the number of transistors. Specifically, the
two examples include P-MOSFETs organised in parallel (st. either can be activated to connect Vdd to the output)
and N-MOSFETs organised in series (st. both must be activated to connect Vss to the output), or vice versa.
Hopefully it is obvious that the three examples model the NOT, NAND (or NOT AND) and NOR (or NOT
OR) Boolean operators respectively; this fact is renforced by Table 2.11. Either way, the fact is that from a
starting point involving atomic-level concepts we have developed components that we can reason about wrt.
both theory and practice. That is, we have used electrical switches to implement Boolean algebra; instead of
reasoning about computation involving the latter in theory, we can now actually build components that do that
computation in practice.
Definition 2.16. A given logic style will suggest an associated standard cell, i.e., an organisation of transistors that
realises a higher-level building block, namely either a a) computational component (e.g., a Boolean AND operator), or b)
storage component (e.g., a latch); where the former is more naturally described as a logic gate. Each such cell will have
associated functional specification (i.e., a truth table or excitation table), and behavioural specification (e.g., detailing
propagation delay).
Definition 2.17. A standard cell library is a collection of standard cells, used as building-blocks in a design.
Definition 2.18. The standard cell methodology permits design abstraction, in the sense a design can be specified at
a high- vs. low-level (i.e., in terms of standard cells, vs. transistors).
Definition 2.19. A Gate Equivalent (GE) is a unit of measurement used to assess the (area) complexity of a digital
logic design independently from the manufacturing process technology. It is common (e.g., for CMOS) to consider a
2-input NAND gate as 1 GE: you can think about it as a normalisation factor for manufacturing processes, st. designs
specified using different processes can be compared fairly.
• We are assuming the voltage levels used to represent values on each wire are perfect in some sense. In
short, we assume the associated signals have a “square” waveform and so are digital signals (i.e., only
ever have a value of 0 or 1). In reality this can be dubious, because physical phenomena that underpin
those voltage levels mean the edges of said signals might be “rounded” and so imperfect (e.g., have a
value of 0.5 say); we basically ignore this issue, at least until later.
• An inversion bubble on the output of a gates is used to denote that fact that the output is inverted. As
such, a buffer (or BUF) is simply a gate that connects the input directly to the output; a NOT gate is then
a buffer that inverts the input to form the output.
• For completeness we have included the NXOR (sometimes written XNOR) gate, which has the obvious
meaning but is seldom used in practise; per Chapter 1, we use ∧ , ∨ and ⊕ as a short-hand to denote
NAND, NOR and NXOR respectively. Clearly, for example, we have
x ∧ y ≡ ¬(x ∧ y).
r is x ≡ r=x ≡ x r
r is NOT x ≡ r = ¬x ≡ x r
r is x NAND y ≡ r=x∧y ≡ x
y r
r is x NOR y ≡ r=x∨y ≡ x
y r
r is x AND y ≡ r=x∧y ≡ x
y r
r is x OR y ≡ r=x∨y ≡ x
y r
r is x XOR y ≡ r=x⊕y ≡ xy r
Figure 2.12: Representation of standard logic gates in English, Boolean algebra, C and symbolic notations.
BUF NOT
x r x r
0 0 0 1
1 1 1 0
(a) A 1-input, 1-output buffer. (b) A 1-input, 1-output NOT gate.
AND NAND
x y r x y r
0 0 0 0 0 1
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 0
(c) A 2-input, 1-output AND gate. (d) A 2-input, 1-output NAND gate.
OR NOR
x y r x y r
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 1 1 1 0
(e) A 2-input, 1-output OR gate. (f) A 2-input, 1-output NOR gate.
XOR NXOR
x y r x y r
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
(g) A 2-input, 1-output XOR gate. (h) A 2-input, 1-output NXOR gate.
x r ≡ x r ≡ x r
x
x
y r ≡ x
y r ≡ r
y
x
x
y r ≡ r ≡ x
y r
y
Figure 2.14: Identities for standard logic gates in terms of NAND and NOR.
• Given 2-input gates such as AND, OR, and XOR, we use a short-hand and draw the gates with more
inputs; this is equivalent to making a tree of 2-input gates since, for example, we have
(w ∧ x ∧ y ∧ z) ≡ (w ∧ x) ∧ (y ∧ z).
Now, by treating the gates as operators per Boolean algebra we can combine them together and design
components that fall into a category often termed combinatorial logic; the gate behaviours combine to compute
a result continuously, with their output updated whenever an input changes.
1. Chapter 1 suggests NOT, AND, and OR are the operators to focus on, so why design NAND and NOR
from transistors? and
2. given the design of NAND and NOR from transistors was an involved, detailed process, is there a way
to avoid repeating this for AND and OR?
The answer to both questions stems from the functional completeness off NAND and NOR: they are universal,
in the sense we can implement every other logic gate using one or other of them alone (as already discussed in
Chapter 1). The identities
¬x ≡ x∧x
x∧y ≡ (x ∧ y) ∧ (x ∧ y)
x∨y ≡ ¬x ∧ ¬y ≡ (x ∧ x) ∧ (y ∧ y)
and
¬x ≡ x∨x
x∧y ≡ ¬x ∨ ¬y ≡ (x ∨ x) ∨ (y ∨ y)
x∨y ≡ (x ∨ y) ∨ (x ∨ y)
replicated diagrammatically in Figure 2.14, demonstrate why; one can easily verify them via enumeration, e.g.,
in
and
This is enormously important: it explains why designing NAND and NOR from transistors made sense in the
first place, but, more over, it allows us to implement any Boolean expression, and so any Boolean function, from
NAND and NOR gates alone. The manufacture of such implementations, which we cover in Section 2.5, will
be vastly easier as a result. At the transistor-level, we only need deal with some (large) number of one building
block (i.e., NAND or NOR) vs. the added complexity and effort associated with many such building blocks (i.e.,
AND, OR, XOR, and so on): everything at a low level is expressed in terms of NAND or NOR, so implemented
by exactly the organisations of N- and P-MOSFETs we have already seen.
r=x∧y
r = (w ∧ x ∧ y) ∨ (x ∧ y ∧ z)
t=x∧y
Arguments based on universality of NAND and NOR motivate a preference for these building blocks by
a preference for minimalism: using a single building block to implement every other component will offer
manufacturing advantages, for example, vs. a more diverse set.
That said, it is reasonable to question what other motivations exist. Put another way, what would happen
if we wanted an AND design in similar, transistor-based terms? A common starting point for such questions is
the following
Vdd
Vss
in which we only have a pull-up network. The reasoning is often that if x = Vdd and y = Vdd then r = Vdd as
required, whereas if x = Vss or y = Vss then r is disconnected; in defining what 0 and 1 mean, if we just define
disconnected as 0 then maybe this design is valid? A counterargument (among many) is to think about what
happens if we use r elsewhere as an input, e.g.,
Vdd
Vss
Now, if r is disconnected, the top-most transistor in the second layer simply does not work: the gate terminal
is disconnected from either Vdd or Vss so the transistor cannot function.
It turns out there is a solution to this sort of issue, which is to opt for a pull-down resistor rather than
network of transistors, i.e., something like
Vdd
Vss
which you could think of as providing a “default” value to any disconnected wire. The problem is, now we
have to reason about and manufacture another component (i.e., the resistor): both of these are out of scope, so,
at least here, this approach is not viable.
1. Find a set T such that i ∈ T iff. O = 1 in the i-th row of the truth table.
2. For each i ∈ T, form a term ti by AND’ing together all the variables while following two rules:
(a) if I j = 1 in the i-th row, then we use
Ij
as is, but
(b) if I j = 0 in the i-th row, then we use
¬I j .
3. An expression implementing the function is then formed by OR’ing together all the terms, i.e.,
_
e= ti ,
i∈T
Intuitively, each i ∈ T will produce a minterm ti in the SoP form: each term ti ANDs inputs together (to form
their product), whereas e ANDs together the terms (to form their sum). Each minterm fully specifies an input
assignment (i.e., a value for each input) for a row of the truth table where the output is 1; in a sense, we are
“covering” (or dealing with) each such row by doing so.
Example 2.8. Consider the task of implementing an expression for XOR, i.e., an e in SoP form which implements
f (x, y) = x ⊕ y, a truth table for which is reproduced (cf. Figure 2.13) here for clarity:
XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0
Likewise, it is clear that T = {1, 2} because O = 1 in rows 1 and 2, whereas O = 0 in rows 0 and 3.
2. Each term ti for i ∈ T = {1, 2} is formed as follows:
• For i = 1, we find
– I0 = x = 0 and so we use ¬x,
– I1 = y = 1 and so we use y
and hence form the term t1 = ¬x ∧ y.
• For i = 2, we find
– I0 = x = 1 and so we use x,
– I1 = y = 0 and so we use ¬y
and hence form the term t2 = x ∧ ¬y.
e =
W
ti
i∈T
=
W
ti
i∈{1,2}
= (¬x ∧ y) ∨ (x ∧ ¬y)
For example, notice that the row for i = 1 produces the minterm t1 = ¬x ∧ y meaning “the row where x = 0 and
y = 1”, whereas the row for i = 2 produces the minterm t2 = x ∧ ¬y meaning “the row where x = 1 and y = 0”;
combining the minterms together, we get an SoP expression that specifies rows where the output should be 1
as “either x = 0 and y = 1, or x = 1 and y = 0”.
Example 2.9. Consider the truth table in Figure 2.15a which describes a 4-input Boolean function, and the SoP
expression
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )
resulting from application of the method above.
Although it only becomes apparent when you do so, deriving such an expression is tedious and error prone;
although the algorithm is simple, it could be described as machine-friendly (in the sense it is best executed
by a computer). The complexity of the expression, in the sense it contains many operators, is more obvious.
Although we could simplify it by applying Boolean axioms, for example, this is again quite tedious. It is
obvious to ask, therefore, whether (and if so, how) we can improve the original method wrt. these problems?
The Karnaugh map invented in 1953 by Maurice Karnaugh while working at Bell Labs [2], is an alternative
method which offers (at least) two advantages over the original: it a) offers a more visual and so arguably
human-friendly way to derive the resulting expression, and b) automatically applies various optimisations
while doing so, st. we no longer need to apply (as much) post-derivation optimisation by hand. Although an
example more usefully illustrates how to use a Karnaugh map, and so the advantages above, the method itself
is best summarised using another algorithm. Again imagine we are tasked with deriving a Boolean expression
that implements some Boolean function f with n inputs and one output:
and each row and column represents one input combination; order rows and columns according to a
Gray code.
2. Fill the grid elements with the output corresponding to inputs for that row and column.
3. Cover rectangular groups of adjacent 1 elements which are of total size 2m for some m; groups can “wrap
around” edges of the grid and overlap.
4. Translate each group into one term of an SoP form Boolean expression e where
Consider a sequence of unsigned, n-bit integers; selecting n = 4, for example, and starting from zero, such a
sequence would be
⟨0, 0, 0, 0⟩ 7→ 0(10)
⟨1, 0, 0, 0⟩ 7→ 1(10)
⟨0, 1, 0, 0⟩ 7→ 2(10)
⟨1, 1, 0, 0⟩ 7→ 3(10)
⟨0, 0, 1, 0⟩ 7→ 4(10)
⟨1, 0, 1, 0⟩ 7→ 5(10)
⟨0, 1, 1, 0⟩ 7→ 6(10)
⟨1, 1, 1, 0⟩ 7→ 7(10)
..
.
where the RHS describes a (decimal) value, and the LHS describes the (binary) representation of that value.
Notice that moving from ⟨1, 1, 0, 0⟩ to the next entry ⟨0, 0, 1, 0⟩ means changing 3 bits: the 0-th and 1-st bits
toggle from 1 to 0, and the 2-nd bit from 0 to 1. Now consider an alternative ordering of the same integers:
⟨0, 0, 0, 0⟩ 7→ 0(10)
⟨1, 0, 0, 0⟩ 7→ 1(10)
⟨1, 1, 0, 0⟩ 7→ 3(10)
⟨0, 1, 0, 0⟩ 7→ 2(10)
⟨0, 1, 1, 0⟩ 7→ 6(10)
⟨0, 0, 1, 0⟩ 7→ 4(10)
⟨1, 0, 1, 0⟩ 7→ 5(10)
⟨1, 1, 1, 0⟩ 7→ 7(10)
..
.
Now, moving from any entry to the next or the previous one will always toggle one bit: such an ordering is
termed a Gray code after Frank Gray who made reference to it in a 1953 patent application (such orderings
had been known and used for quite some time before that). Crucially,
1. we can produce an ordering that satisfies the same property for any n, and
2. the alternative ordering is just a permutation of the original: we keep the same values (and the same
representations), but just rearrange them within the sequence.
w x y z r
0 0 0 0 1
0 0 0 1 1
0 0 1 0 1
0 0 1 1 0 x y z r
0 1 0 0 1 0 0 0 0
0 1 0 1 1 0 0 1 0
0 1 1 0 0 0 1 0 1
0 1 1 1 0 0 1 1 1
1 0 0 0 1 1 0 0 0
1 0 0 1 0 1 0 1 ?
1 0 1 0 1 1 1 0 1
1 0 1 1 1 1 1 1 ?
1 1 0 0 0
(b) A 3-input example.
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
(a) A 4-input example.
00
0 1 5 4
01
2 3 7 6
z
11
y 10 11 15 14
10
8 9 13 12
Correctly interpreting the grid layout is crucial, since we need to translate rows of the truth table into the
correct elements. Note that w and x relate to the columns (or horizontal axis), whereas y and z relate to
the rows (or vertical axis). The left-most column, for example, relates to cases where w and x both have
the values 0, i.e., where (w, x) = (0, 0); reading that column top-to-bottom, the rows within it relates to
cases where (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0). The other columns, read from left-to-right, are similar for
y and z, but for the remaining cases where (w, x) = (0, 1), (1, 1) and (1, 0). As such, we can now fill each
element in the grid with an output listed in the corresponding truth table row to get
w
x
00 01 11 10
00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14
10
8
1 9
0 13
0 12
1
Bars above and to the left of this grid denote cases where the associated input is 1: the 1-st and 2-nd (or
middle) columns are where x = 1, for example, whereas the 0-th and 3-rd (or outer) columns are where
x = 0. Elsewhere you might also see numbers to the left of each row, or above each column to make
the values more explicit: they might show (0, 0) and (1, 0) (or just 00(2) and 01(2) ) for the 1-st and 2-nd
(or middle) columns, and (0, 1) and (1, 1) (or just 10(2) and 11(2) ) for the 0-th and 3-rd (or outer) columns.
Either way, the ordering might, reasonably, seem odd: note that in row- and column-wise directions, a
Gray code is used. From top-to-bottom, elements in a column are for (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0),
not (y, z) = (0, 0), (0, 1), (1, 0) and (1, 1) which might seem more natural. The reason for this choice will be
made apparent later, but, for now, keep in mind that it is what allows the Karnaugh map to deliver the
advantages outlined above.
2. The next step is to cover 1 elements in the grid. In a sense, this is analogous to what we did in the original
method when we identified each row in the truth table where the output was 1: there we would have a
group for each 1 element, but here we can form larger groups and cover multiple 1 elements.
The rules state we can form rectangular, potentially overlapping groups whose size is a power-of-two
(i.e., 2m for some m): provided we follow them, each group formed will represent a term we then need to
implement as part of the SoP expression. The larger the group, the fewer inputs we be included in each
of the terms; the fewer groups, the fewer terms there are. An example grouping in this case is as follows:
w
x
00 01 11 10
00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14
10
8
1 9
0 13
0 12
1
• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,
• a group of one element in the top right-hand corner,
• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and
• a group of two elements which wrap around the bottom-left and bottom-right corners.
3. Finally, we need to translate each group into a term in the SoP expression. As an example, consider the
first group (i.e., of four elements in the top left-hand corner) and the values each input is assigned within
it. It should become clear that the value of x is irrelevant provided that w = 0. Put another way, fixing
w = 0 means we include the two left-most columns only (excluding the two right-most columns because
they relate to cases where w = 1). In the same way, the value of z is irrelevant provided that y = 0.
By specifying values for each relevant input and ignoring the irrelevant inputs, we can implement this
term as
¬w ∧ ¬y
to cover all four cells in that group; we are specifying “the columns where w = 0 and rows where y = 0”,
which restricts us precisely to elements within the group. By applying similar reasoning to the other
three groups, we find that
r = ( ¬w ∧ ¬y ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬x ∧ y ∧ ¬z ) ∨
( w ∧ y ∧ z )
which is equivalent to but clearly simpler than the result we derived originally: there are a) fewer terms,
and b) each term is the combination of fewer inputs.
Example 2.11. The result above is simpler than the original, but it turns out we can do better still by more
careful formation of the groups. More specifically, we could consider the following alternative
w
x
00 01 11 10
00
0
1 1
1 5
0 4
1
01
2
1 3
1 7
0 6
0
z
11 0 0 1 1
y 10 11 15 14
10
8
1 9
0 13
0 12
1
• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,
• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and
• a group of four elements which wrap around the top-left, bottom-left, top-right, and bottom-right corners.
r = ( ¬w ∧ ¬y ) ∨
( ¬x ∧ ¬z ) ∨
( w ∧ y ∧ z ) .
Constructive use of don’t care entries An important feature or extension of truth tables, as defined so far, is
the potential to include so-called don’t care entries: rather than 0 nor 1, we use ? to denote we do not care what
the value is (vs. we do not know what the the value is, for example). When used in the context of an output, it
can be rationalised by considering a component whose output simply does not matter given some combination
of inputs: maybe this input is invalid, so the output is never used due to the resulting error.
1. On the LHS, wrt. the input x. In this case, the ? represents a short-hand, because by saying we don’t care
what the value of x is we expand that one row into two: one for x = 0 and one for x = 1, which is like
saying “irrespective of x (so if x = 0 or x = 1), provided y = 0 then r = 1”.
2. On the RHS, wrt. the output r. In this case, the ? represents a choice, because by saying we don’t care
what the value of r is we can select whatever suits us: it could be thought of like a “wildcard” of some
sort.
This concept has various applications, but is immediately useful during the derivation of an expression from the
specification (including don’t care entries) of some function. In short, both the original method and Karnaugh
map alternative can, at a high level, be described as covering 1 entries in the truth table (either individually, or
in a group); in both cases, fewer 1 entries implies a simpler the SoP expression. As such, it makes sense to deal
with don’t care entries (in the output) in a way that helps: we are free to treat them as 0 or 1, so a) treating them
as 0 means we do not need to cover them with a group, whereas b) treating them as 1 means we can potentially
form larger groups.
Example 2.13. Consider the truth table in Figure 2.15b which describes a 3-input Boolean function and thus
has 23 = 8 rows; selecting p = 2 and q = 4 yields the (empty) map
x
y
00 01 11 10
0
0 1 5 4
z 1
2 3 7 6
0
0
0 1
1 5
1 4
0
z 1 0 1 ? ?
2 3 7 6
0
0
0 1
1 5
1 4
0 0
0
0 1
1 5
1 4
0
z 1 0 1 ? ? z 1 0 1 ? ?
2 3 7 6 2 3 7 6
The left-hand option treats the element associated with x = 1 and z = 1 in the 1-st row, 2-nd column as 0: as
such it is not covered by a group, and we are forced to form two rectangular groups as a result st. the resulting
expression is
r = (¬x ∧ y) ∨ (y ∧ ¬z).
In contrast, the right-hand option treats the element as a 1, meaning it can be included in a single, larger group.
This produces the (much) simpler expression r = y.
Why Gray code?! In the example above, we informally cited the use of Gray code ordering for rows and
columns in a Karnaugh map as important wrt. the advantages it then offers. The easiest way to see why this is
true, is via another example where we do not use this approach.
Example 2.14. Consider the truth table
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
which describing some 4-input function f . By using a Gray code ordering, we translate it into the following
Karnaugh map
00 01 11 10
00 0 1 1 0
01 0 0 0 0
11 0 0 0 0
10 0 0 0 0
that allows formation of a single group that covers the two elements in the 0-th column; this group produces
the SoP expression
r = x ∧ ¬y ∧ ¬z,
noting that the value of w is irrelevant in this case (i.e., provided x = 0, y = 0 and z = 0, that alone is enough to
cover the group). Now consider a similar Karnaugh map without a Gray code ordering
00 01 10 11
00 0 1 0 1
01 0 0 0 0
10 0 0 0 0
11 0 0 0 0
which is more like a Veitch diagram [10], a precursor to the Karnaugh map. Note, for example, that the 2-nd
column now represents cases where w = 1 and x = 0, and the 3-nd column now represents cases where w = 1
and x = 1: the 2-nd and 3-rd columns are swapped versus the original Karnaugh map (and likewise for the
rows). The problem is, now we cannot make a single group that covers the same two elements: we now need
two groups, each covering one element. These groups obviously produce a more complicated SoP expression,
namely
r = ( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z )
where we now include w even though we know it is not required; to get the same result as before, we would
now have to manipulate the expression by hand using suitable axiomatic steps.
This basically demonstrates that by using a Gray code ordering, where one bit will always toggle in the input
assignment when moving between rows and/or columns, we support precisely the observation outlined at the
start of the Section. Put another way, we wanted to identify input assignments that differed wrt. one input only
so as to eliminate that input; by ensuring that two adjacent (including wrap-around) rows or columns satisfy
this property, a group that spans them will naturally translate into a term that eliminates the single, different
input that identifies them.
Step #1: extraction of prime implicants The first step is to produce a table, Table 2.16 for this example, that
we extend step-by-step: we
1. initialise the 0-th section by extracting each minterm from the truth table (i.e., each input assignment st.
the output is 1), then
2. process the i-th section to construct the (i + 1)-th section, iterating until no progress can be made.
0 1 2 4 8 5 10 11 15
0+1+4+5 ✓ ✓ ✓ ✓
0 + 2 + 8 + 10 ✓ ✓ ✓ ✓
10 + 11 ✓ ✓
11 + 15 ✓ ✓
Figure 2.17: Quine-McCluskey simplification, step #2: covering the prime implicants table.
Based on an input assignment represented as a tuple, in this case (z, y, x, w), we identify each minterm using
an integer: you can see the seven mintems extracted from Figure 2.15a at the top of Table 2.16. In the table,
each entry (i.e., each row) is called an implicant; they are assigned a group based on the number of elements in
associated tuple that equal 1. Consider section 0 for example. Implicant 0 represented by (0, 0, 0, 0) (st. w = 0,
x = 0, y = 0 and z = 0) is assigned group 0 because zero elements of the representation equal 1, In contrast,
implicant 5 represented by (1, 0, 1, 0) (st. w = 0, x = 1, y = 0 and z = 1) and implicant 10 represented by (0, 1, 0, 1)
(st. w = 0, x = 1, y = 0 and z = 1) are both assigned group 2 because two elements of their representations
equal 1.
Recall from our simplification using Karnaugh maps that we were able to apply a rule to implement both
minterms w ∧ ¬x ∧ y ∧ ¬z and w ∧ ¬x ∧ y ∧ z with a single, simpler expression w ∧ ¬x ∧ y because the value
of z is irrelevant. We use a similar approach here, using the i-th section to construct the (i + 1)-th section by
comparing members of the j-th and ( j + 1)-th groups in the former; our goal is to find pairs of implicants whose
representations differ in one element, and combine them together. We skip comparison of the j-th group and
groups other than the ( j + 1)-th, because by definition they cannot satisfy the criterion. As an example, consider
construction of the 1-st section from the 0-th section: we
• compare implicants 1, 2, 4 and 8 from group 1 with implicants 5 and 10 from group 2,
• compare implicants 5 and 10 from group 2 with implicant 11 from group 3, and
In the new section, we replace the differing element of paired implicants with ? to highlight the fact we don’t
care about that input: combining implicants 0 and 1 represented by the tuples (0, 0, 0, 0) and (1, 0, 0, 0), for
example, produces an implicant represented by (?, 0, 0, 0). Furthermore, each implicant from the i-th section
which is used to form an implicant in the (i + 1)-th section is marked with a ✓ next to it; implicants 0, 1, 2, 4 and
8 are thus marked due to the comparison between groups 0 and 1 and their use in forming implicants 0 + 1,
0 + 2, 0 + 4 and 0 + 8.
The process is iterated, constructing subsequent sections until we can no longer make progress, i.e., there are
no implicants that can be combined. Table 2.16 includes three sections, noting that section 2 has no implicants
that be combined and so is the last constructed. In addition, it illustrates the fact combination of implicants
in the i-th section may produce duplicates in the (i + 1)-th section: here, we can see (0, ?, 0, ?) and (?, 0, ?, 0) are
duplicated. Whenever this occurs, we ignore the duplicates and omit them from further comparisons.
Step #2: covering the prime implicants table Any unmarked implicants are termed prime implicants: these
form the focus of a second step whose task is to produce the SoP expression. The content of Table 2.16 includes
four prime implicants, namely
0+1+4+5 7→ (?, 0, ?, 0)
0 + 2 + 8 + 10 7→ (0, ?, 0, ?)
10 + 11 7→ (?, 1, 0, 1)
11 + 15 7→ (1, 1, ?, 1)
These are used to form a prime implicant table, as in Table 2.17: it lists the prime implicants along the left-hand
side and the original minterms along the top, and includes a ✓ character in every elements where a given prime
implicant includes a given minterm.
The goal now is to select a combination of the prime implicants which covers all of the original minterms.
For example, the implicant 0 + 1 + 4 + 5 covers the prime implicants 0, 1, 4 and 5; selecting this as well as
implicant 10 + 11 will cover 0, 1, 4, 5, 10 and 11. Before doing so, we can make our task easier by identifying
the set of essential prime implicants, i.e., those which are the only cover for a given minterm. We can see the
prime implicant 11 + 15 is such a case in Table 2.17, because it is the only way to cover minterm 15; as a result,
we must include it in our expression.
The process for coverage is fairly intuitive: we start with essential prime implicants, and then draw a line
through the associated row in the prime implicants table; when a line goes through a ✓, we also draw a line
through that column. The resulting lines show which minterms are currently covered by prime implicants we
have selected for inclusion in our SoP expression. For our example we
• draw a line through the row for implicant 11 + 15, and hence through the columns for implicants 11 and
15,
• draw a line through the row for implicant 0 + 1 + 4 + 5, and hence through the columns for implicants 0,
1, 4 and 5, and finally
• draw a line through the row for implicant 0 + 2 + 8 + 10, and hence through the columns for implicants
0, 2, 8 and 10.
The end result shows that by using prime implicants 0 + 1 + 4 + 5, 0 + 2 + 8 + 10, and 11 + 15, we can can cover
all the original minterms; we need not include prime implicant 0 + 1 + 4 + 5 for example, since minterms 0, 1, 4
and 5 are all covered elsewhere. Looking at the associated tuples, we have
0+1+4+5 7→ (0, ?, 0, ?)
0 + 2 + 8 + 10 7→ (?, 0, ?, 0)
11 + 15 7→ (1, ?, 1, 1)
if t = 0 { use ¬t
if t = 1 { use t
if t = ? { ignore
we form a term for each prime implicant listed and thus implement the SoP expression as
r = ( ¬w ∧ ¬y ) ∨
( ¬x ∧ ¬z ) ∨
( w ∧ y ∧ z )
• wire delay, which relates to the time taken for current to move through the conductive wire from one point to
another, and
• gate delay, which relates to the time taken for transistors in each gate to switch between connected and unconnected
states.
The latter is typically larger than the former, and both relate to the associated implementations: the latter relates to
properties of the transistors used, the former to properties of the wire (e.g., conductivity, length, and so on). x
Definition 2.21. The critical path through some combinatorial logic is the longest sequential sequence of delays (so wire
and/or gate delays) between the inputs and outputs.
Although such wire and gate delays are typically very small, when many gates are placed in series or when
wires are very long, the delays add up; the problem of managing the result is multiplied as the complexity of
combinatorial logic increases. The concept of wire delay is perhaps more intuitive than gate delay, so it make
sense to expand a little on the latter; the example below attempts to explain the cause.
Example 2.15. Consider Figure 2.18, which includes an idealised (left-hand side, in Figure 2.18a) and (more)
realistic (right-hand side, in Figure 2.18b) illustration of what happens when the input to a MOSFET-based
NOT gate, i.e., Figure 2.10a, switches.
The idea is to stress the fact that in the idealised case, there is an instantaneous change in the output voltage:
the plot representing the output is square-edged, changing (or swinging) from 5V (i.e., 1) to 0V (i.e., 0) the
instant that the input voltage changes from 0V (i.e., 0) to 5V (i.e., 1), or, more precisely, when it reaches the
threshold voltage. Note that the illustration includes output voltage levels above 0V and below 5V that represent
the threshold at which said output is interpreted as a 0 or 1, but since the change is instantaneous these are
irrelevant.
In contrast, the realistic case suggests a non-instantaneous change in the output voltage, i.e., it takes some
time. The characteristics of the now curved plot relate to properties of the transistors. However, the important
thing to realise is that the input voltage will take some time to change between 0V (i.e., 0) and 5V (i.e., 1), so
there is some delay in the output voltage changing from 5V (i.e., 1) and 0V (i.e., 0); this also suggests there is a
(short) period of time where the output voltage cannot be interpreted is either 0 or 1.
5 5
4 4
1 1
Output voltage
Output voltage
3 3
2 2
0 0
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
Input voltage Input voltage
(a) Idealised, square switching activity. (b) Realistic, curved switching activity.
Figure 2.18: An illustration of idealised and realistic switching activity wrt. a MOSFET-based NOT gate.
t0 x
t2
y
x t0
t1
r t2
t3
y
r
0ns
10ns
20ns
30ns
40ns
50ns
t1 t3
Figure 2.19: A behavioural waveform demonstrating the effects of propagation delay on an XOR implementation.
t
x r
Figure 2.20: A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour.
x
y
m target gates
Figure 2.21: A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive n target gates.
Although this property is often abstracted when illustrating the value of a wire in a waveform, meaning
transitions from 0 to 1, or vice versa, are square-edged, it can be captured with sloped-edges as shown in
x
y
Notice that x and y toggle between 0 and 1 in the same way, but transitions in the former (resp. latter) are
instantaneous (resp. take some time). Whether implicit or explicit, the gate delay property still exists, and has
an impact on evaluation of larger combinatorial designs:
Example 2.16. Consider Figure 2.19a, which shows the implementation of an XOR gate (using, so derived from
NOT, AND and OR gates). If we take a static approach to evaluating the output using the inputs, it is reasonable
that by setting x = 0 and y = 1 we get
x = = 0
y = = 1
t0 = ¬x = 1
t1 = ¬y = 0
t2 = t0 ∧ y = 1
t3 = t1 ∧ x = 0
r = t2 ∨ t3 = 1
However, this ignores the impact of delay on the evaluation process; if we take a dynamic approach and imagine
the delay of
3. an OR gate is 20ns,
this changes matters. Imagine we toggle the inputs from x = 0, y = 1 to x = 1, y = 1; immediately we introduce
time, in the sense we have introduced previous values of x and y rather than just current values. An illustration
of the gate behaviour is given in Figure 2.19b, however simplistic. The waveform starts when the gate is in the
correct state given the inputs x = 0, y = 1, after which the inputs are toggled to x = 1, y = 1 (at 0ns). Notice
that the the result is not valid immediately. In particular, we can examine points in the waveform and show
that the final and intermediate results are actually incorrect. For example, it takes 10ns before either NOT gate
produces the correct output on t0 and t1 ; the result r remains incorrect until 50ns; gate delay has caused a gap
between the inputs being toggled, and output being valid.
To conclude, it is important to stress the central role a critical path has: it is a limiting factor or bound on how
quickly some combinatorial logic computes outputs, i.e., it dictates the associated latency. That may not seem
important, but obviously we prefer an optimised design that has lower latency; this implies a design challenge,
in that we almost always want to minimise the critical path.
Example 2.17. Following the example above, consider Figure 2.19a: this XOR design has a critical path that
goes through a NOT gate, then an AND gate, and then an OR gate: the path has a total delay of 50ns. In a way,
this formalises what we found above: it took 50ns to get the correct output r from inputs x and y. However,
examining the critical path delivers this information with no evaluation; it basically tells us the design can
never compute outputs in less time, which of course might imply the system said design is placed in is further
limited as a result.
Definition 2.22. A glitch is normally defined to describe a (momentary) change wrt. some wire, which may cause a
(momentarily) invalid or incorrect output if used as an input to some gate; the cause is typically delay of some sort, e.g.,
a mismatch in when two gate inputs become valid.
Example 2.18. Consider Figure 2.20, wherein the two AND gate inputs are forced to be valid at different times
due to imbalanced delay: it clearly takes longer for the value of x to propagate through the NOT gate than
directly along the wire. The net result is that if we toggle x = 0 to x = 1 then back again, we produce a short
glitch, i.e.,
x
t
r
0ns
10ns
20ns
30ns
40ns
50ns
60ns
matching the NOT gate delay.
1. Although the functionality of a buffer is r = x, there is still some associated gate delay (roughly equivalent
to a NOT gate); it can thus be used to equalise the delay through different paths in some combinatorial
logic, and thus help solve the glitch problem outlined above. Within Figure 2.19a, for example, one can
imagine adding a buffer between y and the second input to the top AND gate; this would ensure that ¬x
and the buffered version of y arrive at the inputs to said gate at the same time.
2. Recall that the output of each MOSFET-based gate was formed by conditionally connecting Vdd or Vss to r;
the inputs, e.g., x and y, simply control which connection was made. This is important, because it implies
that even if the inputs are in some way “weak” then the output will be amplified, so equal to the “strong”
levels Vdd or Vss . A buffer can therefore be viewed a way to get r, an identical but amplified version of x.
Neither of these fact is particularly important within the remit of what we cover, but is is nonetheless important
to keep them in mind iff. you see buffer gates in designs elsewhere.
• The term fan-in is used to describe the number of inputs to a given gate.
• The term fan-out is used to describe the number of inputs (so in a rough sense the number of other gates) the
output of a given gate is connected to.
The former is easier to explain: it is just a way to formalise the fact that, wlog. a 2-input AND gate that
computes r = x ∧ y has fan-in of 2, whereas a 3-input AND gate that computes r = x ∧ y ∧ z has fan-in of 3.
A gate with higher fan-in will typically switch more slowly than a gate with lower fan-in; this stems from the
fact the larger number of inputs are processed using a more complex internal organisation of transistors.
The latter is still easy to explain, but harder to justify as important. The idea is that, ideally, we are free
to connect the output of a given source gate to the inputs of say m other target gates; in practice, however,
there is a limit on m. It stems from increased load on the source gate, and so longer propagation delay: it
basically takes longer for the driving voltage to meet the required threshold. In addition, a transistor is limited
wrt. the current driven through it before it will malfunction in some way; if the fan-out requires this to be
exceeded, then the under-supplied source gate will fail somehow. So, in a sense, fan-out is an intrinsic versus
extrinsic implication of propagation delay (where the latter simply delays computation in some sense, the
former disrupts it). For example, consider the contrived design in Figure 2.21: the source AND gate on the
left-hand side is used to drive m other target AND gates to the right. Unless the source gate drives enough
current onto its output, it may malfunction because the target gates will not receive enough of a share to operate
correctly. The implementation of each gate will be rated wrt. fan-out, which essentially say how many is too
many, i.e., the the number of target gates which can be safely connected to a source gate; CMOS-based gates
have quite a high fan-out rating, perhaps 100 target gates or more can be connected to a single source.
Suspend disbelieve for a moment and assume these cases could be of use in some way; hopefully it is obvious
that neither is likely to yield the outcome we want, or indeed can reason precisely about. In the first case, the
input is neither 0 or 1 so it is unclear what the output will be. Perhaps the only caveat to this is where one
input along can dictate the output; reconsider Figure 2.10b for example, which implements a NAND gate and
so computes r = x ∧ y. The truth table for NAND suggests if y = 0 the r = 1 irrespective of x: this reasoning is
validated by the implementation, since if y = 0 one P-MOSFET will always connect Vdd to r irrespective of the
other. This aside, however, so in general, if an input is not a Boolean value then it remains unlikely we get the
Boolean-like behaviour intended. In the same way, in the second case we basically join n outputs together: this
is more dangerous, because both drive current along the wire. The outcome depends on a number of factors,
but is, again, normally not a positive one wrt. the behaviour we want.
We can mitigate this issue by extending the idea of 2-state, Boolean logic values into 3-state logic. There
are two main ideas:
1. We introduce a new logic value, hence the name 3-state, called Z or high impedance; the easiest way to
think about this value is as representing a null, or disconnected value that can be safely “overpowered”
by any other value (i.e., 0 or 1).
2. We introduce a new logic gate, a so-called enable gate, which is essentially just a switch implemented
using a single transistor, i.e.,
en
x r
The associated truth table accommodates the high impedance value as follows:
Enable
x en r
0 0 Z
1 0 Z
Z 0 Z
0 1 0
1 1 1
Z 1 Z
0 Z Z
1 Z Z
Z Z Z
In combination, these steps allow us to cope with both cases above. The first case is now less of an issue: we
still might not get the behaviour we wanted, but at least we can reason about it. In the second case, we can use
the enable gate to allow conditional access to a shared wire: if en = 0 the output is Z so not driven, meaning
another driver could be safely connected to and use the same wire. However, when en = 1 the output is x;
nothing else should be driving a value along this wire or we are back to the situation which caused the original
problem.
• an unstable state if the output is unpredictable, e.g., either be 0, 1, a voltage level between the threshold for either,
or oscilate between the two somehow.
Definition 2.25. A meta-stable state is an unstable state, which, after some period of time, will resolve to some stable
state: the output eventually settles to either a 0 or 1 (i.e., become stable), but we cannot predict which or when.
Instances of instability typically stem from some form of logical inconsistency in a design, and, in the case of
meta-stability, are only ever resolved due to physical characteristics of the implementation (e.g., strength of
transistors).
MUX2 r
c x y r
x y
y c
r 0 0 ? 0
0 1 ? 1
1 ? 0 0
(a) The multiplexer
1 ? 1 1
as a symbol.
(b) The multiplexer as a truth table. c
r0
DEMUX2 x
r0
c x r1 r0
x 0 0 ? 0 r1
c r1
0 1 ? 1
1 0 0 ?
(d) The demulti-
1 1 1 ?
plexer as a symbol.
(e) The demultiplexer as a truth table. c
(f) The demultiplexer as a circuit.
Figure 2.22: An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells.
x r
which could be captured using the (logically inconsistent) expression x = r = ¬x. Clearly there is a problem,
because of x = 0 it should be 1 due to the NOT gate, and if x = 1 it should be 0; as a result, the output r will be
unstable and oscillate somehow (potentially at a rate that is related to the gate delay involved).
1. a multiplexer
• has m inputs,
• has 1 output,
• uses a (⌈log2 (m)⌉)-bit control signal input to choose which input is connected to the output,
x0 x
w
y0
r r0 x
r
y c
x y c
c
x1 x c0
y1
r r1
y c
x
y c
r r
c
x2 x
y2
r r2
y c c1
y x
c r
z y c
x3 x
y3
r r3
y c
c0
c
(a) A 2-input, 4-bit multiplexer. (b) A 4-input, 1-bit multiplexer.
Figure 2.23: Application of the isolated and cascaded replication design patterns.
while
2. a demultiplexer
• has 1 input,
• has m outputs,
• uses a (⌈log2 (m)⌉)-bit control signal input to choose which output is connected to the input,
noting that each the input and output is n-bit. We can describe how the components behave using C as an
analogy. For example, ignoring the number of bits in each input, output and control signal, the statement
switch ( c ) {
case 0 : r = w; break ;
case 1 : r = x; break ;
case 2 : r = y; break ;
case 3 : r = z; break ;
}
acts similarly to a 4-input multiplexer: depending on the control signal c, one of the inputs (i.e., w, x, y, or z) is
assigned to the output (i.e., r). Likewise,
switch ( c ) {
case 0 : r0 = x; break ;
case 1 : r1 = x; break ;
case 2 : r2 = x; break ;
case 3 : r3 = x; break ;
}
acts similarly to a 4-output demultiplexer: depending on the control signal c, one of the outputs (i.e., r0,
r1, r2, or r3) is assigned from the input (i.e., x). Although attractive, using such an analogy needs care. In
particular, keep in mind the C fragments include an implicit, discrete order wrt. the assignments. In contrast,
the component design means an analogous connection is evaluated in a continuous manner: whenever either
the control signal or any input changes, the output may change to match.
This behaviour stems from a design based on combinatorial logic, which is easy to develop for both
components; in a similar way to before, we write down a truth table that describes the behaviour we require,
then derive a Boolean expression to implement that behaviour:
Example 2.20. Consider the case of a 2-input (resp. 2-output), 1-bit multiplexer, a truth table for which is
outlined in Figure 2.22b. The idea is we have two 1-bit inputs x and y, and one 1-bit control signal c; we want
to drive r with either x or y depending on whether c = 0 or c = 1. The truth table should make sense in that
when c = 0 the output r matches x, and when c = 1 the output r matches y; the don’t care entries, and so truth
table as a whole, can be read as “if c = 0 then r = x irrespective of y, whereas if c = 1 then r = y irrespective of
x”. From the truth table, we can arrive at the expression
r = ( ¬c ∧ x ) ∨
( c ∧ y )
which is shown diagrammatically in Figure 2.22c.
Example 2.21. Consider the case of a 2-input (resp. 2-output), 1-bit demultiplexer, a truth table for which is
outlined in Figure 2.22e. The idea is we have two 1-bit outputs r0 and r1 , and one 1-bit control signal c; we
want to drive either r0 or r1 with x depending on whether c = 0 or c = 1. The truth table should make sense in
that when c = 0 the output r0 matches x, and when c = 1 the output r1 matches x; the don’t care entries, and so
truth table as a whole, can be read as “if c = 0 then r0 = x and r1 is irrelevant, whereas if c = 1 then r1 = y and
r0 is irrelevant”. From the truth table, we can derive the expression
r0 = ¬c ∧ x
r1 = c∧x
shown diagrammatically in Figure 2.22f.
For more general m-input (resp. m-output), n-bit alternatives, we employ the design patterns outlined earlier
using the 2-input (resp. 2-output), 1-bit components as a starting point.
Example 2.22. Consider the task of designing a 2-input, n-bit multipliexer, wlog. taking n = 4 as an example.
Note that with m = 2 inputs, we need ⌈log2 (m)⌉ = 1 control signals: one of 21 = 2 possible input assignments is
used to select each input.
Figure 2.23a illustrates the design, which uses replication. The idea is simple: we use n separate 2-input,
1-bit multiplexers where the i-th instance accepts the i-th bit of each input x and y and produces the i-th bit of
the output r. Or, put another way, since each instance is controlled by the same c, they are all either selecting
some bit of x or of y to produce r.
Example 2.23. Consider the task of designing a m-input, 1-bit multipliexer, wlog. taking m = 4 as an example.
Note that with m = 4 inputs, we need ⌈log2 (m)⌉ = 2 control signals: one of 22 = 4 possible input assignments is
used to select each input.
One strategy would be to simply write down a larger truth table, i.e.,
MUX4
c1 c0 w x y z r
0 0 0 ? ? ? 0
0 0 1 ? ? ? 1
0 1 ? 0 ? ? 0
0 1 ? 1 ? ? 1
1 0 ? ? 0 ? 0
1 0 ? ? 1 ? 1
1 1 ? ? ? 0 0
1 1 ? ? ? 1 1
r = ( ¬c0 ∧ ¬c1 ∧ w ) ∨
( c0 ∧ ¬c1 ∧ x ) ∨
( ¬c0 ∧ c1 ∧ y ) ∨
( c0 ∧ c1 ∧ z )
This yields a reasonable result, but as the number of inputs grows the task becomes more difficult. An alternative
is to divide-and-conquer, using 2-input, 1-bit multiplexers to decompose the larger decision task into smaller
steps. Figure 2.23b illustrates the design, which uses a cascade. The first, left-most layer of multipliexers is
controlled by c0 : the top-most instance produces w if c0 = 0, or x if c0 = 1, whereas the bottom-most instance
produces y if c0 = 0, or z if c0 = 1. These outputs are fed into a second, right-most layer that uses c1 to select
appropriately: if c1 = 0 the output of the top-most multiplexer in the first layer is selected, whereas if c1 = 1
the output of the bottom-most multiplexer in the first layer is selected. The overall result r is the same as our
dedicated design above, but hopefully it is clear the cascaded design is conceptually a lot simpler.
Equal
x
x y r
r 0 0 1
y x
0 1 0 y r
(a) The equality 1 0 0
comparator as a 1 1 1 (c) The equality comparator as a circuit.
symbol.
(b) The equality comparator as a truth
table.
Less-Than
x
x y r
x
r 0 0 0
y
0 1 1
(d) The less than 1 0 0
y r
comparator as a 1 1 0
symbol. (f) The less than comparator as a circuit.
(e) The less than comparator as a truth
table.
i.e., an r̂ that represents the sum of x̂ and ŷ. The content of Chapter 3 does exactly this. As a means of support,
however, a more specific, lower-level first step considers a set of less complex 1-bit building block components:
although not so useful alone, they will act as building blocks within the more general alternatives.
Comparators In contrast to arithmetic proper, where we expect both inputs and output to be numbers, a
comparison compares numerical inputs thus produces a Boolean output. Various types of comparison are
useful, but it is enough to consider two in particular: the others are derived from these comparators, that deal
with 1-bit inputs.
1 if x = y
(
r=
0 otherwise
From the associated truth tables is shown in Figure 2.24b, we can derive the expression
r = ¬(x ⊕ y).
Example 2.25. Given 1-input inputs x and y, a less than comparator computes
1 if x < y
(
r=
0 otherwise
From the associated truth tables is shown in Figure 2.24e, we can derive the expression
r = ¬x ∧ y.
While fairly self explanatory, the truth tables may seem a little odd as a result of their dealing with 1-bit inputs.
However, reading through them row-wise should demonstrate their content is sane: using less than as an
example, consider than the truth table mirrors your intuition wrt. this comparison by stating that 0 is not less
than 0, 0 is less than 1, 1 is not less than 0, and, finally, 1 is not less than 1. Note that the equality comparator
design hints that an inequality comparator can be simpler still: inverting the expression, we find r = x ⊕ y
provides an inequality comparison (
1 if x , y
r=
0 otherwise
because, by definition, when x = y (i.e., x = 0 and y = 0 or x = 1 and y = 1) x ⊕ y = 0 and when x , y (i.e., x = 0
and y = 0 or x = 1 and y = 1) x ⊕ y = 1.
Half-Adder x
y s
co
x y co s
x 0 0 0 0
y s
0 1 0 1
(a) The half-adder as 1 0 0 1
a symbol. 1 1 1 0 co
(b) The half-adder as a truth table. (c) The half-adder as a circuit.
Full-Adder ci
ci x y co s x
y s
0 0 0 0 0
ci co
0 0 1 0 1
x 0 1 0 0 1
y s
0 1 1 1 0
(d) The full-adder as 1 0 0 0 1
a symbol. 1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
co
(e) The full-adder as a truth table. (f) The full-adder as a circuit.
Adders The simplest arithmetic operation, conceptually at least, is addition. There are two variants of a 1-bit
adder, instances of which will be sufficient to construct larger, n-bit adders later:
Example 2.26. Given 1-bit inputs x and y, a half-adder component computes a 1-bit sum s and carry-out co (i.e.,
the LSB and MSB of the 2-bit sum x + y + ci), as output. The corresponding truth table shown in Figure 2.25b
can be used to derive associated Boolean expressions
co = x∧y
s = x⊕y
co = (x ∧ y) ∨ (x ∧ ci) ∨ (y ∧ ci)
= (x ∧ y) ∨ ((x ⊕ y) ∧ ci)
s = x ⊕ y ⊕ ci
Note that the full-adder design is essentially two half-adders joined in a cascade: to accommodate the extra
carry-in the first instance computes t = x + y with the second one then computing s = t + ci. Also, note that the
co
x y
(a) An expanded half-adder, with XOR in terms of NOT, AND and OR.
co
x y
(b) An half-adder based on NAND gates only.
co
x y
(c) An half-adder based on NOR gates only.
Figure 2.26: Gate universality used to implement a NAND- and NOR-based half-adder. Note that the dashed boxes
in the NAND and NOR implementations (middle and bottom) are translations of the primitive gates within the more
natural description (top).
Boolean expressions listed for a full-adder effectively include two (equivalent) options for co. One reason to
prefer the second is that given we need to compute both co and s, it contains the shared term x ⊕ y which can
be capitalised on during optimisation.
As an aside, the half-adder represents a simple enough design to explore the idea of gate universality in (a
little) more detail:
Example 2.28. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26b
describes an equivalent using NAND gates only.
Example 2.29. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26c
describes an equivalent using NOR gates only.
Focusing on the NAND-based variant, for example, naive translation using identities (annotated using dashed
boxes, wrt. the original gates in the natural implementation) yields an implementation with 11 NAND gates.
As you might expect, we can improve this with some careful optimisation: capitalising on the equivalence
x ∧ (y ∧ y) ≡ x ∧ (x ∧ y),
we can write
s = x⊕y
= (¬x ∧ y) ∨ (x ∧ ¬y) (XOR identity)
= ¬(¬x ∧ y) ∧ ¬(x ∧ ¬y) (OR into NAND identity)
= (¬¬x ∨ ¬y) ∧ (¬x ∨ ¬¬y) (de Morgans)
= (x ∨ ¬y) ∧ (¬x ∨ y) (involution)
= (¬x ∧ ¬¬y) ∧ (¬¬x ∧ ¬y) (OR into NAND identity)
= (¬x ∧ y) ∧ (x ∧ ¬y) (involution)
= ((x ∧ x) ∧ y) ∧ (x ∧ (y ∧ y)) (NOT into NAND identity)
= ((x ∧ y) ∧ y) ∧ (x ∧ (x ∧ y))
which uses 4 NAND gate due to the shared term x ∧ y, which is also shared with
co = x∧y
= (x ∧ y) ∧ (x ∧ y)
meaning 5 NAND gates for the whole half-adder (which is roughly the same number of non-NAND gates
within the natural implementation). There is a more direct ways to manipulate the expression for s, but notice
that in the above a) steps 1 to 5 yield a result equivalent to Figure 2.26b, b) steps 6 to 7 eliminate any (obviously)
redundant NOT gates, and c) step 8 reorganises the gates to maximise shared logic (rather than eliminating any
gates outright). Although this is a specific example, these steps demonstrate a general strategy that often has a
counter-intuitive impact on any given design: correctly optimised, using NAND (or NOR) often yields a lower
(if any) increase in gate count vs. your expectation or an initial translation. Put another way, although they
can be harder to work with, they do not imply a less efficient result wrt. area (while also retaining advantages
such as regularity).
x Encoder Decoder x
where the encoder accepts the input x, and encodes it into an x′ then transmitted; the decoder receives x′ , and
decodes it so as to recover the original x. Phrased as such, both encoder and decoder are basically translating
between representations because x′ could be thought of as a different representation of x we normally term a
code word.
Definition 2.26. Modelling an encoder and decoder as two functions
we use the term n-to-m to describe either component where it has n inputs and m outputs:
Enc-4-to-2
x3 x2 x1 x0 x′1 x′0 x1 x00
0 0 0 1 0 0 x3
0 0 1 0 0 1
0 1 0 0 1 0 x2 x01
1 0 0 0 1 1
(b) The encoder as a circuit.
(a) The encoder as a truth table.
x3
x2
Dec-2-to-4
x′1 x′0 x3 x2 x1 x0 x1
0 0 0 0 0 1
0 1 0 0 1 0
1 0 0 1 0 0 x0
1 1 1 0 0 0
(c) The decoder as a truth table.
x01 x00
(d) The decoder as a circuit.
1. an n-to-m encoder translates an n-bit input value into an m-bit code word, and
2. an m-to-n decoder translates an m-bit code word into an n-bit output value.
Definition 2.27. If for every code word x we have HW(x) = 1, i.e., every possible code word has exactly one bit set to 1,
we call the associated encoder (resp. decoder) one-hot (or one-of-many).
Definition 2.28. A priority encoder is st. priority (or preference) is given to one input over another. This concept is
most obviously useful in a one-hot encoder, allowing it to cope gracefully with erroneous situations where HW(x) > 1:
the idea is that if xi = 1 and x j = 1, then priority is given to xi say (meaning the fact x j = 1 is ignored).
These formalisms hide various subtleties, most notably the fact that it only makes sense to discuss encoder
and decoders in context: both a) the encoding (resp. decoding) scheme and so structure of code words, and
b) parameterisation of said scheme (e.g., n and m), are totally domain-specific, meaning we cannot describe a
general encoder (resp. decoder) in a sensible manner.
• The n-to-m terminology suggests inputs (resp. outputs) drawn from sets of 2n (resp. 2m ) values. However,
it is clearly possible, and often useful for some code words to remain unused; as such, it can be useful to
relax the terminology this think of n-value and m-value sets instead. Note there is no strict requirement
that m > n, or vice versa.
• Normally we need to consider the encoder and decoder together, as their behaviour is related: we
normally expect
(Decode ◦ Encode)(x) = x,
i.e., Decode = Encode−1 . This fact implies that it is not always possible to describe a valid decoder (resp.
encoder) for a given encoder (resp. decoder): some functions have no inverse. That said, however, some
contexts do not need a decoder: the problem at hand may be st. the code word is useful as is, and the
corresponding x need not be recovered.
Example 2.30. Consider the task of taking n inputs, say xi for 0 ≤ i < n, and producing a unsigned integer x′
that determines which xi = 1 given that for all j , i, x j = 0. In other words, we want an encoder that takes x and
produces some x′ as a result; the task of taking x′ and recovering each xk for 0 ≤ k < n demands a corresponding
0 ci co ci co ci co ci co
x x x x
y s y s y s y s
0
r0
r1
rn−1
Figure 2.28: An incorrect counter design, using naive “looped” feedback.
decoder. This problem might be motivated by a need to control components: if we have n such components in
a system, the decoder could, for instance, be used to enable one of them at a time.
By setting n = 4 for example, the encoder (resp. decoder) will have four inputs x0 , x1 , x2 , and x3 ; this implies
x′ ∈ {0, 1, 2, 3} and hence m = 2, meaning two outputs x′0 and x′1 . Figure 2.27a and Figure 2.27c show truth tables
for the two components. For the encoder, we derive the Boolean expressions
x′0 = x1 ∨ x3
x′1 = x2 ∨ x3
Example 2.31. Using the previous example for motivation, imagine we break the rules and set both x1 = 1 and
x2 = 1: the encoder fails, producing
x′0 = x1 ∨ x3 = 1
x′1 = x2 ∨ x3 = 1
as the code word (incorrectly suggesting that x3 = 1). To address problems of this sort, we can employ a priority
encoder, giving x2 priority over x1 for example (or, more generally, every xi has priority over x j for i > j). To
capture this requirement, we rewrite the truth table as follows:
PriorityEnc-4-to-2
x3 x2 x1 x0 x′1 x′0
0 0 0 1 0 0
0 0 1 ? 0 1
0 1 ? ? 1 0
1 ? ? ? 1 1
Take the 2-nd row for example: although potentially x0 = 1 or x1 = 1, the output gives priority to x2 . That is,
provided that x2 = 1 and x3 = 0 (i.e., irrespective of x0 and x1 ) the output will be st. x′0 = 0 and x′1 = 1. The
associated Boolean expressions are updated accordingly to
2. we do not let the output of each full-adder settle before it is used again as an input: they are computed
continuously (because there is a loop, from x through the full-adder to r and so back to x).
Φ2
Φ1
δ1 δ2 δ3 δ4
(b) A 2-phase clock.
So despite the fact it intuitively functions as required, this design is far from ideal and, in fact, invalid. Perhaps
the only use it has is to illustrate some fundamental limitations of combinatorial logic. More specifically, we
cannot control when a component computes an output (since it does so continuously), nor have it remember
said output once produced. We need a different approach, which along with components used to support it, is
termed sequential logic: we need
• one or more components that remember what state they are in, and
2.3.1 Clocks
If we want to perform computation as a sequence of steps, we need to exert control over the components
involved: for example, it could be important to synchronise each component st. they all start (or stop)
computation at the same time. We use a special control signal to do this:
Definition 2.29. A clock signal is simply a digital signal that oscillates between 0 and 1 in a regular fashion.
Note that despite the terminology, in the context of digital logic a clock is somewhat analogous to a metronome:
rather than tracking the (wall-clock) time, for example, it simply produces a regular series of “ticks” (or features)
that are used for synchronise associated actions.
Clock features Since a clock signal is a digital signal, it shares features such as positive and negative edges
and levels as previously outlined within Chapter 1 and now by Figure 2.29a. That said, however, several
specific features are also important:
Definition 2.30. The interval between a given positive (resp. negative) edge and the next positive (resp. negative) edge
is termed a clock cycle. Additional terms you commonly encounter stem from this definition: for example, the clock
period is the time taken for a clock cycle to complete, while the clock frequency (or clock rate) is the number of clock
cycles completed in a unit of time (typically each second, and hence the inverse of the clock period).
Definition 2.31. The time the clock signal spends at positive and negative levels need not be equal; the term duty cycle
is used to describe the ratio between these times. A clock will typically have a duty cycle of a half, meaning the signal is
at a positive level (literally “doing its duty”) for the same time it is at a negative level, but clearly other ratios are valid.
These features are harnessed by a clocking strategy, which is a formal way of saying “a way of using the clock
signal”. For example, we might use a clock edge to trigger the start some computation, or a clock level to
enable or disable (e.g., reset) some computation.
Clock generation In a sense, any signal could be deemed a clock signal provided it satisfies the definition(s)
above. However, in practice there is set of distinguished clock signals generated by a) an external or b) an
internal clock source (or clock generator) component.
Example 2.32. An external clock source is commonly provided using a piezoelectric crystal. When a voltage
is applied across such a component, it will oscillate according to a natural frequency (related to the physical
characteristics of the crystal); roughly speaking, one can use the resulting electrical field generated by this
oscillation as a clock signal.
Definition 2.32. It can be useful to manipulate a given clock signal, in order to alter it wrt. features such as frequency; this
is common whenever the clock signal is provided as an input to a design, but the design has specific internal requirements.
In this context, the original and manipulated cases are sometimes termed the reference clock signal and derived clock
signal.
Increasing the frequency of, i.e., multiplying, a reference clock is possible but somewhat beyond our scope;
dedicated designs exist to solve this problem, but we omit a detailed overview. Decreasing the frequency of,
i.e., dividing, a reference clock is much easier. Imagine that each positive edge of the reference clock clk causes
a counter c to be incremented: assuming c = 0 initially, the individual bits of c can be visualised as
clk
c0
c1
c2
c=0
c=1
c=2
c=3
c=4
c=5
Notice that each successive bit of c models clk, but with a period that is twice as long: formally, the (i − 1)-th
bit of the counter c acts like clk divided by 2i . This means, for example, that if i = 1 we can extract a clock signal
with 21i = 211 = 12 times (i.e., half) the frequency via the 0-th bit of c.
Clock distribution
Definition 2.33. As with the power rails, a given clock signal must be distributed (or supplied) to each component that
makes use of it; a clock distribution network is tasked with doing so.
Definition 2.34. The term clock skew describes a phenomena whereby a clock signal arrives at one component along a
different path to another, and so at a different time; this suggests the two components are unsynchronised.
Example 2.33. Example clock distribution network topologies include the H-tree, which is a form of space
filling curve. The advantage of a H-tree is that wire delay from the clock generator to each target component,
is uniform: this helps minimise the potential for clock skew.
Definition 2.35. The term clock domain defines the influence of control exerted by a specific clock signal; every
component in a given clock domain is controlled by the same clock signal.
It is hard(er) to reason about the relationship between the features in different clock signals that imply different
clock domains. This means, for example, that a) synchronising, and/or b) communicating values between
components in two, different clock domains is harder than if the same components are in the one, single clock
domain: intuitively, for example, it is hard to tell when positive edges on said clocks may occur at the same
time (and so synchronise the components, say). As a result, points of interaction between (i.e., at the boundary
of) clock domains (e.g., so-called clock domain crossings) demand careful attention.
From 1-phase to n-phase clocks Although it is easiest to think of a single clock signal, as illustrated in
Figure 2.29a, more complicated arrangements are both possible and useful. A central example is the concept
of an n-phase clock, which sees the clock distributed as n separate signals along n separate wires.
A common4 instance of this general concept is 2-phase clock: the idea is that the clock is represented by
two signals, often labelled Φ1 and Φ2 . Figure 2.29b shows how the signals behave relative to each other. Note
that features within a 1-phase clock, e.g., the clock period, levels and edges, translate naturally to both Φ1 and
Φ2 . However, notice the additionally guarantee which means their positive levels are non-overlapping: while
Φ1 is at a positive level, Φ2 is always at a negative level and vice versa. This behaviour is controlled by four
parameters
4 Based admittedly on limited experience, it seems that relatively few textbooks cover both 1- and 2-phase clocking strategies: in some
ways this is a pity, since the use of 2-phase clocks is certainly simpler given the requirement for latches rather than flip-flops. If you want
an alternative overview, then [11, Section 5] offers an example.
Adjusting these parameters will shorten or elongate the period of Φ1 and/or Φ2 , or the “gaps” between them,
but the central principle of their being non-overlapping is maintained.
1. an astable, where the component is not stable in either state and flips uncontrolled between states,
2. a monostable, where the component is stable in one states and flips uncontrolled but periodically between states,
or
3. a bistable, where the component is stable in two states and flips between states under control of an input.
The third class or bistables is often the most useful, and our focus here, since it has the most useful behaviour.
Definition 2.37. Given a suitable bistable component controlled using an enable signal en that determines when updates
happen, we say it can be
The former type is typically termed a latch, with the latter termed a flip-flop.
Latches are sometimes described as transparent: this term refers to the fact that while enabled, their input and
output will match since the state (which matches the output) is being updated with the input. This is not the
case with flip-flops, because their state is only updated at the exact instant of an edge.
Definition 2.38. Whether a positive or negative level (resp. edge) of some signal controls the component depends on
whether it is active high or active low; a signal en is often written ¬en to denote the latter case.
Definition 2.39. It is common for a given latch or flip-flop design to include additional control signals; an important
example is a reset signal rst, that is often included to allow (re)initialisation of a design.
Definition 2.40. When used as a verb rather than a noun (cf. logic gate), gate means to conditionally turn off some
component or feature.
Example 2.34. Consider a component whose 1-bit input x is gated by AND’ing it with a control signal g: the
input provided to the component is
x′ = g ∧ x.
We say g gates x because if g = 0 then x′ = g ∧ x = 0 ∧ x = 0: whatever the value of x, the component gets
x′ = 0 as input if g = 0, hence x has been “turned off” by g. In contrast, if g = 1 then x′ = g ∧ x = 1 ∧ x = x: the
component gets x′ = x as normal if g = 1.
Our description of such components has so far been very abstract; the goal in what follows is to remedy this
situation. First, we describe the high-level design and behaviour of some latch and flip-flip components. Then
we show how this behaviour can be realised, using a lower-level design expressed in terms of logic gates. In
combination, we focus specifically on the goal of developing an n-bit register based on D-type latches and/or
flip-flops.
High-level descriptions of behaviour There are four common, concrete instantiations of the somewhat
abstract components described above. That is, we usually rely on four common latch and flip-flop types:
1. An SR-type latch (resp. SR-type flip-flop) component has two inputs S (or set) and R (or reset):
SR-Latch/SR-FlipFlop
Current Next
S R Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 ? ? ? ?
2. A D-type latch or “data latch” (resp. D-type flip-flop) component has one input D:
D-Latch/D-FlipFlop
Current Next
D Q ¬Q Q′ ¬Q′
0 ? ? 0 1
1 ? ? 1 0
3. A JK-type latch (resp. JK-type flip-flop) component has two inputs J (or set) and K (or reset):
D Q D Q
en en
¬Q ¬Q
Figure 2.30: Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation around en).
JK-Latch/JK-FlipFlop
Current Next
J K Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 0 1 1 0
1 1 1 0 0 1
4. A T-type latch or “toggle latch” (resp. T-type flip-flop) component has one input T:
T-Latch/T-FlipFlop
Current Next
T Q ¬Q Q′ ¬Q′
0 0 1 0 1
0 1 0 1 0
1 0 1 1 0
1 1 0 0 1
It is useful to look in more detail at the D-type component, since this will help explain the basic concepts. The
component has
• two outputs, Q and ¬Q; we can ignore ¬Q usually, but note that it should always be the inverse of Q.
The truth table that describes the behaviour is split into two halves, which is unlike what we have seen
previously: the left-hand half is a description of the current state, the right-hand a description of the next state,
i.e., after we perform an update. Sometimes this is termed an excitation table to distinguish it from a standard
truth table. So, for example, the first row can be read as “if D = 0, then no matter what the current state is then
the next state should be Q = 0”, and the second row can be read as “if D = 1, then no matter what the current
state is then the next state should be Q = 1”. Put another way, this component works as required: we can either
update it to store D when enabled, or operate it in storage mode to retain Q otherwise.
Armed with this knowledge, we can already think about using such components in our designs: we
expand on their internal design in the following Sections, but can already use more abstract symbols shown
in Figure 2.30 to differentiate between the latch and flip-flop versions. Similar symbols describe components
other than the D-type one we have focused on. They typically retain the the triangle annotation (or absence
thereof) on en, and commonly omit any unused outputs (e.g., ¬Q).
Low(er)-level descriptions of behaviour Still focusing on the D-type component, lower-level use can be
illustrated using a timing diagram, which shows the behaviour of the enable signal en (which we assume is
active high), the input D and the output Q. For a D-type latch we have something like the following:
en
Q
D
t0 t1 t2 t3 t4 t5
The vertical dashed lines highlight important points in time; between t1 and t2 , and t3 and t4 for instance, en is
at a positive level so the latch state is updated to match D. Otherwise, for example between t0 and t1 , en is at a
negative level so changes to D do not effect the latch state: the latch is in storage mode, meaning it retains the
current state. Swapping to a D-type flip-flop, the behaviour changes:
en
Q
D
t0 t1 t2
Now the flip-flop state will be updated to match D only at the points in time where en transitions from 0 to 1;
this happens at t0 , t1 and t2 , meaning interim changes to D have no effect on the flip-flop state.
Definition 2.41. Using a component of this type is more difficult in practice than alluded to by these examples. Although
we largely ignore them from here on, the following are important:
1. The setup time (resp. hold time) is the minimum period of time that D must be stable before (resp. after) use to
update the component.
Think of the clock feature (either level or edge) as triggering the act of sampling from D in order to update the state.
As such, the two timing restrictions mentioned make sure the sample is reliable: they specify a window, around the
change to en, where D has to be stable for some period of time.
2. The clock-to-output time is an artefact of propagation delay: a delay will exist between the update event being
triggered by the associated clock feature, and the output Q changing to match.
These time periods or delays will be determined by the implementation of the component; ideally they will be minimised,
which makes the component easier to use (i.e., more tolerant).
Example 2.35. The concepts of setup, hold, and clock-to-output time are illustrated in the following (intention-
ally exaggerated) waveform relating to a D-type (edge triggered) flip-flop:
setup hold
time time
en
Q
D
clock-to-output
time
S ¬Q S Q
R Q R ¬Q
S ¬Q S Q
en en
Q ¬Q
R R
(c) An NOR-based SR type latch with enable signal. (d) An NAND-based SR type latch with enable signal.
D S0 D S0
¬Q Q
en en
Q ¬Q
R0 R0
(e) An NOR-based SR type latch with enable signal and (f) An NAND-based SR type latch with enable signal and
R = ¬S. R = ¬S.
Figure 2.31: A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated (middle
and bottom) control features.
1 0 0 1
S ¬Q S ¬Q
R Q R Q
0 1 1 0
0 1 0 0
S ¬Q S ¬Q
R Q R Q
0 0 0 1
1 0
S ¬Q
R Q
1 0
Figure 2.32: A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases for S = 0
and R = 0, and no sane cases for when S = 1 and R = 1.
As an aside, we can construct (more or less) the same component using NAND rather than NOR gates; the
NAND-based versions are shown alongside each of the associate NOR-based Figures. This change implies
a subtle difference in behaviour however. Essentially, the storage and meta-stable states are swapped over:
when enabled and
In addition, the Q and ¬Q outputs from the component swap over as well. In short, the NAND-based version
still achieves the same goal, but we need to carefully translate the behaviour when using it within a larger
design. It is often termed an SR latch rather than SR latch to highlight this fact, which we adopt to avoid
confusion about which type of component is meant.
The first step is somewhat counter-intuitive. We start by looking at the Set-Reset or SR latch: the circuit
shown in Figure 2.31a has two inputs called S and R which are the set and reset signals, and two outputs Q
and ¬Q. Internally, the arrangement will probably seem odd in comparison to other designs we have seen so
far: the outputs of each NOR gate is wired to the input of the other, an arrangement we say is cross-coupled.
Understanding the behaviour of the design as a whole depends on a property of the NOR gates. Recall
(e.g., from Figure 2.13) that we can describe NOR using a truth table as follows:
NOR
x y r
0 0 1
0 1 0
1 0 0
1 1 0
In particular, this illustrate the fact that if either x = 1 or y = 1 then the result must be r = x ∨ y = 0. Put another
way, we can write two axioms
x ∨1 = 0
1 ∨y = 0
These are important, because they allow us to resolve the loop introduced by the cross-coupled nature of NOR
gates in this design. We can see how, on a case-by-case basis, by observing output for each possible assignment:
this is shown in Figure 2.32.
• if S = 1, R = 0 (Figure 2.32a) then we force Q = 1, ¬Q = 0 (irrespective of what they were previously)
because the top NOR gate must output 0 because we know 1 ∨ y = 0,
• if S = 0, R = 1 (Figure 2.32b) then we force Q = 0, ¬Q = 1 (irrespective of what they were previously)
because the bottom NOR gate must output 0 because we know x ∨ 1 = 0,
• if S = 0, R = 0 then the outputs are not uniquely defined by the inputs: there are in fact two logically
consistent possibilities (Figure 2.32c and Figure 2.32d), namely Q = 1, ¬Q = 0 or Q = 0, ¬Q = 1,
• if S = 1, R = 1 (Figure 2.32e) then we force Q = 0, ¬Q = 0: in a sense this is contradictory, because we
expect each to be the inverse of the other, but hints at another problem.
In the final case, the latch could be (and we have) described as being in a meta-stable state because the eventual
output is not predictable. An intuitive reading is that it makes no sense to both set and reset the value, so some
form of unexpected behaviour for S = R = 1 is therefore not unreasonable. More specifically though, once we
return to S = 0, R = 0 the latch must settle in one or other of the two possibilities outlined above: we cannot
predict which one, however, so the subsequent state of the latch is essentially random.
Note that in terms of the specified behaviour, the design does what we want. For example, we can set
or reset the current state (per Figure 2.32a and Figure 2.32b) or retain the current state (per Figure 2.32c and
Figure 2.32d) as need be. However, this high-level description avoids two perfectly reasonable questions,
namely
Vdd
S t0
r1 t1
r0
S r1
t2 t3
Vss
Vdd
R t4
r0 t5
r1
R r0
t6 t7
Vss
Figure 2.33: An annotated SR latch, decomposed into two NOR gates and then into transistors; r0 , the output of the top
NOR gate, is used as input by the bottom NOR gate and r1 , the output from the bottom NOR gate, is used as input by
the top NOR gate (although the physical connections are not drawn).
1. how does the latch settle into any state, particularly given the case where S = R = 0 seems to imply there
are two options, and
Up to a point, it is reasonable to consider that if it the latch settles into one of the two logically consistent states,
there is just no motivation for it to subsequently change into the other; therefore, it will retains the same state.
To provide greater detail, however, we rely on Figure 2.33. The idea is it decomposes the SR latch design into
eight individual transistors (labelled t0 through to t7 ) which implement the two NOR gates; this annotation is
important because it allows a clear explanation of their behaviour.
Question #1: how does the latch settle into a state? You can use a similar reasoning for all four cases, but
focus on S = 0 and R = 1 which mean
This means r1 = 0 because t6 is connected and t4 is disconnected. Now we can see that
This means r0 = 1 because t0 and t1 are connected, while t2 and t3 are disconnected. Finally, we can check for
consistency, noting
This means r1 = 0 because t6 and t7 are connected, while t4 and t5 are disconnected: we knew that anyway. So,
in short, the circuit settles into a stable state even though it might seem the “loop” would prevent it doing so,
and is valid in the sense that r0 and r1 (i.e., Q and ¬Q) are each others inverse as expected.
Question #2: how does the latch remain in a state? Now imagine we flip to S = R = 0, meaning we would
like to retain the state fixed above, i.e., keep Q = 0 until we want to update it again. Two transistors change as
a result of R changing
• t5 is a P-MOSFET, so is still disconnected since r0 = 1, meaning that t4 being connected does not connect
r1 to Vdd , and
• t7 is an N-MOSFET, so is still connected since r0 = 1, meaning that t6 being disconnected does not
disconnect r1 from Vss .
That is, there is no motivation (or physical stimulus) for the transistors to flip into into the other stable state
(i.e., where S = R = 0 and Q = 1) and so the current state is therefore retained.
1. To control when an update happens, we gate S and R by adding an extra input en and two AND gates:
the internal latch inputs become
S′ = S ∧ en
R′ = R ∧ en
When en = 0, S and R are irrelevant: S′ and R′ will always be 0 because, for example, S′ = S ∧ 0 = 0. This
means when en = 0 the latch can never be updated. When en = 1 however, S and R are passed through
into the latch as input because S′ = S ∧ 1 = S.
Put another way, the result shown in Figure 2.31c is now clearly level-triggered because S and R only
matter during a positive level of en. Note that although en can be considered a generic enable signal, we
can use a clock signal to provoke regular, synchronised updates.
2. To avoid the situation where S = R = 1, we simply force R = ¬S by inserting a NOT gate between them
to disallow the case where S = R; Figure 2.31e shows the result, where the single input is now labelled D.
By following the above, the latch inputs become
S′ = D ∧ en
R′ = ¬D ∧ en
This might seem to imply that we cannot put the latch into storage mode any longer. However, remember
that when en = 0 we always have S′ = R′ = 0 irrespective of D, so en basically decides if we retain Q (if
en = 0) or update it with D (if en = 1).
The result now represents the D-type latch component discussed originally. Reiterating, when enabled and
but when not enabled, the component is in storage mode and retains Q.
D ¬Q
en
D Q
en
¬Q
Figure 2.35: A NOR-based D-type flip-flop created using a primary-secondary organisation of latches.
Using a glitch generator One approach is to construct a circuit that will intentionally generates a glitch (or
pulse), i.e., an output that whose value will be 1 for a short period of time, namely when en transitions from
0 to 1. The glitch then approximates an edge, even though we are still actually using a level; doing so can be
rationalised by noting that as long as the glitch period is short, it will give us finer grained control than the
original latch.
Example 2.36. Reconsider Figure 2.20, whereby a glitch is is generated (in that case unintentionally) for a (short)
period of time when en = 1 and t = 1. We can drive the original D-type latch using such a design: Figure 2.34
illustrates the result, which now approximates a flip-flop due to the approximation of edge-based triggering.
1. while en = 1, i.e., during the first half-cycle, the primary latch is enabled,
2. while en = 0, i.e., during the second half-cycle, the secondary latch is enabled.
In practical terms, this means while en = 1, i.e., during a positive level on en, the primary latch stores the input.
Then, the instant en = 0, i.e., at a negative edge on en, the secondary latch stores the output of the primary
latch: you can think of it as triggering a transfer from the primary to the secondary latch, or as the secondary
latch only being sensitive to the output of the primary latch rather than the input. The fact that the transfer is
5
Historically, the terms master and slave have often been used in place of primary and secondary. Per [3, Section 1.1], however, and
despite some debate, the former are typically viewed as inappropriate now. We deliberately use the latter, therefore, noting that doing so
may imply a need to translate the former when aligning with other literature.
Q0n−1
Qn−1
Q00
Q0
Q01
Q1
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q
en
(a) A level-triggered register based on D-type latches.
Q0n−1
Qn−1
Q00
Q0
Q01
Q1
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q
en
(b) An edge-triggered register based on D-type flip-flops.
Figure 2.36: An n-bit register, with n replicated 1-bit components synchronised using the same enable signal.
instantaneous, in the sense it occurs as the result of an (in this case a negative) edge when en flips from 1 to 0,
means we get what we want, i.e., an edge-triggered flip-flop.
Example 2.37. Figure 2.37a represents a solution based on use of flip-flops, which implies a 1-phase clocking
strategy. The top-half of the Figure shows an n-bit ripple-carry adder; the idea is that it computes r′ ← r + 1.
This part is roughly the same as the initial, faulty solution. The bottom-half of the Figure shows an n-bit,
edge-triggered register; the idea is that it stores the current value of r. Beyond this, two features of the design
are vitally important:
1. Notice that the 1-bit sum produced as output by each full-adder is AND’ed with ¬rst. This acts as a
limited reset mechanism, in the sense rst gates the output register input (resp. adder output): if rst = 1
(so ¬rst = 0) then the register input will always be zero, whereas if rst = 0 (so ¬rst = 1) then the register
input will match the adder output. Put another way, if rst = 1 then the value subsequently latched by the
input flip-flop is forced to be zero: this is important, because when powered-on the current value will be
undefined and hence unusable.
2. Notice that each D-type flip-flop in the register is synchronised by clk (which we assume is a clock):
positive edges on clk provoke them to update the stored value r with r′ ← r + 1.
The original loop is broken, because the update is instantaneous not continuous: there is a “gap” between
computing and storing values, in the sense that the adder has an entire clock cycle to compute the result
r + 1 given r is stored in the flip-flops. Provided that that the propagation delay associated with the adder
is less than the clock period (i.e., we do not update r faster than r′ is computed) the problem is solved
and r cycles through the required values in discrete steps controlled by the clock.
rst
0 ci co ci co ci co ci co
x x x x
y s y s y s y s
1
0
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q
Φ1
r0
r1
rn−1
(a) Using a 1-phase clock and flip-flop based register(s).
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q
Φ2
rst
0 ci co ci co ci co ci co
x x x x
y s y s y s y s
1
D Q D Q D Q D Q
en en en en
¬Q ¬Q ¬Q ¬Q
Φ1
r0
r1
rn−1
computes r + 1
adder
computes r + 1
adder
···
···
r←0
flip-flops reset
r←r+1
flip-flops update
r←r+1
flip-flops update
store r ← r0
input latches
store r ← r0
input latches
···
Φ1
···
r+1
computes
adder
r+1
computes
adder
Φ2
···
store r0 ← r + 1
output latches
store r0 ← r + 1
output latches
Figure 2.38: Two illustrative waveforms, outlining stages of computation within the associated counter design.
Input Output
Combinatorial logic
Example 2.38. Figure 2.37a represents a solution based on use of latches, which implies a 2-phase clocking
strategy. A reasonable question to ask is why we cannot just replace the flip-flops with latches? Imagine we did
this: since the latches are level-triggered, they will be updated when clk = 1. So one one hand we have broken
the original loop, but on the other hand the loop is still there when clk = 1 because the latches are essentially
transparent.
To resolve this the design uses two sets of latches, one to store the adder input and one to store the adder
output. Only one set is enabled at a time, because we use a 2-phase clock to control them; when Φ2 = 1 the
output latches store the adder output, then when Φ1 = 1 the input latches store whatever the output latches
stored and subsequently provide a new input to the adder. Clearly we need more storage components to
support this approach, but you can think of this as a trade-off wrt. reduced complexity of latches versus
flip-flops. Put another way, the design might be less efficient in terms of area but is much easier to reason
about.
Figure 2.39 generalises the two counter solutions in the previous Section; you can think of both as general
frameworks, or architectures, that can be filled-in with concrete details to realise the solution to a specific
problem. These can be generalised a little further by noting the following:
2. a control-path, that tells components in the data-path what to do and when to do it.
For example, within the two counter solutions we clearly have computational (i.e., the adder) and storage
components (i.e., the register), and also mechanisms to control them (i.e., the reset AND gates).
1. try to apply various low-level optimisations with the goal of reducing the critical path of X, or
2. apply the higher-level technique of pipelining, restructuring X as investigated by the rest of this Section.
Car #1
Step #1
Car #1
Step #2
Car #1
Step #3
Car #1
Step #4
Car #1 Car #2
#1 Car #3
#1 Car #4
#1
Step #1
Car #1 Car #2
#1 Car #3
#1
Step #2
Car #1 Car #2
#1
Step #3
Car #1
Step #4
cars with each other, i.e., producing more than one at a time, in parallel. We can measure the efficiency of the
production lines #1 and #2 using two metrics, the first of which probably seems more natural:
Definition 2.43. The latency is the total time elapsed before a given input is operated on to produce an output; this is
simply the sum of the latencies of each stage.
Definition 2.44. The throughput (or bandwidth) is the rate at which new inputs can be supplied (resp. outputs
collected).
The point is that although the latency associated with one car is not changed (it takes 4 time units to produce
a car in both production lines), the throughput is: in production line #2 we produce a new car every time unit
(once the production line is full), whereas we only produce one every 4 time units in #1. In a sense this is an
obvious byproduct of the fact that in production line number #1 some of the stages are idle at any given time,
but in number #2 they are all active eventually.
If we generalise, an n-stage production line will ideally give us an n-fold improvement in throughput.
However, there are some caveats:
• The maximum improvement comes only when we can keep the production line full of work: if the first
stage does not start because there is a lack of demand, the production line as a whole is utilised less
efficiently.
• If we cannot advance the production line for some reason (perhaps one stage is delayed by a lack of
parts), we say it has stalled; this also reduces utilisation.
• The speed at which the production line can be advanced is limited by the slowest stage; to minimise
idle time, balance is needed between the workload of stages. That is, if there is one stage that takes
significantly longer than the rest (e.g., it involves some relatively time consuming task), it will hold up
the rest.
• Usually a production line will not be perfect: moving the result of one stage to the next will take some
time, so there is some (perhaps small) overhead associated with all stages. This overhead typically reduces
efficiency; minimising it means we can get closer to the ideal n-fold improvement.
x X r
300ns
x X0 X1 r
200ns 100ns
x X0 X1 r
100ns 200ns
x X0 X1 X2 r
100ns 100ns 100ns
Figure 2.42: Four different ways to split a (hypothetical) component X into stages.
The next problem is how we control the pipeline so it does what we want, at the right time, to produce the
right outputs from the right inputs. Consider Figure 2.43a, which outlines some generic pipeline stages (whose
behaviour is irrelevant). There are two key problems:
1. Fundamentally, the stages cannot operate on different inputs if there is nowhere to store those inputs: if
we supply a new input to X0 at each step, where does the first input go once the first step is finished? It
should be processed by X1 , but instead it will vanish when replaced with the input for the second step.
2. Imagine in the j-th clock cycle the i-th stage Xi computes a partial result ti required by the (i + 1)-th stage.
If the stages are connected by a wire, as soon as the i-th stage changes ti this potentially disrupts what the
(i + 1)-th stage is doing.
So instead, we connect the stages with by pipeline registers, say Ri . This means the (i + 1)-th stage can have
a separate, stable input which only changes when the register latches a new value, i.e., when the pipeline
advances. However, each pipeline register takes time to operate, and so adds to the total latency.
Figure 2.43c outlines the new structure, which resolves both problems above. The structure is controlled
by adv, shown here as a single global signal that advances all stages at the same time by having the output of
each Xi stored in Ri and hence used subsequently as input to Xi+1 . Figure 2.44 gives a high-level overview of
progression through the pipeline, controlled by positive edges on adv.
The implication of this structure is that we need to take more care wrt. how we split X into stages.
Specifically, more pipeline registers means larger overall latency; as a result, we cannot simply split X into as
many stages as we need to have them balanced. Rather, we must make a trade-off between increased latency (as
the result of some pipeline registers) and increased throughput (as the result of the pipelined design overall).
A synchronous pipeline is a term used to describe a pipeline structure where all stages are globally synchro-
nised, controlled using a single global signal adv which you can think of as a clock; to re-enforce this fact, the
period between advances is often termed a pipeline cycle.
In an asynchronous pipeline the aim is to remove the need for global control over when the pipeline
advances, and hence remove the need for a global clock. Roughly speaking, control is devolved into the
pipeline stages themselves: for one stage to advance, it must engage in a simple handshake with the preceding
and subsequent stages to agree when to advance. More formally each Xi controls advi , the local signal that
determines when it advances, by communicating with Xi−1 and Xi+1 .
This is advantageous in that stages can operate as fast or slow as their workload, rather than a global clock,
dictates: the asynchronous pipeline can advance whenever the result is ready rather than being pessimistic
and forcing advancement at the rate of the slowest stage. However, although the global clock is removed one
potential disadvantage of this approach is overhead in provision of the handshake mechanism that has to exist
between stages; clearly this can become quite complex depending on the pipeline structure.
Great. But what use is this? The point is, we can relate this abstract example to a concrete component which
acts as motivation for why such an improvement is worthwhile.
Example 2.40. Consider a component that performs the logical left-shift of some 8-bit vector x by a distance of
y ∈ {0, 1, . . . , 7} bits. There are a variety of approaches to designing a circuit with the required behaviour, but
one of the simplest is a combinatorial, logarithmic shifter. We will look at the design in detail in Chapter 3, but
the idea is illustrated by Figure 2.46a. In short, the result is computed using three steps: each step produces
an intermediate result by either shifting an intermediate input by some fixed distance (the i-th stage shifts by
2i bits), or simply passing it through unaltered. For example, if we select y = 6(10) = 110(2) then
1. since y0 = 0, the 0-th stage passes the input x through unaltered to form the intermediate result x′ , then
2. since y1 = 1, the 1-st stage shifts the intermediate input x′ by a distance of 21 = 2 bits to form the
intermediate result x′′ , then
3. since y2 = 1, the 2-nd stage shifts the intermediate input x′′ by a distance of 22 = 4 bits to form the result r
i-th (i + 1)-th
stage stage
ti ti+1 ti+2
ti ti+1 ti+2
adv
(b) Option #2: with pipeline registers and a global control signal.
ti ti+1 ti+2
Figure 2.43: A problematic pipeline, and a solution involving the use of pipeline registers and a control signal to indicate
when each stage should advance.
processor we want to use, our program might be compiled in different ways and produce different executable
forms.
In a rough sense, the same process applies to circuits: once we have a description of behaviour, we need to
actually realise the corresponding components (i.e., logic gates or transistors) so that we can use them. There
are various ways to achieve this, which depend on the underlying technology used: using semi-conductors to
construct transistors is not the only option. Although the topic is somewhat beyond the scope of this book, it
is useful to understand some approaches and technologies involved: at very least, it acts to connect theoretical
concepts with their practical realisation.
stored in Ri
from ti
computes ti+1
each Xi
stored in Ri
from ti
computes ti+1
each Xi
···
···
computed by Xi
stores ti+1
each Ri+1
computed by Xi
stores ti+1
each Ri+1
computed by Xi
stores ti+1
each Ri+1
Figure 2.44: An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock.
x X R r
300ns 20ns
x X0 R1 X1 R2 X2 R3 r
100ns 20ns 100ns 20ns 100ns 20ns
Figure 2.45: An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative.
For semi-conductors the analogous process is photolithography, and involves very similar steps which are
illustrated by Figure 2.48. We again start (Figure 2.48a) with a substrate, which is usually a wafer of silicon;
this is often circular by virtue of machining it from a synthetic ingot, or boule, of very pure silicon. After being
cut into shape, the wafer is polished to produce a surface suitable for the next stage. We can now coat it with
a layer of base material we wish to work with (Figure 2.48b), for example a doped silicon or metal. Then we
coat the whole thing with a photosensitive chemical, usually called a photo-resist (Figure 2.48c). Two types
exist, a positive one which hardens when hidden from light and a negative one which hardens when exposed
to light. By projecting a mask of the circuit onto the result (Figure 2.48d), one can harden the photo-resist so
that only the required areas are covered with a hardened covering. After baking the result to fix the hardened
photo-resist, and etching to remove the surplus base material, one is left with a layer of the base material only
where dictated by the mask (Figure 2.48e to Figure 2.48g).
The process iterates to produce many layers of potentially different materials, i.e., the result is 3D not 2D. We
might need layers of N-type and P-type semi-conductor and a metal layer to produce transistors, for example.
The feature size (e.g., 90nm CMOS) relates to the resolution of this process; for example, accuracy of the
photolithographic process dictates the width of wires or density of transistors. Regularity of such features
is a major advantage: we can manufacture many similar components in a layer using one photolithographic
process. For example, if we aim to manufacture many transistors they will all be composed of the same layers
albeit in different locations on the substrate.
2.5.1.2 Packaging
Before we can use the “raw” output from the photolithography, a process of packaging is typically applied.
At very least, the first step is to cut out individual components from the resulting wafer: remember that we
can produce many identical components using the same process, so this step gives us a single component we
can use. Before we do so however, each component is typically mounted on a plastic base and connected
to externally accessible pins (or pads) with bonding wires. This makes the inputs to and outputs from the
component (which may be physically tiny and delicate) easier to access. A protective, often plastic, package
Flip-flop based
register
x
y × + r
Flip-flop based
register
register
x
y × + r
Flip-flop based
Flip-flop based
register
register
Figure 2.46: An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative.
Flip-flop based
register
x x x
x 1 y c
r 2 y c
r 4 y c
r r
y0 y1 y2
Flip-flop based
Flip-flop based
register
register
register
x x x
x 1 y c
r 2 y c
r 4 y c
r r
y0 y1 y2
Flip-flop based
Flip-flop based
Flip-flop based
register
register
register
Figure 2.47: An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative.
Figure 2.49: Bonding wires connected to a high quality gold pad (public domain image, source: http://en.wikipedia.
org/wiki/Image:Wirebond-ballbond.jpg).
Figure 2.50: A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public domain image,
source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_z-clip.png).
is also applied to prevent physical damage; large or power-hungry components might also mandate use of a
heat sink (and fan) to dissipate heat.
The final result is a self-contained component, which we can describe as a microchip (or simply a chip) and
start to integrate with other components to construct a larger system.
The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.
Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term,
the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly
constant for at least 10 years. That means by 1975, the number of components per integrated circuit for
minimum cost will be 65, 000.
– Moore
and later updated: in short “the number of transistors that can be fabricated in unit area doubles roughly every
two years”. In a sense, this has become a form of a self-fulfilling prophecy in that the “law” is now an accepted
truth: industry is forced to deliver improvements, and is in part driven by the law rather than the other way
around!
Figure 2.51 demonstrates the manifestation of Moore’s Law on the development of Intel processors. The
implications for design of such processors, and circuits more generally, can be viewed in (at least) two ways:
1. If one can fit more transistors in unit area, the transistors are getting smaller and hence working faster
due to their physical characteristics. As a result one can take a fixed design and, over time, it will get
faster or use less power as a result of Moore’s Law.
1000000000
Intel Pentium D
100000000
Intel Pentium M
Intel Pentium 4
Intel Pentium
Intel 80486
1000000
Intel 80386
Intel 80286
100000
Intel
Intel
8086
8088
10000
Intel 8080
Intel 8008
Intel 4004
1000
1970 1975 1980 1985 1990 1995 2000 2005
Year
Figure 2.51: A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.intel.com/
technology/mooreslaw/).
2. If one can fit more transistors in unit area, then one can design and implement more complex structures
in the same fixed area. As a result, over time, one can use the extra transistors to improve the design yet
keep it roughly the same size.
There is no “free lunch” however; Moore notes that as feature size decreases (i.e., transistors get smaller)
two problems become more and more important. First, power consumption and heat dissipation become an
issue: it is harder to distribute power to the more densely packed transistors and keep with within operational
temperature limits. Second, process variation, which may imply defects and reduce yield, starts to increase
meaning a higher chance that a manufactured chip malfunctions.
A Programmable Logic Array (PLA) is a general-purpose fabric that can be configured to implement specific
SoP or PoS expressions as combinatorial circuits. The fabric itself accepts n inputs, say xi for 0 ≤ i < n, and
produces m outputs, say r j for 0 ≤ j < m via logic gates arranged in two planes. Using an AND-OR type PLA
as an example, the first plane computes a set of minterms using AND gates; those minterms are fed as input
to a second plane of OR gates whose output is the required SoP expression. An OR-AND type PLA simply
reverses the ordering of the planes, thus allows implementation of PoS expressions.
This does not hint at a PLA being particularly remarkable: why is it any different to the combinatorial circuits
we have seen already? The crucial difference is how we end up with the required circuit. The starting point is
a generic, clean fabric as shown in Figure 2.52a. At this point you can think of all of the gates being connected
to all corresponding gate inputs via existing connection points at wire junctions (filled circles), and fuses at
the gate inputs (filled boxes). This is transformed into a specific circuit using a process roughly analogous to
programming: we selectively blow fuses, guided by a configuration that is derived from the circuit design.
Normally a fuse acts as a conductive material, somewhat like a wire; when the fuse is blown using some
directed energy, however, it becomes a resistive material. Therefore, to form the required connections we
x0
x1
xn−1
r0
r1
rm−1
(a) A “clean” PLA fabric, with fuses (filled boxes) acting as potential connections between the AND and OR planes.
x0
x1
xn−1
r0
r1
rm−1
(b) The PLA fabric with blown fuses (empty boxes) to implement a half-adder.
simply blow all the fuses6 where no connection is required. Figure 2.52b shows an example, where fuses have
been blown (now shown as unfilled boxes) to form various connections (shown as thick lines). As a result, this
PLA computes
r0 = (x0 ∧ ¬x1 ) ∨ (¬x0 ∧ x1 ) = x0 ⊕ x1
and
r1 = x0 ∧ x1 ,
i.e., it is a half-adder.
We say that a PLA fabric is one-time programmable. Put simply, once a fuse (or antifuse) is blown, it
cannot be unblown. Since a PLA can only be configured once, it is not unreasonable to think of a PLA as like a
ROM (in the sense that once programmed, the content is fixed) but has the advantage of being able to optimise
for don’t care states. However, the fixed structure means that versus conventional combinatorial logic, it has
the disadvantage of being less (easily) able to capitalise on optimisations such as sharing logic for common
sub-experssions.
a conductor). Using antifuses at each junction means the configuration process blows each antifuse at a junctions where a connection is
required.
(a) The mesh of configurable logic (large boxes) and communication resources (small boxes).
x
r
y c
arithmetic w
and carry
x
logic r D Q
y
en
¬Q
LUT z
x
r
y c
x
r D Q
y
en
¬Q
LUT z
(b) A example Vertex-5 slice, including two LUTs, two D-type flip-flops and a suite of arithmetic cells.
• a Digital Clock Manager (DCM) block, which allows a fixed input clock to be manipulated in a way
that suits the device configuration,
• a Block RAM (BRAM) block, instances of which act like memory devices, and are often realised using
SRAM or similar,
Other possibilities include common arithmetic building blocks, multipliers for instance, which would be
relatively costly to construct using the CLB resources yet are often required.
The added complexity of supporting such flexibility typically means FPGAs have a lower maximum clock
frequency, and will consume more power than a comparable implementation directly in silicon. As such,
they are often used as a prototyping device for designs which will eventually be fabricated using a more
high-performance technology. Other applications include those where debugging and updating hardware
is important, meaning an FPGA-based solution is as flexible as software while also improving performance.
Consider space exploration for example: it turns out to be exceptionally useful to be able to remotely fix bugs
in hardware rather than write off a multi-million pound satellite which is orbiting Mars (and hence out of the
reach of any local repair men).
References
[1] D. Harris and S. Harris. Digital Design and Computer Architecture: From Gates to Processors. Morgan Kauf-
mann, 2007.
[2] M. Karnaugh. “The map method for synthesis of combinatorial logic circuits”. In: Transactions of American
Institute of Electrical Engineers 72.9 (1953), pp. 593–599 (see p. 81).
[3] M. Knodel and N. ten Oever. Terminology, Power and Oppressive Language. Internet Engineering Task Force
(IETF) Internet Draft. 2018. url: https://tools.ietf.org/id/draft-knodel-terminology-00.html
(see p. 114).
[4] E.J. McCluskey. “Minimization of Boolean function”. In: Bell System Technical Journal 35.5 (1956), pp. 1417–
1444 (see p. 87).
[5] G.E. Moore. “Cramming more components onto integrated circuits”. In: Electronics Magazine 38.8 (1965),
pp. 114–117 (see p. 128).
[6] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.
[7] W.V. Quine. “The problem of simplifying truth functions”. In: The American Mathematical Monthly 59.8
(1952), pp. 521–531 (see p. 87).
[8] R.J. Smith and R.C. Dorf. “Chapter 12: Transistors and Integrated Circuits”. In: Circuits, Devices and
Systems. 5th ed. Wiley, 1992 (see p. 71).
[9] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice Hall, 2012.
[10] E.W. Veitch. “A Chart Method for Simplifying Truth Functions”. In: ACM National Meeting. 1952, pp. 127–
133 (see p. 87).
[11] N.H.E. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. 2nd ed. Addison
Wesley, 1993 (see p. 105).
CHAPTER
3
BASICS OF COMPUTER ARITHMETIC
– Babbage
In Chapter 1, we saw how numbers could be represented using bit-sequences. More specifically, we demonstrated
various techniques to represent both unsigned and signed integers using n-bit sequences. In Chapter 2, we then investigated
how logic gates capable of computing Boolean operations (such as NOT, AND, and OR) and higher-level building block
components could be designed and manufactured.
One way to view this content is as a set of generic techniques. We have the ability to design and implement components
that computes any Boolean function, for example, and reason about their behaviour in terms of Physics. A natural next
step is to be more specific: what function would be useful? Among many possible options, the field of computer
arithmetic provides some good choices. In short, arithmetic is something most people would class as computation;
something as simple as a desktop calculator could still be classed as a basic computer. As such, the goal of this Chapter
is to combine the previous material, producing a range of high-level building blocks that perform computation involving
integers: although this useful and interesting in itself, it is important to keep in mind that it also represents a starting
point for study of more general computation.
3.1 Introduction
In general terms, an Arithmetic and Logic Unit (ALU) is a component (or collection thereof) tasked with
computation. The concept stems from the design of EDVAC by John von Neumann [9]: he foresaw that
a general-purpose computer would need to perform basic Mathematical operations on numbers, so it is
“reasonable that [the computer] should contain specialised organs for these operations”. In short, the modern
ALU is an example of such an organ: as part of a micro-processor, an ALU supports execution of instructions
by computing results associated with arithmetic expressions such as x + y in a given C program.
One can view a concrete ALU at two levels, namely 1) at a low(er) level, in terms of how the constituent
components themselves are designed, or 2) at a high(er) level, in terms of how said components are organised.
In this Chapter we focus primarily on the former, which implies a focus on computer arithmetic. The challenge
is roughly as follows: given one or more n-bit sequences that represent numbers, say x̂ and ŷ, how can we
design a component, i.e., a Boolean function f we can then implement as a circuit, whose output represents an
arithmetic operation? For example, if we want to compute
r̂ = f (x̂, ŷ) 7→ x + y,
i.e., an r̂ that represents the sum of x̂ and ŷ, how can we design a suitable function
f : Bn × Bn → Bn
that realises the operation correctly while also satisfying additional design metrics once implemented as a
circuit?
x op
C0
y x0 C0 r0
y0
x op
C1
m-input, n-bit
y x1 C1 r1
multiplexer
y1
r
x op
Cm−1
y xn−1 Cn−1 rn−1
yn−1
Figure 3.1: Two high-level ALU architectures: each combines a number of sub-components, but does so using a different
strategy.
Often you will have already encountered long-hand, “school-book” techniques for arithmetic operations
such as addition and multiplication. These allow you to perform the associated computation manually, which
can can be leveraged to address the challenge of designing such an f . That is, we can use an approach whereby
we a) recap on your intuition about what the arithmetic operation means and works at a fundamental level, b)
formalise this as an algorithm, then, finally, c) design a circuit to implement the algorithm (often by starting
with a set of 1-bit building blocks, then later extending them to cope with n-bit inputs and outputs). Although
effective, the challenge of applying this approach is magnified by what is typically a large design space of
options and trade-offs. For example, we might implement f using combinatorial components alone, or widen
this remit by considering sequential components to support state and so on: with any approach involving a
trade-off, the sanity of opting for one option over another requires careful analysis of the context.
After first surveying higher-level, architectural options for an abstract ALU, this Chapter deals more
concretely with a set of low-level components: each Section basically applies the approach above to a different
arithmetic operation. From here on, keep in mind that the scope is constrained by several simplifications:
1. The large design space of options for any given operation dictates we take a somewhat selective approach.
A major example of this is our focus on integer arithmetic only: arithmetic with fixed- and floating-
point numbers is an important topic, but we ignore it entirely and instead refer to [10, Part V] for a
comprehensive treatment.
3. Having fixed the representation of integers, writing x̂ is somewhat redundant: we relax this notation and
simply write x as a result. However, we introduce extra notation to clarify whether a given operation
is signed or unsigned: for an operation ⊙, we use ⊙s and ⊙u to denote signed and unsigned versions
respectively. With no annotation of this type, you can assume the signed’ness of the operator is irrelevant.
complex, it can also be advantageous to have separate ALUs for different classes of input; use of a dedicated
(so separate from the ALU) Floating-Point Unit (FPU) for floating-point computation is a common example.
These possibilities aside, at a high-level an ALU is simply a collection of sub-components; we provide one
or more inputs (wlog. say x and y), and control it using op to select the operation required. Of course, some
operations will produce a different sized output than others: an (n × n)-bit multiplication produces a 2n-bit
output, but any comparison will only ever produce a 1-bit output for example. One can therefore view the ALU
as conceptually producing a single output r, but in reality it might have multiple outputs that are used as and
when appropriate. To be concrete, imagine we want an ALU which performs say m = 11 different operations
⊙ ∈ {+, −, ·, ∧, ∨, ⊕, ∨ , ≪, ≫, =, <}
meaning it can perform addition, subtraction, multiplication, a range of bit-wise Boolean operations (AND,
OR, XOR and NOR), left- and right-shift, and two comparisons (equality and less than): it computes r = x ⊙ y
for an ⊙ selected by op. Figure 3.1 shows two strategies for realising the ALU, each using sub-components (the
i-th of which is denoted Ci ) of a different form and in a different way:
1. Figure 3.1a illustrates an architecture where each sub-component implements all of a different operation.
For example, C0 and C1 might compute all n bits of x + y and x − y respectively; the ALU output is selected,
from the m sub-component outputs, using op to control a suitable multiplexer.
Although, as shown, each sub-component is always active, in reality it might be advantageous to power-
down a sub-component which is not being used. This could, for example, reduce power consumption or
heat dissipation.
2. Figure 3.1b illustrates an architecture where each sub-component implements all operations, but does so
wrt. a single bit only. For example, C0 and C1 might compute the 0-th and 1-st bits of x + y and x − y
respectively (depending on op).
Tanenbaum and Austin [11, Chapter 3, Figures 3-18/3-19] focus on the second strategy, discussing a 1-bit ALU
slice before dealing with their combination. Such 1-bit ALUs are often available as standard building blocks,
so this focus makes a lot of sense on one hand. On the other hand, an arguable disadvantage is that such a
focus complicates the overarching study of computer arithmetic. Put another way, focusing at a low-level on
1-bit ALU slices arguably makes it hard(er) to see how some higher-level arithmetic works. As a result, we
focus instead on the first strategy in what follows: we consider designs for each i-th sub-component, realising
each operation (a Ci for addition, for example) in isolation.
Essentially this means we ignore high-level organisation and optimisation of the ALU from here on, but of
course both strategies have merits. For example, as we will see in the following, overlap exists between different
arithmetic circuit designs: intuitively, the computation of addition and subtraction is similar for example. The
second strategy is advantageous therefore, since said overlap can more easily be capitalised upon to reduce
overall gate count. However, arithmetic circuits that require multiple steps to compute an output (using an
FSM for example) are hard(er) to realise using the second strategy than the first. As a result, a mix of both
strategies as and when appropriate is often a reasonable compromise.
3.3.1 Addition
Example 3.1. Consider the following unsigned, base-10 addition of x = 107(10) to y = 14(10) :
x = 107(10) 7→ 1 0 7
y = 14(10) 7→ 0 1 4 +
c = 0 0 1 0
r = 121(10) 7→ 1 2 1
Although not an arithmetic operation per se, the issue of type conversion is a an important related concept
none the less. Where such a conversion is performed explicitly (e.g., by the programmer) it is formally termed
a cast, and where performed implicitly (or automatically, e.g., by the compiler) it is termed a coercion; either
conversion, depending on the types involved, may or may not retain the same value due to the range of
representable values involved.
As an example, imagine you write a C program that includes a cast of an n-bit integer x into an n′ -bit
integer r. Four cases can occur:
1. If x and r are unsigned and n ≤ n′ , r is formed by padding x with n′ − n bits equal to 0, at the most-
significant end.
2. If x and r are signed and n ≤ n′ , r is formed by padding x with n′ − n bits equal to the sign bit (i.e., the
MSB or (n − 1)-th bit of x) at the most-significant end.
3. If x and r are unsigned and n > n′ , r is formed by truncating x, i.e., removing (and discarding) n − n′ bits
from the most-significant end.
4. If x and r are signed and n > n′ , r is formed by truncating x, i.e., removing (and discarding) n − n′ bits
from the most-significant end.
The second case above is often termed sign extension, and is required (vs. the first case) because simply
padding x with 0 may turn it from a negative to positive value. For example, imagine n = 16 (i.e., the short
type) and n′ = 32 (i.e., the int type): if x = −1(10) , the two options yield
x = 1111111111111111(2) = −1(10)
ext32
0
(x) = 00000000000000001111111111111111(2) = 65535(10)
± (x) =
ext32 11111111111111111111111111111111(2) = −1(10)
where the latter retains the value of x, whereas the former does not.
If we write them naturally, it is clear that |107(10) | = 3 and |14(10) | = 2. However, the resulting mismatch will
become inconvenient: in this example and from here on, we force x and y to have the same length by padding
them with more-significant zero digits. Although this may look odd, keep in mind this padding can be ignored
without altering the associated value (i.e., we are confident 14(10) = 014(10) , however odd the latter looks when
written down).
Most people will have at least seen something similar to this, but, to ensure the underlying concept clear, r
is being computed by working from the least-significant, right-most digits (i.e., x0 and y0 ) towards the most-
significant, left-most digits (i.e., xn−1 and yn−1 ) of the operands x and y. In English, in each i-th step (or column,
as read from right to left) we sum the i-th digits xi and yi and a carry-in ci (produced by the previous, (i − 1)-th
step); since this sum is potentially larger than a single base-b digit is allowed to be, we produce the i-th digit of
the result ri and a carry-out ci+1 (for use by the next, (i + 1)-th step). We call c a (or the) carry chain, and say
carries propagate from one step to the next.
This description can be written more formally: Algorithm 1 offers one way to do so. Notice the loop in
lines #2 to #5 iterates through values of i from 0 to n − 1, with the body in lines #3 and #4 computing ri and ci
respectively. You can read the latter as “if the sum of xi , yi and ci is smaller than a single base-b digit there is a
carry into the next step, otherwise there is no carry”. Notice that the algorithm sets c0 = ci to allow a carry-into
the overall operation (in the example we assumed ci = 0), and co = cn allowing a carry-out; the sum of two
n-digit integers is an (n + 1)-digit result, but the algorithm produces an n-digit result r and separate 1-digit
carry-out co (which you could, of course, think of as two parts of a single, larger result).
A reasonable question is why a larger carry (i.e., a ci+1 > 1) is not possible? To answer this, we should first
note that although line #4 is written as a conditional statement, it could be rewritten st.
ri ← (xi + yi + ci ) mod b
ci+1 ← (xi + yi + ci ) div b
where mod and div are integer modulo and division: this makes more sense in a way, because the latter
assignment can be read as “the number of whole multiples of b carried into the next, (i + 1)-th column”. By
ci ci co ci co ci co ci co co
x x x x
y s y s y s y s
x0
y0
r0
x1
y1
r1
xn−1
yn−1
rn−1
Figure 3.2: An n-bit, ripple-carry adder described using a circuit diagram.
bi bi
x
bo bi
x
bo bi
x
bo bi
x
bo bo
y d y d y d y d
x0
y0
r0
x1
y1
r1
xn−1
yn−1
rn−1
Figure 3.3: An n-bit, ripple-carry subtractor described using a circuit diagram.
ci
op ci co ci co ci co ci co co
x x x x
y s y s y s y s
x0
y0
r0
x1
y1
r1
xn−1
yn−1
rn−1
Figure 3.4: An n-bit, ripple-carry adder/subtractor described using a circuit diagram.
ci
x carry look-ahead logic co
y
pn−1
gn−1
cn−1
p0
g0
p1
g1
c0
c1
p g ci p g ci p g ci p g ci
x x x x
y s y s y s y s
x0
y0
r0
x1
y1
r1
xn−1
yn−1
rn−1
Figure 3.5: An n-bit, carry look-ahead adder described using a circuit diagram.
ci
∨
O(log n)
∨ ∨
∧ ∧ ∧ ∧
O(log n) ∧ ∧ ∧ ∧ ∧ ∧ ∧ ∧
Figure 3.6: An illustration depicting the structure of carry look-ahead logic, which is formed by an upper- and lower-tree
of OR and AND gates respectively (with leaf nodes representing gi and pi terms for example).
Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit carry-in ci ∈ {0, 1}
Output: An unsigned, n-digit, base-b integer r = x + y, and a 1-digit carry-out co ∈ {0, 1}
1 r ← 0, c0 ← ci
2 for i = 0 upto n − 1 step +1 do
3 ri ← (xi + yi + ci ) mod b
4 if (xi + yi + ci ) < b then ci+1 ← 0 else ci+1 ← 1
5 end
6 co ← cn
7 return r, co
Algorithm 1: An algorithm for addition of base-b integers.
Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit borrow-in bi ∈ {0, 1}
Output: An unsigned, n-digit, base-b integer r = x − y, and a 1-digit borrow-out bo ∈ {0, 1}
1 r ← 0, c0 ← bi
2 for i = 0 upto n − 1 step +1 do
3 ri ← (xi − yi − ci ) mod b
4 if (xi − yi − ci ) ≥ 0 then ci+1 ← 0 else ci+1 ← 1
5 end
6 bo ← cn
7 return r, bo
Algorithm 2: An algorithm for subtraction of base-b integers.
considering bounds on (i.e., the maximum values of) each of the inputs, we can show ci ≤ 1 for 0 ≤ i ≤ n. That
is,
so we know (inductively) that the carry out of the i-th step into the next, (i + 1)-th step is at most 1 (and so
either 0 or 1); this is true no matter what value of b is selected.
Example 3.2. Consider the following trace of Algorithm 1, for x = 107(10) and y = 14(10) :
i xi yi ci r x i + y i + ci ci+1 ri r′
⟨0, 0, 0⟩ ⟨0, 0, 0⟩
0 7 4 0 ⟨0, 0, 0⟩ 11 1 1 ⟨1, 0, 0⟩
1 0 1 1 ⟨1, 0, 0⟩ 2 0 2 ⟨1, 2, 0⟩
2 1 0 0 ⟨1, 2, 0⟩ 1 0 1 ⟨1, 2, 1⟩
0 ⟨1, 2, 1⟩
Throughout this Chapter, a similar style is used to describe step-by-step behaviour of an algorithm for specific
inputs (particularly those which include one or more loops). Read from left-to-right, there is typically a section
of loop counters, such as i and j, a section of variables as they are at the start of each iteration, a section of
variables computed during an iteration, and a section of variables as they are at the end of each iteration. If
variable t in the left-hand section is updated during an iteration, we write it as t′ (read as “the new value of t”)
in the right-hand section.
An important feature in the presentation of Algorithm 1 is use of a general b: when invoking it, we can select
any concrete value of b we want. When discussing representation of integers, b = 2 was a natural selection
because it aligned with concepts in Boolean algebra; the same is true here, within a discussion of computation
involving such integers.
Example 3.3. Consider the following unsigned, base-2 addition of x = 107(10) = 01101011(2) to y = 14(10) =
00001110(2)
x = 107(10) 7→ 0 1 1 0 1 0 1 1
y = 14(10) 7 → 0 0 0 0 1 1 1 0 +
c = 0 0 0 0 1 1 1 0 0
r = 121(10) 7→ 0 1 1 1 1 0 0 1
Notice that if we select b = 2, the body of the loop and therefore each replicated step in the unrolled alternative
computes the 1-bit addition
ri = x i ⊕ y i ⊕ ci
ci+1 = (xi ∧ yi ) ∨ (xi ∧ ci ) ∨ (yi ∧ ci )
Put another way, it matches the full-adder cell we produced a design for in Chapter 2. Substituting one for the
other, we simply have n full-adder instances connected via respective carry-in and carry-out: each i-th instance
computes the sum of xi and yi and a carry-in ci , and produces the sum ri and a carry-out ci+1 . The design,
which is termed a ripple-carry adder since the carry “ripples” or propagates through the chain chain, is shown
in Figure 3.2.
The algorithm and associated design satisfy the required functionality: they can compute the sum of n-bit
addends x and y. As such, one might question whether exploring other designs is necessary. Any metric
applied to the design may provide some motivation, but the concept of critical path is particularly important
here. Recall from Chapter 2 that the critical path of a circuit is defined as the longest sequential sequence of
gates; here, the critical path runs through the entire circuit from the 0-th to the (n − 1)-th full-adder instance.
Put another way, the carry chain represents an order on the computation of digits in r: ri cannot be computed
until ci is known, so the i-th step cannot be computed until every j-th step for j < i is computed, due to the carry
chain. This implies the critical path can be approximated by O(n) gate delays; our motivation for exploring
other designs is therefore the possibility of improving on this, and thus computing r with lower latency (i.e.,
less delay).
Example 3.6. Consider the following unsigned, base-10 addition of x = 456(10) to y = 444(10)
x = 456(10) 7→ 4 5 6
y = 444(10) 7 → 4 4 4 +
c = 0 1 1 0
r = 900(10) 7→ 9 0 0
A Carry Look-Ahead (CLA) adder takes advantage of the fact that using base-2 makes application of the rules
simple. In particular, imagine we use gi and pi to indicate whether the i-th step will generate or propagate a
carry respectively. We can express these as
gi = xi ∧ yi
pi = xi ⊕ yi
• we generate a carry-out if both xi = 1 and yi = 1 since no matter what the carry-in is, their sum cannot be
represented in a single base-b digit, and
• we propagate a carry-out if either xi = 1 or yi = 1 since this plus any carry-in will also produce a sum
which cannot be represented in a single base-b digit.
where, again, c0 = ci and we produce a carry-out cn = co. Again this can be explained in words: at the i-th stage
“there is a carry-out if either the i-th stage generates a carry itself, or there is a carry-in and the i-th stage will
propagate it”. As an aside, note that it is common to see gi and pi written as
gi = xi ∧ yi
pi = xi ∨ yi
Of course, when used in the above both expressions have the same meaning: if xi = 1 and yi = 1, then gi = 1
so it does not matter what the corresponding pi is (given the OR will yield 1, since the left-hand term gi = 1,
irrespective of the right-hand term). As such, use of an OR gate rather than an XOR is preferred because the
former requires less transistors.
Like the ripple-carry adder, once we fix n we can unwind the recursion to get an expression for the carry
into each i-th full-adder cell:
c0 = ci
c1 = g0 ∨ (ci ∧ p0 )
c2 = g1 ∨ (g0 ∧ p1 ) ∨ (ci ∧ p0 ∧ p1 )
c3 = g2 ∨ (g1 ∧ p2 ) ∨ (g0 ∧ p1 ∧ p2 ) ∨ (ci ∧ p0 ∧ p1 ∧ p2 )
..
.
This looks horrendous, but notice that the general structure is of the form shown in Figure 3.6: both the bottom-
and top-half are balanced binary trees (st. leaf nodes are gi and pi terms, and internal nodes are AND and OR
gates respectively) that implement the SoP expression for a a given ci . We are able to use this organisation as
a result of having decoupled computation of ci from the corresponding ri , which is, essentially, what yields an
advantage: the critical path (i.e., the depth of the structure, or longest path from the root to some leaf) is shorter
than for a ripple-carry adder. Stated in a formal way, the former is described by O(log n) gate delays due to the
tree structure, and the latter by O(n) as a result of the linear structure.
The resulting design is shown in Figure 3.5. In contrast with the ripple-carry adder design in Figure 3.2,
all the full-adder instances are independent: the carry chain previously linking them has now been eliminated.
Instead, the i-th such instance produces gi and pi ; these inputs are used by the carry look-ahead logic to produce
ci . The design hides an important trade-off, namely the associated gate count. Although we have reduced
the critical path, the gate count is now much higher: a rough estimate would be O(n) and O(n2 ) gates for a
ripple-carry and carry look-ahead adders. It can therefore be attractive to combine several, small(er) carry
look-ahead adders (e.g., 8-bit adders) in a large(r) ripple-carry configuration (e.g., to form a larger, 32-bit,
adder).
A second approach to eliminating the carry chain is to bend the rules a little, and look at a slightly different
problem. The ripple-carry and the carry look-ahead adder compute the sum of two addends x and y; what
happens if we consider three addends x, y, and z, and thus compute x + y + z rather than x + y?
A carry-save adder offers a solution for this alternative problem. It is often termed a 3 : 2 compressor
because it compresses three n-bit inputs x, y and z into two n-bit outputs r′ and c′ (termed the partial sum and
shifted carry). Put another way, a carry-save “adder” computers the actual sum r = x + y + z in two steps: 1)
a compression step produces a partial sum and shifted carry, then 2) an addition step combines them into the
actual sum.
The first step amounts to replacing c with z in the ripple-carry adder design, meaning that for the i-th
full-adder instance we have
r′i = xi ⊕ yi ⊕ zi
c′i = (xi ∧ yi ) ∨ (xi ∧ zi ) ∨ (yi ∧ zi )
Unlike the ripple-carry adder, where the instances are connected via a carry chain, the expressions for r′i and
c′i only use the i-th digits of x, y, and z: computation of each i-th digit of r′ and c′ is independent. Crucially,
this means each r′i and c′i can be computed at the same time; the critical path runs through just one full-adder
instance, rather than all n instances as in a ripple-carry adder.
Example 3.7. Consider computation of the partial sum and shifted carry from x = 96(10) = 01100000(2) , y =
14(10) = 00001110(2) and z = 11(10) = 00001011(2) :
x = 96(10) 7→ 0 1 1 0 0 0 0 0
y = 14(10) 7 → 0 0 0 0 1 1 1 0
z = 11(10) 7 → 0 0 0 0 1 0 1 1
r′ = 0 1 1 0 0 1 0 1
c′ = 0 0 0 0 1 0 1 0
After computing r′ and c′ , we combine them via the second step by computing r = r′ + 2 · c′ using a standard
(e.g., ripple-carry) adder. You could think of this step as propagating the carries, now represented separately
(from the sum) by c′ .
Example 3.8. Consider computation of the actual sum from r′ = 01100101 and c′ = 00001010:
r′ = 0 1 1 0 0 1 0 1
2 · c′ = 0 0 0 0 1 0 1 0 0 +
c = 0 0 0 0 0 0 1 0 0 0
r = 121(10) 7→ 0 0 1 1 1 1 0 0 1
which produces r = 001111001(2) = 121(10) as expected.
Given we need this step to produce r, it is reasonable to question to ask why we bother with this approach at
all: it seems as if we are limited in the same way as if we used a ripple-carry adder in the first place. With m = 1
compression step, the answer is that we have a critical path of O(1) + O(n) gate delays vs. O(n) + O(n) if we used
two ripple-carry adders (one to compute t = x + y, then another to compute r = t + z). The more general idea,
however, is we compute many compression steps (i.e., m > 1) and then a single, addition step: if we do this, the
cost associated with the addition step becomes less significant (i.e., is amortised) as m grows larger. Later, in
Section 3.5 when we look at designs for multiplication, the utility of this approach should become clear.
3.3.2 Subtraction
3.3.2.1 Redesigning a ripple-carry adder
Subtraction is conceptually, and so computationally similar to addition. In essence, the same steps are evident:
we again work from the least-significant, right-most digits (i.e., x0 and y0 ) towards the most-significant, left-
most digits (i.e., xn−1 and yn−1 ). At each i-th step (or column), we now compute the difference of the i-th digits
xi and yi and a borrow-in produced by the previous, (i − 1)-th step; this difference is potentially smaller than
zero, so we produce the i-th digit of the result and a borrow-out into the next, (i + 1)-th step. This description is
formalised in a similar way by Algorithm 2. Note that although the name c is slightly counter-intuitive (it now
represents a borrow- rather than carry-chain), we stick to the same notation as an adder to stress the similar
use.
Example 3.9. Consider the following unsigned, base-10 subtraction of y = 14(10) from x = 107(10)
x = 107(10) 7→ 1 0 7
y = 14(10) 7 → 0 1 4 −
c = 0 1 0 0
r = 93(10) 7→ 0 9 3
and the corresponding trace of Algorithm 2
i xi yi ci r x i − y i − ci ci+1 ri r′
⟨0, 0, 0⟩ ⟨0, 0, 0⟩
0 7 4 0 ⟨0, 0, 0⟩ 3 0 3 ⟨3, 0, 0⟩
1 0 1 0 ⟨0, 0, 0⟩ −1 1 9 ⟨3, 9, 0⟩
2 1 0 1 ⟨0, 0, 0⟩ 0 0 0 ⟨3, 9, 0⟩
0 ⟨3, 9, 0⟩
which produces r = 93(10) as expected.
Example 3.10. Consider the following unsigned, base-2 subtraction of y = 14(10) = 00001110(2) from x = 107(10) =
01101011(2)
x = 107(10) 7→ 0 1 1 0 1 0 1 1
y = 14(10) 7→ 0 0 0 0 1 1 1 0 −
c = 0 0 0 1 1 1 0 0 0
r = 93(10) 7→ 0 1 0 1 1 1 0 1
x
y d
Half-Subtractor
x y bo d
0 0 0 0
0 1 1 1
1 0 0 1
1 1 0 0
(a) The half-subtractor as a truth table. bo
(b) The half-subtractor as a circuit.
bi
Full-Subtractor x
y d
bi x y bo d
0 0 0 0 0
0 0 1 1 1
0 1 0 0 1
0 1 1 0 0
1 0 0 1 1
1 0 1 1 0
1 1 0 0 0
1 1 1 1 1
(c) The full-subtractor as a truth table. bo
(d) The full-subtractor as a circuit.
i xi yi ci r xi − yi − ci ci+1 ri r′
⟨0, 0, 0, 0, 0, 0, 0, 0⟩ ⟨0, 0, 0, 0, 0, 0, 0, 0⟩
0 1 0 0 ⟨0, 0, 0, 0, 0, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
1 1 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ 0 0 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩
2 0 1 0 ⟨1, 0, 0, 0, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩
3 1 1 1 ⟨1, 0, 1, 0, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩
4 0 0 1 ⟨1, 0, 1, 1, 0, 0, 0, 0⟩ −1 1 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
5 1 0 1 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 0 0 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩
6 1 0 0 ⟨1, 0, 1, 1, 1, 0, 0, 0⟩ 1 0 1 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
7 0 0 0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩ 0 0 0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
0 ⟨1, 0, 1, 1, 1, 0, 1, 0⟩
bo = ¬x ∧ y
d = x ⊕ y
and
bo = (¬x ∧ y) ∨ (¬(x ⊕ y) ∧ bi)
d = x ⊕ y ⊕ bi
respectively. Keep in mind that bi and bo perform the same role as ci and co previously: the subtraction
analogue of the ripple-carry adder, an n-bit ripple-borrow subtractor perhaps, is identical except for the borrow
chain through all n full-subtractor instances.
x + y + ci if op = 0
(
r=
x − y − ci if op = 1
for example. Notice that as well as controlling computation of the sum or difference of x and y, op will control
use of ci as a carry- or borrow-in depending whether an addition of subtraction is computed. The advantage
is that, at a high-level, the design
ci ci0 ci
x
co ci
x
co ci
x
co ci
x
co co0 co
op y s y s y s y s
x00
y00
r00
x01
y01
r01
x0n−1
y0n−1
r0n−1
x y r
includes one internal adder. Versus two separate, similar components (i.e., an adder and a subtractor), this is
already a useful optimisation outright; in designs for multiplication this will be amplified further.
The question is, how should we control the internal inputs to the adder (namely x′ , y′ and ci′ ) st. given
all the external inputs (namely op, x, y and ci) the correct output r is produced? By using two’s-complement
representation, we saw in Chapter 1 that
−y 7→ ¬y + 1
for any given y. The idea is to use this identity, translating from what we want to compute into what we already
can compute:
op ci r op ci r op ci r op ci r
0 0 x + y + ci 0 0 x+y+0 0 0 x+y+0 0 0 x+y+0
0 1 x + y + ci ≡ 0 1 x+y+1 ≡ 0 1 x+y+1 ≡ 0 1 x+y+1
1 0 x − y − ci 1 0 x−y−0 1 0 x + (¬y + 1) − 0 1 0 x + (¬y) + 1
1 1 x − y − ci 1 1 x−y−1 1 1 x + (¬y + 1) − 1 1 1 x + (¬y) + 0
The left-most table just captures what we said above: if op = 0 (in the top two rows) we want to compute
x + y + ci, but if op = 1 (in the bottom two rows) we want to compute x − y − ci. Moving from left-to-right, we
substitute in values of ci then apply the identity for −y in the bottom rows; the right-most table simply folds
the constants together. In the right-most table, all the cases (for addition and subtraction, so where op = 0 or
op = 1) are of the same form, which we can cope with using the internal adder: we have op, x, y and ci, so can
just translate via
op ci xi yi ci′ x′i y′i
0 0 0 0 0 0 0
0 1 1 1 1 1 1
1 0 0 0 1 0 1
1 1 1 1 0 1 0
i.e., ci′ = ci ⊕ op, x′i = xi and y′i = yi ⊕ op. That is x is unchanged whereas yi and ci are XOR’ed with op to
conditionally invert them (in the bottom two rows, where we need ¬yi rather than yi ). Figure 3.4 illustrates the
result, where it is important to see that the overhead, versus in this case a ripple-carry adder, is simply extra
n + 1 XOR gates.
1. x and y are (and hence the addition is) unsigned and there is a carry-out, or
2. x and y are (and hence the addition is) signed but the sign of r makes no sense
which are termed carry and overflow errors respectively. The two cases can be illustrated using some (specific)
examples:
Example 3.11. Consider the following unsigned, base-2 addition of x = 15(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)
x = 15(10) 7→ 1 1 1 1
y = 1(10) 7 → 0 0 0 1 +
c = 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0
which produces an incorrect result r = 0000(2) 7→ 0(10) due to a carry error.
Example 3.12. Consider the following signed, base-2 addition of x = −1(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)
(both represented using two’s-complement)
x = −1(10) 7→ 1 1 1 1
y = 1(10) 7→ 0 0 0 1 +
c = 1 1 1 1 0
r = 0(10) 7→ 0 0 0 0
which produces a correct result r = 0000(2) 7→ 0(10) .
Example 3.13. Consider the following signed, base-2 addition of x = 7(10) 7→ 0111(2) to y = 1(10) 7→ 0001(2) (both
represented using two’s-complement)
x = 7(10) 7→ 0 1 1 1
y = 1(10) 7→ 0 0 0 1 +
c = 0 1 1 1 0
r = −8(10) 7→ 1 0 0 0
which produces an incorrect result r = 1000(2) 7→ −8(10) . due to an overflow error.
To deal with such errors in a sensible manner, we really need two steps: 1) detect that the error has occurred,
then 2) apply some mechanism, e.g., to communicate or correct the error.
Detecting the carry error is simple: as suggested by the first example above, we need to inspect the carry-out.
In this example, that assumes n = 4, the correct result r = 16 has a magnitude which cannot be accommodated
in the number of bits available; an incorrect result r = 0 is therefore produced, with the carry-out (i.e., the fact
that if the result is computed by Algorithm 1, it produces co = 1) signalling an error. However, notice that if we
have signed x and y, as in the second example, any carry-out is irrelevant: in this case, the result r = 0 is correct
and the carry-out should be discarded.
This suggests detecting the overflow error requires more thought, with the third example suggesting a
starting point. In this case, x is the largest positive integer we can represent using n = 4 bits; adding y = 1
means the value wraps-around (as discussed in Chapter 1) to form a negative result r = −8. Clearly this is
impossible, in the sense that for positive x and y we can never end up with a negative sum: this mismatch
allows us to conclude than an overflow error occurred. More specifically, in the case of addition, we apply the
following set of rules (with a similar set possible for subtraction):
x +ve y -ve ⇒ no overflow
x -ve y +ve ⇒ no overflow
x +ve y +ve r +ve ⇒ no overflow
x +ve y +ve r -ve ⇒ overflow
x -ve y -ve r +ve ⇒ overflow
x -ve y -ve r -ve ⇒ no overflow
Note that testing the sign of x or y is trivial, because it will be determined by their MSBs as a result of how
two’s-complement is defined: x is positive, for example, iff. xn−1 = 0 and negative otherwise. Based on this,
detection of an overflow error is computed as
of = ( xn−1 ∧ yn−1 ∧ ¬rn−1 ) ∨
( ¬xn−1 ∧ ¬yn−1 ∧ rn−1 )
or in words: “there is an overflow if either x is positive and y is positive and r is negative, or if x is negative
and y is negative and r is positive”. This can be further simplified to
o f = cn−1 ⊕ cn−2
where c is the carry chain during addition of x and y: basically this XORs the carry-in and the carry-out of the
(n − 1)-th full-adder. As such, an overflow is signalled, i.e., o f = 1, in two cases: either
The fact there are two different classes of shift operation demands some care when writing programs; put
simply, in a given programming language you need to make sure you select the correct operator. In C, both
left- and right-shifts use the operators << and >> irrespective of whether they are arithmetic or logical; the type
of the operand being shifted dictates the class of shift. For example
1. if x is of type int (i.e., x is a signed integer) then the expression x >> 2 implies an arithmetic right-shift,
whereas
2. if x is of type unsigned int (i.e., x is an unsigned integer) then the expression x >> 2 implies a logical
right-shift.
In contrast, Java has no unsigned integer data types so needs to take a different approach: arithmetic and
logical right-shifts are specified by two different operators, meaning
1. cn−1 = 0 and cn−2 = 1, which can only occur of xn−1 = 0 and yn−1 = 0 (i.e., x and y are both positive but r is
negative), or
2. cn−1 = 1 and cn−2 = 0, which can only occur of xn−1 = 1 and yn−1 = 1 (i.e., x and y are both negative but r
is positive).
Once an error condition is detected (during a relevant operation by the ALU, for example), the next question
is what to do about it: clearly the error needs to be managed somehow, or the incorrect result will be used as
normal. There are numerous options, but two in particular illustrate the general approach:
1. provide the incorrect result as normal, (e.g., truncate the result to n bits by discarding bits we cannot
accommodate), but signal the error condition somehow (e.g., via a status register or some form of
exception), or
2. fix the incorrect result somehow, according to pre-planned rules (e.g., saturate or clamp the result to the
largest integer we can represent in n bits).
In short, the choice is between delegating responsibility to whatever is using the ALU (in the former) and
making the ALU itself responsible (in the latter); both have advantages and disadvantages, and may therefore
be appropriate in different situations.
Notice that if y is positive it increases the weight associated with a given digit xi , hence “shifting” said digit to
the left in the sense it assumes a more-significant position. If y is negative, on the other hand, it decreases the
weight associated with xi and the digit “shifts” to the right; in this case, the operation acts as a division instead,
because clearly
1 x
x · b−y = x · y = y .
b b
This argument applies for any b, and, as you might expect, we will ultimately be interested in b = 2 since this
aligns with our approach for representing integers.
Example 3.14. Consider a base-10 shift of x = 123(10) by y = 2
n−1
r = x · by = xi · bi+y
P
i=0
= x0 · b0+2 + x1 · b1+2 + x2 · b2+2
= 3 · 102 + 2 · 103 + 1 · 104
= 300 + 2000 + 10000
= 12300(10)
x = 218(10)
= 11011010(2)
7 → ⟨0, 1, 0, 1, 1, 0, 1, 1⟩
7→ 11011011
describes the same value: using the literal notation in what follows is more natural, but keep in mind that the
equivalence above allows us to translate the same reasoning to any of the alternatives.
Based on our description the in previous Section, we need to consider what a shift operation means when
applied to an integer represented by an n-bit sequence.
Definition 3.1. Two types of shift can be applied to an n-bit sequence x:
r = x ≪ y = xn−1−abs(y) ∥ · · · ∥ x1 ∥ x0 ∥ ?? . . .?
| {z }
n bits
and
2. a right-shift, where y < 0, can be defined as
where y is termed the distance, and each ? represents a “gap” bit that must be filled to ensure r has n bits.
Definition 3.2. When computing a shift, any gap is filled in according to some rules:
1. logical shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBs
and fills the gap in MSBs with zeros, and
2. arithmetic shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBs
and fills the gap in MSBs with a sign bit.
Phrased in this way, a rotate operation (of some x by a distance of y) is the same as a logical shift except that any gap is
filled by the other end of x rather than zero: that is,
1. a left-rotate (for which we use the operator ≪, vs. ≪ for the corresponding shift) yields a gap in the LSBs which
is filled by the MSBs that would be discarded by a left-shift, and
2. a right-rotate (for which we use the operator ≫, vs. ≫ for the corresponding shift) yields a gap in the MSBs
which is filled by the LSBs that would be discarded by a right-shift.
Example 3.18. Consider the base-2 shift and rotation of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010 by a
distance of y = 2:
x ≪u y = 11011010 ≪u 2 = 01101000
x ≫u y = 11011010 ≫u 2 = 00110110
x ≪s y = 11011010 ≪s 2 = 01101000
x ≫s y = 11011010 ≫s 2 = 11110110
and
3. logical left- and right-rotate produce
These examples hopefully illustrate the somewhat convoluted definitions more clearly: in reality, the underly-
ing concepts are reasonably simple. Consider the the logical left-shift: looking step-by-step at
x ≪u y = 11011010 ≪u 2
= 011010??
= 01101000
1. we discard two bits from the left-hand, most-significant end because they cannot be accommodated, plus
2. at the right-hand, less-significant end we need to fill the resulting gap: this is a logical shift, so they are
replaced with 0.
On the other hand, some more subtle points are important. First, note the importance of knowing n when
performing these operations. If we did not know n, or did not fix it say, a left-shift would just elongate the
literal: instead of discarding MSBs, the literal grows to form an n + y bit result. Likewise, if we do not know n
then rotate cannot be defined in a sane way; a left-rotate cannot fill the gap in the LSBs with discarded MSBs,
because they are not discarded! Second, use of terminology including “left” and “right” explains why it is
easier to reason about these operations by using literals. In short, doing so means the operations both have
the intuitive effect by moving elements left or right: using a sequence, under our notation at least, the effect is
counter-intuitive (i.e., the wrong way around). Third, and finally, if y is a known, fixed constant then shift and
rotate operations require no actual arithmetic: we are simply moving bits in x left or right. As a result, a circuit
that left- or right-shifts x by a fixed distance y simply connects wires from each xi (or zero say, to fill LSBs or
MSBs) to ri+y . We can use this fact as a building block in more general circuits that can cater for scenarios where
y is not fixed. Even then, however, we typically assume a) the sign of y is always positive (which is captured in
the above via use of abs(y)), which is sane because we have specific left- and right-shift (or rotate) operations
vs. one generic operation, and b) the magnitude of y is restricted to 0 ≤ y < n meaning y has m = ⌈log2 (n)⌉ bits.
We then have a choice of how to deal with a y outside this range. Typically, we either let r be undefined for any
y > n or y < 0 or consider y modulo n.
Although logical and arithmetic left-shift are equivalent (i.e., a gap is zero-filled in both cases), this is not
so for right-shift; as such, it is fair to question why arithmetic right-shift is included as a special case. Recall
the original description above, where a shift of x by y was equated to a multiplication of x by b y . If x uses a
signed representation, a multiplication, and therefore also a shift, ideally preserves the sign: if x is positive
(resp. negative) then we expect x · b y to be positive (resp. negative). This is essentially the purpose of an
arithmetic right-shift, in the sense it preserves the sign of x and hence has the correct arithmetic meaning. Both
the underlying issue and impact of this special case is clarified by an example:
x = −38(10) 7→ 11011010
x/2 = −38(10) /2 = −19(10) 7 → 11101101
1 x
x · b y = x · 2−1 = x · = .
2 2
However, using logical right-shift, we get
r = x ≫u y = 11011010 ≫u 1
= 01101101
7 → 109(10)
r = x ≫s y = 11011010 ≫s 1
= 11101101
7 → −19(10)
as expected. In the former we fill the MSBs with zero, which turns x into a positive r; in the later we fill the
MSBs with the sign bit of x to preserves the sign in r. This highlights the reason there is no need for a special
case for arithmetic left-shift. With right-shift we fill MSBs of, so dictate the sign bit of, the result; in contrast, a
left-shift means filling LSBs in the result, so the sign bit remains as is (i.e., is preserved by default).
(x ≪ y) ≪ y′ ≡ x ≪ (y + y′ ).
r 1 r0
Figure 3.8: An iterative design for n-bit (left-)shift described using a circuit diagram.
x x x x
x 20 y c
r 21 y c
r
y c
r 2m−1 y c
r r
y0 y1 ym−1
y
Figure 3.9: A combinatorial design for n-bit (left-)shift described using a circuit diagram.
Example 3.20. Consider the base-2, logical left-shift of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010, first by
a distance of y = 2 then by a distance of y′ = 4:
(x≪y) ≪ y′ = ( 11011010 ≪ 2 ) ≪ 4
= ( 01101000 ) ≪ 4
= 10000000
x ≪ (y + y′ ) = 11011010 ≪ (2 + 4)
= 11011010 ≪ 6
= 10000000
Using the same reasoning, it should be obvious that if y = 6 then
r = x≪y = x≪6
= (((((x ≪ 1) ≪ 1) ≪ 1) ≪ 1) ≪ 1) ≪ 1
Put another way, we can decompose one large shift on the LHS into several smaller shifts on the RHS: six
repeated shifts each by a distance of 1 bit produce the same result as one shift by a distance of 6 bits. This
approach is formalised by Algorithm 3.
Example 3.21. Consider the following trace of Algorithm 3, for y = 6(10) :
i r r′
x
0 x x≪1 r′ ←r≪1
1 x≪1 x≪2 r′ ←r≪1
2 x≪2 x≪3 r′ ←r≪1
3 x≪3 x≪4 r′ ←r≪1
4 x≪4 x≪5 r′ ←r≪1
5 x≪5 x≪6 r′ ←r≪1
x≪6
which produces r = x ≪ 6 as expected.
Figure 3.8 captures the components required to implement this algorithm; the design highlights a trade-off
between area and latency in which smaller area is favoured. Specifically, we only need a) a register to store r
(left-hand side), and b) a component to perform a 1-bit left-shift (center), which realises line #3) of Algorithm 3
and so needs no actual logic (since the shift distance is constant). However, this data-path demands an associated
control-path that realises the loop. We can do so using an FSM of course: in each i-th step, the FSM latches
r′ (representing the combinatorial result r ≪ 1) into r ready for the (i + 1)-th step; implementation of such an
FSM clearly demands a register to hold i and suitable control logic, both of which add somewhat to the area
(and design complexity). Even so, the trade-off is essentially that we have a simple computational step but, as
a result, need to iterate through y such steps to compute the (eventual) result.
So far so good, but what about right-shift? Or, rotate?! Crucially, we can support the entire suite of shift-like
operations via a fairly simple alteration to line #3 of Algorithm 3: we simply need to the component at the
center of our design. For example, if we replace r ← r ≪ 1 with r ← r ≫ 1 we get a design that performs a
right- vs. left-shift. Even better, if we replace r ← r ≪ 1 with something more involved, namely
r ≪ 1 if op = 0
r ≫ 1 if op = 1
r←
r ≪ 1 if op = 2
r ≫ 1 if op = 3
then provided we supply op as an extra input, the resulting design can perform left- and right-shift, and
left- and right-rotate: a multiplexer, controlled by op, decides which result of (produced by each different,
individual operation) to update r with. We still iterate through y steps, meaning the end result is now a left-
or right-shift, or left- or right-rotate by a distance of y. One can view this as an application of the unintegrated
ALU architecture in Figure 3.1a, but at a lower (or internal, component) level vs. higher, ALU level.
Although the example is the same, the underlying strategy is to express y st. each smaller shift is by a power-
of-two distance (i.e., by 2i for some i). As such, if we write y in base-2 then each bit yi tells us whether or not
to shift by a distance derived from i: we can compute the result via application of a simple rule “if yi = 1 then
shift the accumulated result by a distance of 2i , otherwise leave it as it is” which is formalised by Algorithm 4.
Example 3.22. Consider the following trace of Algorithm 3, for y = 6(10) st. m = ⌈log2 (n)⌉ = 3:
i r yi r′
x
0 x 0 x r′ ← r
1 x 1 x≪2 r′ ← r ≪ 21
2 x≪2 1 x≪6 r′ ← r ≪ 22
x≪6
Translating the algorithm into a corresponding design harnesses the same idea as the ripple-carry adder: once
n is known, we unroll the loop by copy and pasting the loop body (i.e., lines #3 to #5) n times, replacing i with
the correct value in each i-th copy. Doing so given y = 6, for example, produces the straight-line alternative
1 r←x
2 if y0 = 1 then r ← r ≪ 20
3 if y1 = 1 then r ← r ≪ 21
4 if y2 = 1 then r ← r ≪ 22
which makes it (more) clear that we are essentially performing a series of choices: if the (i − 1)-th stage produces
t as output, the i-th uses yi to choose between producing t or t ≪ 2i for use by the (i + 1)-th stage. All the shifts
themselves are by fixed constants (which we already argued are trivial), so these stages are really just a cascade
of multiplexers.
Figure 3.9 translates this idea into a concrete circuit. The trade-off between latency and area is swapped vs.
that for the previous, iterative design. On one hand, the component is combinatorial: it takes 1 step to perform
each operation (vs. n), whose latency is dictated by the critical path, and can do so without the need for an FSM.
On the other hand, however, it is likely to use significantly more area (relating to the logic gates required for
each multiplexer).
Formally, a multiplication operation1 computes the product r = y·x based on the multiplier y and multiplicand
x. Despite a focus on integer values of x and y here, the techniques covered sit within a more general case often
described as scalar multiplication: abstractly, x could be any object from a suitable structure (wlog. an integer,
meaning x ∈ Z) that is multiplied, while y is an integer scalar that does the multiplying.
In the case of addition, we covered several possible strategies with some associated trade-offs. This is
exacerbated with multiplication, where a much larger design space exists. Even so, the same approach2
is adopted: we again start by investigating the computation above from an algorithmic perspective, then
somehow translate this into a design for a circuit we can construct from logic gates.
1 Why write y · x rather than x · y, which would match addition for example?! Since multiplication is commutative, we could legitimately
x2 x1 x0 x2 x1 x0
y2 y1 y0 × y2 y1 y0 ×
y0 · x0 y0 · x0
y1 · x0 y1 · x0
y2 · x0 y0 · x1
y0 · x1 y2 · x0
y1 · x1 y1 · x1
y2 · x1 y0 · x2
y0 · x2 y2 · x1
y1 · x2 y1 · x2
y2 · x2 y2 · x2
r5 r4 r3 r2 r1 r0 r5 r4 r3 r2 r1 r0
(a) Using operand scanning. (b) Using product scanning.
Figure 3.10: Two examples demonstrating different strategies for accumulation of base-b partial products resulting from
two 3-digit operands.
x = 623(10) 7→ 6 2 3
y = 567(10) 7→ 5 6 7 ×
p0 = 7 · 3 · 100 = 21(10) 7 → 2 1
p1 = 7 · 2 · 101 = 140(10) 7 → 1 4
p2 = 7 · 6 · 102 = 4200(10) 7 → 4 2
p3 = 6 · 3 · 101 = 180(10) 7→ 1 8
p4 = 6 · 2 · 102 = 1200(10) 7→ 1 2
p5 = 6 · 6 · 103 = 36000(10) 7→ 3 6
p6 = 5 · 3 · 102 = 1500(10) 7→ 1 5
p7 = 5 · 2 · 103 = 10000(10) 7→ 1 0
p8 = 5 · 6 · 104 = 300000(10) 7→ 3 0
r = 353241(10) 7 → 3 5 3 2 4 1
The idea of long-hand multiplication is that to compute r = y · x (at the bottom) from x and y (at the top), we
generate and then sum a set of partial products (in the middle): each pi is generated by multiplying a digit
from y with a digit from x, which we term a digit-multiplication. Within this context, and multiplication in
general, we use the following definition:
Definition 3.3. The result of a digit-multiplication between x j and yi is said to be reweighted by the combined weight
of the digits being multiplied: if x j has weight j and yi has weight i, the result will have weight j + i.
Informally at least, this explains why each partial product is offset by some distance from the right-hand edge.
In the example above, note that
st. p7 is offset (or left-shifted) by 3 digits and so weighted by 103 : during summation of the partial products, it
is representing y2 · x1 · 103 = 10000(10) .
The question then is how to generate and sum the partial products. It turns out there are (at least) two
strategies for doing so. These are described in Figure 3.10, which hightlights a difference wrt. how the
digit-multiplications are managed. More specifically:
• The left-hand strategy is termed operand scanning, and is formalised by Algorithm 5. The idea is to loop
through digits of x and y, accumulating the associated digit-multiplications into whatever the relevant
digit of the result r is.
• The right-hand strategy is termed product scanning, and is formalised by Algorithm 6. The idea is to
loop through digits of the result r, so that when computing the i-th such digit ri we accumulate all relevant
digit-multiplications stemming from x and y.
Example 3.24. Consider the following trace of Algorithm 5, which computes a base-10 operand scanning
multiplication for x = 623(10) and y = 567(10)
j i r c yi xj t = yi · xi + ri+j + c r′ c′
⟨0, 0, 0, 0, 0, 0⟩
0 0 ⟨0, 0, 0, 0, 0, 0⟩ 0 7 3 21 ⟨1, 0, 0, 0, 0, 0⟩ 2
0 1 ⟨1, 0, 0, 0, 0, 0⟩ 2 7 2 16 ⟨1, 6, 0, 0, 0, 0⟩ 1
0 2 ⟨1, 6, 0, 0, 0, 0⟩ 1 7 6 43 ⟨1, 6, 3, 0, 0, 0⟩ 4
0 ⟨1, 6, 3, 0, 0, 0⟩ 4 ⟨1, 6, 3, 4, 0, 0⟩
1 0 ⟨1, 6, 3, 4, 0, 0⟩ 0 6 3 24 ⟨1, 4, 3, 4, 0, 0⟩ 2
1 1 ⟨1, 4, 3, 4, 0, 0⟩ 2 6 2 17 ⟨1, 4, 7, 4, 0, 0⟩ 1
1 2 ⟨1, 4, 7, 4, 0, 0⟩ 1 6 6 41 ⟨1, 4, 7, 1, 0, 0⟩ 4
1 ⟨1, 4, 7, 1, 0, 0⟩ 4 ⟨1, 4, 7, 1, 4, 0⟩
2 0 ⟨1, 4, 7, 1, 4, 0⟩ 0 5 3 22 ⟨1, 4, 2, 1, 4, 0⟩ 2
2 1 ⟨1, 4, 2, 1, 4, 0⟩ 2 5 2 13 ⟨1, 4, 2, 3, 4, 0⟩ 1
2 2 ⟨1, 4, 2, 3, 5, 0⟩ 1 5 6 35 ⟨1, 4, 2, 3, 5, 0⟩ 3
2 ⟨1, 4, 2, 3, 5, 0⟩ 3 ⟨1, 4, 2, 3, 5, 3⟩ 3
⟨1, 4, 2, 3, 5, 3⟩
producing r = 353241(10) as expected.
Example 3.25. Consider the following trace of Algorithm 6, which computes a base-10 product scanning
multiplication for x = 623(10) and y = 567(10)
r = 14 · x = x + x + x + x + x + x + x + x + x + x + x + x + x + x.
This is important because we already covered how to compute an addition, plus how to design associated
circuits. So to compute a multiplication, we essentially just need to reuse our addition circuit in the right way:
Algorithm 7 states the obvious, in the sense it captures this idea by simply adding x to r (which is initialised to
0) in a loop that iterates y times. Directly using repeated addition is unattractive, however, since the number
of operations performed relates to the magnitude of y. That is, we need y − 1 operations3 in total, so for some
n-bit y we perform O(2n ) operations: this grows quickly, even for modest values of n (say n = 32).
Fortunately, improvements are easy to identify. Another way to look at the multiplication of x by y is as
inclusion of an extra weight to the digits that describe y. That is, writing y in base-b yields
n−1
r = = yi · 2i ) · x
P
y·x (
i=0
n−1
= yi · x · 2i
P
i=0
Example 3.26. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) , which we can expand into a sum
of n = 4 terms as follows:
y·x = y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
= 0 · x · 20 + 1 · x · 21 + 1 · x · 22 + 1 · x · 23
= 0·x + 2·x + 4·x + 8·x
= 14 · x
Intuitively, this should already seem more attractive: there are only n terms (relating to the n digits in y) in
the summation so we only need n − 1, or O(n), operations to compute their sum. Using a similar format to
Algorithm 7, this is formalised by Algorithm 8. However, a problem lurks: line #3 of the algorithm reads
r ← r + y i · x · bi
or, put another way, our goal is to compute a multiplication but each step in doing so needs a further two
multiplications itself! This is a chicken-and-egg style problem, but can be resolved by our selecting b = 2:
1. Multiplying x by yi can be achieved without a multiplication: given we know yi ∈ {0, 1, . . . b − 1} = {0, 1},
if yi = 0 then yi · x = 0, and if yi = 1 then yi · x = x. Put simply, we make a choice between 0 and x using yi
rather than multiply x by yi .
3 Why y − 1 and not y: Algorithm 7 certainly performs y iterations of the loop! If you think about it, although doing so would make it
more complicated, the algorithm could be improved given we know the first addition (i.e., when i = 0) will add x to 0. Put another way,
we could avoid this initial addition and simply initialise r ← n and perform one less iteration (i.e., y − 1 vs. y). Although the difference
is minor, and so a distraction from the main argument here, you can see this more easily by counting the number of + operators in the
expansion above: for y = 14 we have 13 such additions.
So, in short, these facts mean the two multiplications in line #3 are pseudo-multiplications (or “fake” multipli-
cations) because we can replace them with a non-multiplication equivalent whenever b = 2.
Although the reformulations above might not seem useful yet, they represent an important starting point from
which we can later construct various concrete algorithms and associated designs and implementations. Within
the large design space of all possible options, we will focus on a selection summarised as follows:
You can think of options within this design space in a similar way to Section 3.4, where, for example, we
overviewed options for the shift operation. Iterative options for multiplication typically deal with one (or at
least few) partial products in each step; many (i.e., more than 1) steps and hence more time will be required
to compute the result, but less space is required to do so (essentially because less computation is performed
per step). Combinatorial options make the opposite trade-off, requiring just a single step to compute the
result. However, the a) critical path, and so time said step takes due to the associated delay, and b) the
space required, are typically both large(r). Unlike shift operations, a clear separation between iterative and
combinatorial options is harder to make; trade-offs that blur the boundaries between some options are attractive,
and explored where relevant.
Consider two designs which compute r = y · x. Irrespective of how they compute r, they differ wrt. the limits
placed on x and y: the first design can deal with n-bit y and x, whereas the second can only deal with smaller,
m-bit values (st. m < n).
Within this context, consider the specific case of m = n2 (in which we assume n is even). As such, we can
split x and y into two parts, i.e., write
x = x1 · 2n/2 + x0
y = y1 · 2n/2 + y0
r = r2 · 2n + r1 · 2n/2 + r0
where
r2 = y1 · x1
r1 = y1 · x0 + y0 · x1
r0 = y0 · x0
demonstrates the result is correct. The more general, underlying idea is we decompose the single, larger n-bit
multiplication into several, smaller n2 -bit multiplications: in this case, we compute the larger n-bit product r
using four n2 -bit multiplications (plus several auxiliary additions). In a sense, this is an instance of divide-
and-conquer, a strategy often used in the design of algorithms: sorting algorithms such as merge-sort and
quick-sort, for example, will decompose the problem of sorting a larger sequence into that of sorting several
smaller sequences. The Karatsuba-Ofman [8] (re)formulation4 offers further improvement, by first computing
t2 = y1 · x1
t1 = (y0 + y1 ) · (x0 + x1 )
t0 = y0 · x0
then rewrites the terms of r as
r2 = t2
r1 = t1 − t0 − t2
r0 = t0
Doing so requires three n2 -bit multiplications (although now there are more auxiliary additions and/or sub-
tractions). This suggests a general trade-off: we could consider performing fewer, larger n-bit multiplications
or more, smaller n2 -bit multiplications. If we accept the premise that designs for n2 -bit multiplication will be
inherently less complex than an n-bit equivalent, this leads us to adopt one of (at least) two approaches:
1. instantiate and operate several smaller multipliers in parallel (e.g., compute y1 · x1 at the same time as
y0 · x0 ) in an attempt to reduce the overall latency,
2. instantiate and reuse one smaller multipliers (e.g., first compute y1 · x1 then y0 · x0 ) in an attempt to reduce
the overall area.
Although this can be useful in the sense it widens the design space of options, making a decision whether the
original monolithic approach or the decomposed approach is better wrt. some metric can be quite subtle (and
depend delicately on the concrete value of n).
1. the first case requires eight base-2 digits to represent a given y, but the forth case can do the same with
only four, and
2. the first case requires four non-zero base-2 digits to represent this y, but the forth case can do the same
with only two.
In short, features such as these, when generalised, allow more efficient strategies (in time and/or space) for
multiplication using y′ than y.
Whatever representation we select, however, it is crucial that any overhead related to producing and using
the recoded y′ is always less than the associated improvement during multiplication. Put another way, if the
improvement is small and the overhead is large, then overall we are worse off: we may as well not using
recoding at all! This requires careful analysis, for example to judge the relative merits of a specific recoding
strategy given a specific n.
where the RHS factors out powers of the indeterminate x from the LHS.
This fact provides a starting point for an iterative multiplier design. Consider the similarity between a
polynomial
i<n
X
a(x) = ai · xi
i=0
i<n
X
y= yi · bi .
i=0
r 1
x
y c
r r0
x +
yi
Figure 3.11: An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram.
Put simply, there is no difference wrt. the form: only the names of variables are changed, plus b represents
an implicit parameter in the latter whereas x is an explicit indeterminate in the former. As a result, we can
consider a similar way of evaluating
i<n
X
y·x= yi · x · bi
i=0
Example 3.28. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) . As previously stated we would
write
i<n
y·x = yi · x · bi
P
i=0
= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
but now this can be rewritten using Horner’s Rule as
y·x = y0 · x + 2 · ( y1 · x + 2 · ( y2 · x + 2 · ( y3 · x + 2 · ( 0 ) ) ) )
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x+2·(0))))
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x+ 0 )))
= 0·x+2·( 1·x+2·( 1·x+2·( 1·x )))
= 0·x+2·( 1·x+2·( 1·x+ 2·x ))
= 0·x+2·( 1·x+2·( 3·x ))
= 0·x+2·( 1·x+ 6·x )
= 0·x+2·( 7·x )
= 0·x+ 14 · x
= 14 · x
There are two sane approaches to evaluate the bracketed expression: we either
1. work inside-out, starting with the inner-most sub-expression and processing y from most- to least-
significant bit (i.e., from yn−1 to y0 ), meaning they are read left-to-right, or
2. work outside-in, starting with the outer-most sub-expression and processing y from least- to most-
significant bit (i.e., from y0 to yn−1 ), meaning they are read right-to-left.
Either way, note that each successive multiplication by 2 eventually accumulates to produce each 2i . Using
y3 · x as an example, we see it multiplied by 2 a total of three times: this means we end up with
2 · (2 · (2 · (y3 · x))) = y3 · x · 23 ,
and hence the original term required. Putting everything together, to compute the result we maintain an
accumulator r that hold the current (or partial) result during evaluation; using r, the computation could be
described as
• to realise each step of evaluation, apply a simple 2-part rule: first double the accumulated result (i.e., set
r to 2 · r), then add yi · x to the accumulated result (i.e., set r to r + yi · x).
r ← yi · x + 2 · r
which is further formalised by Algorithm 9: notice that line #1 realises the first point above while lines #3 to #6
realise the second point, with a loop spanning lines #2 to #7 iterating over them to realise each step. Although
we will continue to focus on this approach, it is interesting to note, as an aside, that Algorithm 10 will yield the
same result.
Example 3.29. Consider the following trace of Algorithm 9, for y = 14(10) 7→ 1110(2) :
i r yi r′
0
3 0 1 x r′ ←2·r+x
2 x 1 3·x r′ ←2·r+x
1 3·x 1 7·x r′ ←2·r+x
0 7·x 0 14 · x r′ ←2·r
14 · x
Algorithm 9 is termed the left-to-right variant, since it processes y from the most- down to the least-significant
bit (i.e., starting with yn−1 , on the left-hand end of y when written as a literal).
Example 3.30. Consider the following trace of Algorithm 10, for y = 14(10) 7→ 1110(2) :
i r x yi r′ x′
0 x
0 0 x 0 0 2·x x′ ←2·x
1 0 2·x 1 2·x 4·x r ← r + x, x′
′
←2·x
2 2·x 4·x 1 6·x 8·x r′ ← r + x, x′ ←2·x
3 6·x 8·x 1 14 · x 16 · x r′ ← r + x, x′ ←2·x
14 · x
Algorithm 10 is termed the right-to-left variant, since it processes y from the least- up to the most-significant
bit (i.e., starting with y0 , on the right-hand end of y when written as a literal).
Whereas the left-to-right variant only updates r, the right-to-left alternative updates r and x; this may be deemed
an advantage for the former, since we only need one register (at least one that is updated in any way, vs. simply
a fixed input) rather than two. Beyond this, however, how does either strategy compare to the approach
based on repeated addition which took O(2n ) operations in the worst case? In both algorithms, the number
of operations performed is dictated by the number of loop iterations: using Algorithm 9 as an example, in
each iteration we a) always perform a shift to compute r ← 2 · r, then b) conditionally perform an addition to
compute r ← r + x (which will be required in half the iterations on average assuming a random y). In other
words we perform O(n) operations, which is now dictated by the size of y (say n = 8 or n = 32) rather than the
the magnitude of y (say 2n = 28 = 256 or 2n = 232 = 4294967296) as it was before.
Whether we use Algorithm 9 or Algorithm 10, the general strategy is termed bit-serial multiplication
because use the 1-bit value yi in each iteration; the remaining challenge is to translate this strategy into a
concrete design we can implement. We did something similar by translating Algorithm 3 into an iterative
design for left-shift in Figure 3.8, so can adopt the same idea here: Figure 3.11 outlines a (partial) design that
implements the loop body (in lines #3 to #6) of Algorithm 9. Notice that, as before,
• the left-hand side shows a register to store r (i.e., the current value of r at the start of the loop body),
• the right-hand side shows a register to store r′ (i.e., the next value of r at the end of the loop body), and
• the middle shows some combinatorial logic that computes r′ from r: this is more complex than the
left-shift case, but the idea is that a) the 1-bit left-shift component computes r ≪ 1 = 2 · r, then b) the
multiplexer component selects between 2 · r and 2 · r + x (the latter of which is computed by an adder)
depending on yi .
To control this data-path, we again need an FSM: in each i-th step it will take r′ (representing yi · x + 2 · r, per
the above) and latches it back into t ready for the (i + 1)-th step. A similar trade-off is again evident, in the
sense that although we only need an adder and multiplexer (plus a register for r and the FSM), the result will
be computed after n steps.
Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = n − 1 downto 0 step −d do
3 r ← 2d · r
4 if yi...i−d+1 , 0 then
5 r ← r + yi...i−d+1 · x
6 end
7 end
8 return r
Algorithm 11: An algorithm for multiplication of base-2 integers using a iterative, left-to-right, digit-serial
strategy.
Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = 0 upto n − 1 step +d do
3 if yi+d−1...i , 0 then
4 r ← r + yi+d−1...i · x
5 end
6 x ← 2d · x
7 end
8 return r
Algorithm 12: An algorithm for multiplication of base-2 integers using a iterative, right-to-left, digit-serial
strategy.
Example 3.31. Consider d = 2, and some y st. n = 4: this implies we process nd = 42 = 2 digits in y, each of 2 bits.
Based on what we covered originally, we already know, for example, that in base-2
n−1
r = = ( yi · 2i ) · x
P
y·x
i=0
n−1
= yi · x · 2i
P
i=0
= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23
= y0 · x + 2 · (y1 · x + 2 · (y2 · x + 2 · (y3 · x + 2 · (0))))
The only change is to combine y0 and y1 into a single digit whose value is y0 + 2 · y1 ; this is basically just treating
the two 1-bit digits in y as one 2-bit digit. By doing so, we can rewrite the expression as follows:
r = y · x = (y0 + 2 · y1 ) · x · 20 + (y2 + 2 · y3 ) · x · 22
= y1...0 · x · 20 + y2...3 · x · 22
= y1...0 · x + 22 · (y2...3 · x + 22 · (0))
The term y1...0 should be read as “the bits of y from 1 down to 0 inclusive”, so clearly y1...0 ∈ {0, 1, 2, 3}. As such,
consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2) :
To implement this new strategy, however, Algorithm 9 needs to be generalised for any d. Recall that for the
special case of d = 1, we already saw and used a rule
r ← yi · x + 2 · r.
r ← yi...i−d+1 · x + 2d · r
can be identified, which differs slightly in both left- and right-hand terms of the addition.
• The right-hand term is simple to accommodate. Rather than multiply r by 2 as before, we now multiply
it by 2d ; we already know this can be realised by left-shifting r by a distance of d, i.e., computing
2d · r ≡ r ≪ d.
• The left-hand term is more tricky. For d = 1, we needed to compute yi · x but argued doing so was
essentially a choice: because yi ∈ {0, 1}, the result is either 0 or x. Now, each d-bit digit
yi...i−d+1 ∈ {0, 1, . . . 2d − 1}
could be any one of 2d values rather than 21 = 2, so either a) the choice is more involved, i.e., includes
more cases, or b) we abandon the idea of it being a choice at all, instead using a combinatorial (d × n)-bit
multiplier to compute yi...i−d+1 · x directly (related designs are covered in Section 3.5.4); you can view this
component as replacing the multiplexer shown in Figure 3.11, which, by analogy, realised the (1 × n)-bit
multiplication yi · x.
Making these changes yields Algorithm 11. Note that line #4 could be implemented via either option above,
and that, as outlined above, extracting the digit yi...i−d+1 from y is simple enough that we view is as happening
during multiplication, ignoring the need to formally recode y into y′ beforehand. Either way, a clear advantage
is already evident: we now require nd steps to compute the result.
Example 3.32. Consider the following trace of Algorithm 12, for y = 14(10) 7→ 1110(2) :
i r yi...i−d+1 r′
0
3 0 11(2) = 3(10) 3·x r′ ← 22 · r + 3 · x
1 3 · x 10(2) = 2(10) 14 · x r′ ← 22 · r + 2 · x
14 · x
Assuming use of a combinatorial (d × n)-bit multiplier, one way to think about a digit-serial multiplier is as a
hybrid combination of iterative and combinatorial designs: it is iterative, in that it performs nd steps, but now
each i-th such step utilises a (d × n)-bit combinatorial multiplier component. Given we can select d, the hybrid
can be configured to make a trade-off between time and space: larger d implies fewer steps of computation but
also a larger combinatorial multiplier, and vice versa.
A better strategy exists however: remember that computing a subtraction is (more or less) as easy as an
addition, so we might instead opt for
r = 15 · x = 16 · x − 1·x
= 24 · x − 20 · x
= x≪4 − x≪0
Intuitively, this latter strategy should seems preferable given we only sum two terms rather than four. Booth
recoding [2] is a standard recoding-based strategy for multiplication which generalises this example. Although
various versions of the approach are considered in what follows, the advantages they all offer stem from use
of a signed representation of y and hence use of addition and subtraction operations.
Definition 3.5. Given a binary sequence y, a run of 1 (resp. 0) bits between i and j means yk = 1 (resp. yk = 0) for
i ≤ k ≤ j; in simply terms, this means there is a sub-sequence of consecutive bits in y whose value is 1 (resp. 0).
As a starting point we consider base-2 Booth recoding; the idea is to identify a run of 1 bits in y between i and
j, and then replace it with a single digit whose weight is 2 j+1 − 2i .
Example 3.34. Consider y = 30(10) 7→ 00011110(2) : since there is a run of four 1 bits between i = 1 and j = 4, and
the fact that
2 j+1 − 2i = 24+1 − 21 = 25 − 21 = 30,
we can recode
y = 30(10)
7→ 24 + 23 + 22 + 21
7→ ⟨0, +1, +1, +1, +1, 0, 0, 0⟩(2)
into
y′ = ⟨0, −1, +0, +0, +0, +1, 0, 0⟩(2)
7→ −21 + 25
7→ 30(10)
which clearly still represents the same value (albeit now via a signed digit set, st. y′i ∈ {0, ±1} vs. yi ∈ {0, 1}).
Using the same intuition as previously, the recoded y′ is preferable to y because it has a lower weight (i.e.,
number of non-zero digits). We can see the impact this feature has by illustrating how such a y′ might be used
during multiplication. Given x = 6(10) 7→ 00000110(2) , for example, we would normally compute r = y · x as
x = 6(10) →
7 0 0 0 0 0 1 1 0
y = 30(10) → 7 0 0 0 1 1 1 1 0 ×
p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0
p1 = +1 · x · 21 = +12(10) 7 → 0 0 0 0 0 1 1 0
p2 = +1 · x · 22 = +24(10) 7 → 0 0 0 0 0 1 1 0
p3 = +1 · x · 23 = +48(10) 7 → 0 0 0 0 0 1 1 0
p4 = +1 · x · 24 = +96(10) 7 → 0 0 0 0 0 1 1 0
p5 = 0 · x · 25 = 0(10) 7 → 0 0 0 0 0 0 0 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 180(10) 7 → 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0
and thus accumulate four non-zero partial products. However, by first recoding y into y′ we find
x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 30(10) 7→ 0 0 0 1 1 1 1 0 ×
y′ (2) = 30(10) 7 → 0 0 +1 0 0 0 −1 0
p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0
p1 = −1 · x · 21 = −12(10) 7 → 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p2 = 0 · x · 22 = 0(10) 7 → 0 0 0 0 0 0 0 0
p3 = 0 · x · 23 = 0(10) 7 → 0 0 0 0 0 0 0 0
p4 = 0 · x · 24 = 0(10) 7 → 0 0 0 0 0 0 0 0
p5 = +1 · x · 25 = +192(10) 7 → 0 0 0 0 0 1 1 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 180(10) 7 → 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0
Modified, base-4 Booth recoding A base-2 Booth recoding already seems to produce what we want. However,
there is a subtle problem: using y′ does not always yield an improvement over y itself. This can be demonstrated
by example:
Example 3.35. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2) , which, based on recoding y, we
would compute r = y · x as
x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 5(10) 7 → 0 0 0 0 0 1 0 1 ×
y′ (2) = 5(10) 7 → 0 0 0 0 +1 −1 +1 −1
p0 = −1 · x · 20 = −6(10) 7→ 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p1 = +1 · x · 21 = +12(10) 7→ 0 0 0 0 0 1 1 0
p2 = −1 · x · 22 = −24(10) 7 → 1 1 1 1 1 1 1 1 1 1 1 0 1 0
p3 = +1 · x · 23 = +48(10) 7 → 0 0 0 0 0 1 1 0
p4 = 0 · x · 24 = 0(10) 7 → 0 0 0 0 0 0 0 0
p5 = 0 · x · 25 = 0(10) 7 → 0 0 0 0 0 0 0 0
p6 = 0 · x · 26 = 0(10) 7 → 0 0 0 0 0 0 0 0
p7 = 0 · x · 27 = 0(10) 7 → 0 0 0 0 0 0 0 0
r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0
This requires accumulation of four non-zero partial products, as would be the case if using y as is.
Based on the original Booth recoding as a first step, to resolve this problem we employ a second recoding step
based on the y′ (2) we already have:
1. reading y′ right-to-left, group the recoded digits into pairs of the form (y′i , y′i+1 ), then
2. treat each pair as a single digit whose value is y′i + 2 · y′i+1 per
y′i = 0 y′i+1 = 0 7→ 0
y′i = +1 y′i+1 = 0 7 → +1
y′i = −1 y′i+1 = 0 7 → −1
y′i = 0 y′i+1 = +1 7→ +2
y′i = +1 y′i+1 = +1 7→ not possible
y′i = −1 y′i+1 = +1 7→ +1
y′i = 0 y′i+1 = −1 7→ −2
y′i = +1 y′i+1 = −1 7→ −1
y′i = −1 y′i+1 = −1 7→ not possible
Given we originally had a signed base-2 recoding of y, we now have a signed base-4 recoding of the same y
(termed the modified Booth recoding): each pair represents a digit in {0, ±1, ±2}. Note that the two invalid (or
impossible) pairs exists because of the original Booth recoding: we cannot encounter them, because the first
recoding step will have already eliminated the associated run.
Example 3.36. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2) ; based on the modified recoding,
we would compute r = y · x as
x = 6(10) 7→ 0 0 0 0 0 1 1 0
y = 5(10) 7 → 0 0 0 0 0 1 0 1 ×
y′ (2) = 5(10) 7 → 0 0 0 0 +1 −1 +1 −1
y′ (4) = 5(10) 7→ +1 +1
p0 = +1 · x · 20 = +6(10) 7 → 0 0 0 0 0 1 1 0
p2 = +1 · x · 22 = +24(10) 7→ 0 0 0 0 0 1 1 0
p4 = 0 · x · 24 = 0(10) 7→ 0 0 0 0 0 0 0 0
p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0
r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0
and this accumulate two non-zero partial products rather than four.
An algorithm for Booth-based recoding Both the first and second recoding steps above are still presented in
a somewhat informal manner, because the goal was to demonstrate the idea; to make use of them in practice,
we obviously need an algorithm. Fortunately, such an algorithm is simple to construct: notice that in a base-2
Booth recoding
and since these digits are paired to form the base-4 Booth recoding, each digit in that depends on yi−1 , yi , and
yi+1 . Thanks to this observation, the recoding process is easier than it may appear: assuming suitable padding
of y (i.e., y j = 0 for j < 0 and j ≥ n), we can produce digits of y′ from a 2- or 3-bit sub-sequence (or window) of
bits in y via
Unsigned Signed Signed
base-2 base-2 base-4
yi+1 yi yi−1 y′i+1 y′i y′i/2
0 0 0 0 0 0
0 0 1 0 +1 +1
0 1 0 +1 −1 +1
0 1 1 +1 0 +2
1 0 0 −1 0 −2
1 0 1 −1 +1 −1
1 1 0 0 −1 −1
1 1 1 0 0 0
Algorithm 13 and Algorithm 14 capture these rules in algorithms that produce a base-2 and base-4 recodings of
a given y respectively. Crucially, one can unroll the loop to produce a combinatorial circuit. For Algorithm 14
say, one would replicates a single recoding cell: each instance of the cell would accepts three bits of y as input
(namely yi+1 , yi , and yi−1 ) and produce a digit of the recoding as output. This implies that the recoding could
be performed during rather than before the subsequent multiplication; the only significant overhead relates to
increased area.
An algorithm for Booth-based multiplication Finally, we can address the problem of using the recoded
multiplier to actually perform the multiplication above: ideally this should be more efficient than the bit-serial
starting point. Algorithm 15 captures the result, which one can think of as a form of digit-serial multiplier:
each iteration of the loop processes a digit of a recoding formed from multiple bits in y.
1. In Algorithm 9, |y| = n dictates the number of loop iterations; Algorithm 11 improves this to n
d for
appropriate choices of d. In comparison, Algorithm 15 requires fewer, i.e.,
|y| n
|y′ | ≃ ≃ ,
2 2
Input: An unsigned, n-bit, base-2 integer x, and a base-4 Booth recoding y′ of some integer y
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = |y′ | − 1 downto 0 step −1 do
3 r ← 22 · r
r − 2 · x if y′i = −2
r − 1 · x if y′i = −1
4 r←
r + 1 · x if yi = +1
′
r + 2 · x if y′ = +2
i
5 end
6 return r
Algorithm 15: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serial
strategy with base-4 Booth recoding.
iterations. As with the digit-serial strategy, to allow this to work, we need to compute 22 · r in line #3
(rather than 2·r), but this can be realised by left-shifting r by a distance of d, i.e., computing 22 ·r ≡ r ≪ 2.
2. In Algorithm 9 we had yi ∈ {0, 1} and in Algorithm 9 we had yi...i−d+1 ∈ {0, 1, . . . 2d − 1}. In Algorithm 15,
however, we have y′i ∈ {0, ±1, ±2}. This basically means we have to test each non-zero y′i against more
cases than before: line #4 captures them in one rather than use a more lengthy set of conditions. In short,
dealing with y′i = −1 vs. y′i = +1 is easy: we simply subtract x from r rather than adding x to r. In the
same way, dealing with y′i = −2 and y′i = +2 mean subtracting (resp. adding) 2 · x from (resp. to) r; since
2 · x can be computed via a shift of x (vs. an extra addition), there is no real overhead vs. subtracting
(resp. adding) x itself.
i yi+1 yi yi−1 t2 t1 t0 t y′
∅
0 1 0 ⊥ 1 0 0 100(2) ⟨−2⟩
2 1 1 1 1 1 1 111(2) ⟨−2, 0⟩
4 ⊥ ⊥ 1 0 0 1 001(2) ⟨−2, 0, +1⟩
⟨−2, 0, +1⟩
i y′i r r′
0
2 +1 0 x r ′ ← 22 · r + 1 · x
1 0 x 4·x r′ ← 2 2 · r
0 −2 4 · x 14 · x r′ ← 2 2 · r − 2 · x
14 · x
Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size d
Output: An unsigned, 2n-bit, base-2 integer r = y · x
1 r←0
2 for i = 0 upto n − 1 step +d do
3 if yd−1...0 , 0 then
4 r ← r + yd−1...0 · x
5 end
6 x ← x · 2d
7 y ← y/2d
8 if y = 0 then
9 return r
10 end
11 end
12 return r
Algorithm 16: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serial
strategy with early termination.
i r yi...i−d+1 r′
0
7 0 00(2) = 0(10) 0 r′ ← 22 · r
5 2 · x 01(2) = 1(10) 1·x r′ ← 22 · r + 1 · x
3 1 · x 11(2) = 3(10) 7·x r′ ← 22 · r + 3 · x
1 7 · x 10(2) = 2(10) 30 · x r′ ← 22 · r + 2 · x
30 · x
Algorithm 12
i r x yi+d−1...i r′ x′
0 x
0 0 x 10(2) = 2(10) 2·x 22 · x r′ ← r + 2 · x, x′ ← 22 · x
2 2 · x 22 · x 11(2) = 3(10) 14 · x 24 · x r′ ← r + 3 · x, x′ ← 22 · x
4 14 · x 24 · x 01(2) = 1(10) 30 · x 26 · x r′ ← r + 1 · x, x′ ← 22 · x
6 30 · x 26 · x 00(2) = 0(10) 30 · x 28 · x r′ ← r + 0 · x, x′ ← 22 · x
30 · x
respectively; both produce r = 30 · x as expected.
Notice that the 2 MSBs of y are both 0, i.e., y7 = y6 = 0 st. y7...6 = 0 and hence y′3 = 00(2) . This fact can
be harnessed to optimise both algorithms. Algorithm 11 processes y′ left-to-right so y′3 is the first digit: the
iteration where i = 7 extracts the digit l
The same argument applies here, in the sense we can skip the associated update of r. In fact, we can be more
aggressive by skipping multiple such updates. If in some i-th iteration the digits processed by all j-th iterations
for j > i are zero, then we may as well stop: none of them will update r, meaning the algorithm can return
it early as is (rather than perform extra iterations). This strategy is normally termed early termination; using
Algorithm 12 as a starting point, it is realised by Algorithm 16.
Example 3.39. Consider the following trace of Algorithm 16 for y = 30(10) 7→ 00011110(2) :
i r x y yd−1...0 r′ x′ y′
0 x 00011110(2)
0 0 x 00011110(2) 10(2) = 2(10) 2·x 22 · x 00000111(2) r′ ← r + 2 · x, x′ ← x · 22 , y′ ← y/22
2 2·x 2
2 · x 00000111(2) 11(2) = 3(10) 14 · x 24 · x 00000001(2) r′ ← r + 3 · x, x′ ← x · 22 , y′ ← y/22
4 14 · x 24 · x 00000001(2) 01(2) = 1(10) 30 · x 26 · x 00000000(2) r′ ← r + 1 · x, x′ ← x · 22 , y′ ← y/22
30 · x
Once r, x, and y have been updated within the iteration for i = 4, we find y′ = 0: this triggers the conditional
statement, meaning r is returned early after three (via line #9) vs. four (via line #12) iterations: the correct result
r = 30 · x is produced as expected.
Although this should seem attractive, some trade-offs and caveats apply. First, the loop body, spanning lines
#3 to #10 of Algorithm 16, is obviously more complex than the equivalent in Algorithm 12. Specifically, r, x,
and y all need to be updated, and the FSM controlling iteration needs to test y and conditionally return r: this
makes it more complex as well. Second, this added complexity, which typically means an increased area, only
potentially (rather than definitively) reduces the latency of multiplication. Put simply, the number of iterations
now depends on the value of y (i.e., whether y′ contains more-significant digits that are 0 st. the algorithm can
skip them), which we cannot know a priori: if this property does not hold, the algorithm will be no better than
standard digit-serial multiplication.
Developing a design directly from this expression is surprisingly easy: we just need to generate each term,
which represents a partial product, then add them up. Figure 3.13 is a (combinatorial) tree multiplier whose
design stems from this idea. It can be viewed, from top-to-bottom, as three layers:
1. The top layer is comprised of n groups of n AND gates: the i-th group computes x j ∧ yi for 0 ≤ j < n,
meaning it outputs either 0 if yi = 0 or x if yi = 1. You can think of the AND gates as performing all n2
possible (1 × 1)-bit multiplications of some x j and yi , or a less general form of multiplexer that selects
between 0 and x based on yi .
2. The middle layer is comprised of n left-shift components. The i-th component shifts by a fixed distance
of i bits, meaning the output is either 0 if yi = 0, or x · 2i if yi = 1. Put another way, the output of the i-th
component in the middle layer is
yi · x · 2i
i.e., some i-th partial product in the long-hand description of y · x.
3. The bottom layer is a balanced, binary tree of adder components: these accumulate the partial products
resulting from the middle layer, meaning the output is
n−1
X
r= yi · x · 2i = y · x
i=0
as required.
In Section 3.5.2, both iterative multiplier designs we produced made a trade-off: they required O(n) time and
O(1) space, thus representing high(er) latency but low(er) area. Here we have more or less the exact opposite
trade-off. The design is combinatorial, so takes O(1) time where the constant involved basically represents the
critical path. However, it clearly takes a lot more space; is is difficult to state formally how much, but the fact
the design includes a tree of several adders vs. one adder hints this could be significant.
Beyond this comparison, it is important to consider various subtleties that emerge if the block diagram is
implemented as a concrete circuit. First, notice that the critical path looks like O(log2 (n)) gate delays, because
this describes the depth of the (balanced) tree as used to form the bottom layer. However, because each node in
said tree is itself an adder, the actual critical path is more like O(n log2 (n)). Even this turns out to be optimistic:
notice, second, that those adders lower-down in the tree (i.e., closer to the root) must be larger (and hence
more complex) than those higher-up. This is simply because the intermediate results get larger; the first level
adds two n-bit partial products to produce a (n + 1)-bit intermediate result, whereas the last level adds two
(2n − 1)-bit intermediate values to produce the 2n-bit result.
1. an initial layer,
2. O(log n) layers of reduction, and
3. a final layer
where the difference is, basically, how those layers are designed. The initial layer generates the partial products,
then the second and third layers accumulate them; this is somewhat similar to the tree multiplier. However,
rather than perform the latter using a tree of general-purpose adders, however, a carefully designed, special-
purpose tree is employed. Producing a design for Wallace and Dadda multipliers follows a different process
than we have used before. Rather than develop an algorithm then translate it into design, the multipliers are
generated directly by an algorithm. Given a value of n as input, Algorithm 17 and Algorithm 18 generate
Wallace and Dadda multipliers respectively; both are described in three steps that mirror the layers above.
Example 3.40. Consider n = 4, where we want to produce a Wallace multiplier design that computes the
product r = y · x for 4-bit x and y; to do so, we use Algorithm 17.
An initial layer multiplies x j with yi for 0 ≤ i, j < 4, st. we produce
Figure 3.12a details the subsequent two reduction layers. For example, in the first reduction layer
• there is one input wire with weight-0, so we use a pass-through operation (denoted PT) which results in
one weight-0 wire as output,
• there are two input wires with weight-1, so we use a half-adder operation (denoted HA) which results in
one weight-2 wire and one weight-4 wire as output, and
• there are three input wires with weight-2, so we use a full-adder operation (denoted FA) which results in
one weight-4 wire and one weight-8 wire as output.
1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2
intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weight
stemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.
2. Reduce the number of intermediate wires using layers composed of full and half adders:
• Combine any three wires with same weight using a full-adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
• Combine any two wires with same weight using a half adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
• If there is only one wire with a given weight, just pass it through to the next layer.
3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:
merge the wires to form two 2n-bit values (padding as required), then add them together with an adder
component.
1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2
intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weight
stemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.
2. Reduce the number of intermediate wires using layers composed of full and half adders:
• Combine any three wires with same weight using a full-adder; the result in the next layer is one
wire of the same weight (i.e. the sum) and one wire a higher weight (i.e. the carry).
• If there are two wires with the same weight left, let w be that weight then:
– If w ≡ 2 mod 3 then combine the wires using a half-adder; the result in the next layer is one
wire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).
– Otherwise, just pass them through to the next layer.
• If there is only one wire with a given weight, just pass it through to the next layer.
3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:
merge the wires to form two 2n-bit values (padding as required), then add them together with an adder
component.
Layer 1 Layer 2
Input Output Input Output
Weight Wires Operation Wires Wires Operation Wires
0 1 PT 1 1 PT 1
1 2 HA 1 1 PT 1
2 3 FA 2 2 HA 1
3 4 FA 3 3 FA 2
4 3 FA 2 2 HA 2
5 2 HA 2 2 HA 2
6 1 PT 2 2 HA 2
7 0 0 0 1
(a) Using a Wallace-based multiplier.
Layer 1 Layer 2
Input Output Input Output
Weight Wires Operation Wires Wires Operation Wires
0 1 PT 1 1 PT 1
1 2 PT 2 2 PT 2
2 3 FA 1 1 PT 1
3 4 FA 3 3 FA 1
4 3 FA 2 2 HA 2
5 2 PT 3 3 FA 2
6 1 PT 1 1 PT 2
7 0 0 0 0
(b) Using a Dadda-based multiplier.
Figure 3.12: A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs.
The resulting design, including the final layer, is illustrated by Figure 3.14.
Notice that n is the only input to the algorithm(s), so although the example is specific to n = 4 the general
structure will remain similar. In fact, the example highlights some important general points:
• we have 1 initial layer, log2 (n) = log2 (4) = 2 reduction layers, and 1 final layer,
• the reduction layers yield at most two wires with a given weight; we form then sum two 2n-bit values
(e.g., using a ripple-carry adder) to produce the result, and, crucially,
• there are no intra-layer carries in the reduction layer(s): the only carry chains that appear are inter-layer,
during reduction, or in the final layer.
Phrased as such, it should be clear why the concept of carry-save addition is relevant: the reduction and final
layers employ essentially the same concept, by compressing many inputs into few(er) outputs until the point
they can be summed to produce the result. If you look again at Algorithm 17 and Algorithm 18, the difference
between the two is within the second step: in the Dadda design, the number of wires of a given weight remains,
by-design, close to a multiple of three, which facilitate use of 3 : 2 compressors as a means of reduction. As
hinted at, a crucial feature of both designs is that each adder cell within the reduction layer(s) operates in
parallel so has an O(1) critical path; this suggests the overall critical path will be O(1 + log2 (n) + n) = O(n) gate
delays in both cases. Comparing Figure 3.12a with Figure 3.12b, we see the main difference is wrt. space not
time. More specifically, for n = 4 the Wallace multiplier uses 6 half-adders and 4 full-adders, and the Dadda
multiplier would use 1 half-adder and 5 full adders; this is a trend that holds for larger n.
yn−1
y1n−1
y0n−1
n−1
n−1
yn−1
y00
y1 0
y0 1
y11
y1
x
x
x
x
x
x
x
x
0 1 n−1
+ +
Figure 3.13: An (n × n)-bit tree multiplier design, described using a circuit diagram.
0 ci co
0 x
y s r7
co ci co
x
y33 x x
y s y s r6
x
y23
co co ci co
x x x
y s y s y s r5
x
y32
x
y13
ci co co ci co
x
y3 x x x
1 y s y s y s r4
x
y22
x
y0
3
x
y30
ci co ci co ci co
x
y1 x x x
2 y s y s y s r3
x
y21
x
y02
ci co co ci co
x
y2 x x x
0 r2
y s y s 0 y s
x
y11
x
y10
co ci co
x x
y s 0 y s r1
x
y10
x 0 ci co
y00 x
r0
0 y s
Figure 3.14: A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram.
Example 3.41. The ARM Cortex-M0 [4] processor supports multiplication via the muls instruction: it yields
a truncated result, st. r = x · y is the least-significant 32 bits of the actual product given 32-bit x and y. The
instruction can be supported in two ways by a given implementation: the design can be a combinatorial,
requiring 1 cycle, or a iterative, requiring 32 cycles. The Cortex-M0 is typically deployed in an embedded
context, where area and power consumption are paramount. The latter, iterative multiplier design may
therefore be attractive choice: assuming increased latency (or time) can be tolerated, it satisfies the goal of
minimising area (or space) associated with this component of the associated ALU.
Example 3.42. The ARM7TDMI [1] processor houses a (8 · 32)-bit combinatorial multiplier; it supports digit-
serial multiplication (for the fixed case where d = 8) with early termination, as invoked by a range of instructions
including umull. One must assume ARM selected this design based on careful analysis. For instance, it seems
fair to claim that
• using a digit-serial multiplier makes an good trade-off between time and space (due to the hybrid,
combinatorial and iterative nature), which is particularly important for embedded processors, plus
• although early termination adds some overhead, it often produces a reduction in latency because of the
types of y used: quite a significant proportion will relate to address arithmetic, where y is (relatively)
small (e.g., as used to compute offsets from a fixed base address).
Example 3.43. Thomas and Balatoni [12] describe a (12 × 12)-bit multiplier design, intended for use in the PDP-8
computer: the design is based on an iterative strategy that makes use of Booth recoding.
Example 3.44. The MIPS R4000 [6] processor takes a somewhat similar, somewhat dissimilar approach to that
described here: it houses a Booth-based multiplier using exactly the recoding strategy described, but within a
(64 · 64)-bit combinatorial rather than an iterative design.
Mirapuri et al. [6, Page 13] detail the design, which splits the multiplication into Booth recoding, multi-
plicand selection, partial product generation and product accumulation steps. A series of carry-save adders
accumulate the partial products, which produces a result r that is stored into two 64-bit registers called hi and
lo (meaning more- and less-significant 64-bit halves).
Example 3.45. An iterative, bit-serial multiplier requires n steps to compute the product; with no further
optimisation, this constraint is inherent in the design. Although the data-path required is minimal, the need
for iterative use of that data-path demands a control-path (i.e., an FSM) of some sort. When placed in a
micro-processor, a resulting question is why we bother having a dedicated multiplier at all: why not just have
an instruction that performs one step of multiplication, and let the program make iterative use of it?
The MIPS-X processor [3] provides an concrete example of this approach: using a slightly rephrased notation
to match what has been described here, [3, Section 4.4.4] basically defines
if GPR[y]31 = 1 then
GPR[r] ← GPR[r] + GPR[x]
GPR[y] ← GPR[y] ≪ 1
mstep GPR[x], GPR[y], GPR[r] 7→ else
GPR[r] ← GPR[r]
GPR[y] ← GPR[y] ≪ 1
end
i.e., a multiply-step instruction essentially matching lines #3 to #6 in Algorithm 9. As such, the idea is implement
a loop that iterates over mstep as described in [3, Appendix IV]. The reason y is left-shifted, is so that one can
test GPR[y]31 rather than GPR[y]i ; the former is updated by the shift, st. in each i-th iteration it does contain
GPR[y]i as required (give iteration is left-to-right, so starts with i = 31 and ends with i = 0).
There are advantages and disadvantages of either approach, i.e., use of a dedicated multiplier vs. an mstep
instruction, with some examples including:
• The mstep instruction removes the need for an FSM to control the dedicated multiplier, essentially
harnessing the existing processor as a control-path. As such, the overhead to support multiplication
within the processor is further reduced.
• On one hand, the 1-step nature of mstep suggests single-cycle execution; in contrast, the n-step nature of
the dedicated multiplier suggests multi-cycle execution. However, this is phrased in terms of processor
cycles: it could be reasonable for a dedicated multiplier and processor to make use of different clock
frequencies. Iff. the former is higher than the latter, n multiplier cycles can be less than n processor cycles
and so execution of n mstep instructions.
It is tempting to avoid designing dedicated circuits for general-purpose comparison, by instead using arithmetic
to make the task easier (or more special-purpose at least). Glossing over the issue of signed’ness, we know for
example that
so we could re-purpose a circuit for subtraction to perform both tasks: we just compute t = x − y and then claim
There idea here is that the general-purpose comparison of x and y is translated into a special-purpose compar-
ison of t and 0.
This slight of hand seems attractive, but turns out to have some arguable disadvantages. Primarily, we
need to cope with signed x and y, and hence deal with cases where x − y overflows for example. In addition,
one could argue a dedicated circuit for comparison can be more efficient than subtraction: even if we reuse one
circuit for subtraction for both operations, cases might occur when this is not possible (e.g., in a micro-processor,
where often we need to do both at the same time).
• By including the mstep instruction, the MIPS-X ISA exposes details of the implementation and so fixes
how a given program should compute r = y · x. If, in contrast, it had a mul instruction with obvious
semantics, then any given implementation of the processor could opt for an iterative or combinatorial
multiplier while maintaining compatibility.
• At least for simple processors, one instruction is executed at a time: for n-bit x and y, this means the
processor will be kept busy for n cycles while executing n mstep instruction. With a dedicated multiplier,
however, one could at least imagine the processor doing something else in the n cycles while the multiplier
is kept busy.
As with the 1-bit building blocks for addition (namely the half- and full-adder), we already covered designs
for 1-bit equality and less than comparators in Chapter 2; these require a handful of logic gates to implement.
Again as with addition, the challenge is essentially how we extend these somehow. The idea of this Section is
to tackle this step-by-step: first we consider designs for comparison of unsigned yet larger, n-bit x and y, then
we extend these designs to cope with signed x and y, and finally consider how to support a suite of comparison
operations beyond just equality and less than.
x0 = x0
y0 y0 ,
x1 = x1
y1 y1 ,
r r
= ,
xn−1 = xn−1
yn−1 yn−1 ,
(a) An AND plus equality comparator based design. (b) An OR plus non-equality comparator based design.
Figure 3.15: An n-bit, unsigned equality comparison described using a circuit diagram.
xn−1
yn−1
xn−1
yn−1
x1
y1
x1
y1
x0
y0
< = < = < = <
Figure 3.16: An n-bit, unsigned less than comparison described using a circuit diagram.
x = 123(10) x = 121(10)
y = 123(10) y = 123(10)
More formally, x and y are equal iff. each digit of x is equal to the corresponding digit of y, so xi = yi for
0 ≤ i < n. As such, in the left-hand case x = y because xi = yi for 0 ≤ i < 3; in the right-hand case x , y because
xi , yi for i = 0. This fact is true in any base, and in base-2 we have a component that can perform the 1-bit
comparison xi = yi : to cope with larger x and y, we just combine instances of it together.
Read out loud, “if x0 equals y0 and x1 equals y1 and ... xn−1 equals yn−1 then x equals y, otherwise x does
not equal y” highlights the basic strategy: each i-th of n instances of a 1-bit equality comparator will compare
xi and yi , then we AND together the results. However, we need to take care wrt. then gate count: by looking
at the truth table
xi yi xi , yi xi = yi
0 0 0 1
0 1 1 0
1 0 1 0
1 1 0 1
it should be clear that the former (inequality) is simply an XOR gate, whereas the latter (equality) needs an
XOR and a NOT gate to implement directly. So we could either
1. use a dedicated XNOR gate whose cost is roughly the same as XOR given that
x ⊕ y ≡ (x ∧ ¬y) ∨ (¬x ∧ y)
and
x ⊕ y ≡ (¬x ∧ ¬y) ∨ (x ∧ y),
or
2. compute x =u y ≡ ¬(x ,u y) instead, i.e., test whether x is not equal to y, then invert the result.
Both designs are illustrated in Figure 3.15: it is important to see that both compute the same result, but use a
different internal design motivated loosely by the standard cell library available (i.e., what gate types we can
use and their relative efficiency in time and space).
where, obviously, x < y in the left-hand case, x > y in the middle case, and x = y in the right-hand case.
Although the examples offer intuitively obvious results, determining why, in a formal sense, x is less than y (or
not) is more involved than the case of equality. A somewhat algorithmic strategy is as follows: work from the
most-significant, left-most digits (i.e., xn−1 and yn−1 ) towards the least-significant, right-most digits (i.e., x0 and
y0 ) and at each i-th step, apply a set of rules that say
• in the left-hand case we find xi = yi for i = 2 and i = 1 but x0 = 1 < 3 = y0 and conclude x < y,
• in the middle case, when i = 2, we find x2 = 3 > 1 = y2 and conclude x > y, while
• in the left-hand case, we find xi = yi for all i and conclude x = y.
Figure 20 captures this more formally: we described, a loop iterates from the most- to least-significant digits of
x and y, and at each i-th step applies the rules above. That is, if xi < yi then x < y and if xi > yi then x ≮ y; if
xi = yi then the loop continues iterating, dealing with the next (i − 1)-th step until it has processed all the digits.
Notice that if the loop actually concludes, then we know that xi = yi for all i and so x ≮ y.
Of course when x and y are written in base-2, our task is easier still because each xi , yi ∈ {0, 1}; this means
we can use our existing 1-bit comparators. As such, translating the algorithm into a concrete design means
reformulating more directly it wrt. said comparators. The idea is to recursively compute
t0 = (x0 < y0 )
ti = (xi < yi ) ∨ ((xi = yi ) ∧ ti−1 )
which matches our less formal rules above: at each i-th step, “x is less than y if xi < yi or xi = yi and comparing
the rest of x is less than the rest of y”. Each step simply requires one of each comparator plus an extra AND and
an extra OR gate; if we have n-bit x and y, we have n such steps as illustrated in Figure 3.16.
Example 3.48. Consider less than comparison for n = 4 bit x and y, st. the unwound recursion
t0 = (x0 < y0 )
t1 = (x1 < y1 ) ∨ ((x1 = y1 ) ∧ t0 )
t2 = (x2 < y2 ) ∨ ((x2 = y2 ) ∧ t1 )
t3 = (x3 < y3 ) ∨ ((x3 = y3 ) ∧ t2 )
yields a result t3 . For x = 5(10) 7→ 0101(2) and y = 7(10) 7→ 0111(2) we can see that
x +ve y -ve 7→ x ≮s y
x -ve y +ve 7 → x <s y
x +ve y +ve 7→ x <s y if abs(x) <u abs(y)
x -ve y -ve 7→ x <s y if abs(y) <u abs(x)
produce the result we want. The first two cases are obvious: if x is positive and y is negative it cannot ever be
true that x < y, while if x is negative and y is positive it is always true that x < y. The other two cases need more
explanation, but basically the idea is to consider the magnitudes of x and y only by computing then comparing
abs(x) and abs(y), the absolute values of x and y. Note that in the case where x and y are both negative the
order of comparison is flipped. This is because a larger negative x will be less than a smaller negative y (and
vice versa); when considering their absolute values, the comparison is therefore reversed.
Example 3.49. set n = 4:
1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) and y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , then x ≮s y since x is +ve and y is -ve,
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) and y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , then x ≮s y since x is +ve and y is -ve,
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) and y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , then x <s y since x is -ve and y is +ve, and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) and y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , then x ≮s y since x is -ve and y is +ve.
Example 3.50. 1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) and y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , then x <s y since x is +ve and y is
+ve and abs(x) = 4 <u 6 = abs(y),
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) and y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , then x ≮s y since x is +ve and y is +ve and
abs(x) = 6 ≮u 4 = abs(y),
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) and y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , then x ≮s y since x is -ve and y is -ve and
abs(y) = 6 ≮u 4 = abs(x), and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) and y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , then x <s y since x is -ve and y is -ve and
abs(y) = 4 <u 6 = abs(x).
Since x and y are representing using two’s-complement, we can make a slight improvement by rewrite the
rules more simply as
x +ve y -ve 7→ x ≮s y
x -ve y +ve 7 → x <s y
x +ve y +ve 7 → x <s y if chop(x) <u chop(y)
x -ve y -ve 7→ x <s y if chop(x) <u chop(y)
where chop(x) = xn−2...0 , meaning chop(x) is x with the MSB (which determines the sign of x) removed; this
is valid because a small negative integer becomes a large positive integer (and vice versa) when the MSB is
removed. Doing so is much simpler than computing abs(x), because we just truncate or ignore the MSBs.
Example 3.51. 1. if x = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , y = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , x <s y since x is +ve and y is +ve and
chop(x) = 4 <u 6 = chop(y),
2. if x = +6(10) 7→ ⟨0, 1, 1, 0⟩(2) , y = +4(10) 7→ ⟨0, 0, 1, 0⟩(2) , x ≮s y since x is +ve and y is +ve and chop(x) = 6 ≮u
4 = chop(y),
3. if x = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , y = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , x ≮s y since x is -ve and y is -ve and chop(x) = 4 ≮u
2 = chop(y), and
4. if x = −6(10) 7→ ⟨0, 1, 0, 1⟩(2) , y = −4(10) 7→ ⟨0, 0, 1, 1⟩(2) , x <s y since x is -ve and y is -ve and chop(x) = 2 <u
4 = chop(y).
The question is, finally, how do we implement these rules as a design? As in the case of overflow detection, we
use the fact that testing the sign of x or y is trivial. As a result, we can write
false if ¬xn−1 ∧ yn−1
xs< y =
true if xn−1 ∧ ¬yn−1
chop(x)u chop(y) otherwise
<
which can be realised by a multiplexer: producing the LHS just amounts to selecting an option from the RHS
using xn−1 and yn−1 , i.e., the sign of x and y, as control signals.
x, y ≡ ¬(x = y)
x≤ y ≡ (x < y) ∨ (x = y)
x≥ y ≡ ¬(x < y)
x> y ≡ ¬(x < y) ∧ ¬(x = y)
meaning the result of all six comparisons between x and y on the LHS can easily be realised using just
References
[1] ARM7TDMI Technical Reference Manual. Tech. rep. DDI-0210C. ARM Ltd., 2004. url: http://infocenter.
arm.com/help/topic/com.arm.doc.ddi0210c/index.html (see p. 176).
[2] A.D. Booth. “A Signed Binary Multiplication Technique”. In: Quarterly Journal of Mechanics and Applied
Mathematics 4.2 (1951), pp. 236–240 (see p. 165).
[3] P. Chow. MIPS-X Instruction Set And Programmer’s Manual. Tech. rep. CSL-86-289. Computer Systems
Laboratory, Stanford University, 1998 (see p. 176).
[4] Cortex-M0 Technical Reference Manual. Tech. rep. DDI-0432C. ARM Ltd., 2009. url: http://infocenter.
arm.com/help/topic/com.arm.doc.ddi0432c/index.html (see p. 176).
[5] L. Dadda. “Some Schemes for Parallel Multipliers”. In: Alta Frequenza 34 (1965), pp. 349–356 (see p. 172).
[6] J. Heinrich. MIPS R4000 Microprocessor User’s Manual. 2nd. 1994 (see p. 176).
[7] W.G. Horner. “A new method of solving numerical equations of all orders, by continuous approximation”.
In: Philosophical Transactions (1819), pp. 308–335 (see p. 161).
[8] A. Karatsuba and Y. Ofman. “Multiplication of Many-Digital Numbers by Automatic Computers”. In:
Physics-Doklady 7 (1963), pp. 595–596 (see p. 160).
[9] J. von Neumann. First Draft of a Report on the EDVAC. Tech. rep. 1945 (see p. 135).
[10] B. Parhami. Computer Arithmetic: Algorithms and Hardware Designs. 1st ed. Oxford University Press, 2000
(see pp. 136, 154).
[11] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice Hall, 2012 (see p. 137).
[12] P.A.V. Thomas and N. Balatoni. “A hardware multiplier/divider for the PDP 8S computer”. In: Behavior
Research Methods & Instrumentation 3.2 (1971), pp. 89–91 (see p. 176).
[13] C.S. Wallace. “A Suggestion for Fast Multipliers”. In: IEEE Transactions on Computers 13.1 (1964), pp. 14–17
(see p. 172).
CHAPTER
4
BASICS OF MEMORY TECHNOLOGY
4.1 Introduction
1. one or more channels, each backed by
2. one or more physical banks, each composed from
3. one or more devices, each composed from
wl wl
Vdd
Q ¬Q
¬bl ¬bl
bl bl
Vss
Q ¬Q
wl
bl Vss
wl wl wl wl
bl bl bl bl
SRAM SRAM SRAM SRAM
wl wl wl wl
row decode
bl bl bl bl
SRAM SRAM SRAM SRAM
WE
An0 −1 . . . A0
wl wl wl wl
bl bl bl bl OE
SRAM SRAM SRAM SRAM
CS
wl wl wl wl
bl bl bl bl
SRAM SRAM SRAM SRAM
sense amplifiers
column decode
wl wl wl wl
bl bl bl bl
RAS DRAM DRAM DRAM DRAM
wl wl wl wl
row decode
row buffer
bl bl bl bl
DRAM DRAM DRAM DRAM
A n0 −1 . . . A0 WE
2
wl wl wl wl
bl bl bl bl
DRAM DRAM DRAM DRAM
OE
wl wl wl wl CS
bl bl bl bl
CAS DRAM DRAM DRAM DRAM
column buffer
sense amplifiers
column decode
row decode
Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss
An0 −1 . . . A0
OE
Vdd /Vss Vdd /Vss Vdd /Vss Vdd /Vss
CS
sense amplifiers
column decode
wl wl wl wl wl wl wl wl
bl bl bl bl bl bl bl bl
SRAM SRAM SRAM SRAM DRAM DRAM DRAM DRAM
wl wl wl wl wl wl wl wl
row decode
row decode
row buffer
bl bl bl bl bl bl bl bl
SRAM SRAM SRAM SRAM WE RAS DRAM DRAM DRAM DRAM WE
OE A n20 −1 . . . A0
wl wl wl wl wl wl wl wl
An0 −1 . . . A0 r bl
SRAM
bl
SRAM
bl
SRAM
bl
SRAM
r bl
DRAM
bl
DRAM
bl
DRAM
bl
DRAM
OE
wl wl wl wl wl wl wl wl
bl bl bl bl CS bl bl bl bl CS
SRAM SRAM SRAM SRAM CAS DRAM DRAM DRAM DRAM
column buffer
w w
c c
Dw−1 . . . D0 Dw−1 . . . D0
Bank #0
Bank #0
Bank #1
WE WE WE
A Bank #0 OE A OE A OE
CS CS CS
Bank #2
Bank #1
Bank #3
D D D
x x x 1Mbit × 8 × 1 SDRAM
x MEM7...0
1Mbit × 16 × 1 SDRAM 1Mbit × 8 × 1 SDRAM
1Mbit × 32 × 1 SDRAM x MEM15...0 x MEM15...8
x MEM31...0 1Mbit × 8 × 1 SDRAM
1Mbit × 16 × 1 SDRAM
x MEM23...16
x MEM31...16
1Mbit × 8 × 1 SDRAM
MEM MEM MEM
x MEM31...24
CHAPTER
5
COMPUTATIONAL MACHINES: FINITE
STATE MACHINES (FSMS)
1. a Moore-style FSM only uses entry actions, i.e., the output depends on the state only, while
2. a Mealy-style FSM only uses input actions, i.e., the output depends on the state and the input.
1. deterministic if for each state there is always one transition for each possible input (i.e., we always know what the
next state should be), or
2. non-deterministic if for each state there might be zero, one or more transitions for each possible input (i.e., we
only know what the next state could be).
δ Xi = 0 Xi = 0
Q Q′ Xi = 1
Xi = 0 Xi = 1
start Seven Sodd
Seven Seven Sodd
Sodd Sodd Seven Xi = 1
(a) A tabular description. (b) A diagrammatic description.
Figure 5.1: An example FSM to decide whether there is an odd number of 1 elements in some sequence X.
Xi = 10
δ
Xi = 20 Xi = 20
Q Q′ start S0 S20 S⊥ Xi = 20
Xi = 10 Xi = 20 Xi = 10
Xi = 10
S0 S10 S20
Xi = 10 Xi = 10 Xi = 20
S10 S20 S30
S20 S30 S⊥ S10 S30
Xi = 20
S30 S⊥ S⊥
(a) A tabular description.
(b) A diagrammatic description.
4. A transition function
δ : S × Σ → S.
5. An output function
ω:S→Γ
Note that:
• The FSM itself might be enough to solve a given problem, but it is common to control an associated data-path using
the outputs.
• A special “empty” (or null) input denoted ϵ allows a transition which can always occur.
• It is common to allow δ to be a partial function, i.e., a function which is not defined for all inputs.
• If the FSM is non-deterministic, then δ might instead give a set of possibilities that is sampled from.
More simply, you can think of an FSM as a directed graph where moving between nodes (which represent
each state) means consuming the input on the corresponding edge. Some examples should show that the fairly
formal description above translates into a much more manageable reality.
Σ = {0, 1}
since each Xi can either be 0 or 1. The FSM can clearly be in two states: having consumed the input so far, it
can either have seen an even or odd number of 1 elements. Therefore we can say
S = {Seven , Sodd },
have s = Seven as the starting state, and let A = {Sodd } be the (singleton) set of accepting states. There is no output
as such, so in this case both the and output alphabet Γ and output function ω are irrelevant.
Our final task is to define the transition function. Figure 5.1 includes a tabular and a diagrammatic
description of the same thing. The tabular, truth table style description is easier to discuss. The idea is that
it lists the current state (left-hand side), alongside the next state for each possible input (right-hand side). In
words, the rows read as follows:
• if we are in state Seven and the input Xi = 0 then we stay in state Seven ,
• if we are in state Seven and the input Xi = 1 then we move to state Sodd ,
• if we are in state Sodd and the input Xi = 0 then we stay in state Sodd , and
• if we are in state Sodd and the input Xi = 1 then we move to state Seven .
The intuition is, for example and with a similar argument possible for the state Sodd , that if we are in state Seven
(i.e., have seen an even number of 1 elements so far) and the next input is 1, then we have now seen an odd
number of 1 elements so move to state Sodd . Conversely, if we are in state Seven (i.e., have seen an even number
of 1 elements so far) and the next input is 0, then we have still seen an even number of 1 elements so stay in
state Seven .
Consider some examples of the FSM in operation
X0 =1 X1 =0 X2 =1 X3 =1
{ Seven { Sodd { Sodd { Seven { Sodd
Since we finish in state Sodd , the input is accepted and hence we conclude it has an odd number of 1
elements.
X0 =0 X1 =1 X2 =1 X3 =0
{ Seven { Seven { Sodd { Seven { Seven
Since we finish in state Seven , the input is rejected and hence we conclude it has an even number of 1
elements.
Imagine we are tasked with designing an FSM that controls a vending machine. The machine accepts tokens
worth 10 or 20 units: when the total value of tokens entered reaches 30 units it delivers a chocolate bar but
it does not give change. That is, the exact amount must be entered otherwise an error occurs, all tokens are
ejected and we start afresh.
The design is clearly a little more complex this time. The input alphabet is basically just the tokens that the
machine can accept, so we have
Σ = {10, 20}.
The set of states the machine can be in is easy to enumerate: it can either have accepted tokens totalling 0, 10,
20 or 30 units in it or be in the error state which we denote by ⊥. Thus, we can say
and clearly set s = S0 since initially the machine has accepted no tokens. There is one accepting state, which is
when a total of 30 tokens has been accepted, so A = {S30 }. Since there is again no output, our final task is again
to define the transition function. As before, Figure 5.2 outlines a tabular and diagrammatic description.
X0 =10 X1 =20
{ S0 { S10 { S30
Since we finish in state S30 , the input is accepted and we get a chocolate bar as output!
X0 =20 X1 =20
{ S0 { S20 { S⊥
Since we finish in state S⊥ , the error state, the input is rejected and the tokens are returned.
Note that the input marked ϵ is the empty input; that is, with no input we can move between the accepting or
error states back into the start state thus resetting the machine. So for example, once we accept or reject the
input we might assume the machine returns to state S0 .
2. the δ and ω functions implemented using combinatorial logic only: they are functions of the current state
and any input.
The behaviour of the framework is illustrated by Figure 5.4. The idea is that within a given current clock cycle
1. ω computes the output from the current state and input, and
2. δ computes the next state from the current state and input
Latch based
register(s) Φ2
δ Q0
Q0 Q0
Q δ
Flip-flop based
Input Clock Q
register(s)
Latch based
Q Input
register(s) Φ1
ω Output Q
Figure 5.3: Two generic FSM frameworks (for different clocking strategies) into which one can place implementations of
the state, δ (the transition function) and ω (the output function).
such that the next state is latched by the positive clock edge marking the next clock cycle. So we have a period
of computation in which ω and δ operate, then an update triggered by a positive clock edge which steps the
FSM from the current state into the next state. What results is a series of steps, under control of the clock, each
performing some computation. As such, it should be clear that the clock frequency determines how quickly
computation occurs; it has to be fast enough to to satisfy the design goals, yet slow enough to cope with the
critical path of a given step of computation. That is, the faster the clock oscillates the faster we step though the
computation, but if it is too fast we cannot finish one step before the next one starts.
To summarise, this is a framework for a computer we can build: we know how each of the components
function, and can reason about their behaviour from the transistor-level upward. To solve a concrete problem
using the framework, we follow a (fairly) standard sequence of steps:
1. Count the number of states required, and give each state an abstract label.
2. Describe the state transition and output functions using a tabular or diagrammatic approach.
3. Decide how the states will be represented, i.e., assign concrete values to the abstract labels, and allocate
a large enough register to hold the state.
4. Express the functions δ and ω as (optimised) Boolean expressions, i.e., combinatorial logic.
5. Place the registers and combinatorial logic into the framework.
Versus a theoretical alternative, it is less common for a hardware-based FSM to have have an accepting states
since we cannot usually halt the circuit (without turning it off); we might include idle or error states to cope. In
addition, and although the framework does not show it, it is common to have a reset input that (re)initialises
the FSM into the start state. For one thing, this avoids the need to turn the FSM off then on again to reset it!
Example #1: an ascending modulo 6 counter Imagine we are tasked with designing an FSM that acts as a
cyclic counter modulo n (rather than 2n as before). If n = 6 for example, we want a component whose output r
steps through values
0, 1, 2, 3, 4, 5, 0, 1, . . . ,
with the modular reduction representing control behaviour (versus the uncontrolled counter that was cyclic
by default). In this case it is clear the FSM can be in one of 6 states (since the counter value is is one of
0, 1, . . . , 5), which we label S0 , S1 , . . . , S5 . Figure 5.5 includes tabular and diagrammatic descriptions of the
transition function, both of which are a little dull: they simply move from one state to the next (with the ϵ
meaning no input is required), cycling from S5 back to S0 .
192
···
···
···
···
···
Φ2
The fact that state assignment occurs quite late in the design of a given FSM is intentional: it allows us to
optimise the representation based on what we do with it. So far, we have used a natural, binary encoding to
represent the i-th of n states as a (⌈log2 (n)⌉)-bit unsigned integer i. For example, if n = 6 we use
S0 7→ ⟨0, 0, 0⟩
S1 7 → ⟨1, 0, 0⟩
S2 7 → ⟨0, 1, 0⟩
S3 7 → ⟨1, 1, 0⟩
S4 7 → ⟨0, 0, 1⟩
S5 7 → ⟨1, 0, 1⟩
1. transition between states is easier (we simply rotate any given encoding by the right distance to get
another), and
2. switching behaviour (and hence power consumption) is reduced since only two bits toggle for any change
(one from 1 to 0, and one from 0 to 1).
Clearly 23 = 8 > 6, so we can represent the current state using a 3-bit integer Q = ⟨Q0 , Q1 , Q2 ⟩. That is,
S0 7→ ⟨0, 0, 0⟩ ≡ 000(2)
S1 7 → ⟨1, 0, 0⟩ ≡ 001(2)
S2 7 → ⟨0, 1, 0⟩ ≡ 010(2)
S3 7 → ⟨1, 1, 0⟩ ≡ 011(2)
S4 7 → ⟨0, 0, 1⟩ ≡ 100(2)
S5 7 → ⟨1, 0, 1⟩ ≡ 101(2)
To implement the FSM, all we need to do is derive Boolean equations for the transition function δ so it can
compute the next state Q′ from Q; with this FSM there is no input, so δ is a function of the current state. To do
so, we first rewrite the tabular description of δ by replacing the abstract labels with concrete values. The result
is a truth table, i.e.,
δ ω
Q2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0
0 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 1
0 1 0 0 1 1 0 1 0
0 1 1 1 0 0 0 1 1
1 0 0 1 0 1 1 0 0
1 0 1 0 0 0 1 0 1
1 1 0 ? ? ? ? ? ?
1 1 1 ? ? ? ? ? ?
which encodes the same information. For example, if the current state is Q = ⟨0, 0, 0⟩ (i.e., we are in state S0 )
then the next state should be Q′ = ⟨1, 0, 0⟩ (i.e., state S1 ). Note that there are 2 unused states, namely ⟨0, 1, 1⟩
When written symbolically, the motivation for using either a Moore or Mealy style FSMs may be unclear. When
the framework for implementing FSMs is taken into account, however, the issue should become more concrete:
• In a Moore FSM the output depends on the current state only, implying changes to any input are only
relevant when the state is updated; you can think of this as meaning the inputs are only relevant in
relation to the clock signal that triggers said update (i.e., they are only taken into account periodically,
rather than continuously).
• In contrast, a Mealy FSM allows the output to depend on the current state and any input. ω is a
combinatorial function, so this implies the output can change a) in relation to the clock signal as a result
up an update to the state, and/or b) at any time as a result of changes to the input. You could think of this as
meaning the FSM is more responsive, in the sense that although the state is updated at the same frequency
(i.e., in relation to the same features of the clock) the output can continuously, and instantaneously change
if/when the input changes.
Both are viable options, so it is not true that one is correct or incorrect. However, it is clearly important to
understand the (subtle) difference so an informed choice can be made within some specific context.
start
S0
δ ω
Q Q′ r S1
S0 S1 0
S1 S2 1 S2
S2 S3 2
S3 S4 3 S3
S4 S5 4
S5 S0 5
S4
(a) A tabular description.
S5
and ⟨1, 1, 1⟩, which we include in the table: the next state in either of these cases does not matter since they are
invalid, so the entries are don’t care.
To summarise, we need to derive Boolean expressions for each of Q′2 , Q′1 and Q′0 in terms of Q2 , Q1 and Q0 .
This can be achieved by applying the Karnaugh map technique to get
Q1 Q1 Q1
Q0 Q0 Q0
Q02 Q01 Q00
00 01 11 10 00 01 11 10 00 01 11 10
0
0
0 1
0 5
1 4
0 0
0
0 1
1 5
0 4
1 0
0
1 1
0 5
0 4
1
Q2 1 1 0 ? ? Q2 1 0 0 ? ? Q2 1 1 0 ? ?
2 3 7 6 2 3 7 6 2 3 7 6
which produce
Q′2 = ( Q1 ∧ Q0 )∨
( Q2 ∧ ¬Q0 )
Q′0 = ( ¬Q0 )
start
S0
δ ω
d=1 d=0
Q Q′ r f
S1
d=0 d=1 d=0 d=1
d=1 d=0
S0 S1 S5 0 0 1
S1 S2 S0 1 0 0 S2
S2 S3 S1 2 0 0 d=1 d=1 d=0 d=0
S3 S4 S2 3 0 0 S3
S4 S5 S3 4 0 0 d=1 d=0
S5 S0 S4 5 1 0 S4
(a) A tabular description. d=1 d=0
S5
start
δ ω S0
Q Q′ M g M a M r A g Aa Ar rst = 0
rst = 0 rst = 1 rst = 0 S1
rst = 1
S0 S1 S6 1 0 0 0 0 1 rst = 0
S1 S2 S6 0 1 0 0 0 1 rst = 1
S2
S2 S3 S6 0 0 1 0 1 0 rst = 1
rst = 1 S6 rst = 0 rst = 0
S3 S4 S6 0 0 1 1 0 0 rst = 1
S3
S4 S5 S6 0 0 1 0 1 0 rst = 1
rst = 0
S5 S0 S6 0 1 0 0 0 1 rst = 1
S4
S6 S0 S6 0 0 1 0 0 1
rst = 0
(a) A tabular description. S5
Now we have enough to fill in the FSM framework: the state is simply a 3-bit register, δ is represented by
circuit analogues of the expressions above. Note that tn this case, the output function ω is trivial: the counter
output r = Q due to our state assignment, so in a sense ω is just the identity function.
Example #2: an ascending or descending modulo 6 counter No imagine we need to upgrade the previous
example: we are tasked with designing an FSM that again acts as a cyclic counter modulo n, but whose direction
can also be controlled. If n = 6 for example, we want a component whose output r steps through values
0, 1, 2, 3, 4, 5, 0, 1, . . .
or
0, 5, 4, 3, 2, 1, 0, 5, . . .
depending on some input d, plus has an output f to signal when the cycle occurs (i.e., when the current value
is last or first in the sequence, depending on d).
The possible states are the same as before: we still have 6 states, labelled S0 , S1 , . . . S6 . The difference is how
transitions between states occur; this is illustrated by Figure 5.6, in which the new tabular and diagrammatic
descriptions of the transition function are shown. Although it looks more complicated, we take exactly the
same approach as before: we start by rewriting the tabular description of δ by replacing the abstract labels with
concrete values to yield:
δ ω
d Q2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0 f
0 0 0 0 0 0 1 0 0 0 0
0 0 0 1 0 1 0 0 0 1 0
0 0 1 0 0 1 1 0 1 0 0
0 0 1 1 1 0 0 0 1 1 0
0 1 0 0 1 0 1 1 0 0 0
0 1 0 1 0 0 0 1 0 1 1
0 1 1 0 ? ? ? ? ? ? ?
0 1 1 1 ? ? ? ? ? ? ?
1 0 0 0 1 0 1 0 0 0 1
1 0 0 1 0 0 0 0 0 1 0
1 0 1 0 0 0 1 0 1 0 0
1 0 1 1 0 1 0 0 1 1 0
1 1 0 0 0 1 1 1 0 0 0
1 1 0 1 1 0 0 1 0 1 0
1 1 1 0 ? ? ? ? ? ? ?
1 1 1 1 ? ? ? ? ? ? ?
The table is larger since we need to consider d as input as well as Q, but the process is the same: to compute
δ, we just need a set of appropriate Boolean expressions. So next we translate the truth table into a set of
Karnaugh maps
Q1 Q1 Q1
Q0 Q0 Q0
Q02 Q01 Q00
00 01 11 10 00 01 11 10 00 01 11 10
00
0
0 1
0 5
1 4
0 00
0
0 1
1 5
0 4
1 00
0
1 1
0 5
0 4
1
01
2
1 3
0 7
? 6
? 01
2
0 3
0 7
? 6
? 01
2
1 3
0 7
? 6
?
Q2 Q2 Q2
11
10
0 11
1 15
? 14
? 11
10
1 11
0 15
? 14
? 11
10
1 11
0 15
? 14
?
d d d
10
8
1 9
0 13
0 12
0 10
8
0 9
0 13
1 12
0 10
8
1 9
0 13
0 12
1
Q′0 = ( ¬Q0 )
This time however, we need to deal with ω more carefully: we can still generate the counter output trivially as
r = Q, but also need to compute f somehow. This is straight-forward of course, because using the truth table
we can write
Q1
Q0
f 00 01 11 10
00
0
0 1
0 5
0 4
0
01
2
0 3
1 7
? 6
?
Q2
11
10
0 11
0 15
? 14
?
d
10
8
1 9
0 13
0 12
0
Example #3: a traffic light controller Imagine we are tasked with designing a traffic light controller for two
roads (a main road and an access road) that intersect. The requirements are to
1. stop cars crashing into each other, so the behaviour should see
(a) green on main road and red on access road, then
(b) amber on main road and red on access road, then
(c) red on main road and amber on access road, then
(d) red on main road and green on access road, then
(e) red on main road and amber on access road, then
(f) amber on main road and red on access road,
and then cycle, and
2. allow an emergency stop button to force red on both main and access roads while pushed, then reset the
system into an initial start state when released.
First we need to take stock of the problem itself: there is basically one input (the emergency stop button,
denoted rst) and six outputs (namely the traffic light values, denoted M g , Ma and Mr for the main road and A g ,
Aa and Ar for the access road). Next we try to develop a precise description of the FSM behaviour. We need 7
states in total: S0 , S1 , . . . , S5 represent steps in the normal traffic light sequence, and S6 is an extra emergency
stop state. Figure 5.7 shows both tabular and diagrammatic descriptions of the transition function; in essence,
it is similar to the counter example (in the sense that it cycles from S0 through to S5 and back again) provided
rst = 0, but if rst = 1 in any state then we move to the S6 . As an aside however, it is important to see this
description represents one solution among several derived from what is (by design) an imprecise question. Put
another way, we have already made several choices. On example is the decision to use a separate emergency
stop state, and have the FSM enter this as the next state of any current state provided rst = 1; the red lights are
both forced on by virtue of being in the emergency stop state, rather than by rst per se. Another valid approach
might be to have ω depend on rst as well (rather than just Q, so it turns from a Moore-based into a Mealy-based
FSM) and forcing the red lights on as soon as rst = 1 and irrespective of what state the FSM is in. In some ways
this is arguably more attractive, in the sense that the emergency stop is instant: we no longer need to wait for
the next clock cycle when the next state is latched. Likewise, we have opted to make the first state listed in the
question (i.e., green on the main road and red on the access road) the initial state; since the sequence is cyclic
this choice seems a little arbitrary, so other choices (plus what state the FSM restarts in after an emergency stop)
might also seem reasonable.
Given our various choices however, we next follow standard practice by translating the description into an
implementation. Since 23 = 8 > 7 we can represent the current and next states via 3-bit integers Q = ⟨Q0 , Q1 , Q2 ⟩
and Q′ = ⟨Q′0 , Q′1 , Q′2 ⟩. where
S0 7→ ⟨0, 0, 0⟩ ≡ 000(2)
S1 7→ ⟨1, 0, 0⟩ ≡ 001(2)
S2 7→ ⟨0, 1, 0⟩ ≡ 010(2)
S3 7→ ⟨1, 1, 0⟩ ≡ 011(2)
S4 7→ ⟨0, 0, 1⟩ ≡ 100(2)
S5 7→ ⟨1, 0, 1⟩ ≡ 101(2)
S6 7→ ⟨0, 1, 1⟩ ≡ 110(2)
and we have one unused state (namely ⟨1, 1, 1⟩). As such, both input and output registers will be comprised
of three 1-bit storage components, in this case D-type latches. Now we have a concrete value for each abstract
state label, we can expand the tabular description of the FSM into a (lengthy) truth table:
δ ω
rst Q2 Q1 Q0 Q′2 Q′1 Q′0 Mg Ma Mr Ag Aa Ar
0 0 0 0 0 0 1 1 0 0 0 0 1
0 0 0 1 0 1 0 0 1 0 0 0 1
0 0 1 0 0 1 1 0 0 1 0 1 0
0 0 1 1 1 0 0 0 0 1 1 0 0
0 1 0 0 1 0 1 0 0 1 0 1 0
0 1 0 1 0 0 0 0 1 0 0 0 1
0 1 1 0 0 0 0 0 0 1 0 0 1
0 1 1 1 ? ? ? ? ? ? ? ? ?
1 0 0 0 1 1 0 1 0 0 0 0 1
1 0 0 1 1 1 0 0 1 0 0 0 1
1 0 1 0 1 1 0 0 0 1 0 1 0
1 0 1 1 1 1 0 0 0 1 1 0 0
1 1 0 0 1 1 0 0 0 1 0 1 0
1 1 0 1 1 1 0 0 1 0 0 0 1
1 1 1 0 1 1 0 0 0 1 0 0 1
1 1 1 1 ? ? ? ? ? ? ? ? ?
• the transition function δ is just three Boolean expressions, one for each Q′i , using rst, Q2 , Q1 and Q0 as
input,
• the output function ω is just six Boolean expressions, one for each Mi and A j , using rst, Q2 , Q1 and Q0 as
input.
00
0
0 1
0 5
1 4
0 00
0
0 1
1 5
0 4
1 00
0
1 1
0 5
0 4
1
01
2
1 3
0 7
? 6
0 01
2
0 3
0 7
? 6
0 01
2
1 3
0 7
? 6
0
Q2 Q2 Q2
11
10
1 11
1 15
? 14
1 11
10
1 11
1 15
? 14
1 11
10
0 11
0 15
? 14
0
rst rst rst
10
8
1 9
1 13
1 12
1 10
8
1 9
1 13
1 12
1 10
8
0 9
0 13
0 12
0
Q′1 = ( rst )∨
( ¬Q2 ∧ ¬Q1 ∧ Q0 )∨
( ¬Q2 ∧ Q1 ∧ ¬Q0 )
0
0
1 1
0 5
0 4
0 0
0
0 1
1 5
0 4
0 0
0
0 1
0 5
1 4
1
Q2 1 0 0 ? 0 Q2 1 0 1 ? 0 Q2 1 1 0 ? 1
2 3 7 6 2 3 7 6 2 3 7 6
Q1 Q1 Q1
Q0 Q0 Q0
Ag 00 01 11 10 Aa 00 01 11 10 Ar 00 01 11 10
0
0
0 1
0 5
1 4
0 0
0
0 1
0 5
0 4
1 0
0
1 1
1 5
0 4
0
Q2 1 0 0 ? 0 Q2 1 1 0 ? 0 Q2 1 0 1 ? 1
2 3 7 6 2 3 7 6 2 3 7 6
Ag = ( Q1 ∧ Q0 )
M g = ( ¬Q2 ∧ ¬Q1 ∧ ¬Q0 ) Aa = ( ¬Q2 ∧ Q1 ∧ ¬Q0 )∨
Ma = ( ¬Q1 ∧ Q0 ) ( Q2 ∧ ¬Q1 ∧ ¬Q0 )
Mr = ( Q1 )∨ Ar = ( ¬Q2 ∧ ¬Q1 )∨
( Q2 ∧ ¬Q0 ) ( ¬Q1 ∧ Q0 )∨
( Q2 ∧ Q1 )
As before, these expressions can be used to fill in the FSM framework to yield a resulting design for the
controller.
Part II
Appendices
APPENDIX
A
EXAMPLE EXAM-STYLE QUESTIONS
A.1 Chapter 1
Q1. We studied representation of unsigned integers using a base-b positional number system. Which of the
following literals
A: 10101
B: 11111
C: 11120
D: 12200
E: 12345
represents the unsigned decimal integer 123(10) in base-3 (or ternary, digits in which are termed trits).
Q2. Imagine that two signed, 8-bit integers x and y are represented using two’s-complement and sign-magnitude
respectively, and both of which have the decimal value 51(10) . If the most-significant bit of both x and y is set
to 1, what are their new (decimal) values?
A: −77(10) and 179(10)
B: −77(10) and −51(10)
C: −51(10) and −77(10)
D: 179(10) and 179(10)
E: 179(10) and −51(10)
Q3. Imagine that two signed, 16-bit integers x and y are represented using two’s-complement; their product r = x · y
is a signed, 32-bit integer also represented using two’s-complement. What is the largest (i.e., whose magnitude
is greatest) negative value of r possible?
A: −0
B: −32768
C: −65535
D: −1073709056
E: −2147483648
Q4. Imagine you write a C program that defines signed, 16-bit integer variables x and y (of type short) and then
assigns them the decimal values 256(10) and 4852(10) respectively. If x and y are then cast into signed, 8-bit
integers (of type char), which of the following
A: 0 and 12
B: 0 and −12
C: −1 and 256
D: −1 and −52
E: 0 and 52
identifies their decimal value? Or, put another way, which are the result of evaluating the two expressions
( char )( x ) and ( char )( y )?
Q5. Consider two signed, 8-bit integer variables x and r (of type char) used in a C program. If x has the decimal
value 9(10) and an assignment
r = ( ~x << 4 ) | 0x97
is executed, what is the decimal value of r afterwards?
A: −9(10)
B: −1(10)
C: 0(10)
D: 1(10)
E: 9(10)
Q6. In general, some x is a fixed point of a function f if f (x) equals x, i.e., if f maps x to itself. Consider the following
function
int8_t abs( int8_t x ) {
int8_t r;
if( x >= 0 ) {
r = x;
}
else {
r = -x;
}
return r;
}
implemented in C: abs was written in an attempt to compute the absolute value of x, a signed, 8-bit integer
representing using two’s-complement. How many of the 28 = 256 possible values of x are fixed points of abs?
A: 0
B: 127
C: 128
D: 129
E: 256
Q7. Imagine that within a given C function, you declare signed, 8-bit integer variables (i.e., variables whose type is
int8_t) x and r. Assume C represents signed integers using two’s-complement, and the right-shift operator
yields arithmetic (rather than logical) shift: if x has the (decimal) value −10(10) , what (decimal) value does r
have after the assignment
r = ~( ( x >> 2 ) ^ 0xF4 )
is executed?
A: −10(10)
B: 10(10)
C: 11(10)
D: 54(10)
E: 203(10)
Q8. Consider two unsigned, 8-bit integer variables, x and y, as declared in some C function by using the type
uint8_t. For how many assignments to these variables will the Hamming weight of their unsigned, 8-bit
integer sum, i.e., x + y, be zero? Put another way, how many elements does the set
{(x, y) | HW(x + y) = 0}
have?
A: 0
B: 1
C: 255
D: 256
E: 65536
Q9. Consider a literal x̂ = 10, which represents a value x using a base-b positional number system. Based on this
information alone, which of the following values
A: x=1
B: x=2
C: x=b
D: x = 10
E: x = 16
is correct?
Q10. Assuming an n-bit x and use of two’s-complement representation for signed integers, which of the following
identities
A: x ∧ ¬x ≡ 0(10)
B: x ∨ ¬x ≡ −1(10)
C: x ⊕ ¬x ≡ −1(10)
D: x + ¬x ≡ −1(10)
E: x − ¬x ≡ −1(10)
is not correct?
Q11. a For the sets A = {1, 2, 3}, B = {3, 4, 5} and U = {1, 2, 3, 4, 5, 6, 7, 8}, compute the following:
i |A|.
ii A ∪ B.
iii A ∩ B.
iv A − B.
v A.
vi {x | 2 · x ∈ U}.
b For each of the following decimal integers, write down the 8-bit binary representation in sign-magnitude
and two’s-complement:
i +0.
ii −0.
iii +72.
iv −34.
v −8.
vi 240.
Q12. For some 32-bit integer x, explain what is meant by the Hamming weight of x; write a short C function to
compute the Hamming weight of a given 32-bit input.
Q17. A given set of Boolean operators may be termed functionally complete (or universal): this means any Boolean
function can be expressed using a Boolean expression involving elements of the set alone. For example, because
we know the NAND operator is functionally complete, we can also term the sets { ∧ } and {∧, ¬} functionally
complete. Noting that . and ⇏ denote the inverse of equivalence and implication respectively (i.e., not
equivalent, and does not imply), which of the following sets
A: {⊕, ∨}
B: {⇒, .}
C: {⇒, ⇏}
D: all of the above
E: none of the above
is/are functionally complete?
The final axiom is missing, i.e., replaced with X: which of the following options for X yields a valid derivation?
A: Absorption
B: Idempotency
C: Implication
D: Null
E: de Morgan
w
x
00 01 11 10
y
00 1 0 1 1
0 1 5 4
x
00 01 11 10
01 0 0 0 0
2 3 7 6
z
0 1 0 1 1 11 0 0 0 0
0 1 5 4 10 11 15 14
y
w 1 0 0 0 1 10 1 0 0 1
2 3 7 6 8 9 13 12
A B
w w
x x
00 01 11 10 00 01 11 10
00 1 0 1 1 00 1 0 1 1
0 1 5 4 0 1 5 4
01 0 0 0 1 01 0 0 0 1
2 3 7 6 2 3 7 6
z z
11 0 0 0 0 11 0 0 0 0
10 11 15 14 10 11 15 14
y y
10 1 0 0 1 10 1 0 0 1
8 9 13 12 8 9 13 12
C D
w
x
00 01 11 10
00 1 0 1 1
0 1 5 4
01 0 0 0 1
2 3 7 6
z
11 0 0 0 0
10 11 15 14
y
10 1 0 0 1
8 9 13 12
Figure A.1: A set of 5 different Karnaugh maps, captioned with an associated option.
Q22. Consider the following truth table, which describes a Boolean function f :
w x y z f (w, x, y, z)
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
Which of the Karnaugh maps shown in Figure A.1 will yield the most efficient (in terms of the number of
operators involved), correct Boolean expression for f ?
Q24. Consider a Boolean function f with n = 1 input x. How many such functions are not idempotent, i.e., how
many f exist such that ∀x ∈ {0, 1}, f ( f (x)) = f (x) does not hold?
A: 0
B: 1
C: 2
D: 3
E: 4
Q25. Consider a Boolean function f with n = 2 inputs x and y. How many such functions are symmetric, i.e., how
many f exist such that ∀x, y ∈ {0, 1}, f (x, y) = f (y, x) holds?
A: 0
B: 1
C: 2
D: 8
E: 16
exist in it.
b Consider the Boolean function
f (a, b, c, d) = ¬a ∧ b ∧ ¬c ∧ d.
Which of the following assignments
i a = 0, b = 0, c = 0 and d = 1,
ii a = 0, b = 1, c = 0 and d = 1,
iii a = 1, b = 1, c = 1 and d = 1,
iv a = 0, b = 0, c = 1 and d = 0.
i (a ∨ b ∨ d) ∧ (¬c ∨ d),
ii (a ∧ b ∧ d) ∨ (¬c ∧ d),
iii (a ∨ b ∨ d) ∨ (¬c ∨ d).
i a ∨ 1 ≡ a.
ii a ⊕ 1 ≡ ¬a.
iii a ∧ 1 ≡ a.
iv ¬(a ∧ b) ≡ ¬a ∨ ¬b.
i ¬¬a ≡ a.
ii ¬(a ∧ b) ≡ ¬a ∨ ¬b.
iii ¬a ∧ b ≡ a ∧ ¬b.
iv ¬a ≡ a ⊕ a.
Q29. a The OR form of the null axiom is x ∨ 1 ≡ 1. Which of the following options
i x ∧ 1 ≡ 1,
ii x ∧ 0 ≡ 0,
iii x ∨ 0 ≡ 0,
iv x ∧ x ≡ x,
i ¬ f = a ∨ b ∨ c ∨ d ∨ e,
ii ¬ f = a ∧ b ∧ c ∧ d ∧ e,
iii ¬ f = a ∧ b ∧ (c ∨ d ∨ e),
iv ¬ f = a ∧ b ∨ ¬c ∨ ¬d ∨ ¬e,
v ¬ f = (a ∨ b) ∧ c ∧ d ∧ e
is correct?
c If we write the de Morgan axiom in English, which of the following
i c∨d∨e
ii ¬c ∧ ¬d ∧ ¬e
iii ¬a ∧ ¬b
iv ¬a ∧ ¬b ∧ ¬c ∧ ¬d ∧ ¬e
(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)
into a form that contains the fewest operators possible, which of the following options
i a ∨ b ∨ c,
ii ¬a ∧ ¬b ∧ ¬c,
iii d ∨ e,
iv ¬d ∧ ¬e,
v none of the above
a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)
into a form that contains the fewest operators possible, which of the following options
i (b ∧ c) ∨ c,
ii c ∨ (a ∧ b ∧ c),
iii a ∧ c,
iv a ∨ (b ∧ c),
v none of the above
a ∧ b ∨ a ∧ b ∧ c ∨ a ∧ b ∧ c ∧ d ∨ a ∧ b ∧ c ∧ d ∧ e ∨ a ∧ b ∧ c ∧ d ∧ e ∧ f.
i a ∧ b ∧ c ∧ d ∧ e ∧ f,
ii a ∧ b ∨ c ∧ d ∨ e ∧ f,
iii a ∨ b ∨ c ∨ d ∨ e ∨ f,
iv a ∧ b,
v c ∧ d,
vi e ∧ f,
vii a ∨ b ∧ (c ∨ d ∧ (e ∨ f ))
viii ((a ∨ b) ∧ c) ∨ d ∧ e ∨ f
is correct?
i 1,
ii 2,
iii 3,
iv 4,
decide which is the least number of operator required to compute the same result as
f (a, b, c) = (a ∧ b) ∨ a ∧ (a ∨ c) ∨ b ∧ (a ∨ c).
f Prove that
(¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ≡ ¬x ∨ ¬y.
g Prove that
(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z)) ≡ y ∧ (x ∨ z).
A.2 Chapter 2
Q31. From the following list
A: has N-type semiconductor terminals and P-type body
B: has P-type semiconductor terminals and N-type body
C: is paired with another N-MOSFET to form a CMOS cell
D: has a threshold voltage above which the transistor is deemed active
identify each statement that correctly describes an N-MOSFET.
Vdd
Vss
Vdd
Vss
which implements a 3-input Boolean function r = f (x, y, z). Which function, from the following, do you think
it matches?
A: r=x∧y∧z
B: r=x
C: r = ¬(x ∧ (y ∨ z))
D: r = x ∧ (y ∨ z)
E: r=x∨y∨z
Q34. Recall that a 2-input XOR operator can be described via the following truth table:
XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0
An implementation of this operator is realised by combining logic gate instances, e.g., for NOT, NAND, AND,
NOR, and OR, while attempting to minimise the total number of underlying MOSFET-based transistors. How
many such transistors do you think it uses?
A: 14
B: 16
C: 18
D: 20
E: 22
Q35. A buffer can be described as a “pass through” logic gate: although it performs no computation (i.e., the output
r matches the input x, so r = x), it does impose a delay (often roughly the same as a NOT gate). It may be
termed a non-inverting buffer (cf. an inverting buffer, or NOT gate) because of this.
You are asked to implement a buffer, using an unconstrained organisation of N- and P-MOSFET transistors
alone. Assuming you attempt to minimise the number used, how many transistors do you need?
A: 0
B: 2
C: 4
D: 6
E: 8
f
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 ?
1 1 1 1
describes a 3-input, 1-output Boolean function f st. r = f (x, y, z). Which of the following Boolean expressions
A: (¬x ⊕ ¬y) ∧ z
B: (¬x ⊕ ¬y) ∨ z
C: (¬x ∧ ¬y) ∧ z
D: (¬x ∧ ¬y) ∨ z
E: (¬x ∨ ¬y) ∧ z
correctly realises f ?
Q37. Imagine you want to design an 8-input, 8-bit multiplexer. Rather than do so from scratch, you intend to form
the design using multiple instances of an existing 2-input, 1-bit multiplexer component. How many do you
need?
A: 1
B: 8
C: 24
D: 40
E: 56
ci ci co ci co ci co ci co co
x x x x
y s y s y s y s
x0 y0 r0 x1 y1 r1 x2 y2 r2 x3 y3 r3
illustrates a 4-bit ripple-carry adder circuit, constructed using 4 full-adder instances: it computes the sum
r = x + y + ci, given two operands x and y and a carry-in ci, and an associated carry-out co. Given the
propagation delay of NOT, AND, OR and XOR gates is 10ns, 20ns, 20ns and 60ns respectively, which of the
following
A: 120ns
B: 180ns
C: 240ns
D: 280ns
E: 480ns
most accurately reflects the critical path of the entire circuit?
Q39. Imagine you use the ripple-carry adder in the previous question to compute an unsigned addition within some
larger circuit. Having seen your design, your friend suggests they can optimise it: they claim that replacing each
full-adder instance with a half-adder instance will halve the total number of logic gates required. However, they
admit the optimisation does have a disadvantage. Specifically, although any value of x can be accommodated
the optimised circuit can only produce the correct output for some values of y. Which of the following values
of y
A: −1
B: 0
C: 1
D: any 2 ≤ y < 8
E: any 8 ≤ y < 16
will produce the correct output?
x0 r0
x1 r1
r2
x2
r3
x3
with a 4-bit input x and a 4-bit output r. Which of the following best describes the purpose of this circuit?
A: it computes the Hamming weight of x
w x y z r = f (w, x, y, z)
0 0 0 0 0
0 0 0 1 1
0 0 1 0 ?
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 0
Q41. Recalling that ? denotes don’t-care, consider the truth table shown in Figure A.2.
a Construction of a Karnaugh map for f demands formation of a set of groups; these (collectively) cover
of all 1 entries. Assuming the most efficient approach is adopted when forming said groups, how many
are required?
A: 1
B: 2
C: 3
D: 4
E: 6
b Using the Karnaugh map above (plus any subsequent optimisation steps you deem necessary), derive a
Boolean expression for f that minimises the number of operators required. How many operators remain
in said expression?
A: 1
B: 4
C: 5
D: 11
E: 12
which details the behaviour of three signals labelled x, y and z. Which of the following components could the
behaviour illustrated relate to?
A: an SR-type flip-flop
B: an SR-type latch
C: a D-type flip-flop
D: a D-type latch
E: a T-type flip-flop
S Q
R ¬Q
illustrates a preliminary NAND-based SR-latch design, in the sense it currently lacks an enable signal. If Q and
Q′ denote the current and next state respectively, which of the following excitation tables
Current Next
S R Q ¬Q Q′ ¬Q′
0 0 0 1 0 1
0 0 1 0 1 0
A 0 1 ? ? 0 1
1 0 ? ? 1 0
1 1 0 0
? ?
0 0 ? ? 1 1
0 1 ? ? 1 0
B 1 0 ? ? 0 1
1 1 0 1 0 1
1 1 1 0 1 0
(
0 0 ? ? 0 1
C 1 1 ? ? 1 0
(
0 ? ? ? 0 1
D 1 ? ? ? 1 0
(
? 0 ? ? 0 1
E ? 1 ? ? 1 0
a x x
r r c
y c y c
illustrates a circuit with well defined behaviour. Based on analysis of this behaviour, which of the following
components
A: a flip-flop
B: a latch
C: a RAM cell
D: a ROM cell
E: a clock multiplier
does the circuit implement?
Vdd Vdd
y
r1
x
y
r0
x
Vss Vss
(a) C0 (using P-type MOSFETs). (b) C1 (using N-type MOSFETs).
Q45. A m-output, 1-bit demultiplexer connects a 1-bit input x to one of m separate 1-bit outputs (say ri for 0 ≤ i < m).
The output is selected using an l-bit control signal c (or, equivalently, c is a collection of l separate 1-bit control
signals). If m = 5, what is the minimum value of l required?
A: 0
B: 1
C: 2
D: 3
E: 4
Q46. Figure A.3 describes the implementation of two components denoted C0 and C1 . Each component Ci produces
one output ri given two inputs x and y, and has been implemented using MOSFET transistors.
a The truth table below includes 5 possibilities for outputs r0 and r1 (stemming from instances of C0 and
C1 ), given x and y. Recall that Vss and Vdd are used to represent 0 and 1 respectively: which option is
correct?
A. B. C. D. E.
z}|{ z}|{ z}|{ z}|{ z}|{
x y r0 r1 r0 r1 r0 r1 r0 r1 r0 r1
0 0 1 0 0 0 1 0 Z 0 1 Z
0 1 1 1 0 0 0 0 Z Z Z Z
1 0 1 1 0 0 0 0 Z Z Z Z
1 1 0 1 1 0 0 0 1 Z Z 0
b The vendor of these components claims they can be used to implement any Boolean function; their
reasoning is based on the fact that a NAND gate can be implemented using instances of C0 and C1 .
Imagine you adhere to a design strategy where any given wire is driven by at most one non-Z value at
any given time, and want to minimise the number of C0 and C1 instances used: how many of each do
you need to implement a NAND gate?
A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
C0 C1 C0 C1 C0 C1 C0 C1 C0 C1
1 1 5 3 3 5 3 3 5 5
Q47. Moore’s Law is an observation about the number of transistors which can be fabricated within some fixed unit
of area: it observes that this number doubles roughly every two years. Which of the following properties of
MOSFET-based transistors act as a constraint with respect to Moore’s Law?
A: Feature size
B: Power consumption
C: Heat dissipation
D: All of the above
ci y x
t0
s
t1
t2 t3
co
Q48. Figure A.4 shows an implementation of a full-adder cell. It uses three 1-bit inputs denoted x, y, and ci (the
carry-in), to compute two 1-bit outputs denoted s (the sum) and co (the carry-out); several other intermediate
wires, namely t0 , t1 , t2 , and t3 , are labelled for reference. Let
denote a change in said inputs: the LHS captures current values, whereas the RHS captures next (or new)
values. For example,
(0, 0, 0) → (1, 0, 0)
toggles x from 0 to 1, while both y and ci remain 0. Which of the following options will cause s to toggle in the
shortest period of time (i.e., with the shortest delay)?
A: (0, 0, 0) → (0, 0, 1)
B: (0, 0, 1) → (0, 1, 1)
C: (0, 1, 1) → (0, 0, 1)
D: (1, 1, 1) → (1, 1, 0)
E: (1, 0, 1) → (0, 1, 1)
Q49. Figure A.5 shows an implementation of a cyclic n-bit counter. While the counter is operational (i.e., while not
reset, and given a clock signal), each ri will transition between 0 and 1 at a different frequency. For the concrete
case of n = 4, which does so at the lowest frequency?
A: r4
B: r3
C: r2
D: r1
E: r0
Q50. Consider a 16-bit register, constructed from CMOS-based D-type latches. Based on high-level reasoning about
this component alone, if the initial value stored is DEAD(16) then overwriting it with which of the following
A: BEEF(16)
B: F00D(16)
C: 1234(16)
D: FFFF(16)
E: 0000(16)
might you expect to consume the most power?
rn−1
Q
¬Q
co
¬Q
s
en
en
D
D
ci
x
y
0
Q
¬Q
co
¬Q
s
en
en
D
D
ci
x
y
r1
Q
¬Q
co
¬Q
s
en
en
D
D
ci
x
y
r0
Q
¬Q
co
¬Q
s
en
en
D
D
ci
x
y
1
rst
0
Φ2
Φ1
A B C D E
1. × ✓ × × ✓
2. × × ✓ × ✓
3. × × × ✓ ✓
states whether f can (a tick) or cannot (a cross) be implemented using a given set of components (i.e., row),
namely
plus the constant values 0 and 1. For example, option C states that f can be implemented by using component
set 2 but not 1 or 3. Which option do you think is correct?
1 x
r r
0 y c
x r ≡
i.e., that one can implement a NOT gate using one instance of a 2-input, 1-bit multiplexer component. Assuming
you want to minimise the number of multiplexer instances, identify how many are required to implement the
expression
(x ∧ y) ∨ z.
A: 1
B: 2
C: 3
D: 6
E: 8
Q53. Consider the combinatorial logic design as shown in Figure A.6, which is described using N-type and P-type
MOSFET transistors. Within the design, three inputs (i.e., x, y, and z) and one output (i.e., r) can be identified;
note that several transistors (e.g., m0 ) and intermediate signals (e.g., t0 ) are annotated for reference. Which of
the following Boolean expressions
A: ¬x
B: ¬((x ∨ y) ∧ z)
C: (¬(x ∨ y)) ∧ z
D: ¬(x ∧ y ∧ ¬z)
E: ¬(x ∨ y ∨ ¬z)
does the design implement?
Q54. Consider the sequential logic design as shown in Figure A.7, which contains two D-type flip-flops. Within the
design, one output (i.e., r) can be identified; note that several intermediate signals (e.g., t0 ) are annotated for
reference. If the clock signal clk has a frequency of 400MHz, what is the frequency of r?
A: 100MHz
B: 200MHz
x y z
Vdd
m0 m4
m1 m5 m8
t1
r
t0
m2 m6 m9
m3 m7
Vss
Figure A.6: A combinatorial logic design, described using N-type and P-type MOSFET transistors.
t0 t1 t2 t3
1 D Q D Q r
en en
¬Q ¬Q
clk
C: 400MHz
D: 800MHz
E: 1600MHz
x
r r
p y c
which is described using a 2-input, 1-bit multiplexer. Within the design, two inputs (i.e., p and q) and one
output (i.e., r) can be identified. Which of the following Boolean expressions
A: r = ¬p
B: r=p∧q
C: r = ¬(p ∧ q)
D: r=p⊕q
E: r = ¬(p ⊕ q)
correctly reflects the relationship between inputs and output?
Vdd
x m0
y m1 z m2 m3
r
t0
pull-down network m4
Vss
Figure A.8: A combinatorial logic design, described using N-type and P-type MOSFET transistors; note that the
pull-down network is (partially) missing.
S Q
R ¬Q
Imagine that the two NAND gates have a non-zero, but unequal gate delay associated with them, i.e., the top
gate has the delay x whereas the bottom gate has the delay x ± δ for some x and δ > 0. If the current input
S = R = 0 is changed instantaneously to S = R = 1, what will the outputs be?
A: Q = 1, ¬Q = 1
B: either Q = 0, ¬Q = 1 or Q = 1, ¬Q = 0
C: either Q = 1, ¬Q = 1 or Q = 0, ¬Q = 0
D: Q = 0, ¬Q = 0
E: None of the above
Q57. Consider the combinatorial logic design as shown in Figure A.8, which is described using N-type and P-type
MOSFET transistors. Within the design, three inputs (i.e., x, y, and z) and one output (i.e., r) can be identified;
note that several transistors (e.g., m0 ) and intermediate signals (e.g., t0 ) are annotated for reference. Despite the
fact that the pull-down network is (partially) missing, it is still possible to infer how the design works: which
of the following Boolean expressions
A: r=x⊕y
B: r = (¬x ∧ ¬y) ∨ ¬z
C: r = (¬x ∨ ¬y) ∧ ¬z
D: r = (x ∧ y) ∨ z
E: r = (x ∨ y) ∧ z
correctly reflects the relationship between inputs and output?
en
x r
If x ∈ {0, 1} and en ∈ {0, 1}, how many different values can r potentially take?
A: 1
B: 2
C: 3
D: 4
E: 5
x>y
(
1
r = f (x, y) =
0 otherwise
For how many combinations of the unsigned, 2-bit inputs x and y is the output r = 1?
A: 1
B: 2
C: 4
D: 6
E: 8
Q60. Consider a micro-processor which is compatible with the ARMv7-A ISA. During execution of an instruction,
the fetch stage of the fetch-decode-execute cycle computes PC + 4, which (potentially) forms the program
counter in the next cycle. In an initial implementation of the micro-processor, PC + 4 is computed by using
a general-purpose ripple-carry adder. Said adder is subsequently optimised, however, by capitalising on the
special-purpose for of computation: ARMv7-A demands that PC is word-aligned, for example.
Assuming that logic gates for NOT, AND, OR, and XOR require 1, 2, 2, and 4 units of area respectively, the
general-purpose solution requires
units of area due to the use of 32 full-adder cells. If the optimisation aims to minimise area, which of the
following options
A: 2.00
B: 2.31
C: 2.33
D: 2.52
E: 3.14
most accurately reflects the improvement factor offered by the special-purpose solution?
Q61. Binary-Coded Decimal (BCD) is a representation for decimal integers, where each decimal digit in some x is
represented independently by a 4-bit binary sequence in r. For example,
Note that because each 0 ≤ xi < 10 and 24 = 16 > 10, some values of the associated BCD-encoded digit ri are
impossible.
Imagine you are asked to implement a 4-input Boolean function f using combinatorial logic, which will be
used to process BCD-encoded digits. Select an option to complete blanks in the sentence “a Karnaugh map cell
which contains a can be treated as either or in order to the resulting term” , so that it
correctly describes how you might deal with an impossible BCD-encoded digit.
A: don’t care, AND, OR, eliminate
B: duplicate, 1, 0, verify
C: unknown, 1, 0, simplify
D: don’t care, 1, 0, simplify
S
⊙ Q
⊙ ¬Q
R
S Q
en
R ¬Q
Figure A.10: An SR-latch variant, which includes additional inputs P, C, and en.
E: unknown, 0, 1, optimise
Q62. The block diagram in Figure A.9, describes a sequential logic component, or, more specifically, an SR-type
latch: it does so by using two abstract components labelled ⊙. If the associated excitation table is as follows
S R Q ¬Q Q′ ¬Q′
1 1 0 1 0 1
1 1 1 0 1 0
1 0 ? ? 1 0
0 1 ? ? 0 1
0 0 ? ? ? ?
Q63. Figure A.10, describes a sequential logic component, or, more specifically, a variant of the SR-latch: in addition
to S and R, it also includes the inputs labelled P, C, and en.
S and R P and C
A synchronous synchronous
B synchronous asynchronous
C asynchronous synchronous
D asynchronous asynchronous
S and R P and C
A active low active low
B active low active high
C active high active low
D active high active high
Q64. Write the simplest (i.e., with fewest operators) possible Boolean expression that implements the Boolean
function
r = f (x, y, z)
described by
f
x y z r
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 ?
1 0 0 1
1 0 1 0
1 1 0 ?
1 1 1 1
where ? denotes don’t care.
Q66. Recall that an SR latch has two inputs S (or set) and R (or reset); if S = R = 1, the two outputs Q and ¬Q are
undefined. This issue can be resolved by using a reset-dominate latch: the alternative design has the same
inputs and outputs, but resets the latch (i.e., has Q = 0 and ¬Q = 1) whenever S = R = 1.
Using a gate-level circuit diagram, describe how a reset-dominate latch can be implemented using only
NOR gates and at most one AND gate.
Q67. The quality of the design for some hardware component is often judged by measuring efficiency, for example
how quickly it can produce output on average. Name two other metrics that might be considered.
Q68. a Describe how N-type and P-type MOSFET transistors are constructed using silicon and how they operate
as switches.
b Draw a diagram to show how N-type and P-type MOSFET transistors can be used to implement a NAND
gate. Show your design works by describing the transistor states for each input combination.
Vss
details a 2-input NAND gate comprised of two P-MOSFET transistors (top) and two N-MOSFET transistors
(bottom). Draw a similar diagram for a 3-input NAND gate.
Q70. Moore’s Law predicts the number of CMOS-based transistors we can manufacture within a fixed sized area
will double roughly every two years; this is often interpreted as doubling computational efficiency over the
same period. Briefly explain two limits which mean this trend cannot be sustained indefinitely.
Q71. Given that ? is the don’t care state, consider the following truth table which describes a function p with four
inputs (a, b, c and d) and two outputs (e and f ):
p
a b c d e f
0 0 0 0 0 0
0 0 0 1 0 1
0 0 1 0 1 0
0 0 1 1 ? ?
0 1 0 0 0 1
0 1 0 1 1 0
0 1 1 0 0 0
0 1 1 1 ? ?
1 0 0 0 1 0
1 0 0 1 0 0
1 0 1 0 0 1
1 0 1 1 ? ?
1 1 0 0 ? ?
1 1 0 1 ? ?
1 1 1 0 ? ?
1 1 1 1 ? ?
a From the truth table above, write down the corresponding Sum of Products (SoP) equations for e and f .
b Simplify the two SoP equations so that they use the minimum number of logic gates possible. You can
assume the two equations can share logic.
Q72. Using a Karnaugh map, derive a Boolean expression for the function
r = f (x, y, z)
Q73. NAND is a universal logic gate in the sense that the behaviour of NOT, AND and OR gates can be implemented
using only NAND. Show how this is possible using a truth table to demonstrate your solution.
Q74. Both NAND and NOR gates are described as universal because any other Boolean gate (i.e., AND, OR, NOT)
can be constructed using them. Imagine your friend suggests a 4-input, 1-bit multiplexer (that selects between
four 1-bit inputs using two 1-bit control signals to produce a 1-bit output) is also universal: state whether or
not you believe them, and explain why.
Q75. Consider the following circuit where the propagation delay of logic gates in the circuit are 10ns for NOT, 20ns
for AND, 20ns for OR and 60ns for XOR:
b
e
d
a Draw a Karnaugh map for this circuit and derive a Sum of Products (SoP) expression for the result.
b Describe advantages and disadvantages of your SoP expression and the dynamic behaviour it produces.
c If the circuit is used as combinatorial logic within a clocked system, what is the maximum clock speed of
the system?
Q76. A game uses nine LEDs to display the result of rolling a six-sided dice; the i-th LED, say Li for 0 ≤ i < 9, is
driven with 1 or 0 to turn it on or off respectively. A 3-bit register D represents the dice as an unsigned integer.
L0 L3 L6
L1 L4 L7
L2 L5 L8
and the required mapping between dice and LEDs, given a filled dot means an LED is on, is
Using Karnaugh maps as appropriate, write a simplified Boolean expression for each LED (i.e., for each
Li in terms of D).
b The 2-input XOR, AND, OR and NOT gates used to implement your expressions have propagation delays
of 40, 20, 20 and 10 nanoseconds respectively. Calculate how many times per-second the dice can be
rolled, i.e., D can be updated, if the LEDs are to provide the correct output.
c The results of individual dice throws will be summed using a ripple-carry adder circuit, to give a total;
each 3-bit output D will be added to and stored in an n-bit accumulator register A.
i Using a high-level block diagram, show how an n-bit ripple-carry adder circuit is constructed from
full-adder cells.
ii If m = 8 throws of the dice are to be summed, what value for n should be selected?
iii Imagine that instead of D, we want to add 2 · D to A. Doubling D can be achieved by computing
either D + D or D ≪ 1 (i.e., a left-shift of D by 1 bit). Carefully state which method is preferable, and
why.
Q77. Consider a simple component called C that compares two inputs x and y (both are unsigned 8-bit integers) in
order to produce their maximum and minimum as two outputs:
?
x - C - min(x, y)
?
max(x, y)
Instances of C can be connected in a mesh to sort integers: the input is fed into the top and left-hand edges of
the mesh, the sorted output appears on the bottom and right-hand edges. An example is given below:
5 2 4 1
? ? ? ?
3 - C 3- C 2- C 2- C - 1
5 3 4 2
? ? ? ?
2 - C 2- C 2- C 2- C - 2
5 3 4 2
? ? ? ?
6 - C 5- C 3- C 3- C - 2
6 5 4 3
? ? ? ?
7 - C 6- C 5- C 4- C - 3
? ? ? ?
7 6 5 4
a Using standard building blocks (e.g., adder, multiplexer etc.) rather than individual logic gates, draw a
block diagram that implements the component C.
b Imagine that an n × n mesh of components is created. Based on your design for C and clearly stating any
assumptions you need to make, write down an expression for the critical path of such a mesh.
c Algorithms for sorting integers can clearly be implemented on a general-purpose processor. Explain two
advantages and two disadvantages of using such a processor versus using a mesh like that above.
Q78. Imagine you are working for a company developing the “Pee”, a portable games console. The user interface is
a fancy controller that has
a The fire button inputs are described as level triggered and active high; explain what this means (in
comparison to the alternatives in each case).
b Some customers want an “autofire” feature that will automatically and repeatedly press the F0 fire button
for them. The autofire can operate in four modes, selected by a switch called M: off (where the fire button
F0 works as normal), slow, fast or very fast (where the fire button F0 is turned on and off repeatedly at the
selected speed). Stating any assumptions and showing your working where appropriate, design a circuit
that implements such a feature.
c In an attempt to prevent counterfeiting, each controller can only be used with the console it was sold
with. This protocol is used:
P C
$
− {0, 1}3
c←
c
−→
r = T(c)
r
←−
?
r = T(c)
i There is some debate as to whether the protocol should be synchronous or asynchronous; explain
what your recommendation would be and why.
ii The function T is simply a look-up table. For example
2 if x = 0 if x = 4
4
6 if x = 1 if x = 5
0
T(x) =
7 if x = 2
5 if x = 6
1 if x = 3 if x = 7
3
Each pair of console and controller has such a T fixed inside them during the manufacturing process.
Stating any assumptions and showing your working where appropriate, explain how this T might
be implemented as a circuit.
Q79. Imagine you have three Boolean values x, y, and z. Given access to as many AND and OR gates as you want
but only two NOT gates, write a set of Boolean expressions to compute all three results ¬x, ¬y and ¬z.
Q80. SAT is the problem of finding an assignment to n Boolean variables which means a given Boolean expression
is satisfied, i.e., evaluates to 1. For example, given n = 3 and the expression
(x ∧ y) ∨ ¬z,
x = 1, y = 1, z = 0 is one assignment (amongst several) which solves the associated SAT problem.
The ability to solve SAT can be used to test whether or not two n-input, 1-output combinatorial circuits C1
and C2 are equivalent. Show how this is possible.
Q81. Consider the following combinatorial circuit, which is the composition of four parts (labelled A, B, C and REG):
each part is annotated with a name and an associated critical path. The circuit computes an output r = f (x)
from the corresponding input x.
x A B C REG r = f (x)
10ns 30ns 20ns 10ns
b explain how and why you would expect use of pipelining to influence both metrics.
Q82. The figure below shows a block of combinatorial logic built from seven parts; the name and latency of each
part is displayed inside it. Note that the last part is a register which stores the result:
x A B C D E F REG r = f (x)
40ns 10ns 30ns 10ns 50ns 10ns 10ns
It is proposed to pipeline the block of logic using two stages such that there is a pipeline register in between
parts D and E:
a Explain the terms latency and throughput in relation to the idea of pipelining.
b Calculate the overall latency and throughput of the initial circuit described above.
c Calculate the overall latency and throughput of the circuit after the proposed change.
d Calculate the number of extra pipeline registers required to maximise the circuit throughput; state this
new throughput and the associated latency. Explain the advantages and disadvantages of this change.
This is a (large) set of example Boolean minimisation questions: each asks you to transform some truth table
describing an n-input Boolean function into a Boolean expression. Each solution includes
1. a reference implementation (produced by forming a SoP expression with a full term for each minterm,
i.e., row where r = 1), and
2. a Karnaugh map annotated with sensible groups, and an optimised implementation based on those
groups.
The goal is to focus on producing the latter, since the former is somewhat easier. Keep in mind and take care
wrt. the following:
n n
• There are 22 Boolean functions with n inputs (or 32 if you include don’t-care as a valid output); whereas
for small n a complete set of functions is included, but for large n there is only a random sub-set.
• No real effort is made to order the questions, and only minor effort to avoid duplicates. That said, there
should be no trivial (in the sense r = 1 or r = 0 for all inputs, e.g., tautological) cases.
• The questions and solutions are generated automatically, meaning a small but real chance of bugs in the
associated implementation!
Q83.
y z r
0 0 1
0 1 0
1 0 1
1 1 1
Q84.
y z r
0 0 1
0 1 1
1 0 0
1 1 1
Q85.
y z r
0 0 1
0 1 0
1 0 0
1 1 0
Q86.
y z r
0 0 1
0 1 1
1 0 0
1 1 0
Q87.
y z r
0 0 0
0 1 0
1 0 0
1 1 1
Q88.
y z r
0 0 1
0 1 0
1 0 0
1 1 1
Q89.
y z r
0 0 0
0 1 0
1 0 1
1 1 0
Q90.
y z r
0 0 0
0 1 1
1 0 1
1 1 0
Q91.
y z r
0 0 0
0 1 1
1 0 0
1 1 0
Q92.
y z r
0 0 1
0 1 0
1 0 1
1 1 0
Q93.
y z r
0 0 0
0 1 ?
1 0 1
1 1 0
Q94.
y z r
0 0 ?
0 1 ?
1 0 0
1 1 1
Q95.
y z r
0 0 0
0 1 1
1 0 ?
1 1 1
Q96.
y z r
0 0 1
0 1 ?
1 0 1
1 1 0
Q97.
y z r
0 0 1
0 1 1
1 0 0
1 1 ?
Q98.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0
Q99.
x y z r
0 0 0 1
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1
Q100.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 0
Q101.
x y z r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1
Q102.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1
Q103.
x y z r
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0
Q104.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 1
Q105.
x y z r
0 0 0 1
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0
Q106.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
Q107.
x y z r
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 1
Q108.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 ?
1 0 0 0
1 0 1 ?
1 1 0 1
1 1 1 0
Q109.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 0
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 ?
Q110.
x y z r
0 0 0 0
0 0 1 ?
0 1 0 1
0 1 1 1
1 0 0 ?
1 0 1 1
1 1 0 ?
1 1 1 ?
Q111.
x y z r
0 0 0 ?
0 0 1 0
0 1 0 ?
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 0
1 1 1 1
Q112.
x y z r
0 0 0 1
0 0 1 1
0 1 0 0
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 0
1 1 1 0
Q113.
x y z r
0 0 0 ?
0 0 1 0
0 1 0 0
0 1 1 ?
1 0 0 1
1 0 1 ?
1 1 0 1
1 1 1 1
Q114.
x y z r
0 0 0 ?
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 1
Q115.
x y z r
0 0 0 0
0 0 1 1
0 1 0 ?
0 1 1 ?
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 ?
Q116.
x y z r
0 0 0 1
0 0 1 ?
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 ?
1 1 0 ?
1 1 1 1
Q117.
x y z r
0 0 0 0
0 0 1 0
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 0
1 1 1 ?
Q118.
w x y z r
0 0 0 0 1
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
Q119.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0
Q120.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Q121.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0
Q122.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0
Q123.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 0
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
Q124.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 0
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
Q125.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 0
Q126.
w x y z r
0 0 0 0 1
0 0 0 1 0
0 0 1 0 1
0 0 1 1 1
0 1 0 0 1
0 1 0 1 0
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 0
1 1 1 1 0
Q127.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 0
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Q128.
w x y z r
0 0 0 0 0
0 0 0 1 ?
0 0 1 0 ?
0 0 1 1 0
0 1 0 0 0
0 1 0 1 ?
0 1 1 0 ?
0 1 1 1 ?
1 0 0 0 0
1 0 0 1 0
1 0 1 0 1
1 0 1 1 ?
1 1 0 0 0
1 1 0 1 1
1 1 1 0 1
1 1 1 1 ?
Q129.
w x y z r
0 0 0 0 ?
0 0 0 1 ?
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 0
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 0
Q130.
w x y z r
0 0 0 0 ?
0 0 0 1 ?
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 1
1 0 0 1 0
1 0 1 0 ?
1 0 1 1 0
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 1
Q131.
w x y z r
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 ?
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0
1 0 0 1 0
1 0 1 0 1
1 0 1 1 1
1 1 0 0 1
1 1 0 1 1
1 1 1 0 0
1 1 1 1 ?
Q132.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 ?
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 ?
1 0 0 0 ?
1 0 0 1 1
1 0 1 0 0
1 0 1 1 ?
1 1 0 0 ?
1 1 0 1 1
1 1 1 0 ?
1 1 1 1 0
Q133.
w x y z r
0 0 0 0 0
0 0 0 1 ?
0 0 1 0 0
0 0 1 1 ?
0 1 0 0 1
0 1 0 1 0
0 1 1 0 0
0 1 1 1 0
1 0 0 0 ?
1 0 0 1 ?
1 0 1 0 ?
1 0 1 1 1
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 0
Q134.
w x y z r
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 ?
0 1 0 0 0
0 1 0 1 0
0 1 1 0 ?
0 1 1 1 1
1 0 0 0 ?
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 ?
1 1 0 1 0
1 1 1 0 1
1 1 1 1 1
Q135.
w x y z r
0 0 0 0 ?
0 0 0 1 1
0 0 1 0 ?
0 0 1 1 0
0 1 0 0 1
0 1 0 1 ?
0 1 1 0 1
0 1 1 1 ?
1 0 0 0 ?
1 0 0 1 1
1 0 1 0 ?
1 0 1 1 ?
1 1 0 0 ?
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 1
Q136.
w x y z r
0 0 0 0 ?
0 0 0 1 0
0 0 1 0 ?
0 0 1 1 1
0 1 0 0 ?
0 1 0 1 ?
0 1 1 0 1
0 1 1 1 1
1 0 0 0 1
1 0 0 1 ?
1 0 1 0 ?
1 0 1 1 ?
1 1 0 0 1
1 1 0 1 ?
1 1 1 0 ?
1 1 1 1 1
Q137.
w x y z r
0 0 0 0 ?
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 0
0 1 1 1 1
1 0 0 0 ?
1 0 0 1 0
1 0 1 0 ?
1 0 1 1 0
1 1 0 0 ?
1 1 0 1 ?
1 1 1 0 0
1 1 1 1 ?
A.3 Chapter 3
Q138. Mike Rowchip was an engineer, working on an ALU design for a new processor: he had completed the design
and implementation of most but not all modules, before he was, unfortunately, run over by a bus. You have
while( x & m ) {
x = x & ~m;
m = m << 1;
}
return x | m;
}
However, Mike left no documentation beyond this. Thanks Mike. What do you think it does?
A: add x to y
B: compute the Hamming weight of x
C: left-rotate x by 1 bit
D: increment x
E: decrement x
wherein c is a placeholder for the condition expression; the statement body (i.e., the continuation dots) is
executed iff. evaluating the condition expression yields a non-zero result. C represents signed integers using
two’s-complement: for an unsigned, 32-bit integer x, imagine we want to execute the statement body if either
every bit of x is 0 or every bit of x is 1. Which of the following choices for the condition expression would
achieve this?
A: ( x == 0 ) || ( x == -1 )
B: !x || !(~x)
C: ( x + 1 ) < 2
D: All of the above
E: None of the above
Q140. Figure ?? captures the design of an n-bit ripple-carry adder, constructed using n full-adder instances connected
by a carry chain denoted c. If c0 = ci (the carry-in) and cn = co (the carry-out), then ci would more generally
denote the carry into the i-th full-adder instance. If n = 4 and ci = 0, which of the following options
A x = 0000(2) y = 0000(2)
B x = 1100(2) y = 0001(2)
C x = 0100(2) y = 0100(2)
D x = 1011(2) y = 1001(2)
E x = 0110(2) y = 0101(2)
would produce c2 = 1?
Q141. Consider two integers x and y, whose sum r = x + y is computed using a ripple-carry adder; x, y, and r are all
8-bit signed integers, represented using two’s-complement. The associated flag
is used to signal whether an overflow occurred during computation of r. Which of the following
A: f = r8
B: f = (x7 ∧ y7 ∧ ¬r7 ) ∨ (¬x7 ∧ ¬y7 ∧ r7 )
C: f = (x7 ∨ y7 ∨ ¬r7 ) ∧ (¬x7 ∨ ¬y7 ∨ r7 )
D: f = x7 ∧ y7 ∧ r7
E: f = x7 ⊕ y7 ⊕ r7
is the correct Boolean expression for f ?
Q142. An n-bit ripple-carry adder has a critical path that can be described as O(n) gate delays. Explain intuitively
why this is the case, and name an alternative whose critical path is shorter.
Q143. Give a single-line C expression to test if a non-zero integer x is an exact power-of-two; i.e., if x = 2n for some n
then the expression should evaluate to a non-zero value, otherwise it evaluates to zero.
Q144. Imagine you are writing a C program that includes a variable called x. If x has the type char and a current
value of 127, what is the new value after
the variable?
Q145. Imagine x represents a two’s-complement, signed integer using 4 bits; xi denotes the i-th bit of x. Write a
human-readable description (i.e., the meaning) of what the Boolean function
computes arithmetically.
Q146. Given an n-bit input x, draw a block diagram of an efficient (i.e., with a short critical path) combinatorial circuit
that can compute r = 7 · x (i.e., multiply x by the constant 7). Take care to label each component, and the size
(in bits) of each input and output.
Q147. Let xi and yi denote the i-th bit of two unsigned, 2-bit integers x and y (meaning that 0 ≤ i < 2). Design a
(2 × 2)-bit combinatorial multiplier circuit that can compute the 4-bit product r = x · y.
Q148. a Comparison operations for a given processor take two 16-bit operands and return zero if the comparison
is false or non-zero if it is true. By constructing some of the comparisons using combinations of other
operations, show that implementing all of =, ,, <, ≤, > and ≥ is wasteful. State the smallest set of
comparisons that need dedicated hardware such that all the standard comparisons can be executed.
b The ALU in the same processor design does not include a multiply instruction. So that programmers can
still multiply numbers, write an efficient C function to multiply two 16-bit inputs together and return the
16-bit lower half of the result. You can assume the inputs are always positive.
c The population count or Hamming weight of x, denoted by HW(x) say, is the number of bits in the binary
expansion of x that equal one. Some processors have a dedicated instruction to do this but the proposed
one does not; write an efficient C function to compute the population count of 16-bit inputs.
Q149. Imagine we want to compute the result of multiplying two n-bit numbers x and y together, i.e., r = x · y, where
n is even. One can adopt a divide-and-conquer approach to this computation by splitting x and y into two
parts each of size n/2 bits
x = x1 · 2n/2 + x0
y = y1 · 2n/2 + y0
and then computing the full result
r = r2 · 2n + r1 · 2n/2 + r0
via the parts
r2 = x1 · y1
r1 = x1 · y0 + x0 · y1
r0 = x0 · y0 .
The naive approach above uses four multiplications of (n/2)-bit values. The Karatsuba-Ofman method reduces
this to three multiplications (and some extra low-cost operations); show how this is achieved.
a What is the result of using a normal 4-bit adder circuit to compute the sum 10 + 12?
b A saturating (or clamped) adder is such that if an overflow occurs, i.e., the result does not fit into 4 bits,
the highest possible result is returned instead. With a clamped 4-bit addition denoted by ⊎, we have that
10 ⊎ 12 = 15 for example. In general, for an n-bit clamped adder
x+y if x + y < 2n
(
x⊎y= n
2 − 1 otherwise
Q151. A software application needs 8-bit, unsigned modular multiplication, i.e., it needs to compute
x·y (mod N)
A.4 Chapter 4
Q152. From the following list
A: the design of a DRAM cell includes more transistors than an SRAM cell
B: an SRAM cell can store more information than a DRAM cell
C: SRAM cells can be accessed more quickly than DRAM cells
D: DRAM cells require a mechanism to refresh their content
identify each statement that correctly describes SRAM and DRAM cells.
Q153. Figure A.11 illustrates the design of a DRAM memory. The labels on four components in the block diagram
have been blanked-out, then replaced with the symbols α, β, γ, and δ: which of the following mappings
NO. 2 CLOCK
GENERATOR
OE\
COLUMN
10
β
ADDRESS 10
COLUMN
DECODERδ Vcc
A0 BUFFER 4 Vss
A1 1024
A2 SENSE AMPLIFIERS
REFRESH I/O GATING
A3 CONTROLLER
A4 1024 x 4
A5
A6
REFRESH
A7
COUNTER
A8
DECODER
A9 10
ROW
MEMORY
α
1024
10
ROW ADDRESS ARRAY
10
BUFFERS (10)
NO. 1 CLOCK
RAS\
GENERATOR
NOTE: WE\ LOW prior to CAS\ LOW, EW detection circuit output is a HIGH (EARLY-WRITE)
CAS\ LOW
Figure A.11: A 4Mbit prior to WE\
DRAM LOW,
block EW detection
diagram circuit output
(source: is a LOW (LATE-WRITE)
http://www.micross.com/pdf/MT4C4001J.pdf ).
MT4C4001J Micross Components reserves the right to change products or specifications without notice.
Rev. 2.3 03/10
2
Q155. Consider a DRAM-based memory device with a capacity of 65536 addressable bytes. Of the following options
A: 8 address pins, 65536 cells
B: 16 address pins, 65536 cells
C: 8 address pins, 524288 cells
D: 16 address pins, 524288 cells
E: none of the above
which offers the most likely description of said device?
Q156. Consider an SRAM-based memory device, which has a 4-bit data bus and 12-bit address bus. If your goal is to
construct a 32KiB memory and only have such devices available, how many will you need?
A: 1
B: 2
C: 4
D: 8
E: 16
Q157. Imagine you are using an 1kB, byte-addressable memory within some larger system. In doing so, you make
a mistake which means the 4-th address wire A4 is not correctly connected: it therefore has the fixed value
A4 = 0. Which of the following options
A: 1
B: 4
address
A
decoder
en0 en1 en2 en3
Figure A.12: A diagrammatic description of an 8-bit micro-processor and associated memory system.
C: 256
D: 512
E: 1024
reflects the number of addresses now accessible if the memory is
a an SRAM, or
b a DRAM.
Q158. Consider an 8-bit micro-processor, connected to a memory system via an 18-bit address bus: let A denote said
bus, such that Ai for 0 ≤ i < 18 is the i-th bit. The memory system is comprised of 4 separate memory devices
(either RAMs or ROMs) denoted MEM0 , MEM1 , MEM2 , and MEM3 . An address decoder maps addresses to
memory devices by controlling a set of associated enable (or chip select) signals, i.e., en0 , en1 , en2 , and en3 .
Figure A.12 offers a diagrammatic version of the same description, noting various extraneous control signals
are omitted for clarity.
If the enable signals are
en0 = ¬A17 ∧ ¬A16 ∧ ¬A15
en1 = ¬A17 ∧ ¬A16 ∧ A15 ∧ ¬A14
en2 = ¬A17 ∧ A16
en3 = A17 ∧ A16 ∧ A15 ∧ A14 ∧ A13 ∧ A12 ∧ A11 ∧ A10 ∧ A9 ∧ A8 ∧ A7 ∧ A6 ∧ A5
which memory device is address A = 48350 mapped to?
A: MEM0
B: MEM1
C: MEM2
D: MEM3
Q160. Consider a 1Mbit SRAM memory device (i.e., housing a total of 106 SRAM memory cells, each holding a 1-bit
value), and a DRAM-based alternative with the same capacity: you are tasked with deciding which device to
use within some larger system. After reading the data sheets, it seems that
a the DRAM-based device might be harder to integrate into the system, and
b the SRAM-based device should have a lower access latency.
Briefly explain why each statement is accurate.
Q161. At a high level, a DRAM memory device could be described as an array (or matrix) of 1-bit cells with an
interface including a data pin, address pins and control pins (e.g., chip select, output and write enable, row
and column strobes). Carefully explain the purpose of
a row and column buffers, and
b row and column decoders
which represent components in such a device.
A.5 Chapter ??
Q162. Consider a Finite State Machine (FSM) whose concrete implementation is as follows:
D Q D Q
en en
¬Q ¬Q r
Φ2
D Q D Q
en en
¬Q ¬Q
Φ1
Notice that the implementation is based on use of four D-type latches, and a 2-phase clock supplied via Φ1 and
Φ2 ; one additional input x plus one output r are also evident. To function correctly, a clock generator ensures
Φ1 and Φ2 are driven as follows:
Φ2
Φ1
identify each property the clock generator must guarantee is true for the implementation to function
correctly.
b Consider the two D-type latches at the bottom of the diagram, which form a 2-bit register. Imagine
the value stored in this register is expressed as a 2-bit integer: when the implementation is initially
powered-on, is this value equal to
A: 00(2)
B: 01(2)
C: 10(2)
D: 11(2)
E: any of the above
c Any FSM specification will include a transition function, often denoted δ, which can be described in
x=1
x=1
x=1 x=1 x=1
start S0 S1 S2 S3 S4
x=0
A
x=0 x=0 x=0
x=0
x=1
x=1
x=1 x=1
start S0 S1 S2 S3
B x=0
x=0 x=0
x=0
x=0
x=0
x=0 x=0
start S0 S1 S2 S3
C x=1
x=1 x=1
x=1
x=1
x=1 x=1 x=1
start S0 S1 S2 S3
D x=0 x=0 x=0
x=0
x=1
start S0 S1 x=1 S2
E
x=0 x=1 x=0
A: Mealy
B: Moore
A: 1.0kHz
B: 5.9MHz
C: 9.0MHz
D: 9.5MHz
E: 1.0GHz
A: set r = 1 iff. the current value of x is different from the previous value of x,
B: act as a modulo 4 counter that is incremented by the value of x, and set r = 1 the current counter
value is zero,
C: compute the Hamming weight of a sequence fed as input bit-by-bit via x, and set r = 1 once this is
equal to 3
D: count the number of consecutive times x = 1, and set r = 1 once this is equal to 3
E: inspect the sequence fed as input bit-by-bit via x, and set r = 1 iff. this sequence, when interpreted
as an unsigned integer, is odd
Q163. Figure A.13 and Figure A.14, describe an FSM implementation and an associated waveform. When read left-
to-right, the waveform captures how values of Φ1 and Φ2 (a 2-phase clock), and rst (a reset signal) change over
time; the other input s maintains the value A6(16) throughout. Note that the waveform is annotated with some
instances and periods in time (e.g., ρ, and each ti ).
A: 0
B: 1
C: undefined
A: 0
B: 1
C: undefined
A: 0
B: 1
C: undefined
relating to components used within Figure ??. The waveform is annotated with ρ, which illustrates the
clock period. If a 2-input NAND gate imposes a gate delay of Tnand = 10ns, which value most closely
reflects the maximum possible clock frequency?
A: 1.0MHz
B: 1.2GHz
C: 3.8MHz
D: 5.9MHz
E: 6.6MHz
Q164. Consider the design as shown in Figure A.18, which implements a simple Finite State Machine (FSM) using
D-type latches and a 2-phase clock. Note that the r output reflects whether the FSM is in an accepting state, the
rst input resets the FSM into the start state, and the Xi input drives transitions between states: the idea is that
the i-th element of a sequence
X = ⟨X0 , X1 , . . . , Xn−1 ⟩
is provided as input, via Xi , in the i-th step. Assuming the entirety of X is consumed, which of the following
A: r = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1
B: r = X0 ∧ X1 ∧ · · · ∧ Xn−1
C: r = X0 ∧ X1 ∧ · · · ∧ Xn−1
D: r = X0 ∨ X1 ∨ · · · ∨ Xn−1
E: r = X0 ∨ X1 ∨ · · · ∨ Xn−1
best describes the output from, or functionality of the FSM?
r
Q
¬Q
¬Q
en
en
D
D
y c
s7
r
x
¬Q
¬Q
en
en
D
D
y c
s6
r
x
Q
¬Q
¬Q
en
en
D
D
y c
s5
r
x
Q
¬Q
¬Q
en
en
D
D
y c
s4
r
x
Q
¬Q
¬Q
en
en
D
y c
s3
r
x
Q
¬Q
¬Q
en
en
D
y c
s2
r
x
Q
¬Q
¬Q
en
en
D
y c
s1
r
x
Q
¬Q
¬Q
en
en
D
y c
s0
r
x
rst
Φ2
Φ1
Figure A.13: An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and rst on the left-hand side; 8-bit s spread
within the design) and 1 output (1-bit r on the right-hand side).
Φ2
Φ1
rst
ρ
t0 t1 t2
Figure A.14: A waveform describing behaviour of Φ1 , Φ2 , and rst within Figure A.13.
D S0
Q
en
¬Q
R0
Figure A.15: A NAND-based implementation of a D-type latch.
r
y
c
Figure A.17: A NAND-based implementation of a 2-input, 1-bit multipliexer.
D Q
ϕ2 en
¬Q
rst
D Q
ϕ1 en
¬Q
r Xi
Figure A.18: Implementation of a simple FSM, using D-type latches and a 2-phase clock.
Q165. Consider the design as shown in Figure A.19, which implements a simple Finite State Machine (FSM) using
D-type latches and a 2-phase clock; note that it includes one output labelled r, and one input labelled x. Which
of the following options
A: 2
B: 3
C: 5
D: 6
E: 9
reflects
Q166. The parity function f accepts an n-bit sequence X as input, and yields f (X) = 1 iff. X has an odd number
of elements equal to 1. If f (X) = 1 (resp. f (X) = 0), we say the parity of X is odd (resp. even). Using a
combinatorial circuit, one can compute this as
f (X) = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1
since XOR can be thought of as addition modulo two. However, how could we design a Finite State Machine
(FSM) to compute f (X) when supplied with X one element at a time? Explain step-by-step how you would
solve this challenge: start with a high-level design for any FSM then fill in detail required for this FSM. Are there
any features or requirements you can add to this basic description so the FSM is deemed “better” somehow?
Q167. Imagine you are asked to to build a simple DNA matching hardware circuit as part of a research project. The
circuit will be given DNA strings which are sequences of tokens that represent chemical building blocks. The
goal is to search a large input sequence of DNA tokens for a small sequence indicative of some feature.
The circuit will receive one token per clock cycle as input; the possible tokens are adenine (A), cytosine
(C), guanine (G) and thymine (T). The circuit should, given the input sequence, set an output flag to 1 when
the matching sequence ACT is found somewhere in the input or 0 otherwise. You can assume the inputs are
infinitely long, i.e., the circuit should just keep searching forever and set the flag when the match is a success.
a Design a circuit to perform the required task, show all your working and explain any design decisions
you make.
D Q D Q
Q′ Q′
0 1
en en
¬Q ¬Q
r
ϕ2
D Q D Q
Q0 Q1
en en
¬Q ¬Q
ϕ1
Figure A.19: Implementation of a simple FSM, using D-type latches and a 2-phase clock.
b Now imagine you are asked to build two new matching circuits which should detect the sequences CAG
and TTT respectively. It is proposed that instead of having three separate circuits, they are combined into a
single circuit that matches the input sequence against one matching sequence selected with an additional
input. Describe one advantage and one disadvantage you can think of for the two implementation
options.
Q168. A revolutionary, ecologically sound washing machine is under development by your company. When turned
on, the machine starts in the idle state awaiting input. The washing cycle consists of the three stages: fill (when
it fills with water), wash (when the wash occurs), spin (when spin dying occurs); the machine then returns to
idle when it is finished. Two buttons control the machine: pressing B0 starts the washing cycle, pressing B1
cancels the washing cycle at any stage and returns the machine to idle; if both buttons are pressed at the same
time, the machine continues as normal as if neither were pressed.
a You are asked to design a circuit to control the washing machine. Draw a diagram illustrating states the
washing machine can be in, and valid transitions between them.
b Translate your diagram from above into a corresponding, tabular description of the transition function.
c Using an appropriate technique, derive Boolean expressions which allow computation of the transition
function; note that because the washing machine is ecologically sound, minimising the overall gate count
is important.
Q169. Recall that an n-bit Gray code is a cyclic, 2n -element sequence S where each i-th element Si is itself an n-element
binary sequence, and the Hamming distance between adjacent elements is one, i.e.,
b Consider a D-type flip-flop, capable of storing a 1-bit value, realised using CMOS-based transistors
arranged into logic gates. Using a gate-level circuit diagram, describe the design of such a component
(clearly explaining the purpose of each part).
c Imagine successive elements of a 3-bit Gray code sequence are stored, one after another, in a register
realised using flip-flops of the type described above. The fact only one bit changes each time the register
is updated could be viewed as advantageous: explain why.
d Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for example
δ, ω and any input and output; clearly explain the purpose of each component in the framework.
e Using the framework outlined above, design a concrete FSM which has
and whose behaviour is as follows: at each positive edge of the clock signal clk, if rst = 0 then r should be
updated with the next element of a 3-bit Gray code, otherwise r should be reset to the first element.
Note that your answer should provide enough detail to fully specify each component in the framework
(e.g., Boolean expressions for δ).
Q170. An electronic security system, designed to prevent unauthorised use of a door, is attached to a mains electricity
supply. The system has the following components:
• Three buttons, say Bi for 0 ≤ i < 3, whose value is initially 0; when pressed, a button remains pressed and
the value changes to 1.
• A door handle modelled by
(
1 when the handle is turned
H=
0 when the handle is unturned
If the door handle is turned after the order of button presses matches a 3-element password sequence P, the
door should be unlocked; if there is a mismatch, it should remain locked. The mechanism is reset (and all
buttons released) whenever the handle is turned (whether or not the door is unlocked). If P = ⟨B1 , B0 , B2 ⟩, then
for example
• B1 then B0 then B2 is pressed, then the handle is turned, the door is unlocked, i.e., L is set to 0, and the
mechanism is reset,
• B0 then B1 then B2 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and the
mechanism is reset,
• B1 then B0 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and the
mechanism is reset.
a Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for example
the transision and output functions (i.e., δ and ω) and any input and output; clearly explain the purpose
of each component in the framework.
b Imagine the password is fixed to P = ⟨B2 , B0 , B1 ⟩. Using the framework outlined above, design a concrete
FSM which can be used to control the security system as required.
Note that your answer should provide enough detail to fully specify each component in the framework
(e.g., Boolean expressions for the transision function).
c After inspecting your design, someone claims they can avoid the need for a clock signal: explain how
this is possible.
d The same person suggests an alternative approach whereby P is not fixed, but rather stored in an SRAM
memory device. Although this approach could be more useful, explain one reason it could be viewed as
disadvantageous.
e Before being sold, each physical system needs to be tested to ensure it functions as advertised. Explain a
suitable testing strategy for your design, and any alterations required to facilitate it.
Q171. Imagine you are John Connor in the film Terminator II: your aim is to design a device that guesses ATM (or
cash machine) Personal Identification Numbers (PINs) using brute-force search. The ATM uses 4-digit decimal
PINs, examples being 1234 and 9876. The device stores a current PIN denoted P: it performs each guess in
sequence by first checking whether P is correct, then incrementing P ready for the next step. The process
concludes when P is deemed correct.
a decimal representation in which the PIN is stored as a sequence of four unsigned integers, i.e., P =
⟨P0 , P1 , P2 , P3 ⟩, with each 0 ≤ Pi < 10, or
a binary representation in which the PIN is stored as a single unsigned integer, i.e., P, with 0 ≤ P < 10000.
State one advantage of each option, and explain which you think is more appropriate.
b A combinatorial component within the device should take the current PIN P as input, and produce two
outputs:
• the guess sent to the ATM, i.e., G = ⟨G0 , G1 , G2 , G3 ⟩, where each 0 ≤ Gi < 10 is the i-th decimal digit
of the current PIN, and
• the incremented PIN P′ ready for the next guess.
Produce a design for this component; include a block diagram and enough detail to fully specify how a
gate-level implementation could be performed.
c The device is controlled by a simple Finite State Machine (FSM) which can be described diagrammatically:
b=0 ϵ
b=1 ϵ ϵ r=1
start S0 S1 S2 S3 S4
r=0
• The device starts in state S0 , in which P is initialised; once the start button b is pressed, it moves into
state S1 .
• In state S1 , P is driven as input into combinatorial component and the device moves into state S2 .
• In state S2 , G is sent to the ATM and P′ is latched to form the new value of P; the device moves into
state S3 .
• In state S3 the device checks the ATM response r. If r = 1 then G was the correct guess and the device
moves into state S4 where it halts (i.e., remains in S4 ); otherwise, the device moves into state S1 and
the process repeats.
Focusing on the diagram above only, produce a design for the FSM; include a block diagram, and enough
detail to fully specify how a gate-level implementation could be performed.
Q172. A given counter machine has r = 4 registers, and supports an instruction set detailed in Figure A.20. Consider
two configurations of this counter machine:
C0 = (l = 0, v0 = 0, v1 = 2, v2 = 1, v3 = 0),
8 7 6 5 4 3 2 1 0
Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0
Figure A.20: The instruction set for an example 4-register counter machine.
C0 = (l = 0, v0 = 0, v1 = 3, v2 = 2, v3 = 1).
For each configuration, a) produce a trace of execution, then b) decide which of the following options
A: Compare the values in R1 and R2 , setting R3 to reflect the result
B: Add the values in R1 and R2 , setting R3 to reflect the result
C: Swap the values in R1 and R2
D: Copy the value in R1 into R2 , retaining the value in R1
E: Copy the value in R1 into R2 , clearing the value in R1
is the best description of what the associated program does.
Q173. Figure A.20 describes the instruction set of an example 4-register counter machine. Consider some i-th encoded,
machine code instruction 0A5(16) expressed in hexadecimal. Which of the following
A: halt computation
B: if register 2 equals 0 then goto instruction 5, else goto instruction i + 1
C: if register 10 equals 0 then goto instruction 5, else goto instruction i + 1
D: increment register 2, then goto instruction i + 1
E: decrement register 10, then goto instruction i + 1
best describes the instruction semantics?
Q174. Figure A.21 outlines, at a high level, a 4-register counter machine implementation; Figure A.22 completes said
implementation, detailing internals of the decoder component. Note that the multiplexer inputs should be
read left-to-right, and use zero-based indexing. Using the left-most multiplexer in the decoder as an example,
if the 3-bit control-signal derived from inst is 001(2) = 1(10) then the 1-st input is selected; this means the output
is 2(10) . Which of the following
A: Li : if R3 = 0 then goto L9 else goto Li+1
B: Li : if R3 + 1 = 0 then goto L9 else goto Li+1
C: Li : R3 ← R3 + 1 then goto Li+1
D: Li : R3 ← 0 then goto Li+1
E: None of the above
describes the semantics of a machine code instruction 100111001(2) for this counter machine?
A.6 Chapter ??
Q175. Write the following as Verilog declarations:
target
op
wr
? addr
0 +1 −1 =0 cmp +1 decoder target
jmp
halt
inst
addr
Q Q Q Q Q addr
R0 R1 Rr−1 PC MEM IR
rst D en rst D en rst D en rst D en rst D en data
rst rst
¬halt ∧ Φ1 ¬halt ∧ Φ2
addr
Figure A.21: The high-level data- and control-path for an example 4-register counter machine.
wr
258
© Daniel Page ⟨dan@phoo.org⟩
halt
jmp
merge
target
inst3,...,0
merge
inst5,...,4
addr
merge
inst8,...,6
11001000
wr
merge
inst8,...,6
12000000
op
inst8
inst7
inst6
inst5
inst4
inst3
inst2
inst1
inst0
Figure A.22: The low-level decoder implementation for an example 4-register counter machine.
3 : 0 ] a;
3 : 0 ] b;
1 : 0 ] c;
3 : 0 ] d;
e;
a = 4'b1101;
b = 4'b01XX;
a assign c = a[ 1 : 0 ];
b assign c = a[ 3 : 2 ];
c assign d = a & b;
d assign d = a ^ b;
g assign c = { 2{ b[ 1 ] } };
h assign c = { 2{ b[ 2 ] } };
i assign e = &a;
j assign e = ^a;
Q177. a Consider the following Verilog processes for appropriately defined 1-bit wires a, b, x, y, p and q:
Given that p and q are independent and may change at any time, write down one potential problem with
this design and outline one potential solution.
b Consider the following Verilog process, for appropriately defined 1-bit wire clk and 2-bit wire vector
state, which implements a state machine with three states:
If this process constitutes the entirety of the design, write down one potential problem with it and outline
one potential solution.
Q178. A Decimal Digit Detector (DDD) is a device that accepts a 1-bit input at each positive clock edge, and waits
until four such bits have been received. At this point, it sets a 1-bit output signal to true if the 4-bit accumulated
input (interpreted in little-endian form) is a valid Binary Coded Decimal (BCD) digit and false if it is not; it
then repeats the process for the next set of four input bits.
Design a Verilog module to model the DDD device; your design should incorporate a reset signal that is
able to initialise the DDD.
Q179. A comparator C is a function which takes two unsigned n-bit integers x and y as input and produces max(x, y)
and min(x, y) as outputs. One can think of C as sorting the 2-element sequence (x, y) into the resulting sequence
(min(x, y), max(x, y)). Design a Verilog module to model a single comparator for n-bit numbers.
Q180. An n-bit shift register Q is a register (say n D-type flip-flops) whose content is right-shifted on each positive of
a shared clock edge. This means if Qi refers to the i-th bit of Q,
A Linear Feedback Shift Register (LFSR) is a type of pseudo-random number generator based on this compo-
nent: at each positive clock edge
For example, if the tap sequence is T = ⟨0, 3, 4, 5, 7⟩ then the new bit that replaces Qn−1 is
M
t= Qt = Q0 ⊕ Q3 ⊕ Q4 ⊕ Q5 ⊕ Q7 .
t∈T
Design a Verilog module to model an 8-bit LFSR with the tap sequence above; your design should incorporate
a reset signal which initialises the LFSR with a seed value given as input.
APPENDIX
B
EXAMPLE EXAM-STYLE SOLUTIONS
B.1 Chapter 1
S1. The question essentially just demands application of
i<n
X
x̂ 7→ xi · bi ,
i=0
which defines a mapping between the representation (on the LHS) and value (on the RHS) of x. In this case,
setting b = 3 allows computation of a (decimal) value for each representation (i.e., literal); we need to select the
one whose value turns out to be 123(10) . As such, it should be clear that
7→ 0 · 1 + 2 · 3 + 1 · 9 + 1 · 27 + 1 · 81
7→ 123(10)
7→ 1 + 2 + 16 + 32
7→ 51(10)
ŷ = ⟨y0 , y1 , . . . , yn−1 ⟩
= ⟨1, 1, 0, 0, 1, 1, 0, 0⟩
−1 yn−1 · n−2 i
P
7 → i=0 yi · 2
7 → 1 · (2 + 2 + 2 + 25 )
0 1 4
7→ 1 + 2 + 16 + 32
7→ 51(10)
i.e., both x and y are represented by the binary literal 00110011, which yields the decimal value 51(10) in both
7→ 1 + 2 + 16 + 32 − 128
7→ −77(10)
7→ −1 − 2 − 16 − 32
7→ −51(10)
so −77 and −51 is the correct answer.
S3. In 16 bits, the largest possible positive value we can represent using two’s-complement is
which is somewhat counter-intuitive given both operands are negative, but stems from the fact that in
two’s-complement more negative values can be represented than positive.
b The largest possible negative product is given by
or
x · y = 32767 · −32768 = −1073709056.
S5. Given
x = 9(10)
= 00001001(2)
0x97 = 97(16)
= 10010111(2)
the expression ( ~x << 4 ) | 0x97 evaluates to give
( ~x ) = 11110110(2)
( ~x << 4 ) = 01100000(2)
( ~x << 4 ) | 0x97 = 11110111(2)
= −9(10)
st. r = −9 is the correct answer.
S6. For a signed, n-bit integers represented using two’s-complement we know that
−2n−1 ≤ x ≤ 2n−1 − 1
r = −x = ¬x + 1.
−x = ¬x +1
= ¬ 10000000(2) + 1
= 01111111(2) + 1
= 10000000(2)
= x
This means x = −128 is a fixed point, i.e., that abs fails to work correctly for this value (since we cannot
represent 128 in 8 bits using two’s-complement). As such, 129 is the correct answer: the 128 positive values,
plus the 1 negative value from above.
i.e., each i-th digit in the (left-hand side) literal is weighted by bi and accumulated to yield the (right-hand side)
represented value. In this case the literal has n = 2 digits, so, given x0 = 0 and x1 = 1, we can see that
x̂ = ⟨x0 , x1 ⟩ 7→ x0 · b0 + x1 · b1 = 0 · 1 + 1 · b = 0 + b = b.
This fact holds for any b, and of course if we knew b one of the other options might be correct (e.g., if b = 10 then
x = 10 would also be correct).
0(10) 7→ 00000000
−1(10) 7→ 11111111
yield n-bit “all 0” and “all 1” sequences for any n. Now consider an arbitrary x, and each option stemming
from it:
x = 01101010 7→ 106(10)
¬x = 10010101 7→ −107(10)
x ∧ ¬x = 00000000 7→ 0(10)
x ∨ ¬x = 11111111 7→ −1(10)
x ⊕ ¬x = 11111111 7→ −1(10)
x + ¬x = 11111111 7→ −1(10)
x − ¬x = 11010101 →
7 −43(10)
Note that if the i-th bit of x is 0 then that of ¬x will be 1; if the i-th bit of x is 1 then that of ¬x will be 0. Based
on this, the AND case will always yield the “all 0” sequence, i.e., 0, because the per-bit computation will be
0 ∧ 1 or 0 ∧ 1 (both yielding 0). Likewise, the OR case will always yield the “all 1” sequence, i.e., −1, because
the per-bit computation will be 0 ∨ 1 or 0 ∨ 1 (both yielding 1); the same is true for both the XOR case and the
addition case, with the latter stemming from the absence of carries. In fact, all these cases will apply for any
n and x based on the same reasoning. So the subtraction case is incorrect: even the result is slightly confusing
in that this example overflows (the actual result 213(10) cannot be represented in 8 bits, at least when using
two’s-complement).
S11. a i |A| = 3.
ii A ∪ B = {1, 2, 3, 4, 5}.
iii A ∩ B = {3}.
iv A − B = {1, 2}.
v A = {4, 5, 6, 7, 8}.
vi {x | 2 · x ∈ U} = {1, 2, 3, 4}.
b i +0 in sign-magnitude is 00000000, in two’s-complement is 00000000.
ii −0 in sign-magnitude is 10000000, in two’s-complement is 00000000.
iii +72 in sign-magnitude is 01001000, in two’s-complement is 01001000.
iv −34 in sign-magnitude is 10100010, in two’s-complement is 11011110.
v −8 in sign-magnitude is 10001000, in two’s-complement is 11111000.
vi This is a trick question: one cannot represent 240 in 8-bit sign-magnitude or two’s-complement; the
incorrect guess of 11111000 in two’s-complement for example is actually −8.
S12. The population count or Hamming weight of x, denoted by H(x) say, is the number of bits in the binary
expansion of x that equal one. Using an unsigned 32-bit integer x for example, an implementation might be
written as follows:
int H( uint32_t x ) {
int t = 0;
return t;
}
S13. Writing
t0 = (x ∧ y) ⊕ z
t1 = (¬x ∨ y) ⊕ z
t2 = (x ∨ ¬y) ⊕ z
t3 = ¬(x ∨ y) ⊕ z
t4 = ¬¬(x ∨ y) ⊕ z
for brevity, we can write the following truth table:
x y z t0 t1 t2 t3 t4
0 0 0 0 1 1 1 0
0 0 1 1 0 0 0 1
0 1 0 0 1 0 0 1
0 1 1 1 0 1 1 0
1 0 0 0 0 1 0 1
1 0 1 1 1 0 1 0
1 1 0 1 1 1 0 1
1 1 1 0 0 0 1 0
Looking at the row where x = 0, y = 0 and z = 1, it is clear that t0 = 1 and t4 = 1, so (x ∧ y) ⊕ z and ¬¬(x ∨ y) ⊕ z
are the correct answers.
S14. You may be able to just spot which one is incorrect, but looking at each case exhaustively (via a truth table for
the LHS and RHS of the supposed equivalence), we see that
x y z (x ∧ y) ∧ z x ∧ (y ∧ z) x∨1 x x ∨ ¬x 1 ¬(x ∨ y) ¬x ∧ ¬y ¬¬x x
0 0 0 0 0 1 0 1 1 1 1 0 0
0 0 1 0 0 1 0 1 1 1 1 0 0
0 1 0 0 0 1 0 1 1 0 0 0 0
0 1 1 0 0 1 0 1 1 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 1 1
1 0 1 0 0 1 1 1 1 0 0 1 1
1 1 0 0 0 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1 0 0 1 1
and identify x ∨ 1 ≡ x as the incorrect case: it probably should be x ∨ 1 ≡ 1, or maybe x ∧ 1 ≡ x.
S15. By writing
t0 = x ∨ (z ∨ y)
t1 = ¬(¬y ∧ ¬z)
we can shorten the expression to
t2 = t0 ∧ t1 .
Then, you can see either by enumeration, i.e.,
x y z t0 t1 t2 y∨z
0 0 0 0 0 0 0
0 0 1 1 1 1 1
0 1 0 1 1 1 1
0 1 1 1 1 1 1
1 0 0 1 0 0 0
1 0 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 1 1 1
or via the derivation
(x ∨ (z ∨ y)) ∧ ¬(¬y ∧ ¬z)
= (x ∨ (z ∨ y)) ∧ (y ∨ z) (de Morgan)
= (x ∨ (y ∨ z)) ∧ (y ∨ z) (commutativity)
= (x ∨ (y ∨ z)) ∧ ((y ∨ z) ∨ 0) (identity)
= ((y ∨ z) ∨ x) ∧ ((y ∨ z) ∨ 0) (commutativity)
= (y ∨ z) ∨ (x ∧ 0) (distribution)
= (y ∨ z) ∨ (0) (null)
= (y ∨ z) (identity)
that the correct answer is y ∨ z.
S16. By writing
t0 = x∨y
t1 = x∧z
we can shorten the expression to
t2 = t0 ∨ t1 .
Then, you can see either by enumeration, i.e.,
x y z t0 t1 t2 x∨y
0 0 0 0 0 0 0
0 0 1 0 0 0 0
0 1 0 1 0 1 1
0 1 1 1 0 1 1
1 0 0 1 0 1 1
1 0 1 1 1 1 1
1 1 0 1 0 1 1
1 1 1 1 1 1 1
or via the derivation
(x ∨ y) ∨ (x ∧ z)
= (y ∨ x) ∨ (x ∧ z) (commutativity)
= y ∨ (x ∨ (x ∧ z)) (association)
= y∨x (absorption)
= x∨y (commutativity)
that the correct answer is x ∨ y.
S17. In the same way as NAND, we know NOR is functionally complete: this can be shown via
¬x ≡ x∨x
x∧y ≡ ¬x ∨ ¬y ≡ (x ∨ x) ∨ (y ∨ y)
x∨y ≡ (x ∨ y) ∨ (x ∨ y)
Then, since x ∨ y ≡ ¬(x ∨ y), clearly {∨, ¬} is functionally complete: this set can be rewritten directly as { ∨ }.
We can harness these facts to show that in fact all other options are functionally complete.
• Given the truth table
x y ⊕
0 0 0
0 1 1
1 0 1
1 1 0
we have ¬x ≡ x ⊕ 1, or, put another way, we can construct ¬ using ⊕. Overall then,
{⊕, ∨} { {¬, ∨},
i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.
• This option is somewhat more difficult: using the same strategy as above, we now need to construct both
¬ and ∨.
– Given the truth table
x y ≡ .
0 0 1 0
0 1 0 1
1 0 0 1
1 1 1 0
it should be clear that since x . y ≡ x ⊕ y, we have ¬x ≡ x . 1. Alternatively, given the truth table
x y ⇒
0 0 1
0 1 1
1 0 0
1 1 1
we have x ⇒ 0 ≡ ¬x.
Overall then,
{⇒, .} { {¬, ∨},
i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.
{⇒} { {¬, ∨}
because we can construct both ¬ and ∨ using it alone; in the above . was potentially redundant in fact,
which is also true of ⇏ here. Alternatively, given the truth table
x y ⇒ ⇏
0 0 1 0
0 1 1 0
1 0 0 1
1 1 1 0
we could write 1 ⇏ y ≡ ¬y, and use ⇏ as a replacement for . and so ¬ as required in the above. Either
way, it is clearly the case that
{⇒, ⇏} { {¬, ∨}
i.e., we have constructed the RHS from the LHS; we know the RHS is functionally complete, so the LHS
is as well.
S18. First, note that each input can obviously be assigned one of two values, namely 0 or 1, so there are 2n possible
assignments to n inputs. For example, if we have 1 input, say x, there are 21 = 2 possible assignments because
x can either be 0 or 1. In the same way, for 2 inputs, say x and y, there are 22 = 4 possible assignments: we can
have
x=0 y=0
x=0 y=1
x=1 y=0
x=1 y=1
This is why a truth table for n inputs will have 2n rows: each row details one assignment to the inputs, and the
associated output.
So how many functions are there? A function with n-inputs means a truth table with 2n rows; each row
includes an output that can either be 0 or 1 (depending on exactly which function the truth table describes). So
to count how many functions there are, we can just count how many possible assignments there are to the 2n
n
outputs. The correct answer is 22 .
S20. These are all (or close to) Boolean axioms, which potentially can be identified by just looking at them: for
example, the first one is the association axiom. Taking a more systematic approach, the following, exhaustive
truth-table
demonstrates that
x ∧ (x ∨ y) . y.
This is an incorrect version of the absorption axiom, which could be corrected to read
x ∧ (x ∨ y) ≡ x.
(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (distribution)
(x ∧ x) ∨ (x ∧ ¬x)
= x ∧ (x ∨ ¬x) (distribution)
= (x ∨ ¬x) ∧ x (commutativity)
(x ∧ x) ∨ (x ∧ ¬x)
= x ∨ (x ∧ ¬x) (idempotency)
= x∨0 (inverse)
= x (identity)
(x ∧ x) ∨ (x ∧ ¬x)
= ¬(¬(x ∧ x) ∧ ¬(x ∧ ¬x)) (deMorgan)
= ¬((¬x ∨ ¬x) ∧ (¬x ∨ x)) (deMorgan)
S22. At first glance, this question looks like a lot of work. However, we can immediately rule out several options
because the associated Karnaugh maps are clearly invalid:
• option A is invalid because the dimensions do not match the truth table: it ignores z s is for a 3-input
rather than 4-input function,
• option B is invalid because the content does not match the truth table: the truth table has 6 entries equal
to 1 whereas the Karnaugh map has 5,
• option D is invalid because the 3-element red group is invalid: groups must be rectangular, but this is
L-shaped.
So only options C and E remain. Even just looking at them, we can guess that option C will yield a more efficient
expression because it uses fewer, larger groups (option E uses unit-sized groups only). In more detail
• Option C yields
r = f (w, x, y, z) = ( ¬x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y )
and thus 5 AND, 2 OR, and 6 NOT operators.
• Option E yields
r = f (w, x, y, z) = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z )
and thus 18 AND, 5 OR, and 16 NOT operators.
Even considering the (significant) potential for applying common sub-expression, e.g., computing and sharing
the result of ¬x once versus using using one operator for each instance, option C will clearly involve fewer
operators.
n
S24. One can show that there are 22 possible Boolean functions with n inputs. In this case n = 1 so we know there
1
are 22 = 22 = 4 such functions, which we can enumerate as follows:
x f0 f1 f2 f3
0 0 0 1 1
1 0 1 0 1
In essence, f0 is the constant 0 function (noting f0 ( f0 (x)) = 0 = f0 (x)), f1 is the identity function (noting
f1 ( f1 (x)) = x = f1 (x)), f2 is the complement function (noting f2 ( f2 (x)) = ¬¬x = x , ¬x = f2 (x)), and f3 is the
constant 1 function (noting f3 ( f3 (x)) = 1 = f3 (x)), As such, only 1 of the 4 possible functions, namely f2 , is not
idempotent.
n
S25. One can show that there are 22 possible Boolean functions with n inputs. In this case n = 2 so we know there
2
are 22 = 24 = 16 such functions, which we can enumerate as follows:
x y f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15
0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
The cases where x = y = 0 and x = y = 1 are naturally symmetric, so the question is basically whether or not
the remaining cases are also symmetric, i.e., whether or not fi (0, 1) = fi (1, 0) holds for a given i. By inspection
of the truth table above, one can see that the relationship holds for each
i ∈ S = {0, 1, 6, 7, 8, 9, 14, 15},
noting that |S| = 8 possible functions are therefore symmetric. As an aside, symmetric Boolean functions are
known as Boolean counting functions in some contexts; one can prove that there are 2n+1 such functions with
n inputs, which in our case means 2n+1 = 22+1 = 23 = 8 as expected.
(x ∧ (¬x ∨ y)) ∨ y
= (x ∧ ¬x) ∨ (x ∧ y) ∨ y (distribution)
= 0 ∨ (x ∧ y) ∨ y (inverse)
(x ∧ (¬x ∨ y)) ∨ y
= (x ∧ ¬x) ∨ (x ∧ y) ∨ y (distribution)
= 0 ∨ (x ∧ y) ∨ y (inverse)
= (x ∧ y) ∨ 0 ∨ y (commutativity)
= (x ∧ y) ∨ y ∨ 0 (commutativity)
= (x ∧ y) ∨ y (identity)
= y ∨ (x ∧ y) (commutativity)
= y ∨ (y ∧ x) (commutativity)
= y (absorption)
(x ∧ (¬x ∨ y)) ∨ y
= ¬(¬(x ∧ (¬x ∨ y)) ∧ ¬y) (deMorgan)
= ¬((¬x ∨ ¬(¬x ∨ y)) ∧ ¬y) (deMorgan)
= ¬((¬x ∨ (x ∧ ¬y)) ∧ ¬y) (deMorgan)
or, failing that, enumeration
x y x ∧ (¬x ∨ y) ∨ y x ∧ ¬x ∨ y ∧ (1 ∨ x) 0∨x∧y∨y x∧y y ¬((¬x ∨ (x ∧ ¬y)) ∧ ¬y)
0 0 0 0 0 0 0 0
0 1 1 1 1 0 1 1
1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
means we can conclude that
x ∧ (¬x ∨ y) ∨ y . x ∧ y.
Since there are n = 3 input variables, there are clearly 2n = 23 = 8 input combinations; three of these
produce 1 as an the output from the function.
b The truth table for this function is as follows
a b c d f (a, b, c, d)
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 0
0 1 0 1 1
0 1 1 0 0
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 0
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
so since there is only one case where f (a, b, c, d) = 1, the only assignment given which matches the criteria
is a = 0, b = 1, c = 0 and d = 1.
This hints at a general principle: when we have an expression like this, a term such as ¬x can be read as
“x should be 0” and x as “x should be 1”. So the expression as a whole is read as “a should be 0 and b
should be 1 and c should be 0 and d should be 1”. Since we have basically fixed all four inputs, only one
entry of the truth table matches. On the other hand, if we instead had
f (a, b, c, d) = ¬a ∧ b ∧ ¬c
for example, we would be saying “a should be 0 and b should be 1 and c should be 0, and d can be
anything” which gives two possible assignments (i.e., a = 0, b = 1, c = 0 and either d = 0 or d = 1).
c Informally, SoP form means there are say n terms in the expression: each term is the conjunction of some
variables (or their complement), and the expression is the disjunction of the terms. As conjunction and
disjunction basically means the AND and OR operators, and AND and OR act sort of like multiplication
and addition, the SoP name should make some sense: the expression is sort of like the sum of terms
which are themselves each a product of variables. The second option is correct as a result; the first and
last violate the form described above somehow (e.g., the first case is in the opposite, PoS form).
d One can easily make a comparison using a truth table such as
from which it should be clear that all the equations are correct except for the first one. That is, a ∨ 1 , a
but rather a ∨ 1 = 1.
a b ¬a ¬b a ∧ b ¬(a ∧ b) ¬a ∨ ¬b
0 0 1 1 0 1 1
0 1 1 0 0 1 1
1 0 0 1 0 1 1
1 1 0 0 1 0 0
a b ¬a ¬b ¬a ∧ b a ∧ ¬b
0 0 1 1 0 0
0 1 1 0 1 0
1 0 0 1 0 1
1 1 0 0 0 0
S29. a The dual of any expression is constructed by using the principle of duality, which informally means
swapping each AND with OR (and vice versa) and each 0 with 1 (and vice versa); this means, for
example, we can take the OR form of each axiom and produce the AND form (and vice versa).
So in this case, we start with an OR form: this means the dual will the corresponding AND form. Making
the swaps required means we end up with
x∧0≡0
so the second option is correct.
b This question is basically asking for the complement of f , since the options each have ¬ f on the left-
hand side: this means using the principle of complements, a generalisation of the de Morgan axiom, by
swapping each variable with the complement (and vice versa), each AND with OR (and vice versa), and
each 0 with 1 (and vice versa). If we apply these rules (taking care with the parenthesis) to
f = ¬a ∧ ¬b ∨ ¬c ∨ ¬d ∨ ¬e,
we end up with
¬ f = (a ∨ b) ∧ c ∧ d ∧ e
which matches the last option.
c The de Morgan axiom, which can be generalised using by the principle of complements, says that
¬(x ∧ y) ≡ ¬x ∨ ¬y
or conversely that
¬(x ∨ y) ≡ ¬x ∧ ¬y
You can think of either form as “pushing” the NOT operator on the left-hand side into the parentheses:
this acts to complement each variable, and swap the AND to an OR (or vice versa). We know that
x∧y ≡ ¬(x ∧ y)
x∨y ≡ ¬(x ∨ y)
So pattern matching against the options, it is clear the first one is correct, for example, because
x∨y ≡ ¬(x ∨ y) ≡ ¬x ∧ ¬y
where the right-hand side matches the description of an AND whose two inputs are complemented.
Likewise, the second one is correct because
x∧y ≡ ¬(x ∧ y) ≡ ¬x ∨ ¬y.
S30. a The third option, i.e., ¬a ∧ ¬b is the correct one; the three simplification steps, via two axioms, are as
follows:
¬ (a ∨ b) ∧ ¬ (c ∨ d ∨ e) ∨ ¬ (a ∨ b)
= (¬a ∧ ¬b) ∧ ¬ (c ∨ d ∨ e) ∨ (¬a ∧ ¬b) (de Morgan)
= (¬a ∧ ¬b) ∧ (¬c ∧ ¬d ∧ ¬e) ∨ (¬a ∧ ¬b) (de Morgan)
= ¬a ∧ ¬b (absorption)
b We can clearly see that
(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)
= (a ∨ b ∨ c) ∧ (¬(d ∨ e) ∨ (d ∨ e)) (distribution)
= (a ∨ b ∨ c) ∧ ((d ∨ e) ∨ ¬(d ∨ e)) (commutativity)
= (a ∨ b ∨ c) ∧ 1 (inverse)
= a∨b∨c (identity)
meaning the first option is the correct one.
c We can clearly see that
a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)
= (a ∧ c) ∨ (c ∧ (¬a ∨ (a ∧ b))) (precedence)
= (c ∧ a) ∨ (c ∧ (¬a ∨ (a ∧ b))) (commutativity)
= c ∧ (a ∨ ¬a ∨ (a ∧ b)) (distribution)
= c ∧ (1 ∨ (a ∧ b)) (inverse)
= c ∧ ((a ∧ b) ∨ 1) (commutativity)
= c∧1 (null)
= c (identity)
meaning the last option is the correct one: none of the above is correct, since the correct simplification is
actually just c.
d The fourth option, i.e., a ∧ b is correct. This basically stems from repeated application of the absorption
axiom, the AND form of which states
x ∨ (x ∧ y) ≡ x.
Applying it from left-to-right, we find that
a∧b∨a∧b∧c∨a∧b∧c∧d∨a∧b∧c∧d∧e∨a∧b∧c∧d∧e∧ f
= (a ∧ b) ∨ (a ∧ b) ∧ (c) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (precedence)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)
= (a ∧ b) (absorption)
at which point there is nothing else that can be done: we end up with 2 operators (and AND and an OR),
so the second option is correct.
f Working from the right-hand side toward the left, we have that
¬x ∨ ¬y
= (¬x ∧ 1) ∨ ¬y (identity)
= (¬x ∧ 1) ∨ (¬y ∧ 1) (identity)
= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ 1) (inverse)
= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ (x ∨ ¬x)) (inverse)
= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ (x ∨ ¬x)) (distribution)
= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ x) ∨ (¬y ∧ ¬x) (distribution)
= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ∨ (¬x ∧ ¬y) (commutativity)
= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) (idempotency)
g By writing
t0 = x ∧ y
t1 = y ∧ z
t2 = y ∨ z
t3 = x ∨ z
t4 = t1 ∧ t2
x y z t0 t1 t2 t3 t4 f g
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0 0 0
0 1 1 0 1 1 1 1 1 1
1 0 0 0 0 0 1 0 0 0
1 0 1 0 0 1 1 0 0 0
1 1 0 1 0 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1
to demonstrate that f = g, i.e., the equivalence holds. Note that this approach is not as robust if the
intermediate steps are not shown; simply including f and g in the truth table does not give much more
confidence that simply writing the equivalence!
To prove the equivalence using an axiomatic approach, the following steps can be applied:
(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z))
= (x ∧ y) ∨ (y ∧ z ∧ y) ∨ (y ∧ z ∧ z) (distribution)
= (x ∧ y) ∨ (y ∧ y ∧ z) ∨ (y ∧ z ∧ z) (commutativity)
= (x ∧ y) ∨ (y ∧ z) ∨ (y ∧ z) (idempotency)
= (x ∧ y) ∨ (y ∧ z) (idempotency)
= (y ∧ x) ∨ (y ∧ z) (commutativity)
= y ∧ (x ∨ z) (distribution)
h Using four simplification steps, via three axioms and the AND operator, as follows
we get a form that contains zero operators (which by definition must be the fewest).
B.2 Chapter 2
S31. We can deal with the first two statements in one go: an N-MOSFET (or N-type MOSFET) has N-type semicon-
ductor terminals and a P-type body. If the type of semiconductors were swapped, we have a P-MOSFET (or
P-type MOSFET) not an N-MOSFET.
A CMOS cell is a pairing of two transistors. However, it depends on use of complementary types, namely
one N-type and one P-type, rather than the same type as suggested. As such, this statement is incorrect
(although subtly so).
A given N-MOSFET is deemed active (resp. inactive) when current is allowed (resp. disallowed) to flow.
The flow is, in some sense, controlled by the voltage level applied: it acts to widen or narrow the associated
depletion region. At some threshold, “enough” voltage is applied for there to be “enough” current flowing for
source and drain to be deemed connected and hence the MOSFET active. So this statement is true, although
arguably some detail is glossed over (e.g., the fact a leakage current will always exist, so a transistor is always a
little bit active).
S32. This is a NAND gate, so clearly there are two inputs and one output: by inspecting the circuit we can see them
labelled x, y and r, plus identify the two power rails labelled Vdd and Vss . The output r is “pulled-up” to the Vdd
voltage level iff. a connection is made via one or other of the top two transistors. Their parallel arrangement
gives a hint at their type, even if the system is not recognisable. If we look at the truth table for NAND, i.e.,
NAND
x y r
0 0 1
0 1 1
1 0 1
1 1 0
then if either x = 0 or y = 0 then r = 1, where Vss ≡ 0 and Vdd ≡ 1: these must be P-MOSFETS, therefore,
because we want a connection to be formed between r and Vdd if x = Vss or y = Vss . There is, of course,
a companion pull-down network allowing r to be “pulled-down” to the Vss voltage level. However, this is
constructed from a sequential arrangement of N-MOSFETS: we did not study BJT transistors, which represent
a different technology from MOSFETs.
Finally, note that it is highly unlikely you will see a flux capacitor in a circuit other than during Back To The
Future!
S33. Provided you know what the behaviour of the pull-up network (top, consisting of P-MOSFETs) and pull-down
network (bottom, consisting of N-MOSFETs) is, it is reasonably easy (if long winded) to answer the question by
looking at a case-by-case analysis. A more efficient way is to spot the sequential and parallel organisation of
MOSFETs:
• If x = Vss , y and z are irrelevant because a connection between r and Vdd is formed (via the bottom-left
P-MOSFET), while a connection between r and Vdd is impossible (due to the top N-MOSFET).
• If x = Vdd , y and z are relevant:
– If y = Vss , z = Vss then a path between r and Vss is impossible; in this case, however, a path between
r and Vdd is formed (via the top P-MOSFETs).
– If y = Vss , z = Vdd then a path between r and Vss is formed (via the top and bottom-right N-MOSFETs).
– If y = Vdd , z = Vss then a path between r and Vss is formed (via the top and bottom-left N-MOSFETs).
– If y = Vdd , z = Vdd then a path between r and Vdd is impossible; in this case, however, a path between
r and Vss is formed (via the bottom N-MOSFETs).
which, translated into an expression, is r = ¬(x ∧ (y ∨ z) or ¬x ∨ (¬y ∧ ¬z) if you prefer: basically r = 1 if x = 0
or both y = 0 and z = 0, otherwise r = 0.
S34. This question is tricky, in the sense there are lots of ways an XOR gate can be constructed using logic gate
instances. One can show, for example, that x ⊕ y is equivalent to
a (¬x ∧ y) ∨ (x ∧ ¬y),
b (x ∨ y) ∧ ¬(x ∧ y),
c t0 ∧ t1 where t0 = (x ∧ x) ∧ y and t1 = (y ∧ y) ∧ x, or
d t1 ∧ t2 where t0 = x ∧ y, t1 = x ∧ t0 , and t2 = y ∧ t0 ,
e (x ∨ y) ∨ ¬(x ∧ y)
and potentially more besides: note that the question rules out the otherwise viable option of directly using
transistors, for example.
To compare the options, the starting point is how each logic gate we could use will be realised using
transistors. In reality this may increase the range of possible answers even further, but to provide an answer
imagine that we only consider the four options above, and assume that
NOT { 2 transistors
NAND { 4 transistors
NOR { 4 transistors
AND { 6 transistors
OR { 6 transistors
i.e., since we know we can construct NOT, NAND and NOR using the stated number of MOSFET-based
transistors, the best way to form AND and OR is simply to append a NOT to a NAND or NOR. This means we
can just count the logic gate instances, and translate:
Based on this 14 is the correct answer, although keep in mind it might be possible to do better based on using a
different set of options (for XOR) and assumptions.
S35. This is quite a tricky question, in the sense there are several plausible answers: selecting between them really
needs some justification, which of course is impossible with a multiple choice question! It is important to
keep in mind that the question focuses on design of some behaviour based on transistors, not precision wrt.
manufacture of the design. Put another way, the question is intentionally pitched at a high-level, with the
various caveats attempting to limit the possible answers:
a It might seem possible to use 0 transistors, in that one can just connect x directly to r. This may realise
the pass through behaviour required, but does not impose a delay (and more generally does not satisfy
use-cases for current or voltage buffers) so is not really valid.
b One could implement a buffer using 1 N-MOSFET transistor, say, in series with a pull-down resistor: the
idea is that when x = Vdd the transistor connects r to Vdd , otherwise the resistor pulls r down to Vss .
However, the material provided does not cover use of resistors: the question caters for this by asking for
an implementation using only transistors, and not listing 1 as an answer! This is justified further by the
fact by placing an emphasis on CMOS, using only an N-MOSFET might seem confusing therefore (given
the argument that N- and P-MOSFETs occur in pairs as part of a pull-down and pull-up network). So
although this is arguably the best answer, it is not the expected one!
c One could start by considering a NOT gate implementation, which will clearly invert x. It does so via
a P-MOSFET that connects Vdd to r, and an N-MOSFET that connects Vss to r. As such, when x = Vss
(resp. x = Vdd ) the P-MOSFET will be connected and N-MOSFET disconnected (resp. vice versa) meaning
that r = Vdd ≃ ¬x (resp. r = Vss ≃ ¬x). A somewhat simple observation is that if we swap the P- and
N-MOSFETs, we end up with a buffer. That is, if the P-MOSFET connects Vss to r and the N-MOSFET
connects Vdd to r, then the behaviour swaps st. if x = Vdd (resp. x = Vss ) then r = Vdd ≃ x (resp. r = Vss ≃ x).
So 2 transistors is a reasonable answer if we assume it is possible to organise them this way. In reality,
this is debatable: it is the opposite of normal pull-down and pull-up networks connected to Vdd and Vss , so
may not be reasonable under the constraints of a given manufacturing process. However, the question is
careful to ask for an unconstrained organisation for transistors, so the assumption is allowed.
d Finally, one could simply use two NOT gate implementations in series with each other: this inverts x
twice, computing r = ¬¬x = x using 4 transistors. This approach needs no assumptions, but on the other
hand is obviously not very efficient wrt. the number of transistors used.
In summary then, 2 transistors is a correct (or the expected) answer in this case (although you could quite
reasonably argue for other answers).
S36. From the truth table, one can form a Karnaugh map
x
z
00 01 11 10
0 1 1 1 0
0 1 5 4
y 1 0 1 1 ?
2 3 7 6
which includes two groups: unlike some other examples, the don’t care in this case is uncovered (i.e., we treat
it as 0) since we cannot make fewer or larger groups by covering it (i.e., treating it as 1). Even so, translating
each group into a term yields the SoP expression
which is the correct answer: the fact this is the only one in SoP form at least hints at the correct answer even
without going through the above.
x if c = 0
(
r=
y otherwise
i.e., it selects the input x (i.e., connects the output r to x) if the control signal c = 0, and selects the input y (i.e.,
connects the output r to y) if the control signal c = 1.
Imagine the 8, 1-bit inputs are named s, t, u, v, w, x, y and z and there is a 3-bit control signal c (to select
between 23 = 8 inputs). The cascade of 2-input, 1-bit multiplexers is constructed as follows
s if c0 = 0
(
t0 =
t otherwise
if c0 = 0 if c1 = 0
( (
u t0
t1 = t4 =
v otherwise t1 otherwise
if c2 = 0
(
t4
r =
t5 otherwise
if c0 = 0 if c1 = 0
( (
w t2
t2 = t5 =
x otherwise t3 otherwise
if c0 = 0
(
y
t3 =
z otherwise
noting there are 3 layers (i.e., they form a tree of depth 3). Using a table such as
c c2 c1 c0 t0 t1 t2 t3 t4 t5 r
0 0 0 0 s u w y s w s
1 0 0 1 t v x z t x t
2 0 1 0 s u w y u y u
3 0 1 1 t v x z v z v
4 1 0 0 s u w y s w w
5 1 0 1 t v x z t x x
6 1 1 0 s u w y u y y
7 1 1 1 t v x z v z z
makes it clear the c-th input is selected as the output r. Given such a component, we then just replicate it 8
times: the same control signal is used for each i-th replication, which then produces the i-th bit of r by selecting
between the i-th bits of the 8-bit inputs s through to z.
We use 7 instances of the 2-input, 1-bit multiplexer for each of the cascades; there are 8 replicated cascades,
so the correct answer is that we need 7 · 8 = 56 instances.
S38. First, notice that the critical path (or longest sequential path) runs through the 4 full-adder instances (as a
result of the carry chain): the 3-rd (most-significant) instance cannot produce either co = c4 or r3 until the carry
propagates from the 0-th (least-significant) instance, and so is dependent on ci = c0 , x0 , and y0 ).
Next, we need some detail about each full-adder instance: the i-th such instance will compute the sum and
carry-out
ri = xi ⊕ yi ⊕ ci
ci+1 = (xi ∧ yi ) ∨ (xi ∧ ci ) ∨ (yi ∧ ci )
= (xi ∧ yi ) ∨ ((xi ⊕ yi ) ∧ ci )
from summands xi and yi , plus the carry-in ci (where c0 = ci and co = c4 ). There are two different options for
ci+1 , so, for the quoted gate delays, we find the critical paths from input to output can be described as:
The correct answer will clearly differ depending on the option we select for ci+1 , but imagine we select the
latter: irrespective of whether it is better or worse, it matches the lecture slide(s). As such, we can deduce the
following:
• For the 0-th full-adder instance, it takes 100ns to generate c1 from c0 , x0 , and y0 .
• For the 1-st full-adder instance, it takes 40ns to generate c2 from c1 , The reason it is not 100ns, as you might
expect, is because the gate computing x1 ⊕ y1 can produce an output before c1 is available; this means it
does not contribute to the critical path.
• For the 2-nd full-adder instance, it takes 40ns to generate c3 from c2 , x2 , and y2 . The reason it is not 100ns,
as you might expect, is because the gate computing x2 ⊕ y2 can produce an output before c2 is available;
this means it does not contribute to the critical path.
• For the 3-rd instance, it takes 40ns to generate c4 from c3 , The reason it is not 100ns, as you might expect,
is because the gate computing x3 ⊕ y3 can produce an output before c3 is available; this means it does not
contribute to the critical path. Likewise, it takes 60ns to generate r3 from c3 , x3 , and y3 ; for the same
reason as above, this is not 120ns as you might expect.
So the critical path is 220ns wrt. c4 and 240ns wrt. r3 , and therefore 240ns overall. It turns out applying similar
reasoning to the former option yields a slightly longer critical path of 240ns wrt. c4 because
• For the 0-th full-adder instance, it takes 60ns to generate c1 from c0 , x0 , and y0 .
• For the 1-st full-adder instance, it takes 60ns to generate c2 from c1 , x1 , and y1 .
• For the 2-nd full-adder instance, it takes 60ns to generate c3 from c2 , x2 , and y2 .
• For the 3-rd instance, it takes 60ns to generate c4 from c3 , x3 , and y3 , and 60ns to generate r3 from c3 , x3 ,
and y3 . The reason it is not 120ns, as you might expect, is because the gate computing x3 ⊕ y3 can produce
an output before c3 is available; this means it does not contribute to the critical path.
but this has no impact on the answer, which is still 240ns overall.
co co co co co
ci x x x x
y s y s y s y s
x0 r0 x1 r1 x2 r2 x3 r3
Put another way, their idea implies the half-adders used no longer have a carry-in. This is not a problem in
terms of computing an addition: we can still use the two available inputs (vs. three for a full-adder) to provide
one operand (i.e., xi , the i-th bit of x) plus a carry-in from the previous half-adder instance. However, clearly
this is not the same addition as previously, since the input we would use to provide the other operand is no
longer available. That is, we can select x as normal, but can no longer provide a y input (the half-adders use
that for to propagate the carry).
Other than x, the only other input we can control is ci which acts as the overall carry-in to the addition.
Since ci ∈ {0, 1}, this means y = 0 and y = 1 are the correct answers.
S40. Recall that a 2-input XOR operator can be described via the following truth table:
XOR
x y r
0 0 0
0 1 1
1 0 1
1 1 0
and
x ⊕ 0 ≡ x,
and hence
x⊕x⊕y≡ y
t0 = x0 ⊕ x2
t1 = x1 ⊕ x3
r0 = t2 = x0 ⊕ t0 = x0 ⊕ x0 ⊕ x2 = x2
r1 = t3 = x1 ⊕ t1 = x1 ⊕ x1 ⊕ x3 = x3
r2 = t4 = t0 ⊕ t2 = x0 ⊕ x2 ⊕ x2 = x0
r3 = t5 = t1 ⊕ t3 = x1 ⊕ x3 ⊕ x3 = x1
st. it becomes clear r0 = x2 , r1 = x3 , r2 = x0 , and r3 = x1 , i.e., the most- and least-significant 2-bit halves of x are
swapped over to produce r.
S41. a Although alternative organisations of the variables may alter the form, a representative Karnaugh map
would be:
w
x
r = f (w, x, y, z) 00 01 11 10
00 0 1 1 1
0 1 5 4
01 1 1 1 1
2 3 7 6
z
11 1 0 0 1
10 11 15 14
y
10 ? 1 1 0
8 9 13 12
The most efficient approach would attempt to form the fewest groups possible, and the largest groups
possible: these will combine to minimise the number and complexity of each term in the resulting
expression. As such, the 1 entries can be covered by 4 groups per
w
x
r = f (w, x, y, z) 00 01 11 10
00 0 1 1 1
0 1 5 4
01 1 1 1 1
2 3 7 6
z
11 1 0 0 1
10 11 15 14
y
10 ? 1 1 0
8 9 13 12
noting that a) each group covers 4 entries (with certain entries covered more than once), b) the don’t care
is assumed to be 0 and therefore remains uncovered, and c) 2 of the groups, in center rows and columns,
wrap around the left and right, and top and bottom edges respectively.
b Performing the translation as is, we find that
r = f (w, x, y, z) = ( ¬x ∧ z ) ∨
( x ∧ ¬z ) ∨
( x ∧ ¬y ) ∨
( w ∧ ¬y )
This expression requires 4 NOT, 4 AND, and 3 OR operators and so 4 + 4 + 3 = 11 in total. To further
reduce the number of operators, we can apply various optimisation steps. First, it is possible to show
that (¬x ∧ z) ∨ (x ∧ ¬z) ≡ x ⊕ z, meaning we collapse two terms into one term (involving one XOR).
Second, the term ¬y is used in two terms; we can compute the result once, using one NOT, and share it
between the terms. In combination, we produce the alternative
r = f (w, x, y, z) = ( x⊕z)∨
( x∧ t)∨
(w∧ t)
where t = ¬y. However, finally, it is possible to apply the distribution axiom to the latter two terms: by
rewriting it as
r = f (w, x, y, z) = ( (x⊕ z))∨
(t∧(x∨w))
we produce an expression that requires 1 NOT, 1 AND, 2 OR and 1 XOR operators and so 1 + 2 + 2 + 1 = 5
in total.
S42. For SR-type latch and flip-flop components, we expect at least S, R and en inputs and a Q output; in contrast, for
the D-type (resp. T-type) latch and flip-flop components we expect D (resp. T) and en inputs and a Q output
which match. Even based on the argument that we might have Q and ¬Q from one or other, this at least makes
the former less likely. Additionally, we might expect S and R to be used in a controlled way so as to avoid a
problematic meta-stable state; there are instances where x = y = z = 1 and x = y = z = 0, so this control is not
clearly being applied.
Narrowing down the choices further requires interpretation of the signal labelling and behaviour. While
somewhat tricky, it should be clear that the value of z changes to match y while x = 1 and is unchanged when
x = 0. Crucially, it is not the case that z changes to match y only at the point where x transitions from 0 to 1 (or
1 to 0): as shown in the right-hand portion of the waveform, z changes at any point that x = 1. As such, we
infer the component is likely to be a level-triggered latch not an edge triggered flip-flop. Additionally, z does
not toggle between 0 and 1 while x = 1; it matches whatever y is.
In summary therefore, i.e., having ruled out a) SR-type components, b) flip-flop components, and c) a T-type
flip-flop, we conclude that the correct answer is a D-type latch: x represents the enable signal en, y represents
the input D and z represents the output Q.
i the top NAND gate outputs 0, meaning the bottom NAND gate outputs 0 ∧ 1 = 1 (which is
consistent with the top gate computing 1 ∧ 1 = 0)
ii the bottom NAND gate outputs 0, meaning the top NAND gate outputs 0 ∧ 1 = 1 (which is
consistent with the bottom gate computing 1 ∧ 1 = 0).
Put another way, S = R = 0 is the meta-stable state (which is inconsistent in the sense wrt. Q = ¬Q: they
should differ) and S = R = 1 is the storage state (which retains whatever values Q and ¬Q have already); S = 0
and R = 1 sets Q = 1, while S = 1 and R = 0 resets Q = 0, whatever the currently value of Q is. As such, the
second excitation table is correct (the first one is for a NOR-based SR-latch), noting this is sort of the inverse of
a NOR-based SR-latch wrt. the meaning of S and R.
S44. This is quite a tricky question, but the central feature to note, for both multiplexers in the circuit, is the loop
between output and (an) input:
• in the left-hand case the loop connects the multiplexer output, say t0 , to the y input, whereas
• in the right-hand case the loop connects the multiplexer output, say t1 , to the x input.
In both cases, this allows a 1-bit value to be stored by “holding” it in the loop. Put another way, when b = 0 then
t0 = a since the x input is selected; when b = 1, however, whatever value t0 has is fed back into the multiplexer.
This is conceptually similar to how SRAM cells work, for example. In that case we had a loop through two
NOT gates that acted to refresh (or reinforce) the stored value, plus extra access transistors. More formally,
in each case the loop results in bistability. Focusing on the left-hand multiplexer as an example, if b = 1 then
either of the states t0 = 0 or t0 = 1 (meaning y = 0 and y = 1) is stable (meaning it will not transition into the
other state without a stimulus), so the value is retained until updated (or lost if the power supply is removed).
In this case, the left-hand and right-hand multiplexers are organised in a primary-secondary form. The idea
is basically that b acts as an enable signal (typically it will be a clock), operating both primary and secondary
multiplexer in one of two modes. Per the above,
284
© Daniel Page ⟨dan@phoo.org⟩
• when b = 0 the primary multiplexer passes a through to t0 , whereas the secondary multiplexer is in
storage mode, and
• when b = 1 the primary multiplexer is in storage mode, whereas the secondary multiplexer passes t0
through to t1
To understand their combined behaviour, focus on the instant in time when a positive edge occurs on b and
imagine that a = α for a value α ∈ {0, 1}. Before the edge, because b = 0, the primary and secondary multiplexers
will be in pass-through and storage mode respectively. This means t0 = a = α and c = t1 . At the instance the
edge occurs on b, t0 = α. Since the primary multiplexer flips into storage mode, this value is retained (i.e., fed
back around the loop into the y input) because b = 1: changes to a are irrelevant. Simultaneously, the secondary
multiplexer flips into pass-through mode st. c = α. Then, at some point, there is a negative edge on b meaning
both primary and secondary multiplexers flip back to the opposite mode. Given c = t1 = α at this instant, the
fact the secondary multiplexer is now in storage mode (again) means it retains the value of α. Likewise, since
the primary multiplexer is in pass-through mode, any change to a is reflected in t0 (but since the secondary
multiplexer is in storage mode, this is irrelevant to the value, namely α, it retains). Diagrammatically, this can
be viewed as in Figure B.1.
A somewhat reasonable analogy is that the primary and secondary multiplexers act as latches (each being
level triggered) in isolation, but as a flip-flop once combined. α can be viewed as being “passed along” a 2-step
“conveyor belt”. First, at a positive edge on b, the primary multiplexer will stores whatever α is passed as
input by the user. Then, at the subsequent negative edge on b, that α is passed on to the secondary multiplexer
which stores it; in a sense, the primary multiplexer “protects” the stored α from subsequent changes to a until
another positive edge on b occurs.
So the correct answer is that the circuit represents a flip-flop, i.e., an edge triggered storage cell: a is the
flip-flop input, c is the flip-flop output (i.e., the stored value), and b is the flip-flop enable signal.
S45. Given an l-bit control signal c, the demultiplexer can select between at most 2l outputs: we treat c as an
unsigned, l-bit integer which will clearly range in value between 0 and 2l − 1. In general, we want an l st.
2l ≥ m so each output can be specified; typically m is a power-of-two, since this matches the maximum number
of outputs that can be specified. However, in this case we have m = 5.
Since 22 = 4 < 5 and 23 = 8 > 5 we know a 2-bit control signal is not enough (it cannot select r4 since
0 ≤ c < 4), but a 3-bit control signal is (although it could cope with upto m = 8, and since 0 ≤ c < 8 select r5 , r6
and r7 if they existed). In summary then, l = 3 is the correct answer.
S46. a Note that if a given ri is not connected to either Vdd or Vss , it is deemed to have the high impedance value
Z. This suggests the correct truth table is
x y r0 r1
0 0 1 Z
0 1 Z Z
1 0 Z Z
1 1 Z 0
The reason is because C0 is st. r0 connects to Vdd via two (pull-up) P-type MOSFETs; since these MOSFETs
only connect source to drain if the gate is Vss , we can say that r0 = 1 if x = y = 0 and r0 = Z (i.e.,
disconnected) otherwise. Conversely, C1 is st. r1 connects to Vss via two (pull-down) N-type MOSFETs;
since these MOSFETs only connect source to drain if the gate is Vdd , we can say that r1 = 0 if x = y = 1
and r1 = Z (i.e., disconnected) otherwise.
b Note that the option using 1 instance of C0 and 1 instance of C1 sort of makes sense: one can implement a
NAND gate using 2 P-type and 2 N-type MOSFETS, matching those that exist within instances of C0 and
C1 . However, the question explicitly says we need to use instances of C0 and C1 : we cannot, for example,
“merge” their internal implementation to make this option viable. So, as a first step, we implement a
NOT gate as follows:
t0
C0
x r
C1
t1
This is useful because we can reuse it when implementing a NAND gate, but also because it explains the
design approach involved: the idea is basically that the output is driven by one instance of C0 or C1 at
a time, with all the others producing the high impedance value (which is “overridden” by the driving
value). The behaviour can be described as follows:
x t0 t1 r
0 1 Z 1
1 Z 0 0
Using the same design approach, we can now implement an NAND gate as follows:
x y
t0
C0
t1
C0
C0
t2
C1
t3
x y
t0
C1
C0 r
t1
C0
t2
as a solution. There is no corresponding option in the table, however: the reason for this is that it violates
the stated design strategy. This can be seen by considering the (hypothetical) truth table
x y t0 t1 t2 r
0 0 Z 1 1 1
0 1 Z Z 1 1
1 0 Z 1 Z 1
1 1 0 Z Z 0
which shows the case where x = 0 and y = 0 means both t1 = 1 and t2 = 1: in such a case r is driven by
two non-Z values.
S47. The first three options are all related to the fact that the number of transistors is increasing; the rate of increase
is important in that the associated limits are reached quicker, but is not so relevant beyond that.
• The fact there are more transistors in a fixed unit of area implies the transistors are smaller, and so, as
a result, that their feature size (i.e., the size of components in their design, such as channel length or
layer thicknesses) is also smaller. Limits clearly exist wrt. how small feature sizes can shrink. Even
if manufacturing process keep pace, at some point the feature size would be measured in some small
number of atoms: at this scale it is plausible the transistor cannot operate correctly, and beyond it one
would need to consider a (radically) different approach.
• Dennard scaling states, roughly, that as transistors become smaller, their power density remains constant.
At face value then, power consumption should not represent a limit. However, Dennard scaling has
now started to break down: with such small feature sizes, otherwise insignificant factors (e.g., static
vs. dynamic power consumption) become significant, and thus lead to increased power consumption
if the number of transistors is increased. An indefinite increase in the amount of power supplied is not
plausible, meaning it acts to limit the number of transistors one can house per unit of area.
• With a small enough feature size, the channel allowing electrical current to flow through the transistor
will not always be able to “contain” it, i.e., there is leakage current. This manifests itself as heat, which
must be dissipated away from the transistors to ensure their correct operation. So, with a fixed capacity
to dissipate heat, this will act to limit the number of transistors one can house per unit of area.
In summary then, all the first three options plausibly constrain or limit Moore’s Law.
S48. A logical approach to this question would likely use two steps: 1) we need to assess which option(s) will toggle
s, then 2) find the shortest path from the inputs to s, which is sort of the opposite of the critical path, and use this
to decide the correct option.
Note that the truth table for this component (including the various annotated intermediate variables) is as
follows:
ci x y t0 t1 t2 co s
0 0 0 0 0 0 0 0
0 0 1 0 1 1 0 1
0 1 0 0 1 1 0 1
0 1 1 0 1 0 1 0
1 0 0 0 1 1 0 1
1 0 1 0 1 0 1 0
1 1 0 0 1 0 1 0
1 1 1 1 1 0 1 1
So, as a first step, we can see that
i.e., all options bar the last one will toggle s (either from 0 to 1, or from 1 to 0). Next, imagine TNOT , TAND , and
TOR denote the gate delay of a NOT gate, and 2-input AND and OR gate respectively. We can consider five
paths from the inputs to s, which each pass through one of the gates organised in a column on the left-hand
side of the diagram.
top 3-input AND { 2 · TAND + 1 · TOR
3-input OR { 1 · TAND + 3 · TOR
↕ 2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
bottom 2-input AND { 2 · TAND + 3 · TOR + 1 · TNOT
Given it is likely TAND ≃ TOR , it seems clear the top path will be the shortest. Put another way, given s = t0 ∨ t2 ,
we can toggle s by controlling t0 or t2 ; having identified the top path as the shortest, controlling t0 will allow
control over s within the shortest period of time. Of the options, the second to last one, i.e., (1, 1, 1) → (1, 1, 0),
is the only one that toggles t0 , and would therefore be deemed correct.
S49. Since this is a cyclic counter, we know selecting n = 4 means the output r will step through values
0, 1, . . . , 24 − 1 = 15, 0, 1, . . . .
r r3 r2 r1 r0
0 0 0 0 0
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
8 1 0 0 0
9 1 0 0 1
10 1 0 1 0
11 1 0 1 1
12 1 1 0 0
13 1 1 0 1
14 1 1 1 0
15 1 1 1 1
0 0 0 0 0
1 0 0 0 1
.. .. .. .. ..
. . . . .
S50. CMOS combines complementary transistor types: one N-MOSFET, and one P-MOSFET. Since these transistor
behave in a complementary manner, it is always that case that one will be active and one inactive. This produces
an attractive feature wrt. power, in that static consumption (i.e., when there is no chance in state) is very low.
The dynamic power consumption (i.e., when there is a change of state) is of course higher, but occurs only
when the inputs cause switching activity.
So, at a high-level at least, we could argue the highest consumption is likely when the initial value differs the
most from stored value. That is, we argue that the highest switching activity occurs when the largest number
of bits stored by the register change. We can measure this using the Hamming distance between initial and
stored value: given
t0 = DEAD(16) = 1101111010101101(2)
t1 = BEEF(16) = 1011111011101111(2)
t2 = F00D(16) = 1111000000001101(2)
t3 = 1234(16) = 0001001000110100(2)
t4 = FFFF(16) = 1111111111111111(2)
t5 = 0000(16) = 0000000000000000(2)
S51. Using a component set with some number of AND, OR, and NOT gates is clearly a more familiar approach.
However, all the component sets can be used to implement f . Writing out the truth table
x y z r
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 1
y
z
00 01 11 10
0 1 0 1 1
0 1 5 4
x 1 0 0 1 1
2 3 7 6
a Using 2-input, 1-bit multiplexers for clarity, component set 1 can be used to implement f as follows:
1 x
r
0 y c
x
r
z y c
1 x
r
1 y c y
x
z r r
y c
0 x
r
0 y c x
x
r
z y c
1 x
r
1 y c y
b Using 2-input, 1-bit multiplexers for clarity, component set 2 can be used to implement f as follows:
1 x
r
y y c
x
z r r
y c
y x
r
y y c x
¬z x
r
0 y c r
x y
x
y
r r
y c
We can show why the implementation is valid (i.e., produces a result matching AND) by inspection:
x y r
0 0 x=0
0 1 x=0
1 0 y=0
1 1 y=1
Notice that x = 0 implies the multiplexer selects the top input and hence r = x, whereas x = 1 implies the
multiplexer selects the top input and hence r = y; overall, r clearly matches AND in the sense r = 1 if x = 1 and
y = 1. Using the same approach, we can implement OR as follows
y x
r r
y c
S53. This question can be approached in several ways. First, one could employ basic pattern matching: read
from left-to-write, the three dominant structures can be matched against known NAND, NOR, and NOT gate
implementations. As such, the design implements the expression
r = ¬((x ∧ y) ∨ z)
¬((x ∧ y) ∨ z)
= ¬((¬(x ∧ y)) ∨ z) (NAND)
= ¬(¬((¬(x ∧ y)) ∨ z)) (NOR)
= ¬(x ∧ y ∧ ¬z) (deMorgan)
such that
r = ¬(x ∧ y ∧ ¬z).
Second, although it involves more work, one can enumerate the transistor and signal states for each input
combination. For example, using + (resp. −) to denote where a given transistor is connected or activated (resp.
disconnected or deactivated), we can write
x y z m0 m1 m2 m3 t0 m4 m5 m6 m7 t1 m8 m9 r
0 0 0 + + − − 1 − + + − 0 + − 1
0 0 1 + + − − 1 − − + + 0 + − 1
0 1 0 − + + − 1 − + + − 0 + − 1
0 1 1 − + + − 1 − − + + 0 + − 1
1 0 0 + − − + 1 − + + − 0 + − 1
1 0 1 + − − + 1 − − + + 0 + − 1
1 1 0 − − + + 0 + + − − 1 − + 0
1 1 1 − − + + 0 + − − + 0 + − 1
S54. To start with, keep in mind that this design uses flip-flops: these are edge-triggered (versus latches, which are
level-triggered). By focusing on and inspecting the left-hand flip-flop, we infer that the state will be updated
to reflect D = 1 ⊕ Q on each positive edge of clk. Given the truth table
x y r
0 0 0
0 1 1
1 0 1
1 1 0
we find that D = 1 ⊕ Q ≡ ¬Q, suggesting, therefore, that this is a toggle flip-flop constructed by using a
D-type flip-flop: on each positive edge of clk, the state will toggle either from 0 to 1 or from 1 to 0. Note that the
right-hand flip-flop has a similar construction, but that the lower input of the XOR comes from the left-hand
flip-flop.
Imagine that both flip-flops are reset, so their initial state is 0. We can draw a waveform which describes
each signal:
clk
t0
t1
t2
t3
r
Put simply, this suggests that each toggle flip-flop acts to halve the frequency: t1 toggles at half the frequency
of clk and t3 toggles at quarter the frequency of clk. Given that clk has a frequency of 400MHz, we therefore
4 = 100MHz.
expect r = t3 to toggle with a frequency of 400
S55. We know that the multiplexer will select the top input if c = 0, or the bottom input if c = 1. This means the
design can be expressed as
r = (¬q ∧ ¬p) ∨ (q ∧ p),
i.e., if q = 0 then r = ¬p, whereas if q = 1 then r = p. As such, we can produce a truth table
p q r
0 0 1
0 1 0
1 0 0
1 1 1
S56. If the current input is S = R = 0, then the latch must be in the invalid state Q = 1, ¬Q = 1. The question is
focused on instantaneously setting S = R = 1, which means the latch is in storage mode: the question is, what
state does it end up in? The answer is that it does not remain in the invalid state, due to the imbalanced gate
delays. Let Tt and Tb denote the top and bottom gate delay respectively:
a If Tt = x > x − δ = Tb , the output of the bottom gate, i.e., ¬Q, will change state first: it changes from 1 to
0, at which point the only valid (eventual) output from the top gate is 1 (i.e., it will stay the same). So,
the bottom gate “winning” by changing first is like we reset the latch via the bottom, R input.
b If Tt = x < x + δ = Tb , the output of the top gate, i.e., Q, will change state first: it changes from 1 to 0, at
which point the only valid (eventual) output from the bottom gate is 1 (i.e., it will stay the same). So, the
top gate “winning” by changing first is like we set the latch via the top, S input.
S57. Since the question is ultimately about Boolean expressions, we use 0 and 1 in place of Vss and Vdd for convenience.
Recall that an N-type MOSFET is is connected or activated (resp. disconnected or deactivated) if the gate
terminal is 1 (resp. 0), whereas an P-type MOSFET is is connected or activated (resp. disconnected or
deactivated) if the gate terminal is 0 (resp. 1).
By inspection, t0 is clearly connected to 1 when either 1) x = 0 and y = 0 (through m0 and m1 ), or 2) z = 0
(through and m2 ). Note that the connectives “and” and “or” used here reflect the sequential and parallel way
the P-type MOSFETS are organised. But, either way, and assuming a matching pull-down network, we can
write
t0 = (¬x ∧ ¬y) ∨ ¬z.
The output r is then produced via m3 and m4 , which form a NOT gate: this means
r = (x ∨ y) ∧ z.
S58. This is an N-type MOSFET, used here as an enable gate: if en = 0 there is no connection between x and r, but
if en = 1 there is a connection between x and r. The former case is the more interesting, in the sense that r is
disconnected from any driving signal. This situation is modelled using 3-state logic, wherein an additional
high impedance value Z is considered. Using Z, we can therefore model the transistor using this truth-table:
x en r
0 0 Z
1 0 Z
0 1 0
1 1 1
Put simply, if en = 0 then r = Z because there is no driving signal, but if en = 1 then r = x ∈ {0, 1}. Based on the
potential values of x and en we conclude that r ∈ {0, 1, Z}, i.e., it can potentially take 3 different values.
S59. The simplest approach to producing a solution for this question is brute-force enumeration: by just inspecting
the truth-table, i.e.,
x1 x0 y1 y0 r
0 0 0 0 0 ⇒ 0≯0
0 0 0 1 0 ⇒ 0≯1
0 0 1 0 0 ⇒ 0≯2
0 0 1 1 0 ⇒ 0≯3
0 1 0 0 1 ⇒ 1>0
0 1 0 1 0 ⇒ 1≯1
0 1 1 0 0 ⇒ 1≯2
0 1 1 1 0 ⇒ 1≯3
1 0 0 0 1 ⇒ 2>0
1 0 0 1 1 ⇒ 2>1
1 0 1 0 0 ⇒ 2≯2
1 0 1 1 0 ⇒ 2≯3
1 1 0 0 1 ⇒ 3>0
1 1 0 1 1 ⇒ 3>1
1 1 1 0 1 ⇒ 3>2
1 1 1 1 0 ⇒ 3≯3
we can see that r = 1 in 6 cases.
• The adder inputs are PC and 4: whereas PC can be any 32-bit value, so the input is general-purpose, 4 is
a fixed, special-purpose input.
PC ≡ PC + 4 ≡ 0 (mod 4).
• For a general-purpose adder, the carry-in input and carry-out output are useful in various situations.
Here, however, neither is useful and so they remain unused; as a result, we can optimise the adder by
eliminating them, e.g., simplifying the associated full- or half-adder cell.
Rather than a general-purpose ripple-carry adder, which uses 32 full-adder cells, we can use the facts above to
compute the same result by using 30 half-adder cells. That is, the adder design is of the form
0 0 x2 x3 x31
co co co co
x x x x
1 y s y s y s y s
r0 r1 r2 r3 r31
where the half-adder that generates r31 is optimised even further by considering that the carry-out is unused.
Under the same assumptions, the resulting area is
S61. Saying that the cell contains “duplicate” makes no sense, and it is not true that we do not know what the cell
content, or output, is. Rather, we do not care what the cell content is: the input is impossible, so the output is
irrelevant to f .
Of the two remaining options, the correct one mirrors the associated approach. That is, the don’t care cell
can be treated as either 0 or 1; we select the option which will most effectively simplify the resulting term. Since
the input is impossible, that selection has no impact on the functionality of f for the possible inputs.
S62. Just by inspection, there is no way we can use either XOR, AND, or OR gate types to achieve the required
functionality. So if NAND and NOR are the only viable options, the question is asking which of them has been
used. The definition of NAND or NOR is as follows:
x y x∧y x∨y
0 0 1 1
0 1 1 0
1 0 1 0
1 1 0 0
This allows us to reason about the relationship between inputs (i.e., S and R) and outputs (i.e., Q and ¬Q).
For example, if the current state is Q = 0, ¬Q = 1 and we have S = 1 and R = 1 then the top component is
computing 0 = S ⊙ ¬Q = 1 ⊙ 1, and the bottom component is computing 1 = R ⊙ Q = 1 ⊙ 0; in this case the
top component is consistent with either NAND or NOR, but the bottom component is consistent with only
NAND. So, although we could test other cases to gain more confidence, this case alone acts as a strong enough
hint that ⊙ ≡ ∧ , i.e., NAND gates have been used.
S63. Note that the additional inputs are often termed preset and clear, matching the labels used here.
a The output of a synchronous circuit depends on the input in a discrete manner, i.e., it can change only
at a specific point in time; this is achieved using a clock signal, which acts to control when components
in the circuit, e.g., by gating them. In contrast, the output of an asynchronous circuit depends on the
input in a continuous manner, i.e., it can change at any time; there is no control of when changes in the
output take effect. The same terminology can be applied to the inputs themselves, or even signals more
generally. S and R are most accurately classified as synchronous. The reason is that they can influence
the output only when en = 1: we can see this by noting en ∧ x = 1 if en = 0, whereas en ∧ x = ¬x if en = 1.
As such, en can be said to gate S and R (and hence control their influence). P and C are most accurately
classified as asynchronous. The reason is that they can influence the output at any point in time, whether
en = 1 or en = 0 and so irrespective of en: in a sense this is obvious, because they do not interact with en
(they feed directly into the two NAND gates towards the right-hand side).
b When used to describe a control signal x, the terms active low and active high relate to when that signal
exerts control, i.e., whether that is when x = 0 or x = 1; in the former case, this is often highlighted by
writing ¬x or x rather than x as the input label. S and R are most accurately classified as active high: we
can show that if en = 1 then S = 1 will set Q = 1, whereas if en = 0 then R = 1 will reset Q = 0. P and C are
most accurately classified as active low: we can show that P = 0 will override S and set Q = 1, whereas
C = 0 will override R and reset Q = 0.
S64. Using a Karnaugh map, for example, one can produce the result
-------+
| +---+
+---+ +->| | +---+
| | |AND|-- S' -->| |
|NOR|---->| | |NOR|-- ~Q --+
| | +---+ +->| | |
+---+ | +---+ |
+--|---------------+
| |
| +---------------+
| +---+ |
+---->| | |
|NOR|-- Q --+
----------------- R' -->| |
+---+
S67. A wide range of answers are clearly possible. Obvious examples include physical size, and power consumption
or heat dissipation. Other variants include worst-case versus average-case versions of each metric, for example
in the case of efficiency.
S68. a MOSFET transistors work by sandwiching together N-type and P-type semiconductor layers. The dif-
ferent types of layer are doped with different substances to create more holes or more electrons. For
example, in an N-type MOSFET the layers are constructed as follows
gate
+-------+
| metal |
==== source ========= drain ==== silicon oxide layer
+--+--------+---------+--------+--+
| | N-type | | N-type | |
| +--------+ +--------+ |
| P-type |
+---------------------------------+
with additional layers of silicon oxide and metal. There are three terminals on the transistor. Roughly
speaking, applying a voltage to the gate creates a channel between the source and drain through which
charge can flow. Thus the device acts like a switch: when the gate voltage is high, there is a flow of charge
but when it is low there is little flow of charge. A P-type MOSFET swaps the roles of N-type and P-type
semiconductor and hence implements the opposite switching behaviour.
b One can construct an NAND to compute z = x ∧ y gate from such transistors as follows:
V_dd
|
+-------+-------+
| |
v v
+--------+ +--------+
x -->| P-type | y -->| P-type |
+--------+ +--------+
| |
+---------------+---> z
|
+--------+
x -->| N-type |
+--------+
|
+--------+
y -->| N-type |
+--------+
^
|
+-------+
|
VSS
If x and y are connected to Vss then both top P-type transistors will be connected, and both bottom
N-type transistors will be disconnected; r will be connected to Vdd . If x and y are connected to Vdd
and Vss respectively then the right-most P-type transistor will be connected, and both lower-most N-
type transistor will be disconnected; r will be connected to Vdd . If x and y are connected to Vss and
Vdd respectively then the left-most P-type transistor will be connected, and both upper-most N-type
transistor will be disconnected; r will be connected to Vdd . If x and y are connected to Vdd then both top
P-type transistors will be disconnected, and both bottom N-type transistors will be connected; r will be
connected to Vss . In short, the behaviour we get is described by
x y r
Vss Vss Vdd
Vss Vdd Vdd
Vdd Vss Vdd
Vdd Vdd Vss
which, if we substitute 0 and 1 for Vss and Vdd , is matches that of the NAND operation.
S69. This question is a lot easier than it sounds; basically we just add two extra transistors (one P-MOSFET and one
N-MOSFET) to implement a similar high-level approach. That is, we want r connected to Vss only when each of
x, y and z are connected to Vdd ; this means the bottom, N-MOSFETs are in series. If any of x, y or z are connected
to Vss , we want r connected to Vdd ; this means the top, P-MOSFETs are in parallel. Diagrammatically, the result
is as follows:
Vdd
Vss
S70. This is quite an open-ended question, but basically it asks for high-level explanations only. As such, some
example answers include the following:
a CMOS transistors are constructed from atomic-level understanding and manipulation; the immutable
size of atoms therefore acts as a fundamental limit on the size of any CMOS-based transistor.
b Feature scaling improves the operational efficiency of transistors, simply because smaller features reduce
delay. Beyond this however, one must utilise the extra transistors to achieve some useful task if compu-
tational efficiency is to scale as well: improvements to an architecture or design are often required, for
instance, to exploit parallelism and so on.
c Even assuming the transistors available can be harnessed to improve computational efficiency, this has
implications: more transistors within a fixed size area will increase power consumption and also heat
dissipation for example, both of which act as limits even if managed (e.g., via aggressive forms of cooling).
d On one hand, smaller transistors mean less cost per-transistor: with a fixed number of transistors, their
area and manufacturing cost will decrease. With a fixed sized area and hence more transistors in it
however, this probably means increase defect rate during manufacture. The resulting cost implication
could act as an economic limit to transistor size.
S71. a The most basic interpretation (i.e., not really doing any grouping using Karnaugh maps but just picking
out each cell with a 1 in it) generates the following SoP equations
b From the basic SoP equations, we can use the don’t care states to eliminate some of the terms to get
e = (¬a ∧ ¬b ∧ c) ∨ (a ∧ ¬c ∧ ¬d) ∨ (b ∧ d)
f = (¬a ∧ ¬b ∧ d) ∨ (b ∧ ¬c ∧ ¬d) ∨ (a ∧ c)
then, we can share both the terms ¬a ∧ ¬b and ¬c ∧ ¬d since they occur in e and f .
S72. Simply transcribing the truth table into a suitable Karnaugh map gives
y
z
00 01 11 10
0 1 1 0 1
0 1 5 4
x 1 0 1 ? 0
2 3 7 6
x y x∧y
0 0 1
0 1 1
1 0 1
1 1 0
¬x = x∧x
x∧y = (x ∧ y) ∧ (x ∧ y)
x∨y = (x ∧ x) ∧ (y ∧ y)
To prove this works, we can construct truth tables for the expressions and compare the results with what we
would expect; for NOT we have:
x x ∧ x ¬x
0 1 1
1 1 0
while for AND we have:
x y x∧y (x ∧ y) ∧ (x ∧ y) x∧y
0 0 1 0 0
0 1 1 0 0
1 0 1 0 0
1 1 0 1 1
and finally for OR we have:
S74. Conventionally a 4-input, 1-bit multiplexer might be described using a truth table such as the following:
c1 c0 w x y z r
0 0 ? ? ? 0 0
0 0 ? ? ? 1 1
0 1 ? ? 0 ? 0
0 1 ? ? 1 ? 1
1 0 ? 0 ? ? 0
1 0 ? 1 ? ? 1
1 1 0 ? ? ? 0
1 1 1 ? ? ? 1
This assumes that there are four inputs, namely w, x, y and z, with two further control signals c1 and c0 deciding
which of them provides the output r. However, another valid way to write the same thing would be
c1 c0 r
0 0 w
0 1 x
1 0 y
1 1 z
This reformulation describes a 2-input, 1-output Boolean function whose behaviour is selected by fixing w, x,
y and z, i.e., connecting each of them directly to either 0 or 1. For instance, if w = x = y = 0 and z = 1 then the
truth table becomes
c1 c0 r
0 0 w=0
0 1 x =0
1 0 y =0
1 1 z =1
which is of course the same as AND. So depending on how w, x, y and z are fixed (on a per-instance basis) we
can form any 2-input, 1-output Boolean function; this includes NAND and NOR, which we know are universal,
meaning the multiplexer is also universal.
a
b
00 01 11 10
00 1 0 0 1
0 1 5 4
01 1 1 1 1
2 3 7 6
d
11 0 1 1 0
10 11 15 14
c
10 1 1 1 1
8 9 13 12
and from which we can derive a simplified SoP form for e, namely
e = (b ∧ d) ∨ (¬b ∧ ¬c) ∨ (c ∧ ¬d)
b The advantages of this expression over the original are that is is simpler, i.e., contains less terms and
hence needs less gates for implementation, and shows that the input a is essentially redundant. We have
probably also reduced the critical path through the circuit since it is more shallow. The disadvantages
are that we still potentially have some glitching due to the differing delays through paths in the circuit,
although these existed before as well, and the large propagation delay.
c The longest sequential path through the circuit goes through a NOT gate, two AND gates and two OR
gates; the critical path is thus 90ns long. This time bounds how fast we can used it in a clocked system
since the clock period must be at least 90ns. So the shortest clock period would be 90ns, meaning the
clock ticks about 11111111 times a second (or at about 11MHz).
S76. a Examining the behaviour required, we can construct the following truth table:
D2 D1 D0 L8 L7 L6 L5 L4 L3 L2 L1 L0
0 0 0 ? ? ? ? ? ? ? ? ?
0 0 1 0 0 0 0 1 0 0 0 0
0 1 0 0 1 0 0 0 0 0 1 0
0 1 1 1 0 0 0 1 0 0 0 1
1 0 0 1 0 1 0 0 0 1 0 1
1 0 1 1 0 1 0 1 0 1 0 1
1 1 0 1 1 1 0 0 0 1 1 1
1 1 1 ? ? ? ? ? ? ? ? ?
Note that
L3 = 0
L5 = 0
L6 = L2
L7 = L1
L8 = L0
so actually we only need expressions for L0...2 and L4 , and that don’t care states are used to capture the
idea that D = 0 and D = 7 never occur. The resulting four Karnaugh maps
D1 D1
D0 D0
L0 00 01 11 10 L1 00 01 11 10
0 ? 0 1 0 0 ? 0 0 1
0 1 5 4 0 1 5 4
D2 1 1 1 ? 1 D2 1 0 0 ? 1
2 3 7 6 2 3 7 6
D1 D1
D0 D0
L2 00 01 11 10 L4 00 01 11 10
0 ? 0 0 0 0 ? 1 1 0
0 1 5 4 0 1 5 4
D2 1 1 1 ? 1 D2 1 0 1 ? 0
2 3 7 6 2 3 7 6
L0 = D2 ∨ (D1 ∧ D0 )
L1 = (D1 ∧ ¬D0 )
L2 = D2
L4 = D0
b All the LEDs can be driven in parallel, i.e., the critical path relates to the single expression whose critical
path is the most. L2...6 have no logic involved, so we can discount them immediately. Of the two remaining
LEDs, we find
L0 { 20ns + 20ns
L1 { 10ns + 20ns
hence L0 represents the critical path of 40ns. Thus if one throw takes 40ns, we can perform
1s 1 · 109 ns
= = 25000000
40ns 40ns
throws per-second. Which is quite a lot, and certainly too many to actually see with the human eye!
ii If we sum 8 values 1 ≤ xi ≤ 6, where xi is the i-th throw (or i-th value of D supplied), then the
maximum total is 8 · 6 = 48. We can represent this in 6 bits, hence n = 6.
iii Using the left-shift method, we compute D′ = 2 · D by simply relabelling the bits in D. That is, D′0 = 0
and D′i+1 = Di for 0 ≤ i < 3. For example, given D = 6(10) = 110(2) we have
D′0 = 0
D′1 = D0 = 0
D′2 = D1 = 1
D′3 = D2 = 1
and hence D′ = 1100(2) = 12(10) . Since there is no need for any logic gates to implement this
method, the critical path is essentially nil: the only propagation delay relates to (small) wire delays.
In comparison to the larger critical path of a suitable n-bit adder, this clearly means the left-shift
approach is preferable.
• lth_8bit compares two 8-bit inputs a and b and produces a 1-bit result r, where r = 1 if a < b and
r = 0 if a ≥ b:
a b
| |
v v
+-----------+
| lth_8bit |
+-----------+
|
v
r
• mux2_8bit selects between two 8-bit inputs; if the inputs are a and b, the output r = a if the control
signal s = 0, or r = b if s = 1:
a b
| |
v v
+-----------+
| mux2_8bit |<-- s
+-----------+
|
v
r
Based on these building blocks, one can describe the component C as follows:
x y
| |
v v
+-----------+
| lth_8bit |
+-----------+
y x | x y
| | | | |
v v | v v
+-----------+ | +-----------+
| mux2_8bit |<--+-->| mux2_8bit |
+-----------+ r +-----------+
| |
v v
min(x,y) max(x,y)
From a functional perspective, C compares x and y using an instance of the lth_8bit building block, and
then uses the result r as a control signal for two instances of mux2_8bit. The left-hand instance selects y
as the output if r = 0 and x if r = 1; that is, if x < y then the output is x = min(x, y) otherwise the output is
y = min(x, y). The right-hand instance swaps the inputs so it selects x as the output if r = 0 and y if r = 1;
that is, if x < y then the output is y = max(x, y) otherwise the output is x = max(x, y).
b The short answer (which gets about half the marks) is that the longest path through the mesh will go
through 2n − 1 of the C components: this is the path from the top-left corner down along one edge to the
bottom-left and then along another edge to the bottom-right. So in a sense, if we write the propagation
delay associated with each instance of C as TC then the overall critical path is
(2n − 1) · TC .
In a bit more detail, the critical path through C is through one instance of lth_8bit and one instance of
mux2_8bit. So we could write the overall critical path is
To be more detailed than this, we need to think about individual logic gates. Imagine we assume
TAND = 50ns, TAND = 20ns, TOR = 20ns and TNOT = 10ns.
• mux2_8bit is simply eight mux2_1bit instances placed in parallel with each other; that is, the i-th
such instance produces the i-th bit of the output based on the i-th bit of the inputs (but all using the
same control signal). Assuming that the propagation delay of AND and OR gates dominates that of
a NOT gate, the critical path through mux2_1bit will be TAND + TOR .
• lth_8bit is a combination of eight sub-components:
Each of these sub-components is placed in series so that ti−1 is an input from the previous sub-
component and ti is an output provided to the next.
Based on simple circuits derived from their truth tables, the critical paths for lth_1bit and equ_1bit
are TAND + TNOT and TXOR + TNOT respectively. Thus the critical path of the whole sub-component
is TXOR + TNOT + TAND + TOR (since the critical path of equ_1bit is longer). Overall, the critical path
of lth_8bit is
8 · (TXOR + TNOT + TAND + TOR ),
or more exactly
7 · (TXOR + TNOT + TAND + TOR ) + TAND + TNOT
because the 0-th sub-component is “special”: there is no input from the previous sub-component.
Using this we can write the overall critical path for the mesh as
S78. a Imagine a component which is enabled (i.e., “turned on”) using the input en:
• The idea of the component being level triggered is that the value of en is important, not a change in
en: the component is enabled when en has a particular value, rather than at an edge when the value
changes.
• The fact en is active high means that the component is enabled when en = 1 (rather than en = 0 which
would make it active low). Though active high might seem the more logical choice, this is just part
of the component specification: as long as everything is consistent, i.e., uses the right semantics to
“turn on” the component, there is sometimes no major benefit of one approach over the other.
b Assume that M is a 4-state switch represented by a 2-bit value M = ⟨M0 , M1 ⟩: ⟨0, 0⟩ means off, ⟨1, 0⟩ means
slow, ⟨0, 1⟩ means fast and ⟨1, 1⟩ means very fast. Also assume there is a clock signal called clk available,
for example supplied by an oscillator of some form.
One approach would basically be to take clk and divide it to create two new clock signals c0 and c1 which
have a longer period: each of the clock signals could then satisfy the criteria of toggling the fire button
on and off at various speeds. A clock divider is fairly simple: the idea is to have a counter c clocked by
clk and to sample the (i − 1)-th bit of the counter: this behaves like clk divided by 2i . For example the 0-th
bit acts like clk but with twice the period.
A circuit to do this is fairly simple: we need some D-type flip-flops to hold the counter state, and some
full-adders to increment the counter:
+-----+ +-----+
|co ci|<----------|co ci|<-- 0
+--|s y|<-- 0 +--|s y|<-- 1
| | x|<-+ | | x|<-+
| +-----+ | | +-----+ |
| | | |
| c_1 --+ | c_0 --+
| | | |
| +-----+ | | +-----+ |
+->|D Q|--+ +->|D Q|--+
| <|<-+ | <|<-+
| | | | | |
+-----+ | +-----+ |
| |
+-----------------+-- clk
Given such a component which runs freely as long as it is driven by clk, we want to feed the original
fire button F0 through to form the new fire button input F′0 when M = 0, and c1 , c0 or clk through when
M = 1, M = 2 or M = 3 (meaning a slow, fast or very fast toggling behaviour). We can describe this as the
following truth table:
M1 M0 F′0
0 0 F0
0 1 c1
1 0 c0
1 1 clk
This is essentially a multiplexer controlled by M, and permits us to write
c i A synchronous protocol demands that the console and controller share a clock signal which acts
to synchronise their activity, e.g., ensures each one sends and receives data at the right time. The
problem with this is ensuring that the clock is not skewed for either component: since they are
physically separate, this might be hard and hence this is not such a good option.
An asynchronous protocol relies on extra connections between the components, e.g., “request” and
“acknowledge”, that allow them to engage in a form of transaction: the extra connections essentially
signal when data has been sent or received on the associated bus. This is more suitable given the
scenario: the extra connections could potentially be shared with those that already exist (e.g., F0 , F1 ,
F2 and D) thereby reducing the overhead, plus performance is not a big issue here (the protocol will
presumably only be executed once when the components are turned on or plugged in).
Both approaches have an issue in that
• once the protocol is run someone could just plug in another, fake controller, or
• or simply intercept c and T(c) pairs until it recovers the whole look-up table and then “imitate”
it using a fake controller
so neither is particularly robust from a security point of view!
ii The temptation here is to say that the use of a 3-bit memory (or register) is the right way to go.
Although this allows some degree of flexibility which is not required since the function is fixed, the
main disadvantage is retention of the content when the controller or console is turned off: some
form of non-volatile memory is therefore needed.
However, we can easily construct some dedicated logic to do the same thing. If we say that y = T(x),
the we can describe the behaviour of T using the following truth table:
x2 x1 x0 y2 y1 y0
0 0 0 0 1 0
0 0 1 1 1 0
0 1 0 1 1 1
0 1 1 0 0 1
1 0 0 1 0 0
1 0 1 0 0 0
1 1 0 1 0 1
1 1 1 0 1 1
This can be transformed into the following Karnaugh maps for y0 , y1 and y2
x1 x1 x1
x0 x0 x0
y2 00 01 11 10 y1 00 01 11 10 y0 00 01 11 10
0 0 1 0 1 0 1 1 0 1 0 0 0 1 1
0 1 5 4 0 1 5 4 0 1 5 4
x2 1 1 0 0 1 x2 1 0 0 1 0 x2 1 0 0 1 1
2 3 7 6 2 3 7 6 2 3 7 6
y2 = ( x1 ∧ ¬x0 ) ∨
( x2 ∧ ¬x0 ) ∨
( ¬x2 ∧ ¬x1 ∧ x0 )
y1 = ( ¬x2 ∧ ¬x1 ) ∨
( ¬x2 ∧ ¬x0 ) ∨
( x2 ∧ x1 ∧ x0 )
y0 = ( x1 )
which are enough to implement the look-up table: we pass x as input, and it produces the right y
(for this fixed T) as output.
S79. This is a classic “puzzle” question in digital logic. There are a few ways to describe the strategy, but the one
used here is based on counting the number of inputs which are 1. In short, we start by computing
t1 = ¬(x ∧ y ∨ y ∧ z ∨ x ∧ z)
t2 = ¬((x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z))
which use our quota of NOT gates. The idea is that t1 = 1 iff. one or zero of x, y and z are 1, and in the same
way t2 = 1 iff. two or zero of x, y and z are 1. This can be hard to see, so consider a truth table
meaning
x y z x∧y∨y∧z∨x∧z t1 (x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z) t2
0 0 0 0 1 0 1
0 0 1 0 1 1 0
0 1 0 0 1 1 0
0 1 1 1 0 0 1
1 0 0 0 1 1 0
1 0 1 1 0 0 1
1 1 0 1 0 0 1
1 1 1 1 0 1 0
and hence t1 and t2 are as required. Now, we can generate the three results as
S80. Imagine that for some n-bit input x, we let yi = Ci (x) denote the evaluation of Ci to get an output yi . As such,
the equivalence of C1 and C2 can be stated as a test whether y1 = y2 for all values of x; another way to say the
same thing is to test whether an x exists such that y1 , y2 which will distinguish the circuits, i.e., imply they
are not equivalent.
Using the second formulation, we can write the test as y1 ⊕ y2 since the XOR will produce 1 when y1 differs
from y2 and 0 otherwise. As such, we have n Boolean variables (the bits of x) and want an assignment that
implies the expression C1 (x) ⊕ C2 (x) will evaluate to 1. This is the same as described in the description of SAT,
so if we can solve the SAT instance we prove the circuits are (not) equivalent.
S81. a The latency of the circuit is the time taken to perform the computation, i.e., to compute some r given x.
For this circuit, the latency is simply the sum of the critical paths.
b The throughput is the number of operations performed per unit time period. This is essentially the
number of operations we can start (resp. that finish) within that time period.
By pipelining the circuit, using say 3 stages, one might expect the latency to increase slightly (by virtue of
having to add pipeline registers between each stage) but the throughput to increase (by virtue of decreasing
the overall critical path to the longest stage, and hence increasing the maximum clock frequency). The trade-off
is strongly influenced by the number of and balance between stages, meaning careful analysis of the circuit
before applying the optimisation is important.
S82. a The latency of a circuit is the time elapsed between when a given operation starts and when it finishes.
The throughput of a circuit is the number of operations that can be started in each time period; that is,
how long it takes between when two subsequent operations can be started.
b The latency of the circuit is the sum of all the latencies of the parts,i.e.,
The throughput relates to the length of the longest pipeline stage; the circuit is not pipelined, so more
1
specifically we can say it is 160·10−9 .
c The new latency is still the sum of all the parts, but now includes the extra pipeline register:
However, the throughput is now more because the longest pipeline stage only has a latency of 100ns
1
(including the extra register). Specifically, the throughput increases to 100·10 −9 which essentially means we
We can achieve this by creating a 4-stage pipeline: adding two more pipeline registers, between parts B
and C and parts E and F, ensures the stages have latencies of
z
0 1
0 1 0
0 1
y 1 1 1
2 3
r = ( ¬z ) ∨
( y )
z
0 1
0 1 1
0 1
y 1 0 1
2 3
r = ( ¬y ) ∨
( z )
z
0 1
0 1 0
0 1
y 1 0 0
2 3
r = ( ¬y ∧ ¬z )
z
0 1
0 1 1
0 1
y 1 0 0
2 3
r = ( ¬y )
z
0 1
0 0 0
0 1
y 1 0 1
2 3
r = ( y ∧ z )
z
0 1
0 1 0
0 1
y 1 0 1
2 3
r = ( y ∧ z ) ∨
( ¬y ∧ ¬z )
z
0 1
0 0 0
0 1
y 1 1 0
2 3
r = ( y ∧ ¬z )
z
0 1
0 0 1
0 1
y 1 1 0
2 3
r = ( y ∧ ¬z ) ∨
( ¬y ∧ z )
z
0 1
0 0 1
0 1
y 1 0 0
2 3
r = ( ¬y ∧ z )
z
0 1
0 1 0
0 1
y 1 1 0
2 3
r = ( ¬z )
z
0 1
0 0 ?
0 1
y 1 1 0
2 3
r = ( y ∧ ¬z )
z
0 1
0 ? ?
0 1
y 1 0 1
2 3
r = ( z )
z
0 1
0 0 1
0 1
y 1 ? 1
2 3
r = ( z )
z
0 1
0 1 ?
0 1
y 1 1 0
2 3
r = ( ¬z )
z
0 1
0 1 1
0 1
y 1 0 ?
2 3
r = ( ¬y )
x
z
00 01 11 10
0 0 0 1 1
0 1 5 4
y 1 0 0 0 1
2 3 7 6
r = ( x ∧ ¬z ) ∨
( x ∧ ¬y )
x
z
00 01 11 10
0 1 1 0 1
0 1 5 4
y 1 1 0 1 1
2 3 7 6
x
z
00 01 11 10
0 1 1 1 0
0 1 5 4
y 1 0 1 0 0
2 3 7 6
x
z
00 01 11 10
0 0 0 1 0
0 1 5 4
y 1 1 1 1 0
2 3 7 6
x
z
00 01 11 10
0 0 0 1 0
0 1 5 4
y 1 0 0 1 0
2 3 7 6
x
z
00 01 11 10
0 0 1 1 1
0 1 5 4
y 1 0 0 0 1
2 3 7 6
x
z
00 01 11 10
0 1 1 1 1
0 1 5 4
y 1 0 0 1 1
2 3 7 6
x
z
00 01 11 10
0 1 0 0 0
0 1 5 4
y 1 0 0 0 1
2 3 7 6
x
z
00 01 11 10
0 0 0 0 0
0 1 5 4
y 1 0 1 1 1
2 3 7 6
x
z
00 01 11 10
0 0 0 1 1
0 1 5 4
y 1 1 1 1 0
2 3 7 6
x
z
00 01 11 10
0 1 ? ? 0
0 1 5 4
y 1 0 ? 0 1
2 3 7 6
r = ( ¬x ∧ ¬y ) ∨
( x ∧ y ∧ ¬z )
x
z
00 01 11 10
0 1 ? 1 0
0 1 5 4
y 1 0 0 ? 0
2 3 7 6
r = ( ¬x ∧ ¬y ) ∨
( x ∧ z )
x
z
00 01 11 10
0 0 ? 1 ?
0 1 5 4
y 1 1 1 ? ?
2 3 7 6
r = ( y ) ∨
( z )
x
z
00 01 11 10
0 ? 0 ? ?
0 1 5 4
y 1 ? ? 1 0
2 3 7 6
r = ( y ∧ z )
x
z
00 01 11 10
0 1 1 ? ?
0 1 5 4
y 1 0 ? 0 0
2 3 7 6
r = ( ¬y )
x
z
00 01 11 10
0 ? 0 ? 1
0 1 5 4
y 1 0 ? 1 1
2 3 7 6
r = ( x )
x
z
00 01 11 10
0 ? 1 0 1
0 1 5 4
y 1 1 1 1 1
2 3 7 6
r = ( ¬z ) ∨
( ¬x ) ∨
( y )
x
z
00 01 11 10
0 0 1 0 1
0 1 5 4
y 1 ? ? ? 0
2 3 7 6
r = ( ¬x ∧ z ) ∨
( x ∧ ¬y ∧ ¬z )
x
z
00 01 11 10
0 1 ? ? 0
0 1 5 4
y 1 0 1 1 ?
2 3 7 6
r = ( ¬x ∧ ¬y ) ∨
( z )
x
z
00 01 11 10
0 0 0 1 1
0 1 5 4
y 1 0 0 ? 0
2 3 7 6
r = ( x ∧ ¬y )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z )
x
z
00 01 11 10
00 1 1 0 1
0 1 5 4
01 0 0 1 0
2 3 7 6
y
11 0 0 0 0
10 11 15 14
w
10 1 0 0 0
8 9 13 12
r = ( ¬w ∧ ¬x ∧ ¬y ) ∨
( ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( ¬w ∧ ¬y ∧ ¬z )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ ¬z )
x
z
00 01 11 10
00 1 0 1 0
0 1 5 4
01 1 0 1 0
2 3 7 6
y
11 0 1 0 1
10 11 15 14
w
10 1 1 0 1
8 9 13 12
r = ( w ∧ x ∧ ¬z ) ∨
( w ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ z ) ∨
( w ∧ ¬x ∧ z ) ∨
( ¬w ∧ ¬x ∧ ¬z )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )
x
z
00 01 11 10
00 1 0 1 0
0 1 5 4
01 0 0 0 1
2 3 7 6
y
11 0 0 1 0
10 11 15 14
w
10 0 0 0 1
8 9 13 12
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ ¬z )
x
z
00 01 11 10
00 0 1 1 1
0 1 5 4
01 0 0 0 0
2 3 7 6
y
11 1 1 0 1
10 11 15 14
w
10 1 1 0 1
8 9 13 12
r = ( ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ) ∨
( w ∧ ¬x )
x
z
00 01 11 10
00 1 0 0 1
0 1 5 4
01 0 1 1 0
2 3 7 6
y
11 1 1 0 1
10 11 15 14
w
10 1 0 0 1
8 9 13 12
x
z
00 01 11 10
00 0 1 0 1
0 1 5 4
01 0 0 0 1
2 3 7 6
y
11 0 0 1 1
10 11 15 14
w
10 0 1 1 1
8 9 13 12
r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )
x
z
00 01 11 10
00 0 0 1 1
0 1 5 4
01 1 1 1 1
2 3 7 6
y
11 1 0 0 0
10 11 15 14
w
10 1 0 0 1
8 9 13 12
r = ( w ∧ ¬x ∧ ¬z ) ∨
( ¬w ∧ x ) ∨
( x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ y )
r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z )
x
z
00 01 11 10
00 0 0 1 1
0 1 5 4
01 1 0 1 0
2 3 7 6
y
11 0 1 0 1
10 11 15 14
w
10 0 1 1 1
8 9 13 12
r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( x ∧ ¬y ) ∨
( w ∧ x ∧ ¬z ) ∨
( w ∧ ¬x ∧ z ) ∨
( ¬w ∧ x ∧ z )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z )
x
z
00 01 11 10
00 1 0 0 1
0 1 5 4
01 1 1 1 1
2 3 7 6
y
11 1 1 0 0
10 11 15 14
w
10 1 1 1 1
8 9 13 12
r = ( ¬w ∧ ¬z ) ∨
( w ∧ ¬x ) ∨
( ¬w ∧ y ) ∨
( w ∧ ¬y )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ y ∧ z )
x
z
00 01 11 10
00 0 1 0 0
0 1 5 4
01 0 0 0 1
2 3 7 6
y
11 0 1 1 0
10 11 15 14
w
10 1 1 0 0
8 9 13 12
r = ( w ∧ ¬x ∧ ¬y ) ∨
( ¬x ∧ ¬y ∧ z ) ∨
( w ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z )
r = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ y ∧ ¬z )
x
z
00 01 11 10
00 0 ? ? 0
0 1 5 4
01 ? 0 ? ?
2 3 7 6
y
11 1 ? ? 1
10 11 15 14
w
10 0 0 1 0
8 9 13 12
r = ( w ∧ y ) ∨
( x ∧ z )
r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z )
x
z
00 01 11 10
00 ? ? 1 1
0 1 5 4
01 0 1 0 0
2 3 7 6
y
11 1 1 0 0
10 11 15 14
w
10 1 1 ? 0
8 9 13 12
r = ( ¬x ∧ z ) ∨
( w ∧ ¬x ) ∨
( ¬w ∧ ¬y )
r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )
x
z
00 01 11 10
00 ? ? 0 0
0 1 5 4
01 1 0 0 0
2 3 7 6
y
11 ? 0 1 0
10 11 15 14
w
10 1 0 ? 1
8 9 13 12
r = ( w ∧ x ∧ ¬y ) ∨
( w ∧ x ∧ z ) ∨
( ¬x ∧ ¬z )
r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ z )
x
z
00 01 11 10
00 0 1 ? 0
0 1 5 4
01 0 0 1 0
2 3 7 6
y
11 1 1 ? 0
10 11 15 14
w
10 0 0 1 1
8 9 13 12
r = ( w ∧ x ∧ ¬y ) ∨
( ¬w ∧ ¬y ∧ z ) ∨
( w ∧ ¬x ∧ y ) ∨
( x ∧ z )
r = ( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( w ∧ ¬x ∧ ¬y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ z )
x
z
00 01 11 10
00 0 0 1 0
0 1 5 4
01 0 ? ? 1
2 3 7 6
y
11 0 ? 0 ?
10 11 15 14
w
10 ? 1 1 ?
8 9 13 12
r = ( x ∧ y ∧ ¬z ) ∨
( w ∧ ¬y ) ∨
( ¬w ∧ x ∧ z )
r = ( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ ¬x ∧ y ∧ z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z )
x
z
00 01 11 10
00 0 ? 0 1
0 1 5 4
01 0 ? 0 0
2 3 7 6
y
11 ? 1 0 0
10 11 15 14
w
10 ? ? ? 1
8 9 13 12
00 0 0 0 0
0 1 5 4
01 1 ? 1 ?
2 3 7 6
y
11 0 0 1 1
10 11 15 14
w
10 ? 0 0 ?
8 9 13 12
00 ? 1 ? 1
0 1 5 4
01 ? 0 ? 1
2 3 7 6
y
11 ? ? 1 0
10 11 15 14
w
10 ? 1 ? ?
8 9 13 12
r = ( x ∧ z ) ∨
( ¬w ∧ x ) ∨
( ¬y )
r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ y ∧ z ) ∨
( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ ¬y ∧ ¬z ) ∨
( w ∧ x ∧ y ∧ z )
x
z
00 01 11 10
00 ? 0 ? ?
0 1 5 4
01 ? 1 1 1
2 3 7 6
y
11 ? ? 1 ?
10 11 15 14
w
10 1 ? ? 1
8 9 13 12
r = ( y ) ∨
( w )
r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨
( ¬w ∧ x ∧ ¬y ∧ z ) ∨
( ¬w ∧ x ∧ y ∧ z )
x
z
00 01 11 10
00 ? 0 1 1
0 1 5 4
01 1 0 1 0
2 3 7 6
y
11 ? 0 ? 0
10 11 15 14
w
10 ? 0 ? ?
8 9 13 12
r = ( x ∧ z ) ∨
( x ∧ ¬y ) ∨
( ¬x ∧ ¬z )
B.3 Chapter 3
S138. First, notice that the function does not use y at all, so the function cannot add x to y. All other options are
plausible: to assess which is correct, one could take a brute-force approach and execute it via
int main( int argc , char* argv [] ) {
for( int i = 0; i < 256; i++ ) {
printf ( "%3d %3d\n", i, f( i ) );
}
return 0;
}
Doing so shows that that the function increments x. But why is this? Consider some specific examples:
• For x = 14(10) = 00001110(2) , the loop performs 0 iterations then terminates: the value
x | m = 00001110(2) ∨ 00000001(2)
= 00001111(2)
= 15(10)
i.e., x + 1 is returned.
More generally, the idea is that, as a result of initialising m to 1 and then left-shifting it in each iteration, the
while loop iterates over each LSB of x which is 1; while doing so, x is assigned the value x & ~m which toggles
the current bit. Once the loop terminates, the value x | m is returned: this sets the current bit of x, which we
know is 0 due to the loop condition, to 1.
• This option is arguably trickier, in the sense there is only one term; it is therefore harder to see how it
captures the two cases. To see why it does, you could apply sort of the opposite reasoning to the above. If
every bit of x is 0, then x = 0 and so x+1 = 1; ( x + 1 ) < 2 evaluates to non-zero in this case. If every bit
of x is 1, then x = −1 ≡ 232 − 1 and so x+1 = 0; ( x + 1 ) < 2 evaluates to non-zero in this case. Given x is
unsigned, all other representations it could take imply 1 ≤ x ≤ 232 − 2 and so 2 ≤ ( x + 1 ) < 2 ≤ 232 − 1;
in such cases we conclude that ( x + 1 ) > 2 evaluates to zero.
S140. Although one could select the correct option by inspection, the easiest approach is simply work through each
one. Doing that via a complete trace (e.g., of each intermediate computation, by each full-adder instance) is
overly verbose, so in the below, we capture the pertinent details only (noting the sequence representing the
carry-chain is read left-to-right; this matches the ripple-carry diagram, but might seem odd wrt. the right-to-left
order of digits in the literals):
x = 0000(2) = 0(10)
y = 0000(2) = 0(10)
r = 0000(2) = 0(10)
c = ⟨0, 0, 0, 0, 0⟩
c2 = 0
x = 1100(2) = 12(10)
y = 0001(2) = 1(10)
r = 1101(2) = 13(10)
c = ⟨0, 0, 0, 0, 0⟩
c2 = 0
x = 0100(2) = 4(10)
y = 0100(2) = 4(10)
r = 1000(2) = 8(10)
c = ⟨0, 0, 0, 1, 0⟩
c2 = 0
x = 1011(2) = 11(10)
y = 1001(2) = 9(10)
r = 0100(2) = 4(10)
c = ⟨0, 1, 1, 0, 1⟩
c2 = 1
x = 0110(2) = 6(10)
y = 0101(2) = 5(10)
r = 1011(2) = 11(10)
c = ⟨0, 0, 0, 1, 0⟩
c2 = 0
S141. In general, an overflow condition occurs when the correct result of some arithmetic operation cannot be
represented (meaning the result is then incorrect). Within the context outlined by the question, two instances of
this can occur: either 1) x is positive and y is positive, but r is negative, or 2) x is negative and y is negative, but
r is positive. In both instances, the sign of r is incorrect because the correct value of r has too large a magnitude
to represent in 8 bits.
In two’s-complement the MSB indicates the sign, meaning that x7 , y7 , and r7 indicate whether x, y, and r are
positive or negative respectively. Using this information we can translate each condition above into a Boolean
expression, i.e.,
x7 ∧ y7 ∧ ¬r7
indicates that x is positive, y is positive, and r is negative, while
¬x7 ∧ ¬y7 ∧ r7
indicates that x is negative, y is negative, and r is positive. As such, simply OR’ing these expressions together
produces the flag we require, i.e.,
S142. O(n) implies the critical path is proportional to the number of bits (including some constant factor) required
to represent each of the operands. The reason is the carry chain which runs through all n full-adders in the
design: each i-th full-adder produces a carry-out used as a carry-into the (i + 1)-th full-adder. This means each
i-th bit of the result depends on, and cannot be computed before, all j-th bits for 0 ≤ j < i.
An alternative, carry look-ahead design separates computation of carries from the full-adder cells them-
selves; this allows an organisation whose critical path can be described as O(log n), although the number of
logic gates required is less attractive.
( x & ( x - 1 ) ) == 0
This works because if x is an exact power-of-two then x − 1 sets all bits less-significant that the n-th to one; when
this is AND’ed with x (which only has the n-th bit set to one) the result is zero. If x is not an exact power-of-two
then there will be bits in x other than the n-th set to one; in this case x − 1 only sets bits less-significant than the
least-significant and hence there are others left over which, when AND’ed with x, result in a non-zero result.
Note that the expression fails, i.e., it is non-zero, for x = 0 but this is allowed since the question says x , 0.
S144. x is of type char, so is therefore represented using two’s-complement in 8 bits; values for such a representation
range between 2n−1 − 1 = 28−1 − 1 = 127 and −2n−1 = −28−1 = −128 inclusive. This means that by
b incrementing x we get the value after 127, which is −128: the reason for this is that the representation of
127 is 01111111(2) , but the next value 10000000(2) is the largest negative value possible. That is, there has
been an overflow with the result “wrapping around”.
S145. The expression computes the comparison 0 < x. This is because if x < 0 then x3 = 1, and if x = 0 then
x3 = x2 = x1 = x0 = 0. Therefore, x > 0 if both x3 = 0 and one of xi , 0 for i ∈ {2, 1, 0}. Strictly speaking, it tests
whether 0 < x ≤ 7 but the upper bound is implied by the representation of x: it cannot take a value greater
than 7 by definition.
S146.
The initial temptation is to use six adder components to compute
r=7·x=x+x+x+x+x+x+x
where the size of inputs and outputs increases as one progresses through the computation; a considered
approach might utilise carry save adders to reduce the critical path associated with the multiple summands,
but here we consider ripple-carry designs only.
A more efficient alternative would use three adders to compute
r = 7 · x = 4 · x + 2 · x + 1 · x = 22 · x + 21 · x + 20 · x
noting that the multiplications by powers-of-two are “free” since they can be achieved by simply relabelling
bits rather than computation. This approach can be further refined to compute
r = 7 · x = 8 · x − 1 · x = 23 · x − 20 · x
using just one adder (assuming addition and subtraction can be realised using the same component). Clearly
this will produce the shortest critical path, and relates to the following diagram:
+------------------+ +---+
x -- n-bit -->| 3-bit left - shift |-- (n+3) - bit -->| |
+------------------+ |sub|-- (n+4) - bit --> r
x -- n-bit -------------------------------------->| |
+---+
x1 x0 y1 y0 r3 r2 r1 r0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 1
0 1 1 0 0 0 1 0
0 1 1 1 0 0 1 1
1 0 0 0 0 0 0 0
1 0 0 1 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 1 0 1 1 0
1 1 0 0 0 0 0 0
1 1 0 1 0 0 1 1
1 1 1 0 0 1 1 0
1 1 1 1 1 0 0 1
Using four Karnaugh maps to produce each ri is overkill, since we can easily derive expressions for r0 and r3
by inspection. Therefore, transcribing the truth table into suitable Karnaugh maps for just r1 and r2 gives
x1 x1
x0 x0
r1 00 01 11 10 r2 00 01 11 10
00 0 0 0 0 00 0 0 0 0
0 1 5 4 0 1 5 4
01 0 0 1 1 01 0 0 0 0
2 3 7 6 2 3 7 6
y0 y0
11 0 1 0 1 11 0 0 0 1
10 11 15 14 10 11 15 14
y1 y1
10 0 1 1 0 10 0 0 1 1
8 9 13 12 8 9 13 12
r0 = ( x0 ∧ y0 )
r1 = ( x1 ∧ ¬ y1 ∧ y0 ) ∨
( x0 ∧ y1 ∧ ¬ y0 ) ∨
( ¬ x1 ∧ x0 ∧ y1 ) ∨
( x1 ∧ ¬ x0 ∧ y0 )
r2 = ( x1 ∧ ¬ x0 ∧ y1 ) ∨
( x1 ∧ y1 ∧ ¬ y0 )
r3 = ( x1 ∧ x0 ∧ y1 ∧ y0 )
S148. a Clearly we can implement , by negating the result of = and likewise for < and ≥, and > and ≤.
Furthermore, we can build ≥ from > and =, and ≤ from < and =. So essentially we only need two
comparisons, say = and < to be able to compute the rest so long as we have the logic operations as well.
The choice of which three is simply a matter of which ones you want to go faster: the ones built from a
combination of other comparison and logic instructions will take longer to execute. One might take the
approach of looking at C programs and selecting the set most used. For example = and < are used a lot
to program typical loops; one might select them for this reason.
b You can be as fancy as you want with any optimisations or special cases, for example checking for
multiplication by zero, one or a power-of-two might be a good idea. But basically, the easiest way to do
this is as follows:
uint16_t t = 0;
return t;
}
int H( uint16_t x ) {
int t = 0;
return t;
}
but this has a number of drawbacks. First, the overhead of of operating the loop quite high in comparison
to the content; for example the loop body needs only a few instructions, while it takes nearly as many
again to test and increment i during each iteration. Second, the number of branches in the code means
that pipelined processors might not execute them efficiently at all. An improvement is to use some form
of divide-and-conquer approach where we split the problem into 2-bit then 4-bit chunks and so on. The
result might look like:
int H( uint16_t x ) {
x = ( x & 0 x5555 ) + ( ( x >> 1 ) & 0 x5555 );
x = ( x & 0 x3333 ) + ( ( x >> 2 ) & 0 x3333 );
x = ( x & 0 x0F0F ) + ( ( x >> 4 ) & 0 x0F0F );
x = ( x & 0 x00FF ) + ( ( x >> 8 ) & 0 x00FF );
return ( int )( x );
}
S149. First, note that the result via a naive method would be
r2 = x1 · y1
r1 = x1 · y0 + x0 · y1
r0 = x0 · y0 .
However, we can write down three intermediate values using only three multiplications as
t2 = x1 · y1
t1 = (x0 + x1 ) · (y0 + y1 )
t0 = x0 · y0 .
The original result can then be expressed in terms of these intermediate values via
r2 = t2 = x1 · y1
r1 = t1 − t0 − t2 = x0 · y0 + x0 · y1 + x1 · y0 + x1 · y1 − x0 · y0 − x1 · y1
= x1 · y0 + x0 · y1
r0 = t0 = x0 · y0 .
So roughly speaking, over all we use three (n/2)-bit multiplications and four (n/2)-bit additions.
10(10) = 1010(2)
12(10) = 1100(2) +
10110(2)
where 10110(2) = 22(10) . In 4 bits this value is 0110(2) = 6(10) however, which is wrong.
Essentially the idea is that if a carry-out occurs from the most-significant adder, this turns all the output
bits to 1 via the additional OR gates. That is, if the carry-out occurs then we get 1111(2) = 15(10) as the
result, i.e., the largest 4-bit result possible.
S151. Since we know nothing about N, there is no obvious short-cut to performing the modular reduction after the
multiplication. Instead, the most simple way to approach the design is to recall that
x · y = x + x + ··· + x + x.
| {z }
y copies
So to compute x · y (mod N), we just have to make sure that each of the additions is modulo N; then we can
use whatever method we want. A circuit for modular addition is actually quite simple:
+-----+ +-----+
x ->| |---+------------->| |
| add | | | sub |---> r
y ->| | | +-->| |
+-----+ | | +-----+
v |
+-----+ +-----+
| | | |
N ->| lth |--->| mux |
| | | |
+-----+ +-----+
^ ^
| |
N 0
In short, we add x and y together, and then compare the result t with N: if t is smaller, we select 0 as the output
from the multiplexer otherwise we select N. Then, we subtract the value we selected from t. The end result is
that we get x + y − 0 = x + y (mod N) if x + y < N, and x + y − N = x + y (mod N) if x + y ≥ N.
Recall that an 8-bit, bit-serial multiplier would compute the product x · y as follows:
which then simply demand eight iterations, under control of a clock, over the circuit
Notice that we first perform the operation t + t (mod N), then use a multiplexer to decide if we take t + t
(mod N) or t + t + x (mod N) as the next value of t. So each iterated use of the circuit represents an iteration
of the algorithm loop. Of course, one could construct a combinatorial multiplier using the same approach, i.e.,
replacing any standard adder circuits with modular alternatives.
B.4 Chapter 4
S152. We can deal with the statements one-by-one:
• DRAM cells are based on use of a capacitor, so only use one transistor (to access the capacitor), while an
SRAM cells uses only transistors: a 6T SRAM cell design requires six for example, so certainly more than
one.
• Both SRAM and DRAM cells store one bit of information: larger memories are constructed by replicating
the cells, but it is not true that one or other cell can store more information.
• Their transistor-based design makes the access latency of SRAM cells low, i.e., they can be accessed
quickly. In contrast, the need to (dis)charge the capacitor limits a DRAM cell in this respect; it is a fair
assumption that DRAM cell access latency will be greater as a result.
• Their need to retain charge in the capacitor means DRAM cells need to be refreshed, since over time that
charge will naturally leak (st. the stored value will “degrade” in some sense).
S153. Various clues should (in combination) be strong enough to hint that
• α is provided input from Ai (the address pins), and is controlled (indirectly) by RAS: this is the row
address strobe. As such, this is likely to be the row address buffer.
• β is provided input from Ai (the address pins), and is controlled (indirectly) by CAS: this is the column
address strobe. As such, this is likely to be the column address buffer.
• γ is taking the content of α and controlling signals on the left-hand side (horizontal orientation) of the
memory array, suggesting it computes the row address: it is likely to be the row address decoder.
• δ is taking the content of β and controlling signals on the top side (vertical orientation) of the memory
array, suggesting it computes the column address: it is likely to be the column address decoder.
S154. SRAMs have a lower access latency in part because of their design: by using only transistors means their
operation is very fast. Therefore, the first statement is true. On the other hand, SRAMs are larger than
DRAMs since their design includes more components (typically six or so transistors versus one transistor and a
capacitor); as a result, their density (i.e., how many one can fit into unit area) is lower, and the second statement
is true as well. The third statement is false, and basically nonsense: the access latency should not depend on
the order. Finally, the forth statement is also false. Rather, a stored program or von Neumann architecture
holds both instructions and data in the same memory: a Harvard architecture segregates them into separate
memories.
S155. There are 216 addressable bytes, meaning a 16-bit address needs to be supplied. However, in contrast to an
SRAM memory, a DRAM memory will normally use a 2-step (or more, potentially) approach: half the address
is supplied by each of the steps (under control of row and column address strobe signals), which requires only
half the number of address pins.
The memory stores bytes, i.e., 8-bit elements, so we expect there to be 8 duplicated arrays each consisting of
65536 cells. Overall, there will be 8 · 65536 = 524288 cells. So, in summary, an answer of 8-bit address pins, and
524288 cells is correct; the alternative of 16-bit address pins, and 524288 cells is not wrong per se, but certainly
less likely in practice.
S156. We want a 32KiB memory, i.e., 32·1024 = 32768 addressible words each of 8 bits (or 1 byte). The memory devices
we have use a 4-bit data bus and 12-bit address bus: this implies that each one has 212 = 4096 addressible
words each of 4 bits. Two such devices could be combined to support an 8-bit word size: we simply take the
LSBs of each byte from one device, and the MSBs from the other. Therefore, to construct the memory required
we need
8 32768
· = 2 · 8 = 16
4 4096
devices.
S157. a A 1kB, byte-addressable SRAM would usually require n = 10 address wires. The n-bit A means addresses
between 0 and 2n − 1 = 210 − 1 = 1023 are accessible.
Consider a small(er) example of an SRAM where n = 3: the addresses
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 1 0 010(2) ≡ 2(10)
0 1 1 011(2) ≡ 3(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 1 0 110(2) ≡ 6(10)
1 1 1 111(2) ≡ 7(10)
are accessible. Now imagine the m-th address wire is misconnected where m = 1, meaning A1 = 0: this
yields
A2 A1 A0 A
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
0 0 0 000(2) ≡ 0(10)
0 0 1 001(2) ≡ 1(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
1 0 0 100(2) ≡ 4(10)
1 0 1 101(2) ≡ 5(10)
so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible. Put another way, 1/2 of the originally
accessible addresses will remain accessible. The same fact applies for any m, so for n = 1024 we conclude
that 1024/2 = 512 addresses are accessible.
b A 1kB, byte-addressable DRAM would usually require n = 5 address wires. The n-bit A means addresses
between 0 and 22·n − 1 = 22·5 − 1 = 210 − 1 = 1023 are accessible, because the address is communicated via
A in two steps: each communicates n bits, so produce a (2 · n)-bit address overall.
j
Consider a small(er) example of a DRAM where n = 2: if Ai denotes the i-th address wire as used in the
j-th step, the addresses
A11 A10 A01 A00 A
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 1 0 0010(2) ≡ 2(10)
0 0 1 1 0011(2) ≡ 3(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 1 0 0110(2) ≡ 6(10)
0 1 1 1 0111(2) ≡ 7(10)
1 0 0 0 1000(2) ≡ 8(10)
1 0 0 1 1001(2) ≡ 9(10)
1 0 1 0 1010(2) ≡ 10(10)
1 0 1 1 1011(2) ≡ 11(10)
1 1 0 0 1100(2) ≡ 12(10)
1 1 0 1 1101(2) ≡ 13(10)
1 1 1 0 1110(2) ≡ 14(10)
1 1 1 1 1111(2) ≡ 15(10)
are accessible. Now imagine the m-th address wire is misconnected where m = 1, meaning A1 = 0: this
yields
A11 A10 A01 A00 A
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 0 0 0 0000(2) ≡ 0(10)
0 0 0 1 0001(2) ≡ 1(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
0 1 0 0 0100(2) ≡ 4(10)
0 1 0 1 0101(2) ≡ 5(10)
so now only addresses 0(10) , 1(10) , 4(10) , and 5(10) are accessible.
Put another way, 1/4 of the originally accessible addresses will remain accessible. The same fact applies
for any m, so for n = 1024 we conclude that 1024/4 = 256 addresses are accessible.
Since
A = 48350(10) = 0BCDE(16) ,
this address therefore maps to memory device MEM1 because
08000(16) ≤ 0BCDE(16) ≤ 0BFFF(16) .
Of course, doing a similar search by hand is very time consuming; a manual solution would therefore use the
form of each eni as a short-cut. For example, we know
en0 = ¬A17 ∧ ¬A16 ∧ ¬A15 ,
i.e., en0 = 1 when the 3 MSBs of A are 0. This fact leads to the range
A ∈ {00000(16) , . . . , 07FFF(16) }
fairly directly, because it captures all 18-bit values whose 3 MSBs are 0, i.e.,
000000000000000000(2) = 00000(16)
000000000000000001(2) = 00001(16)
..
.
000111111111111111(2) = 07FFF(16)
S160. a Two main answers are clear. First, use of the DRAM device could imply a somewhat more involved, 2-step
access algorithm: it is common to latch the row and column buffers (under control of two dedicated row
and column strobes) in two steps, hence allowing half the number of pins to address the same number of
cells. Second, the DRAM cells need to be refreshed periodically since their content will decay. Typically
a mechanism to do this might be built into the device, but if not then the system itself will need to be
responsible for doing so.
b The obvious reason an SRAM device might have a lower access latency is because the individual cells
have a lower access latency: since SRAM cells are constructed from transistors, they can be accessed (read
from or written to) more quickly than a capacitor-based DRAM cell (which takes longer to charge and
discharge).
S161. a As the number of cells grows larger, providing enough address pins to identify each physical cell can
become impractical. One way to combat this problem is to multiplex the address pins; roughly this means
using less pins in more steps, e.g., n′ /2 pins in two steps rather than n′ pins in one step. Thus, under
control of the row and column strobes, two steps will see the (n′ /2)-bit row and column addresses latched
into the row and column buffer: once latched, the combined content forms a usable n′ -bit address. So in
short, the buffers are required to retain the row and column addresses during this process.
b Once the row and column buffers are latched with the address, the device is ready to access the identified
cell. However, depending on the geometry, i.e., number of rows and columns, a translation needs to
be made: this is the task of the row and column decoders. Essentially they implement the translation
between logical address and physical cell, activating said cell to perform the required operation (which
is either a read or a write).
B.5 Chapter ??
S162. Throughout the following, keep in mind that three main component groups can be identified in the implemen-
tation: read from bottom-to-top, these are
We know this is an FSM, so we expect the input register to hold the current state and the combinatorial logic to
compute both the transition and output functions. Specifically, the input register and x are provided as input
to combinatorial logic (in the middle-left and -center) that represents the transition function. It computes the
next state, then stored in the output register; the rest of this logic (in the middle-right) clearly represents the
output function, since it computes r.
a i Saying a signal is digital is the same as saying it takes the values 0 and 1 only; for a clock signal, this
is the same as saying it the form is a perfect square wave.
In practice this is difficult to achieve since transitions between 0 and 1 cannot be perfectly instanta-
neous. This implies each edge has a slope, however shallow this is. Even so, the same issue is true
for all digital logic components: if the inputs to an AND gate are neither 0 or 1 (or their associated
voltage levels), the output is undefined. So it is also fair to say this is a requirement for Φ1 and Φ2 ,
at least as far as is practical.
ii Φ1 and Φ2 are said to be non-overlapping in the sense a positive level on one always occurs at the
same time as a negative level on the other.
If this were not true, e.g., Φ1 = Φ2 , then a “loop” would form during the overlap: the output register
would be updated with whatever was computed by the combinatorial logic, which is fed by the
input register also being updated at the same time by the output register. This would likely result in
a malfunction of some sort: the input register could not settle into a stable state, for example.
iii To gate any signal, Φ1 and Φ2 included, means to (conditionally) disable them. This is typically
realised by adding extra logic, e.g., an AND gate, so the clock signal can be forced to 0.
This might be useful; it allows one to disable latch updates and so “pause” the FSM (e.g., to save
power when idle). However, doing so is not a requirement and not evident in this implementation.
iv In general, the concept of skew describes a situation where the clock signal arrives at two components
at different times; with a 2-phase clock, we might also find cases where Φ1 and Φ2 can arrive at the
same component at different times.
This is clearly undesirable, since the clock is meant to synchronise the component. If they were not
synchronised then malfunction is the likely result: one latch might be updated at a different time,
and hence with an unrelated value, than another, for example.
v The duty cycle of a clock (signal) is the percentage of each clock period in which it has a positive
level. For example, saying that Φ1 has a duty cycle of 33% is the same as saying Φ1 = 1 for a third of
the time (and hence Φ1 = 0 for two thirds of the time).
Here, there is no reason the duty cycle of Φ1 and Φ2 must be 33%. If is was 40%, for example, the
implementation would still function correctly. Assuming Φ1 and Φ2 have the same form, then of
course it must be true their duty cycles are less than 50% otherwise they would have to overlap.
Other than that, however, getting closer to 50% just means less separation between their positive
levels.
b This is not a trick question per se, but the correct answer is that the register might hold any 2-bit value
when powered-on. Although the register must settle into some state, it is not clear how we could predict
what this will be: the stored value is basically random, or more precisely determined by physics of the
underlying implementation and fabrication process.
As an aside, this FSM has an related, unattractive design feature: there is no reset input. This means
there is no way to enforce a start state, i.e., whatever value is held in the register at power-on is used as
the start state: the only way to alter this, and hence make the FSM function as required, is to power-cycle
the implementation and hope the initial stored value is the required start state!
c Partly as a result of the multiple-choice format, this question might seem odd. Exactly the same concepts
are involved, but the steps are the opposite way around: rather than derive an implementation from a
given specification, it asks you to reverse engineer a specification from a given implementation.
Each of the registers constitutes 2 D-type latches, so the FSM can be in at most 22 = 4 possible states.
Denoting the current (resp. next) state Q = ⟨Q0 , Q1 ⟩ (resp. Q′ = ⟨Q′0 , Q′1 ⟩), we can write an expression for
the next state and output in terms of the current state and input: this basically just means translating the
logic gate symbols into a Mathematical form. That is, we can write
Q′1 = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 ) ∨ (x ∧ Q0 )
Q′0 = (Q1 ∧ Q0 ) ∨ (x ∧ ¬Q0 )
r = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 )
x Q1 Q0 Q′1 Q′0 r
0 0 0 0 0 0
0 0 1 0 0 0
0 1 0 0 0 0
0 1 1 1 1 1
1 0 0 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
1 1 1 1 1 1
i.e., we reverse engineer the transition and output functions, expressed as a truth table. For example, if
the current state is Q = ⟨0, 0⟩ and the input is x = 1 then we can see the next state will be Q′ = ⟨1, 0⟩ and
the output will be r = 0.
The truth table encodes the same information as a diagrammatic alternative. The only difference is the
use of a concrete representation, rather than an abstract label for each state. We need the former, because
of course the implementation stores and computes Boolean values: it cannot deal with a label such as S0
or S3 means unless we give that label a value. So imagine we make such an (reverse) assignment, namely
⟨0, 0⟩ 7→ S0
⟨1, 0⟩ 7 → S1
⟨0, 1⟩ 7 → S2
⟨1, 1⟩ 7 → S3
Now we can say, for example that if the current state is S0 and the input is x = 1 then the next state will
be S1 and the output will be r = 0. This makes drawing the diagrammatic alternative a little easier: if
we draw 4 nodes for the four states, we join them with edges based on rows of the truth table. The end
result, and correct answer, is
x=1
d The central difference between Mealy- and Moore-type FSMs stems from how the output function is
defined. In the former, the output is a function of the current state and input; in the latter, only the current
state is relevant. For a set of states S and input and output alphabets Σ and Γ, this means for a Mealy-type
FSM we have
ω:S×Σ→Γ
whereas for a Moore-type FSM we have
ω : S → Γ.
For this FSM, we already know from the previous question that the output is described by
r = (Q1 ∧ Q0 ) ∨ (x ∧ Q1 ).
As such, it should be obvious r = ω(Q, x) is a function of both Q (the current state) and x (the input): this
is therefore a Mealy-type FSM.
e The behaviour of this FSM can be described as repeated iteration over two steps under control of the
clock. That is, it repeatedly does
• Step #1:
– the combinatorial logic compute the next state Q′ = δ(Q, x) and output r = ω(Q, x), and
– the output register latches Q′ .
Note that the critical path of this step is that from the the Q output of the input register to the Q
output of the output register.
• Step #2:
– the input register latches Q′ as Q.
Note that the critical path of this step is that from the the Q output of the output register to the Q
output of the input register.
f For example, the first step occurs during the period when Φ1 = 1 and the second when Φ2 = 1. Within
every clock period, i.e., within the “time limit” represented by ρ, both steps must be completed. Therefore,
we can say
ρ ≥ (Tlogic + Tlatch ) + (Tlatch )
where Tlatch and Tlogic are the critical paths associated with a D-type latch and the combinatorial logic
respectively.
Note the critical path runs through the middle-left or middle-center of the combinatorial logic: although
the former includes a NOT gate, the latter includes a 3- versus 2-input OR gate. Either way the delay is
50ns, so we can write
ρ ≥ 50 + 60 + 60ns
≥ 170ns
then compute the maximum clock frequency as
fmax = 1/ρ
= 1/170ns
≃ 5.9MHz
g From the definition of the transition function (above), it should be clear the FSM will progress from left-
to-right as consecutive inputs x = 1 are encountered. Moreover, encountering an input of x = 0 means
restarting in state S0 , and when eventually the FSM reaches state S3 it stays in that state (whether x = 1
or x = 0). So it basically counts the number of consecutive times x = 1 until that count is 3 (if it is in state
Si , then the count is i). This already provides the correct answer, but is further confirmed by inspecting
the output function: r = 1 when the FSM is in state S3 , i.e., when the count is 3.
S163. a Before t0 , we can see that a pulse on rst at the same time as Φ2 = 1; this acts as a reset, storing s (as a result
of the multiplexers) into the top register. Then, at t0 we find that Φ1 = 1: during this period, the design
stores a the bottom register as provided by the top register (which, at that point, is fixed since Φ2 = 0).
As such, at t0 we expect the bottom register to store s and hence r to be the MSB of s, i.e., r = s7 = 1.
b At t1 the design has performed one cycle relative to t0 : the value stored in the bottom register at t0 is
updated by the middle of the design, then stored in the top register, and finally stored back in the bottom
register (ready for the next cycle). The middle of the design is fairly simple. Ignoring the less-significant
end since this does not impact r (yet), it basically just shifts the bits toward the more-significant end. At
t1 , we therefore expect the bottom register to be st. r = s6 = 0.
c This design is a Linear Feedback Shift Register (LFSR); such a design might be used to support a variety of
use-cases, with a common example being the generation of (pseudo-)random bits. As the name suggests,
an LFSR is essentially an n-bit shift register. After initialising (or seeding) the register state with s,
successive updates are performed; each such update a) shifts-out an output bit (wlog. the MSB), which
forms the LFSR output, and b) shifts-in an input bit (wlog. the LSB), which is computed using a linear
function of the state. A set T captures the tap bits, which specify the function of x used to compute the
input bit; given n, T is selected to maximise the period of the LFSR, noting that x = 0 should be disallowed
to avoid trivial behaviour.
Both Fibonacci- and Galois-form LFSR designs are possible; in this case, we have an example of the
former, with n = 8 and T = {3, 4, 5, 7}. Given a state x, the update process, yielding an output bit r and a
next state x′ , can be formalised as
r = x7
=
L
x′ (x ≪ 1) ∥ ( i∈T xi )
= (x6 ∥ x5 ∥ · · · ∥ x0 ) ∥ (x3 ⊕ x4 ⊕ x5 ⊕ x7 )
As such, we can use a table to trace the state and output as it is updated:
i x x′ r
A6(16) seed x with s
0 A6(16) 4C(16) 1 generate 0-th output bit
1 4C(16) 99(16) 0 generate 1-st output bit
2 99(16) 33(16) 1 generate 2-nd output bit
3 33(16) 66(16) 0 generate 3-rd output bit
4 66(16) CD(16) 0 generate 4-th output bit
5 CD(16) 9A(16) 1 generate 5-th output bit
6 9A(16) 35(16) 1 generate 6-th output bit
7 35(16) 6A(16) 0 generate 7-th output bit
8 6A(16) D4(16) 0 generate 8-th output bit
.. .. .. ..
. . . .
Using this table, we can infer that at time t2 (where the 8-th output bit is generated, which is the first bit
which is computed from x vs. matching s), r = 0.
d Within the clock period (i.e., within the “time limit” which ρ dictates), two steps must be completed; those
steps are completed when Φ1 = 1 and Φ2 = 1 respectively, and can be described as 1) the top register must
be updated with a value computed by the middle of the design (i.e., the combinatorial logic) from the
value in the bottom register, then 2) the bottom register must be updated with the value in the top register.
So if Tlatch and Tlogic are the critical paths associated with a D-type latch and said combinatorial logic
respectively, then we can write
ρ ≥ (Tlogic + Tlatch ) + (Tlatch ).
Adding more detail, we could then reflect the critical path of components constituting the combinatorial
logic: writing
Tlogic = Txor + Txor + Tmux
then reflects the fact that the critical path includes two XOR gates and one multiplexer. Overall then, we
have
ρ ≥ (Txor + Txor + Tmux + Tlatch ) + (Tlatch )
≥ 2 · Tlatch + 2 · Txor + Tmux
Since we have the design of each component, we can, as a next step, be more concrete about each term
above: inspecting the NAND based designs, we can deduce
Tlatch = 4 · Tnand = 40ns
Txor = 3 · Tnand = 30ns
Tmux = 3 · Tnand = 30ns
and thus
ρ ≥ 2 · Tlatch + 2 · Txor + Tmux
≥ 2 · 40ns + 2 · 30ns + 30ns
≥ 80ns + 60ns + 30ns
≥ 170ns
Tlatch arguably represents the more tricky case, noting that the cross-coupled right-hand side means the
path is through 4 NAND gates. Finally, the maximum clock frequency is inversely proportional to this
critical path so we find
fmax = 1/ρ
= 1/170ns
≃ 5.9MHz
is correct.
S164. Basically this question is asking us to reverse engineer the FSM implementation into a design and hence
functionality; to do that, we can step backwards through the process we have that would normally step forwards.
The first step is therefore be to inspect the implementation and extract pertinent features: 1) the bottom and
top D-type latches capture 1-bit current and next states, i.e., Q and Q′ , respectivly, 2) between the two we can
identify an output function r = ω(Q) = ¬Q and a transision function Q′ = δ(Q, rst) = (¬rst) ∧ (¬Xi ∨ Q). Note
that we can classify this as a Moore-type FSM, since the output r is determined by the current state Q alone.
The next step is to reconstruct a concrete, tabular description of the FSM, i.e., a truth table, using ω and δ:
δ ω
rst Xi Q Q′ r
0 0 0 1 1
0 0 1 1 0
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 1
1 1 1 0 0
Because Q and Q′ are each represented by a single D-type latch, we can infer the FSM has (at most) two states.
Other assignments are possible provided we are consistent, but the most natural would be to say Q = 0 7→ S0
and Q = 1 7→ S1 . Given that rst = 1 forces Q′ = 0, we can infer than S0 is an initial state; Given that r = ¬Q and
so r = 1 iff. Q = 0, we can infer than S0 is an accepting state.
The next step is to reconstruct a abstract, diagrammatic description of the FSM:
Xi = 0
Xi = 0
start S0 S1
Xi = 1 Xi = 1
The final step demands some creativity, in the sense that we need to interpret the functionality realised: although
doing so is not trivial, we can approach it by tring to explain in words what the FSM does step-by-step. For
example, note that the FSM starts in state S0 and stays there while the input is Xi = 1. However, as soon as it
encounters an input st. Xi = 0 it will transistion to state S1 : it stays there whether the input is Xi = 0 or Xi = 1.
So, put another way, the FSM
r = X0 ∧ X1 ∧ · · · ∧ Xn−1 .
Q′0 = x ∧ ( Q0 ∨ Q1 )
Q′1 = x ∧ ( ¬Q0 ∨ Q1 )
r = ¬x ∧ ( Q0 ∧ Q1 )
where the expressions for Q′0 and Q′1 constitute the transition function δ, and the expression for r constitutes
the output function ω. This means:
a 3 gates are involved in the output function implementation (2 AND gates, and 1 NOT gate), and
b 5 gates are involved in the transition function implementation (2 AND gates, 2 OR gates, and 1 NOT
gate).
+---------+
+-->| \delta |
| +---------+
| ^ |
| Q | | Q'
| | v
| +---------+
input --+ | state |<-- clock
| +---------+
| |
| Q |
| v
| +---------+
+-->| \omega |--> output
+---------+
st.
• An n-bit register (middle component) holds Q, the current state of the FSM.
• Within a given clock period, the current state is provided as input to δ, the transition function: based on
Q and any input, this computes the next state Q′ .
• At the same time that δ is computing the next state, the output function ω computes any output from the
FSM; depending on the type of FSM, this might be based on Q only, or on Q and any input.
• A positive edge of the clock signal causes the state to be updated with the output from δ. That is, the FSM
advances from the current to next state; computation by δ and ω is performed in the same way during
the subsequent clock period, once Q has been updated with Q′ .
Note that this framework is assumed in any of the following questions that ask for it.
This FSM can be in one of two states: either the bits of X processed so far have an even or odd number of
elements equal to 1; we give each of the states a label, so in this case Seven and Sodd for example. Next we can
describe how the FSM can transitions from some current state to a next state, i.e., how the transition function δ
works: based on an input Xi provided at each step, we might draw
Xi = 0 Xi = 0
Xi = 1
Xi = 1
or equivalently say
δ
Q Q′
Xi = 0 Xi = 1
Seven Seven Sodd
Sodd Sodd Seven
where Q is the current state and Q′ is the next state.
Given the FSM has two states only, we can store the current state using a 1-bit register. Based on a natural
mapping of the abstract to concrete state labels (i.e., Seven 7→ 0 and Sodd 7→ 1), we can rewrite the transition
function as a truth table:
Xi Q Q ′
0 0 0
0 1 1
1 0 1
1 1 0
and see clearly that Q′ = Q ⊕ Xi . Inspecting Q directly provides the output: if Q = 0 we have (so far) even
parity, and in contrast if Q = 1 we have odd parity. So in a sense the output function ω is simply the identity
function. In short, the low-level detail filled into the high-level design is very simple (in this case at least) once
the question has been digested.
One obvious addition would be some form of mechanism to reset the FSM: as stated above, we assume it
starts in the state Seven when powered-on but clearly this may not be true (the content in Q will essentially be
random initially).
S167. a There are several approaches to solving this problem. Possibly the easiest, but perhaps not the most
obvious, is to simply build a shift-register: the register stores the last three inputs, when a new input is
available the register shifts the content along by one which means the oldest input drops off one end and
the new input is inserted into the other end. One can then build a simple circuit to test the current state
of the shift-register to see if the last three inputs match what is required..
Alternatively, one can take a more heavy-weight approach and formulate the solution as a state machine.
First we need to decide on an encoding for our state; when searching though the input we can have
matched zero through three correct tokens we denote this by the integer S stored in two bits using Q1
and Q0 as the most-significant and least-significant respectively. We also need an encoding of the actual
input tokens I which are being passed to the matching circuit. Arbitrarily we might select A = 0, C = 1,
G = 2 and T = 3 although other encodings are valid and might actually simplify things; we use I1 and
I0 to denote the most and least-significant bits of the input token I. From this we can now create a table
describing the mapping between current state S and input I to next state S′ which can be roughly written
as
I1 I0 Q1 Q0 Q′1 Q′0
0 0 0 0 0 1
0 1 0 0 0 0
1 0 0 0 0 0
1 1 0 0 0 0
0 0 0 1 0 1
0 1 0 1 1 0
1 0 0 1 0 0
1 1 0 1 0 0
0 0 1 0 0 1
0 1 1 0 0 0
1 0 1 0 0 0
1 1 1 0 1 1
0 0 1 1 0 1
0 1 1 1 0 0
1 0 1 1 0 0
1 1 1 1 0 0
Thus, if we are in state (Q1 , Q0 ) = (0, 0) = 0 and see an A input, we move to state (Q′1 , Q′0 ) = (0, 1) = 1
otherwise we stay in state (Q′1 , Q′0 ) = (0, 0) = 0. Now we can define the transition function from current
state to next state as
Q0 = (¬Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0 )∨
(¬Q1 ∧ Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ Q0 ∧ ¬I1 ∧ ¬I0 )∨
(Q1 ∧ ¬Q0 ∧ I1 ∧ I0 )
Q1 = (¬Q1 ∧ Q0 ∧ ¬I1 ∧ I0 )∨
(Q1 ∧ ¬Q0 ∧ I1 ∧ I0 )
with simplifications as appropriate. Finally, the output flag F will be set only according to
F = Q1 ∧ Q0
to signal when we have matched three characters. As such, we can realise the FSM framework described
in Solution 166 by filling each component with the associated implementation above.
b Making a general-purpose matching circuit will probably use less logic than having three separate circuits;
this will reduce the space required. As an extension one might consider implementing the transition and
output functions as a look-up table instead of hard-wiring them; this will mean the circuit could be used
to match any sequence providing the tables were correctly initialised. Introducing a more complex circuit
design could have the disadvantage of increasing the critical path (the longest sequential path though
the entire circuit). If the critical path is longer, the design will have to be clocked slower and hence will
not perform the matching function as quickly.
S168. a A basic diagram should show the four states and transitions between them which relate the movement
from one to the other as a result of the washing cycle, and movement as a result of input from the buttons;
for example a (very) basic diagram would be:
b Since there are four states, we can encode them using one two bits; we assign the following encoding
idle = 00, fill = 01, wash = 10 and spin = 11. We use Q1 and Q0 to represent the current state, and Q′1 and
Q′0 to represent the next state; B1 and B0 are the input buttons. Using this notation, we can construct the
following state transition table which encodes the state machine diagram:
B1 B0 Q1 Q0 Q′1 Q′0
0 0 0 0 0 0
0 1 0 0 0 1
1 0 0 0 0 0
1 1 0 0 0 0
0 0 0 1 1 0
0 1 0 1 1 0
1 0 0 1 0 0
1 1 0 1 1 0
0 0 1 0 1 1
0 1 1 0 1 1
1 0 1 0 0 0
1 1 1 0 1 1
0 0 1 1 0 0
0 1 1 1 0 0
1 0 1 1 0 0
1 1 1 1 0 0
so that if, for example, the machine is in the wash state (i.e., Q1 = 1 and Q0 = 0) and no buttons are pressed
then the next state is spin (i.e., Q′1 = 1 and Q′0 = 1); however if button B1 is pressed to cancel the cycle, the
next state is idle (i.e., Q′1 = 0 and Q′0 = 0).
c From the state transition table, we can easily extract the two Karnaugh maps:
Q1 Q1
Q0 Q0
Q′1 00 01 11 10
Q′0 00 01 11 10
00 0 1 0 1 00 0 0 0 1
0 1 5 4 0 1 5 4
01 0 1 0 1 01 1 0 0 1
2 3 7 6 2 3 7 6
B0 B0
11 0 1 0 1 11 0 0 0 1
10 11 15 14 10 11 15 14
B1 B1
10 0 0 0 0 10 0 0 0 0
8 9 13 12 8 9 13 12
i The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of times
Xi = 1. This can be computed as
n−1
X
HW(X) = Xi .
i=0
ii The Hamming distance between X and Y is the number of bits in X that differ from the corresponding
bit in Y, i.e., the number of times Xi , Yi :
n−1
X
HD(X, Y) = Xi ⊕ Yi .
i=0
b There are two main approaches to constructing a flip-flop of this type; since both start with an SR-latch,
the difference is mainly in how the edge-triggered behaviour is realised. Use of a primary-secondary
organisation is probably the more complete solution, but a simpler alternative would be to use a pulse
generator. The overall design can be described roughly as follows:
+---+ +---+
D--+---------------------------->| |---S-->| |
| |AND| |NOR|-->r_0 = ~Q
v +-->| | r_1 -->| |
+---+ +------------+ | +---+ +---+
| | | | |
|NOT| en -->| pulse gen. |--+
| | | | |
+---+ +------------+ | +---+ +---+
| +-->| |---R-->| |
| |AND| |NOR|-->r_1 = Q
+---------------------------->| | r_0 -->| |
+---+ +---+
i An SR-latch has two inputs S and R, and two outputs Q and ¬Q. When
• S = 0, R = 0 the component retains Q,
• S = 1, R = 0 the component updates to Q = 1,
• S = 0, R = 1 the component updates to Q = 0,
• S = 1, R = 1 the component is meta-stable.
The component is level-triggered in the sense that Q is updated within the period of time when
S = 1 or R = 1 (rather than when they transition to said values).
ii To provide more fine-grained control over the component, the two inputs are typically gated using
(i.e., AND’ed with) an enable signal en: when en = 0, the latch inputs are always zero and hence it
retains the same state, when en = 1 it can be updated as normal.
iii In order to change from the current level-triggered behaviour into an edge-triggered alternative,
one approach is to use a pulse generator. The idea here is to intentionally create a mismatch in
propagation delay into the inputs of an AND gate: each time en changes, the result is that we see a
small pulse on the output of the AND gate. Provided this is small enough, one can argue it acts like
an edge rather than a level.
iv Finally, the gated S and R inputs are tied together and controlled by one input D meaning S = D and
R = ¬D. This prevents the component being used erroneously: it can only retain or update the state.
c The power consumed by CMOS transistors can be decomposed into two parts: the static part (which
relates to leakage) and the dynamic part (which relates to power consumed when the transistor switches).
In short, a value switching (i.e., changing from one value to another) consumes much more power than
staying the same. In this case, clearly we have an advantage in the all but one of the n bits in the register
will stay the same; hence in terms of power consumption, storing elements of the the Gray code (versus
some other sequence for example) is an advantage.
e As an aside, a potentially neat approach here is to use a Johnson counter. This is basically an n-bit register
(initialised to zero) whose content is shifted by one place on each clock edge. The new incoming, 0-th bit
is computed as the NOT of the outgoing, (n − 1)-th bit and every other bit is shifted up by one place (e.g.,
each i-th bit for 1 ≤ i < n − 1 becomes the (i + 1)-th bit). For n = 3, this produces the sequence
⟨0, 0, 0⟩
⟨1, 0, 0⟩
⟨1, 1, 0⟩
⟨1, 1, 1⟩
⟨0, 1, 1⟩
⟨0, 0, 1⟩
..
.
which satisfies the Hamming distances property, but does not include all possible values: for example,
⟨1, 0, 1⟩ is not included. So this does not really answer the question in the sense that we require a
component that cycles through the full 2n -element sequence, an example of which is
⟨0, 0, 0⟩
⟨1, 0, 0⟩
⟨1, 1, 0⟩
⟨0, 1, 0⟩
⟨0, 1, 1⟩
⟨1, 1, 1⟩
⟨1, 0, 1⟩
⟨0, 0, 1⟩
As a result, we can use an FSM-based approach based on the framework in the question above. For n = 3
there are 23 = 8 elements in the Gray code, and so a 3-bit state Q = ⟨Q0 , Q1 , Q2 ⟩ is enough to store the
current element. The output function ω is basically free: we simply provide the current state Q as output,
which is also the current element in the Gray code sequence. Based on the inputs Q and rst, the state
transition function δ can be described as follows:
From this truth table we can (more easily that usual perhaps) extract Karnaugh maps for each bit of the
next state Q′
Q1 Q1 Q1
Q0 Q0 Q0
Q′2 00 01 11 10
Q′1 00 01 11 10
Q′0 00 01 11 10
00 0 0 0 1 00 0 1 1 1 00 1 1 0 0
0 1 5 4 0 1 5 4 0 1 5 4
01 0 1 1 1 01 0 0 0 1 01 0 0 1 1
2 3 7 6 2 3 7 6 2 3 7 6
Q2 Q2 Q2
11 0 0 0 0 11 0 0 0 0 11 0 0 0 0
10 11 15 14 10 11 15 14 10 11 15 14
rst rst rst
10 0 0 0 0 10 0 0 0 0 10 0 0 0 0
8 9 13 12 8 9 13 12 8 9 13 12
Q′2 = ( ¬rst ∧ Q2 Q0 ) ∨
( ¬rst ∧ Q1 ¬Q0 )
Q′1 = ( ¬rst ∧ ¬Q2 ∧ Q0 ) ∨
( ¬rst ∧ Q1 ∧ ¬Q0 )
Placing the associated combinatorial logic and a 3-bit, D-type flip-flop based register to store Q into the
generic framework, we end up with a component that cycles through our 3-bit Gray code sequence under
control of a clock signal.
b There are a few different ways to interpret some parts of the problem definition, but one reasonable
approach is as follows:
H=1
B2 = 1 B0 = 1 B1 = 1
and and and
H=0 H=0 H=0
start S0 S1 S2 S3
H=1
H=1
H=1
Essentially, the idea is that by pressing buttons we advance from the stating state S0 toward the final state
S3 (as long as the handle is not turned, which means we go back to the start): when in S3 the door is
unlocked, otherwise it remains locked. In particular, if the buttons are pressed in the wrong order we
get “stuck” half way along the sequence and never reach S3 . For example if B1 is pressed while in state
S1 , the FSM does not (and cannot ever) transition into S2 since the button stays pressed: the only way to
“unstick” the FSM is to turn the handle, reset the mechanism and start again.
There are four states in total; since 22 = 4 we can represent the current state Q as a 2-bit integer, making
the concrete assignment
S0 7→ ⟨0, 0⟩
S1 7 → ⟨1, 0⟩
S2 7→ ⟨0, 1⟩
S3 7 → ⟨1, 1⟩
The FSM diagram can be expressed as a truth table, particular to this P, which captures the various
transitions:
H B2 B1 B0 Q1 Q0 Q′1 Q′0
0 0 ? ? 0 0 0 0
0 1 ? ? 0 0 0 1
1 ? ? ? 0 0 0 0
0 ? ? 0 0 1 0 1
0 ? ? 1 0 1 1 0
1 ? ? ? 0 1 0 0
0 ? 0 ? 1 0 1 0
0 ? 1 ? 1 0 1 1
1 ? ? ? 1 0 0 0
0 ? ? ? 1 1 1 1
1 ? ? ? 1 1 0 0
Implementing this truth table via a 6-input Karnaugh map is a little more tricky than with fewer inputs;
instead, we simply derive the expressions by inspection (i.e., by forming a term for each 1 entry in a given
output) to yield
Q′1 = ( ¬H B0 ∧ ¬Q1 ∧ Q0 ) ∨
( ¬H ∧ ¬B1 ∧ Q1 ∧ ¬Q0 ) ∨
( ¬H ∧ B1 ∧ Q1 ∧ ¬Q0 ) ∨
( ¬H ∧ Q1 ∧ Q0 )
with minor optimisation possible thereafter. Returning to the framework, the idea is then that we
i instantiate the middle box with a 2-bit register, using D-type flip-flops for example, to store Q,
ii instantiate the top box to implement δ using the equations above,
iii instantiate the bottom box to implement ω using the equation
L = ¬(Q1 ∧ Q0 )
c The purpose of a clock signal is to control the FSM, advancing it through steps (i.e., transitions) with all
components synchronised. However, the only updates of state occur on positive transistors of Bi or H.
That is, the FSM only chances state when one of the buttons is pressed, or the handle turned: in each
case, this means the associated value transitions from 0 to 1. As a result, one could argue the expression
H ∨ B0 ∨ B1 ∨ B1 ∨ B2
can be used to advance the FSM (i.e., latch the next state produced by the transition function), rather than
“polling” the buttons and handle at each clock edge to see if their value has changed.
i The content stored in an SRAM memory is lost if the power supply is removed: such devices depend
on a power supply so transistors used to maintain the stored content can operate. In the context of
the proposed approach, this means if a power cut occurs, for example, then the password will be
“forgotten” by the lock.
ii When the power supply comes back online the password might be essentially random due to the
way SRAMs work. If this is not true however, and the SRAM is initialised into a predictable value
(e.g., all zero), this could offer an attractive way to bypass the security offered!
iii Given physical access to the lock, one might simply read the password out of the SRAM. With an FSM
hard-wired to a single password, the analogue is arguably harder: one would need to (invasively)
reverse engineer the gate layout and connectivity, then the FSM design.
Less attractive answers include degradation of performance (e.g., as a result of SRAM access latency) or
increase in cost: given constraints of the application, neither seems particular important. For example
the access latency of SRAM memory is measured in small fractions of a second; although arguably true
in general, from the perspective of a human user of the door lock the delay will be imperceptible.
e This is quite open-ended, but one reasonable approach would be as follows:
i This is a slightly loaded question in that it implies some alteration is needed; as such, marks might
typically be given for identifying the underlying reason, and explaining each aspect of the proposed
alteration.
The crucial point to realise is testing implementations of δ and ω, for example, depends on being
able to set (and possibly inspect) the state Q which acts as input to both. An example technology to
allow this would be JTAG, which requires an additional interface (inc. TDI, TDO, TCLK, TMS and
TRST pins) and also injection of a scan chain to access all flip-flops. This allows the test process to
scan a value into Q one bit at a time, run the system normally, then scan out Q to test it.
ii The idea would be to place each system under the control of a test stimulus that automates a series of
tests: the test stimulus has access to all inputs (i.e., the JTAG interface, each button and the handle)
and outputs (e.g., the JTAG interface, and the lock mechanism), and is tasked with making sure the
overall behaviour matches some reference.
In this context, the number of states, inputs and outputs is small enough that a brute force approach
is reasonable; this is also motivated by the fact there are no obvious boundary cases and so on. The
strategy would therefore be: for each entry in the truth table
• put the device in test mode,
• scan-in the state Q and drive each Bi with the associated values,
• put the device in normal mode, and force an update of the FSM using the clock signal,
• put the device in test mode,
• check the value of L matches that expected,
• scan-out and check the value of Q matches that expected.
An alternative answer might focus on some form of BIST, but in essence this just places all the above
inside the system rather than viewing it as something done externally.
S171. a At least three advantages (or disadvantages, depending on which way around you view the options) are
evident:
• With option one, extracting each digit of the current PIN to form a guess is trivial; with option
two this is much harder, in that we need to take the integer P and decompose it into a decimal
representation (through repeated division and modular reduction).
• With option one, incrementing the current PIN is harder (since the addition is in decimal); with
option two this is much easier, in that we can simply use a standard integer adder.
• With option one, the total storage requirement is 4 · 4 = 16 bits; with option two this is only 14 bits,
since 214 = 16384 > 9999.
Based on this, and reading ahead to the next question, the decimal representation seems more attractive:
designing a decimal adder is significantly easier than a binary divider.
b Given the choice, and although both options are viable, we focus on a design for the second, decimal
representation: this is simpler by some way, so the expected answer. At a high-level, the component can
be described as follows:
Pi = Gi so production of the guess is trivial; the other output is a little harder. The basic idea is to use
something similar to a ripple-carry adder. Each i-th cell takes a decimal digit Pi and a carry-in from the
previous, (i − 1)-th cell; it produces a decimal digit P′i and a carry-out into the next (i + 1)-th cell. The
difference from a binary ripple-carry adder then is that it only accepts one digit rather than two as input
(since it increments P rather than computes a general-purpose addition), plus it obviously works with
decimal rather than binary digits.
There are various ways to approach the design of each decimal adder cell, but perhaps the most straight-
forward uses two stages:
The first stage computes an integer sum r′ = x + ci. Although this could be realised using a standard
ripple-carry adder, we can make a more problem-specific improvement: a ripple-carry adder normally
uses full-adder cells that compute x + y + ci, but we lack the second input y. Thus we can use half-adder
cells instead, which use half the number of gates; we assume such a half-adder is available as a standard
component. The second stage takes r′ = x + ci as input, and produces the outputs r and co, implementing
the modular reduction. The range of each input means 0 ≤ r′ < 11, or equivalently that cases where
r′ > 10 are impossible. We can describe the behaviour of the stage using the following truth table:
r′1 r′1
r′0 r′0
r3 00 01 11 10 r2 00 01 11 10
00 0 0 0 0 00 0 0 0 0
0 1 5 4 0 1 5 4
01 0 0 0 0 01 1 1 1 1
r′2 2 3 7 6
r′2 2 3 7 6
11 ? ? ? ? 11 ? ? ? ?
r′3 10 11 15 14
r′3 10 11 15 14
10 1 1 ? 0 10 0 0 ? 0
8 9 13 12 8 9 13 12
r′1 r′1
r′0 r′0
r1 00 01 11 10 r0 00 01 11 10
00 0 0 1 1 00 0 1 1 0
0 1 5 4 0 1 5 4
01 0 0 1 1 01 0 1 1 0
r′2 2 3 7 6
r′2 2 3 7 6
11 ? ? ? ? 11 ? ? ? ?
r′3 10 11 15 14
r′3 10 11 15 14
10 0 0 ? 0 10 0 1 ? 0
8 9 13 12 8 9 13 12
r3 = r′3 ∧ ¬r′1
r2 = r′2
r1 = ¬r′3 ∧ r′1
r0 = r′0
c The FSM maintains a current state Q. Given there are five states, we can represent the current state as
Q = ⟨Q0 , Q1 , Q2 ⟩
i.e., three bits (since 23 = 8 > 5), so the device could store it in a register comprised of three D-type
flip-flops; doing so accepts there are three unused state representations.
We can represent the states as follows
S0 = ⟨0, 0, 0⟩
S1 = ⟨1, 0, 0⟩
S2 = ⟨0, 1, 0⟩
S3 = ⟨1, 1, 0⟩
S4 = ⟨0, 0, 1⟩
and therefore formulate a tabular transition function δ:
Turning these into Karnaugh maps and then Boolean expressions is a little tricky due to the need for five
inputs. To cope, we assume there are no transitions from S0 and ignore b, then patch the equation for Q′0
(the only bit of the next state influenced by moving out of S0 ) appropriately. That is, we get the following
Q1 Q1 Q1
Q0 Q0 Q0
Q′2 00 01 11 10
Q′1 00 01 11 10
Q′0 00 01 11 10
00 0 0 0 0 00 0 1 0 1 00 0 0 1 1
0 1 5 4 0 1 5 4 0 1 5 4
01 1 ? ? ? 01 0 ? ? ? 01 0 ? ? ?
2 3 7 6 2 3 7 6 2 3 7 6
Q2 Q2 Q2
11 1 ? ? ? 11 0 ? ? ? 11 0 ? ? ?
10 11 15 14 10 11 15 14 10 11 15 14
r r r
10 0 0 1 0 10 0 1 0 1 10 0 0 0 1
8 9 13 12 8 9 13 12 8 9 13 12
Q′2 = ( r∧ Q1 ∧ Q0 ) ∨
( Q2 )
Q′1 = ( ¬ Q1 ∧ Q0 ) ∨
( Q1 ∧ ¬ Q0 )
Q′0 = ( b ∧ ¬ Q2 ∧ ¬ Q1 ∧ ¬ Q0 ) ∨
( ¬r ∧ Q1 ) ∨
( Q1 ∧ ¬ Q0 )
S172. a First we need to decode the machine code program: using Figure A.20, we find that
Next we can produce a trace of execution for the program: starting with the initial configuration given,
we find that
C0 = (0, 0, 2, 1, 0)
L0 { if R2 = 0 then goto L3 else goto L1
C1 = (1, 0, 2, 1, 0)
L1 { R2 ← R2 − 1 then goto L2
C2 = (2, 0, 2, 0, 0)
L2 { if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 2, 0, 0)
L0 { if R2 = 0 then goto L3 else goto L1
C4 = (3, 0, 2, 0, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 2, 0, 0)
L4 { R1 ← R1 − 1 then goto L5
C6 = (5, 0, 1, 0, 0)
L5 { R2 ← R2 + 1 then goto L6
C7 = (6, 0, 1, 1, 0)
L6 { if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 1, 1, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 1, 1, 0)
L4 { R1 ← R1 − 1 then goto L5
C10 = (5, 0, 0, 1, 0)
L5 { R2 ← R2 + 1 then goto L6
C11 = (6, 0, 0, 2, 0)
L6 { if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 0, 2, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C13 = (7, 0, 0, 2, 0)
L7 { halt
As a result, stating that the program will “copy the value in R1 into R2 , clearing the value in R1 ” is the
best match. Note that the program itself is in two parts: L0 to L2 clear (or zero) R2 , and L3 to L6 move R1
into R2 . Also note that it depends on having R0 = 0, allowing the construction of unconditional branches
in L2 and L6 .
b First we need to decode the machine code program: using Figure A.20, we find that
Next we can produce a trace of execution for the program: starting with the initial configuration given,
we find that
C0 = (0, 0, 3, 2, 1)
L0 { if R3 = 0 then goto L3 else goto L1
C1 = (1, 0, 3, 2, 1)
L1 { R3 ← R3 − 1 then goto L2
C2 = (2, 0, 3, 2, 0)
L2 { if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 3, 2, 0)
L0 { if R3 = 0 then goto L3 else goto L1
C4 = (3, 0, 3, 2, 0)
L3 { if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 3, 2, 0)
L4 { R3 ← R3 + 1 then goto L5
C6 = (5, 0, 3, 2, 1)
L5 { R1 ← R1 − 1 then goto L6
C7 = (6, 0, 2, 2, 1)
L6 { if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 2, 2, 1)
L3 { if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 2, 2, 1)
L4 { R3 ← R3 + 1 then goto L5
C10 = (5, 0, 2, 2, 2)
L5 { R1 ← R1 − 1 then goto L6
C11 = (6, 0, 1, 2, 2)
L6 { if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 1, 2, 2)
L3 { if R1 = 0 then goto L7 else goto L4
C13 = (4, 0, 1, 2, 2)
L4 { R3 ← R3 + 1 then goto L5
C14 = (5, 0, 1, 2, 3)
L5 { R1 ← R1 − 1 then goto L6
C15 = (6, 0, 0, 2, 3)
L6 { if R0 = 0 then goto L3 else goto L7
C16 = (3, 0, 0, 2, 3)
L3 { if R1 = 0 then goto L7 else goto L4
C17 = (7, 0, 0, 2, 3)
L7 { if R2 = 0 then goto L11 else goto L8
C18 = (8, 0, 0, 2, 3)
L8 { R3 ← R3 + 1 then goto L9
C19 = (9, 0, 0, 2, 4)
L9 { R2 ← R2 − 1 then goto L10
C20 = (10, 0, 0, 1, 4)
L10 { if R0 = 0 then goto L7 else goto L11
C21 = (7, 0, 0, 1, 4)
L7 { if R2 = 0 then goto L11 else goto L8
C22 = (8, 0, 0, 1, 4)
L8 { R3 ← R3 + 1 then goto L9
C23 = (9, 0, 0, 1, 5)
L9 { R2 ← R2 − 1 then goto L10
C24 = (10, 0, 0, 0, 5)
L10 { if R0 = 0 then goto L7 else goto L11
C25 = (7, 0, 0, 0, 5)
L7 { if R2 = 0 then goto L11 else goto L8
C26 = (11, 0, 0, 0, 5)
L11 { halt
where the final configuration halts execution.
As a result, stating that the program will “add the values in R1 and R2 , setting R3 to reflect the result” is
the best match. Note that the program itself is in three parts: L0 to L2 clear (or zero) R3 , L3 to L6 add R1
to R3 , L7 to L10 add R2 to R3 . Also note that it depends on having R0 = 0, allowing the construction of
unconditional branches in L2 , L6 , and L10 .
More specifically, the (green) register address and the (blue) branch target address mean the instruction
semantics are
if R2 = 0 then goto L5 else goto Li+1 ,
i.e., if register 2 equals 0 then goto instruction 5, else goto instruction i + 1.
S174. Once fetched, the instruction inst = 100111001(2) is provided as input to the decoder: based on the implemen-
tation given, the decoder will therefore produce
as output. Looking then at the data- and control-path to assess how these outputs are used to execute the
instruction, we conclude that
In general then, this instruction will writes 0 into register Raddr ; given addr = 3 here,
B.6 Chapter ??
S175. a wire [ 7 : 0 ] a;
b wire [ 0 : 4 ] b;
c reg [ 31 : 0 ] c;
d reg signed [ 15 : 0 ] d;
e reg [ 7 : 0 ] e[ 0 : 1023 ];
f genvar f;
S176. a c = 2'b01
b c = 2'b11
c d = 4'b010X
d d = 4'b10XX
e d = 4'b1101
f d = 4'b0111
g c = 2'bXX
h c = 2'b11
i e = 1'b0
j e = 1'b1
S177. a One potential problem is the if p and q can change at any time, and hence trigger execution of the processes
at any time, the two might change at the exact same time. In this case, it is not clear which values x and y
will be assigned to. Maybe the top assignment to x beats the bottom one, but the bottom assignment to y
beats the top one. Any combination is possible; since it is not clear which will occur, it is possible that x
and y do not get assigned to the values one would expect.
As an attempt at a solution, we can try to exert some control over which block of assignments is executed
first. For example, we might try to place a guard around the assignments:
since now the process is at least deterministic: if p is equal to 1 then the first block executes, if q is equal
to 1 then the second block executes and if both are equal to 1 we execute the first block as the default.
b The problem with this is that the state signal is not initialised; to start with it could be any value which
might either result in the state machine operating in the wrong sequence or, since case is used and not
casex or casez, none of the cases being matched at all. A slightly more minor issue is that we have to
assume that no other process assigns to state. For example, if another process sets state to 3 the state
machine process will malfunction.
The best way to rectify this problem is by introducing a reset signal called rst and initialising the state
variable whenever it is equal to 1:
S178. This is a bit of a vague question; it does not mention styles of Verilog and so you can assume any valid style is
allowed. With this in mind, a rough solution might look something like this:
reg [ 3 : 0 ] t;
endmodule
y
|
v
+-----+
x-->| C |-->min(x,y)
+-----+
|
V
max(x,y)
and thus need some sort of comparison inside; we already know how to design a less than comparison which
is good enough. Thus, the Verilog module could look something like this:
parameter n = 8;
output wire r;
input wire [ n - 1 : 0 ] x;
input wire [ n - 1 : 0 ] y;
wire [ n - 1 : 0 ] w0 = ~( x ^ y );
wire [ n - 1 : 0 ] w1 = ( ~x & y );
wire [ n - 1 : 0 ] w2;
genvar i;
generate
for( i = 1; i < n; i = i + 1 ) begin
assign w2[ i ] = w1[ i ] | ( w0[ i ] & w2[ i - 1 ] );
end
endgenerate
endmodule
S180. In software the task is fairly simple; we simply have an 8-bit variable called Q which maintains the content of
the shift register, and allow two functions to initialise and update the value (i.e., clock the register) as follows:
uint8_t Q = 0;
uint8_t lfsr () {
// compute outgoing bit
uint8_t r = Q & 1;
// update state
Q = ( Q >> 1 ) |
( x << 7 ) ;
return r;
}
Following this approach, in Verilog the design is also simple: we just need to deal with the required action
each time a positive clock edge triggers an update. For example, the following does more or less the same
thing as the C version:
input wire [ 7 : 0 ] s,
output reg r );
reg [ 7 : 0 ] Q;
endmodule
That is, we again have 8-bit register called Q; each time a positive edge occurs on clk we make a choice
• If rst is true then we initialise the LFSR by setting Q equal to the seed value s,
• If rst is false then we update the LFSR by first setting the output r equal to the 0-th bit of Q, then updating
Q via a concatenation expression (which performs the shift, meaning the (n − 1)-th bit of the result is the
XOR of the tap bits).