Arch Module Final-1
Arch Module Final-1
Organization
and
Architecture
(ITec381)
GHION TECHNOLOGY COLLEGE
Department of COMPUTER SICENCE
Computer Organization
and
Architecture
(ITec381)
Module
August 2011
DebreMarkos
Table of Contents
Course Objectives....................................................................................................................................................i
Learning Outcomes.................................................................................................................................................i
References..........................................................................................................................................................VIII
ITec381 Computer Organization and Architecture
Course Objectives
The overall objective of this course is to provide students with a fundamental understanding of the
functional components of a computer system, and how they are organized. The emphasis of the module
is on the hardware aspects of a system, and how hardware is used during the execution of software. This
course is aimed at introducing students with architecture (design) and organization of computer
components.
Learning Outcomes
By the end of the module the student would be able to:
i
ITec381 Computer Organization and Architecture
Chapter 1
Boolean Algebra and Digital Logic Circuits
Objectives:
After completing this chapter you will be able to:
Understand Boolean algebra and its applications in circuit analysis
Construct the truth table for a given Boolean function
Be familiar with logic gates
Simplify Boolean functions using different techniques
Implement the circuit design the of Boolean functions
Be familiar with combinational circuits
Be familiar with Sequential circuits
Boolean algebra makes use of logical variables and logical operators.The possible values for a logical
variable are either TRUE or FALSE. For ease of use, these values are, conventionally, represented by 1
and 0 respectively. A system in which the only possible values are 0 and 1 is the binary number system.
Likewise,it is similar to the binary states of digital electronics and that is why Boolean algebra is used to
analyze digital circuits. The logical operators of Boolean algebra are AND, OR, and NOT, which are
symbolically represented by dot (∙), plus sign (+), and overbar( ¯ ). Often the dot is omitted in Boolean
expression. Hence, A∙B is written as AB without the dot.
The operation AND yields true (binary value 1) if and only if both of its operands aretrue.The operation
OR yields true if either or both of its operands are true.The unaryoperation NOT inverts the value of its
operand.
Other useful derived operators are NAND, NOR, and XOR. NAND (stands for NOT - AND) is a
combination of AND and NOT. It is the opposite of AND. NOR (NOT - OR) is formed by combining
OR and NOT. NOR is the opposite of OR. Similarly XOR is equivalent to the equation X∙Y + X∙Y.
Operator NOT is a unary operator and others are binary operators. XOR yields 1 when only if exactly
one of the operands has the value 1.Except NOT, all operators and can be used with more than two
variables. Table 1-1 shows the truth tables for these Boolean operators. A truth table shows the results of
an operation for every possible combination of values for its variables.
Table 1-2 shows important identities of Boolean algebra. These identities are useful in simplifying
Boolean functions in order to find simple circuit designs.
0 1 1 1 0 0 1 1
1 0 0 1 0 0 1 1
1 1 0 1 1 0 0 0
Each gate shown in Figure 1-1 has one or two inputs and one output. However,it is alreadystated that, all
of the gates except NOT can have more than twoinputs.Thus, they can be implemented with thee
inputs.When one or more of the values at the input are changed, the correct outputsignal appears almost
instantaneously, delayed only by the propagation time ofsignals through the gate (known as the gate
delay). In some cases, a gate is implemented with two outputs, oneoutput being the negation of the other
output.
Here we introduce a common term: we say that to assert a signal is to causesignal line to make a
transition from its logically false (0) state to its logically true(1) state. The true (1) state is either a high
or low voltage state, depending on thetype of electronic circuitry.
Figure 1-1: Basic logic gates
Typically, not all gate types are used in implementation. Design and fabricationare simpler if only one or
two types of gates are used.Thus, it is important to identifyfunctionally complete sets of gates.This
means that any Boolean function can be implementedusing only the gates in the set.The following are
functionally complete sets:
AND, OR,NOT
AND,NOT
OR,NOT
NAND
NOR
It should be clear that AND, OR, and NOT gates constitute a functionallycomplete set, because they
represent the three operations of Boolean algebra. Forthe AND and NOT gates to form a functionally
complete set, there must be a way tosynthesize the OR operation from the AND and NOT operations.
Similarly, the OR and NOT operations are functionally complete because they canbe used to synthesize
the AND operation.
Figure 1-2 (a) shows how the AND, OR, and NOT functions can be implementedsolely with NAND
gates, and Figure 2-2 (b)shows the same thing for NOR gates. Forthis reason, digital circuits can be, and
frequently are, implemented solely withNAND gates or solely with NOR gates. Although this may not
be the minimum-gate implementation, it has the advantage of regularity, which can simplify the
manufacturing process.
With gates, we have reached the most primitive circuit level of computer hardware.An examination of
the transistor combinations used to construct gates departsfrom that realm and enters the realm of
electrical engineering. For our purposes,however, we are content to describe how gates can be used as
building blocks toimplement the essential logical circuits of a digital computer.
Figure 1-2: (a) The use of NAND gate (b)The use of NOR gate
Truth table: For each of the 2npossible combinations of input signals, thebinary value of each of
the m output signals is listed.
Graphical symbols: The interconnected layout of gates is depicted.
Boolean equations: Each output signal is expressed as a Boolean function ofits input signals.
(1.1)
F = ABC + ABC + ABC
There are three combinations of input values that cause F to be 1, and if any oneof these combinations
occurs, the result is 1.This form of expression, for self-evidentreasons, is known as the sum of products
(SOP) form. Figure 1-3 shows a straightforwardimplementation with AND, OR, and NOT gates.
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0
Another form can also be derived from the truth table.The SOP form expressesthat the output is 1 if any
of the input combinations that produce 1 is true.We canalso say that the output is 1 if none of the input
combinations that produce 0 is true.Thus,
F=(ABC)∙(ABC)∙(ABC)∙(ABC)∙(ABC)
F=(A+B+C)∙(A+B+C)∙(A+B+C)∙(A+B+C)∙(A+B+C)=(A+B+C)∙(A+B+C)
(1.2)
This is in the product of sums (POS) form, which is illustrated in Figure 1-4. Forclarity,NOTgates are
not shown. Rather, it is assumed that each input signal and itscomplement are available.This simplifies
the logic diagram and makes the inputs tothe gates more readily apparent.
Thus, a Boolean function can be realized in either SOP or POS form. At thispoint, it would seem that the
choice would depend on whether the truth table containsmore 1s or 0s for the output function:The SOP
has one term for each 1, and thePOS has one term for each 0. However, there are other considerations:
It is often possible to derive a simpler Boolean expression from the truth tablethan either SOP or
POS.
It may be preferable to implement the function with a single gate type(NAND or NOR).
The significance of the first point is that, with a simpler Boolean expression, fewer gates will be needed
to implement the function. Three methods that can be used to achieve simplification are
Algebraic simplification
Karnaugh maps
Quine–McKluskey tables
(1.3)
F = AB + BC
Or even simpler as
F = B(A + C)
This expression can be implemented as shown in Figure 1-5. The simplification of Equation (1.1) was
done essentially by observation. For more complex expressions, some more systematic approach is
needed.
1.3.3 KarnaughMaps
The map can be used to represent any Boolean function in the following way.Each square corresponds
to a unique product in the sum-of-products form, with a 1value corresponding to the variable and a 0
value corresponding to the NOT of that variable. Thus, the product corresponds to the fourth square in
Figure 1-6a. For each such product in the function, 1 is placed in the corresponding square. Thus, the
map in Figure 1-6a corresponds to the two-variable example. Given the truth table of a
Boolean function, it is an easy matter to construct the map: for each combination of values of variables
that produce a result of 1 in the truth table, fill in the corresponding square of the map with 1. Figure 1-
6b shows the result for the truth table of Table 1-3. To convert from a Boolean expression to a map, it is
first necessary to put the expression into what is referred to as canonical form: each term in the
expression must contain each variable. So, for example, if we have Equation (1.3), we must first expand
it into the full form of Equation (1.1) and then convert this to a map.
The labeling used in Figure 1-6d emphasizes the relationship between variables and the rows and
columns of the map. Here the two rows embraced by the symbol A are those in which the variable A has
the value 1; the rows not embraced by the symbol A are those in which A is 0; similarly for B, C, and D.
Once the map of a function is created, we can often write a simple algebraic expression for it by noting
the arrangement of the 1s on the map. The principle is as follows. Any two squares that are adjacent
differ in only one of the variables. If two adjacent squares both have an entry of one, then the
corresponding product terms differ in only one variable. In such a case, the two terms can be merged by
eliminating that variable. For example, in Figure 1-7a, the two adjacent squares correspond to the two
terms ABCD and ABCD . Thus, the function expressed is
ABCD + ABCD = ABD
This process can be extended in several ways. First, the concept of adjacencycan be extended to include
wrapping around the edge of the map. Thus, the topsquare of a column is adjacent to the bottom square,
and the leftmost square of arow is adjacent to the rightmost square. These conditions are illustrated in
Figures1-7b and c. Second, we can group not just 2 squares but 2 nadjacent squares (that is,2, 4, 8, etc.).
The next three examples in Figure 1-7 show groupings of 4 squares.Note that in this case, two of the
variables can be eliminated. The last three examplesshow groupings of 8 squares, which allow three
variables to be eliminated.
1. Among the marked squares (squares with a 1), find those that belong to aunique largest block of
1, 2, 4, or 8 and circle those blocks.
2. Select additional blocks of marked squares that are as large as possible and asfew in number as
possible, but include every marked square at least once. Theresults may not be unique in some
cases. For example, if a marked square combineswith exactly two other squares, and there is no
fourth marked square tocomplete a larger group, then there is a choice to be made as two which
of thetwo groupings to choose.When you are circling groups, you are allowed to usethe same 1
value more than once.
3. Continue to draw loops around single marked squares, or pairs of adjacent marked squares, or
groups of four, eight, and so on in such a way that everymarked square belongs to at least one
loop; then use as few of these blocks aspossible to include all marked squares.
Figure 1-8a, based on Table 1-3, illustrates the simplification process. If anyisolated 1s remain after the
groupings, then each of these is circled as a group of 1s.Finally, before going from the map to a
simplified Boolean expression, any group of1s that is completely overlapped by other groups can be
eliminated.This is shown inFigure 1-8b. In this case, the horizontal group is redundant and may be
ignored increating the Boolean expression.
One additional feature of Karnaugh maps needs to be mentioned. In somecases, certain combinations of
values of variables never occur, and therefore the correspondingoutput never occurs. These are referred
to as “don’t care” conditions.For each such condition, the letter “d” is entered into the corresponding
square ofthe map. In doing the grouping and simplification, each “d” can be treated as a 1 or
0,whichever leads to the simplest expression.
For more than four variables, the Karnaughmap method becomes increasinglycumbersome. With five
variables, twomaps are needed, with one map considered to be on top of the other in three dimensionsto
achieve adjacency. Six variables require the use of four tablesin four dimensions! An alternative
approach is a tabular technique, referred to asthe Quine–McKluskey method.The method is suitable for
programming on a computerto give an automatic tool for producing minimized Boolean expressions.
Let us assume that this expression was derived from a truth table.We would like toproduce a minimal
expression suitable for implementation with gates.
The first step is to construct a table in which each row corresponds to one of theproduct terms of the
expression. The terms are grouped according to the number ofcomplemented variables.That is, we start
with the term with no complements, if it exists,then all terms with one complement, and so on. Table 1-4
shows the list for theexample expression, with horizontal lines used to indicate the grouping. For clarity,
each term is represented by a 1 for each uncomplemented variable and a 0 for each complemented
variable. Thus, we group terms according to the number of 1s they contain.
The index column is simply the decimal equivalent and is useful in what follows.The next step is to find
all pairs of terms that differ in only one variable, that is,all pairs of terms that are the same except that
one variable is 0 in one of the termsand 1 in the other. Because of the way in which we have grouped the
terms, we cando this by starting with the first group and comparing each term of the first groupwith
every term of the second group.Then compare each term of the second groupwith all of the terms of the
third group, and so on.Whenever a match is found, placea check next to each term, combine the pair by
eliminating the variable that differsin the two terms, and add that to a new list.Thus, for example, the
termsandare
ABCD combined ABCDto produce ABC.This process continues until the entire originaltable has been
examined.The result is a new table with the following entries:
ACDABCABD
BCD ACD
ABCBCD
ABD
The new table is organized into groups, as indicated, in the same fashion as thefirst table.The second
table is then processed in the same manner as the first.That is,terms that differ in only one variable are
checked and a new term produced for a thirdtable. In this example, the third table that is produced
contains only one term:BD.In general, the process would proceed through successive tables until a
tablewith no matches was produced. In this case, this has involved three tables.
Once the process just described is completed, we have eliminated many of thepossible terms of the
expression.Those terms that have not been eliminated are used toconstruct a matrix, as illustrated in
Table 1-5. Each row of the matrix corresponds toone of the terms that have not been eliminated (has no
check) in any of the tables usedso far. Each column corresponds to one of the terms in the original
expression.An X isplaced at each intersection of a row and a column such that the row element is
“compatible”with the column element. That is, the variables present in the row elementhave the same
value as the variables present in the column element. Next, circle each Xthat is alone in a column. Then
place a square around each X in any row in which there is a circled X. If every column now has either a
squared or a circled X, then we are done, and those row elements whose Xs have been marked constitute
the minimal expression. Thus, in our example, the final expression is
In cases in which some columns have neither a circle nor a square, additional processing is required.
Essentially, we keep adding row elements until all columns are covered.
Let us summarize the Quine–McKluskey method to try to justify intuitivelywhy it works. The first phase
of the operation is reasonably straightforward. Theprocess eliminates unneeded variables in product
terms. Thus, the expressionis equivalent to AB, because
ABC + ABC = AB (C + C) = AB
After the elimination of variables, we are left with an expression that is clearlyequivalent to the original
expression. However, there may be redundant terms inthis expression, just as we found redundant
groupings in Karnaugh maps.The matrixlayout assures that each term in the original expression is
covered and does so in away that minimizes the number of terms in the final expression.
Inthis section, we examine sequential circuits taking flip-flops as example.As will be seen, the
sequential circuit makes use of combinational circuits.
1.4.1 Flip-Flops
The simplest form of sequential circuit is the flip-flop. There are a variety of flipflops,all of which share
two properties:
The flip-flop is a bistable device, i.e. has two stable states. It exists in one of two states and, in
the absenceof input, remains in that state.Thus, the flip-flop can function as a 1-bit memory.
The flip-flop has two outputs, which are always the complements of each other.These are
generally labeled Q and.Q
First, let us show that the circuit is bistable. Assume that both S and R are 0 and that Q is 0. The inputs
to the lower NOR gate are and Thus, the output means that the inputs to the upper NOR gate are Q = 0
and S = 0. Thus the output Q = 1 means that the inputs to the upper NOR gate are Q = 1 and R = 0 which
has the output Q = 0. Thus, the state of the circuit is internally consistent and remains stable as long as S
= R = 0. A similar line of reasoning shows that the state Q = 1, Q = 0 is also stable for R = S = 0.
Thus, this circuit can function as a 1-bit memory. We can view the output Q as the “value” of the bit.
The inputs S and R serve to write the values 1 and 0, respectively, into memory. To see this, consider the
state Q = 0, Q = 1, S = 0, R = 0. Suppose that S changes to the value 1. Now the inputs to the lower
NOR gate are S = 1, Q = 0. After some time delay Δt, the output of the lower NOR gate will be Q = 0
(see Figure 1-14). So, at this point in time, the inputs to the upper NOR gate become R = 0, Q= 0.
After another gate delay of Δt the output Q becomes 1. This is again a stable state. The inputs to the
lower gate are now S = 1, Q = 1, which maintain the output Q = 0. As long as S = 1 and R = 0, the
outputs will remain Q = 1, Q = 0. Furthermore, if S returns to 0, the outputs will remain unchanged.
The R output performs the opposite function. When R goes to 1, it forcesQ = 0, Q = 1 regardless of the
previous state of Q and. Q
Again, a time delay of2Δt occurs before the final state is established (Figure 1-
10).
The S–R latch can be defined with a table similar to a truth table, called acharacteristic table, which
shows the next state or states of a sequential circuit as a functionof current states and inputs. In the case
of the S–R latch, the state can be definedby the value of Q. Table 1-6a shows the resultingcharacteristic
table. Observe thatthe inputs S = 1, R = 1 are not allowed, because these would produce an
inconsistentoutput (bothQ Q andequal 0).The table can be expressed more compactly, as in Table1-6b. An
illustration of the behavior of the S–R latch is shown in Table 1-6c.
t 0 1 2 3 4 5 6 7 8 9
S 1 0 0 0 0 0 0 0 1 0
R 0 0 0 1 0 0 1 0 0 0
Qn+1 1 1 1 0 0 0 0 0 1 1
The output of the S–R latch changes, after a brief time delay, in response to a change in the input. This is
referred to as asynchronous operation. More typically, events in the digital computer are synchronized to
a clock pulse, so that changes occur only when a clock pulse occurs. Figure 1-11 shows this
arrangement. This device is referred to as a clocked S–R flip-flop. Note that the R and S inputs are
passed to the NOR gates only during the clock pulse.
D Flip-Flop
One problem with S–R flip-flop is that the condition must be avoided. One way to do this is to allow just
a single input. The D flip-flop accomplishes this. Figure 1-12 shows a gate implementation and the
characteristic table of the D flip-flop. By using an inverter, the non-clock inputs to the two AND gates
are guaranteed to be the opposite of each other.
The D flip-flop is sometimes referred to as the data flip-flop because it is, in effect, storage for one bit of
data. The output of the D flip-flop is always equal to the most recent value applied to the input. Hence, it
remembers and produces the last input. It is also referred to as the delay flip-flop, because it delays a 0
or 1 applied to its input for a single clock pulse. We can capture the logic of the D flip-flop in the
following truth table:
D Qn+1
0 0
1 1
J–K Flip-Flop
Like the S–R flip-flop,it has two inputs. However, in this case all possible combinations of input values
arevalid. Figure 1-13 shows a gate implementation of the J–K flip-flop, and Figure1-14 shows its
characteristic table (along with those for the S–R and D flip-flops).Note that the first three combinations
are the same as for the S–R flip-flop.With noinput asserted, the output is stable. If only the J input is
asserted, the output is stable. If only the J input is asserted, the result is a setfunction, causing the output
to be 1; if only the K input is asserted, the result is areset function, causing the output to be 0.When both
J and K are 1, the function performedis referred to as the toggle function: the output is reversed.Thus, if
Q is 1 and1 is applied to J and K, then Q becomes 0.
b) F = X+Y + XY
2. Simplify each the following Boolean functions using algebraic properties, Karnaugh Maps, and
theQuine–McKluskeymethod.
a) F = X+Y + X(XY) + Y(XY)
b) F = X(Y+Z) + XY+ XZ + YZ + X + Y + Z
3. Draw the logic diagram for the Boolean functions given in question 1 and the simplified functions in
question 2.
4. Construct the SOP and POS diagrams of the Boolean functions of question 1.
5. Explain the difference between combinational and sequential circuits.
6. List the different types of flip-flops and draw the block diagram of each.
Chapter 2
Common Digital Components
Objective:
After completing this chapter you will be:
Familiar with common combinational circuits (Adder, multiplexer, and decoder)
Familiar with common sequential circuits (registers and binary counters)
Familiar with memory circuits (RAM and ROM)
Able to appreciate the characteristics and applications of different digital circuitry
Familiar with integrated circuit (IC)
As the technology of ICs has improved, the number of gates that can be put in a single chip has
increased. Small-scale integration (SSI) devices contain several (usually less than 10) independent gates
in a single package. Medium-scale integration (MSI) devices contain approximately 10 to 200 gates in a
single package. They usually perform specific elementary digital functions such as decoders, adders, and
registers. Large-scale integration (LSI) devices contain between 200 and a few thousands gates in a
single package. They include digital systems such as processors, memory chips, and programmable
modules. Very-large-scale integration (VLSI) devices contain thousands of gates in a single package.
Examples are large memory arrays and complex microcomputer chips.
Digital integrated circuits are classified not only by their logic operation but also by the specific circuit
technology to which they belong. The circuit technology is referred to as a digital logic family. Each
logic family has its own basic electronic circuit upon which more complex digital circuits and functions
are developed. The basic circuit in each technology is either a NAND, a NOR, or an inverter gate. The
electronic components that are employed in the construction of the basic circuit are usually used for the
name of the technology. Many different logic families of integrated circuits have been introduced
commercially. The most popular are:
Because of their many advantages, integrated circuits are used exclusively to provide various digital
components needed in the design of computer systems. To understand the organization and design of
digital computers it is very important to be familiar with the various components encountered in
integrated circuits. For this reason, the most basic components are introduced in this chapter with an
explanation of their logical properties. These components provide a catalog of elementary digital
functional units commonly used as basic building blocks in the design of digital computers.
Binary addition differs from Boolean algebra in that the result includes a carry term. Thus,
However, addition can still be dealt with in Boolean terms. In Table 2-1a, we show the logic for adding
two input bits to produce a 1-bit sum and a carry bit. This truth table could easily be implemented in
digital logic. A digital arithmetic circuit that carries out the addition of a pair of bits is called a half
adder. However, we are not interested in performing addition on just a single pair of bits. Rather, we
wish to add two n-bit numbers along with a carry from a previous bitwise addition. Such digital circuit is
called a full adder. A combination of two half adders creates a full adder. This can be done by putting
together a set of adders so that the carry from one adder is provided as input to the next.Figure 2-1
shows the block diagram and the logic diagram for a half adder and Figure 2-2 shows that of a full
adder.
You can notice that the sum and carry columns of the half adder truth table are similar to that of an XOR
and AND logic operation truth tables. Therefore, the sum and carry of a pair of bits can be implemented
with an XOR and an AND gate as shown in Figure 2-1 and Figure 2-2.
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
By combining a number of full adders, we can have the necessary logic to implement a multiple-bit
adder such as shown in Figure 2-3. Note that because the output from each adder depends on the carry
from the previous adder, there is an increasing delay from the least significant to the most significant bit.
Each single-bit adder experiences a certain amount of gate delay, and this gate delay accumulates. For
larger adders, the accumulated delay can become unacceptably high.
2.2.2 Multiplexers
The multiplexer connects multiple inputs to a single output. At any time, one of the inputs is selected to
be passed to the output. A general block diagram representation is shown in Figure 2-4. This represents
a 4-to-1 multiplexer. There are four input lines, labeled D0, D1, D2, and D3. One of these lines is
selected to provide the output signal F. To select one of the four possible inputs, a 2-bit selection code is
needed, and this is implemented as two select lines labeled S1 and S2.
D0
4-to-1
D1 MUX F
D2
D3
S2 S1
0 0 D0
0 1 D1
1 0 D2
1 1 D3
An example 4-to-1 multiplexer is defined by the truth table in Table 2-2. This is a simplified form of a
truth table. Instead of showing all possible combinations of input variables, it shows the output as data
from line D0, D1, D2, or D3. Figure 2-5 shows an implementation using AND, OR, and NOT gates. S1
and S2 are connected to the AND gates in such a way that, for any combination of S1 and S2, three of
the AND gates will output 0.The fourth AND gate will output the value of the selected line, which is
either 0 or 1. Thus, three of the inputs to the OR gate are always 0, and the output of the OR gate will
equal the value of the selected input gate. Using this regular organization, it is easy to construct
multiplexers of size 8-to-1, 16-to-1, and so on.
Multiplexers are used in digital circuits to control signal and data routing. An example is the loading of
the program counter (PC).The value to be loaded into the program counter may come from one of
several different sources:
These various inputs could be connected to the input lines of a multiplexer, with the PC connected to the
output line. The select lines determine which value is loaded into the PC. Because the PC contains
multiple bits, multiple multiplexers are used, one per bit. Figure 2-6 illustrates this for 16-bit addresses.
2.2.3 Decoders
A decoder is a combinational circuit with a number of output lines, only one of which is selected at any
time, depending on the pattern of input lines. In general, adecoder has n inputs and 2 noutputs. Figure 2-7
shows a decoder with three inputs and eight outputs.
Decoders find many uses in digital computers. One example is address decoding. Suppose we wish to
construct a 1K-byte memory using four–bit RAM chips. We want a single unified address space, which
can be broken down as follows:
Address Chip
0000 – 00FF 0
0100 – 01FF 1
0200 – 02FF 2
0300 – 03FF 3
Each chip requires 8 address lines, and these are supplied by the lower-order 8 bits of the address. The
higher-order 2 bits of the 10-bit address are used to select one of the four RAM chips. For this purpose, a
2-to-4 decoder is used whose output enables one of the four chips, as shown in Figure 2-8.
With an additional input line, a decoder can be used as a demultiplexer. The demultiplexer performs the
inverse function of a multiplexer; it connects a single input to one of several outputs. This is shown in
Figure 2-9. As before, n inputs are decoded to produce a single one of outputs. All of the output lines are
ANDed with a data input line. Thus, the n inputs act as an address to select a particular output line, and
the value on the data input line (0 or 1) is routed to that output line.
The configuration in Figure 2-9 can be viewed in another way. Change the label on the new line from
Data Input to Enable. This allows for the control of the timing of the decoder. The decoded output
appears only when the encoded input ispresent and the enable line has a value of 1.
As an example of the use of flip-flops, let us first examine one of the essential elements of the CPU: the
register. As we know, a register is a digital circuit used within the CPU to store one or more bits of data.
Two basic types of registers are commonly used: parallel registers and shift registers.
Parallel Registers
A parallel register consists of a set of 1-bit memories that can be read or written simultaneously. It is
used to store data. The 8-bit register of Figure 2-10 illustrates the operation of a parallel register using D
flip-flops. A control signal, labeled load, controls writing into the register from signal lines, D11
through D18. These lines might be the output of multiplexers, so that data from a variety of sources can
be loaded into the register.
Figur
e 2-10: 8-bit parallel register
Shift Registers
A shift register accepts and/or transfers information serially. Consider, for example, Figure 2-11, which
shows a 5-bit shift register constructed from clocked D flip-flops. Data are input only to the leftmost
flip-flop. With each clock pulse, data are shifted to the right one position, and the rightmost bit is
transferred out.
Shift registers can be used to interface to serial I/O devices. In addition, they can be used within the
ALU to perform logical shift and rotate functions. In this latter capacity, they need to be equipped with
parallel read/write circuitry as well as serial.
Another useful category of sequential circuit is the counter. A counter is a register whose value is easily
incremented by 1 modulo the capacity of the register; that is, after the maximum value is achieved the
next increment sets the counter value to 0. Thus, a register made up of n flip-flops can count up to 2 n-1.
An example of a counter in the CPU is the program counter.
Counters can be designated as asynchronous or synchronous, depending on the way in which they
operate. Asynchronous counters are relatively slow because the output of one flip-flop triggers a change
in the status of the next flip-flop. In a synchronous counter, all of the flip-flops change state at the same
time. Because the latter type is much faster, it is the kind used in CPUs.
Ripple Counter
An asynchronous counter is also referred to as a ripple counter, because the change that occurs to
increment the counter starts at one end and “ripples” through to the other end. Figure 2-12 shows an
implementation of a 4-bit counter using J–K flip-flops, together with a timing diagram that illustrates its
behavior. The timing diagram is idealized in that it does not show the propagation delay that occurs as
the signals move down the series of flip-flops. The output of the leftmost flip-flop (Q 0) is the least
significant bit. The design could clearly be extended to an arbitrary number of bits by cascading more
flip-flops.
In the illustrated implementation, the counter is incremented with each clock pulse. The J and K inputs
to each flip-flop are held at a constant 1. This means that, when there is a clock pulse, the output at Q
will be inverted (1 to 0; 0 to 1). Note that the change in state is shown as occurring with the falling edge
of the clock pulse; this is known as an edge-triggered flip-flop. Using flip-flops that respond to the
transition in a clock pulse rather than the pulse itself provides better timing control in complex circuits.
If one looks at patterns of output for this counter, it can be seen that it cycles through 0000, 0001, . . .,
1110, 1111, 0000, and so on.
Synchronous Counters
The ripple counter has the disadvantage of the delay involved in changing value, which is proportional
to the length of the counter. To overcome this disadvantage, CPUs make use of synchronous counters, in
which all of the flip-flops of the counter change at the same time.
For a 3-bit counter, three flip-flops will be needed. Let us use J–K flip-flops. Label the uncomplemented
output of the three flip-flops A, B, C respectively, with C representing the least significant bit. The first
step is to construct a truth table that relates the J–K inputs and outputs, to allow us to design the overall
circuit. Such a truth table is shown in Table 2-3. The first three columns show the possible combinations
of outputs A, B, and C. They are listed in the order that they will appear as the counter is incremented.
Each row lists the current value of A, B, C and the inputs to the three flip-flops that will be required to
reach the next value of A, B, C.
To understand the way in which the truth table of Table 2-3 is constructed, it may be helpful to recast
the characteristic table for the J–K flip-flop. Recall that this table was presented as follows:
In this form, the table shows the effect that the J and K inputs have on the output. Now consider the
following organization of the same information:
In this form, the table provides the value of the next output when the inputs and the present output are
known. This is exactly the information needed to design the counter or, indeed, any sequential circuit. In
this form, the table is referred to as an excitation table.
Let us return to Table 2-3. Consider the first row. We want the value of A to remain 0, the value of B to
remain 0, and the value of C to go from 0 to 1 with the next application of a clock pulse. The excitation
table shows that to maintain an output of 0, we must have inputs of J = 0 and don’t care for K. To effect
a transition from 0 to 1, the inputs must be J = 1 and K = d. These values are shown in the first row of
the table. By similar reasoning, the remainder of the table can be filled in.
Having constructed the truth table of Table 2-3, we see that the table shows the required values of all of
the J and K inputs as functions of the current values of A, B, and C. With the aid of Karnaugh maps, we
can develop Boolean expressions for these six functions. This is shown in part b of the figure. For
example, the Karnaugh map for the variable Ja (the J input to the flip-flop that produces the A output)
yields the expression Ja = BC. When all six expressions are derived, it is a straightforward matter to
design the actual circuit, as shown in part c of Figure 2-13.
The two operations that a RAM can perform are the write and read operations. The write signal specifies
a transfer-in operation and the read signal specifies a transfer-out operation. On accepting one of these
control signals, the internal circuits inside the memory provide the desired function. These steps that
must be taken for the purpose of transferring a new word to be stored into memory are as follows:
1. Apply the binary address of the desired word into the address line.
2. Apply the data bits that must be stored in memory into the data input lines.
3. Activate the write input.
The memory unit will then take the bits presently available in the input data lines and store them in the
word specified by the address lines.
k address lines
Memory unit
read 2k words
write n bits per word
The steps that must be taken for the purpose of transferring a stored word out of memory are as follows:
1. Apply the binary address of the desired word into the address lines.
2. Activate the read input.
The memory unit will then take the bits from the word that has been selected by the address and apply
them into the output data lines. The content of the selected word does not change after reading.
Combinational circuits are often referred to as “memoryless” circuits, because their output depends only
on their current input and no history of prior inputs is retained. However, there is one sort of memory
that is implemented with combinational circuits, namely read-only memory (ROM).
Recall that a ROM is a memory unit that performs only the read operation. This implies that the binary
information stored in a ROM is permanent and was created during the fabrication process. Thus, a given
input to the ROM (address lines) always produces the same output (data lines). Because the outputs are a
function only of the present inputs, the ROM is in fact a combinational circuit. Figure 2-15 shows the
block diagram for ROM.
A ROM can be implemented with a decoder and a set of OR gates. As an example, consider Table 2-4.
This can be viewed as a truth table with four inputs and four outputs. For each of the 16 possible input
values, the corresponding set of values of the outputs is shown. It can also be viewed as defining the
contents of a 64-bit ROM consisting of 16 words of 4 bits each. The four inputs specify an address, and
the four outputs specify the contents of the location specified by the address. Figure 2-16 shows how this
memory could be implemented using a 4-to-16 decoder and four OR gates.
m×n ROM
(m = 2k)
Input Output
0 0 0 0 0 0 0 0 Table 2-4: Truth table for a ROM
0 0n data 0output lines
1 0 0 0 1
0 0 1 0 0 0 1 1
0 0 1 1 0 0 1 0
0 1 0 0 0 1 1 0
0 1 0 1 0 1 1 1
0 1 1 0 0 1 0 1
0 1 1 1 0 1 0 0
1 0 0 0 1 1 0 0
1 0 0 1 1 1 0 1
1 0 1 0 1 1 1 1
1 0 1 1 1 1 1 0
Department
1 of1 C.S 0 0 1 0 1 0 GTC 35
1 1 0 1 1 0 1 1
1 1 1 0 1 0 0 1
1 1 1 1 1 0 0 0
ITec381 Computer Organization and Architecture
Types of ROM
The required paths in a ROM may be programmed in three different ways. The first, mask
programming, is done by the semiconductor company during the last fabrication process of the unit. The
procedure for fabricating a ROM requires that the customer fill out the truth table that he wishes the
ROM to satisfy. The truth table may be submitted in a special form provided by the manufacturer or in a
specified format on a computer output medium. The manufacturer makes the corresponding mask for the
paths to produce the 1’s and 0’s according to the customer’s truth table. This procedure is costly because
the vendor changes the customer a special fee for custom masking the particular ROM. For this reason,
mask programming is economical only if a large quantity of the same ROM configuration is to be
ordered.
For small quantities it is more economical to use a second type of ROM called a programmable read-
only memory or PROM. When ordered, PROM units contain all the fuses intact, giving all 1’s in the bits
of the stored words. The fuses in the PROM are blown by application of current pulses through the
output terminals for each address. A blown fuse defines a binary 0 state, and an intact fuse gives a
binary 1 state. This allows users to program PROMs in their own laboratories to achieve the desired
relationship between input addresses and stored words. Special instruments called PROM programmers
are available commercially to facilitate this procedure. In any case, all procedures for programming
ROMs are hardware procedures even though the word “programming” is used.
The hardware procedure for programming ROMs or PROMs is irreversible, and once programmed, the
fixed pattern is permanent and cannot be altered. Once a bit pattern has been established, the unit must
be discarded if the bit pattern is to be changed. A third type of ROM available is called erasable PROM
or EPROM. The EPROM can be restructured to the initial value even though its fuses have been blown
previously. When the EPROM is placed under a special ultraviolet light for a given period of time, the
shortwave radiation discharges the internal gates that serve as fuses. After erasure the EPROM returns to
its initial state and can be reprogrammed to a new set of words. Certain PROMs can be erased with
electrical signals instead of ultraviolet light. These PROMs are called electrically erasable PROM or
EEPROM.
2. Draw the block diagram for the digital circuits listed in question 1.
3. Classify the following digital circuits into two, as combinational and sequential circuit.
Half Adder Binary Counter
Full Adder Register
Multiplexer RAM
Decoder ROM
4. If the input lines of the multiplexer in Figure 2-4 accept input as D0 = 1, D1 = 1, D2 = 0, and D3 = 1,
and the select lines are set as S1 = 1, and S2 = 1, which one of the input lines is selected?
5. If the 4 input lines (A, B, C, and D) of a 4-to-16 decoder are 1001, then which of the output lines (D0
– D15) is selected?
Chapter 3
Data Representation
Objectives:
After completing this chapter you will be able to:
Understand binary, octal, and hexadecimal number system
Perform conversions among decimal, binary. Octal, and hexadecimal numbers systems
Understand different integer representations
Understand floating point number representations
Perform computer arithmetic
Understand representation of characters
Data in computers is represented in binary form. The represented data can be number, text, movie, color
(picture), sound, or anything else. It is up to the application software that presents the data to portray the
data accordingly. We enter data into a computer using letters, digits & special symbols. But inside the
computer, there is no color, letter, digit or any other character inside the computer system unit.
Just like any other electrical device, computers understand and respond to only the flow of electrical
charge. They also have storage devices that work based on magnetism. This shows that the overall
structure of computers work only in binary conditions (the semi-conductors are conducting or not
conducting, a switch is closed or opened, a magnetic spot is magnetized or demagnetized). Hence, data
must be represented in the form of binary code that has a corresponding electrical signal.
The form of binary data representation system we are seeking is similar to the binary number system in
mathematics. Nevertheless we humans are not accustomed to the use of binary numbers. The main focus
of this chapter is on how data is represented in computers. Since understanding of binary number system
is essential to understand binary data representation, conversion of numbers from a given number base
to another base, is also discussed here. The number systems (bases) we will discuss are: decimal, binary,
octal, and hexadecimal.
I. Non-positional number system: The symbols of the number have the same value regardless of its
position in the number. The value of a symbol (digit) in a number does not depend on the position of
the digit in number.
II. Positional number system: The value of a symbol in the number is determined by its position, the
symbol and the base of the number system. In all positional number systems, the base has the
following properties
1. It determines the number of different symbols it has. For example, there are 10 different symbols
in base 10 (decimal) number system and there are 2 symbols in base 2 (binary) number system.
2. The maximum value of a single digit is one less than the base value. For example, the largest
single digit number in the decimal (base 10) number system is 9.
3. The positional value of each symbol is expressed by the power of the base. For example, the
value of the symbol 7 in the decimal number 75 is 70 (7×10 1) while the value of 7 in the decimal
number 756 is 700 (7×102). The source of the variation is the position of the digit in the number
and this value is expressed as a multiple of powers of the base.
In positional number system, the base of a number is indicated by writing it as a subscript at the right
side of the number. For instance the number 25110 is the decimal number two hundred fifty one, and
2518 is two-five-one octal (note that it is not read as two hundred fifty one octal). Often the base of a
decimal number is not indicated. If the base of a number is not shown, it is assumed to be a decimal
number.
The decimal number system, also called the base 10 number system, is the number system we use in our
day-to-day life. The preference of this number system by humans is attributed to their nature that
humans have 10 fingers. It is believed that humans start counting using their fingers. This fact is the
basis for the preference of the decimal number system by humans.
The decimal number system has 10 different symbols identified as 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. All
decimal numbers are written as a combination of these 10 digits.
Although the number system is easily understood by humans, it cannot be used to represent data in
computers because there are only two (binary) states in a computer system. On the other hand, the
binary number system, also known as base 2 number system, has two digits 0 and 1. This makes it useful
for data representation in computers. The two digits of the binary number system correspond to the two
distinct states of the digital electronics. A binary digit is referred to as a bit.
We can associate the two digits of the binary number system with two states of electrical systems,
magnetic systems, and switches. The following table shows the conventional association. It is also
possible to exchange the association but it becomes confusing since it is the opposite of the convention
(what is agreed on).
0 1
Electronic No current There is current
Magnetic Demagnetized Magnetized
Switch Off On
Data representation using the binary number system results in a large string of 0s and 1s. This makes the
represented data large and difficult to read. Writing such a binary string becomes tedious as well. For the
sake of writing the binary strings in a short hand form and make them readable, the octal and the
hexadecimal number systems are used.
Octal number system, also called base 8 number system, has 8 different symbols: 0, 1, 2, 3, 4, 5, 6, and
7. The octal number system is used to write binary numbers in short form. An octal number has about
one-third of the digits in its binary equivalent.
The hexadecimal number system, also called base 16 number system, has 16 different symbols 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. The hexadecimal number system is usually referred as hex for short.
It is used to write binary numbers in short form. A hex number has about one-fourth of the digits in its
binary equivalent. Memory addresses and MAC addresses are usually written in hex.
Step 1: Divide the given decimal number by m (the desired base). The result will have a quotient and a
remainder.
Step 2: Divide the quotient by m. Still you get a quotient and a remainder.
Step 3: Repeat step 2 until the quotient becomes 0. You should note that we are conducting integer
division. In integer division n/m, the quotient is 0 whenever n < m.
Step 4: Collect and arrange the remainders in such a way that the first remainder is the least significant
digit and the last remainder is the most significant digit (i.e., RnRn-1 … R2R1).
Example: Convert the following decimal number 47 into binary, octal, and hexadecimal.
a. Conversion to binary
In order to convert the given decimal numbers into binary (base 2), they are divided by 2.
Quotient Remainder
47 ÷ 2 23 1
23 ÷ 2 11 1
11 ÷ 2 5 1
5÷2 2 1
2÷2 1 0
1÷2 0 1
Since the quotient becomes 0 at the last division, the division has to stop and we should collect the
remainders starting from the last one. Hence the result is 1011112. Note that, starting from the second
division, at each consecutive division the quotient of the previous division is used as the dividend.
b. Conversion to octal
Here the numbers are divided by 8 because the required base is octal (base 8).
Quotient Remainder
47 ÷ 8 5 7
5÷8 0 5
Therefore, 47 = 578
c. Conversion to hexadecimal
Since the conversion now is into hexadecimal (base 16) the given decimal numbers are divided by 16.
Quotient Remainder
47 ÷ 16 2 15
2 ÷ 16 0 2
Remember that the remainders are all in decimal and when you write the result in the required base, the
remainders has to be converted into that base. The hexadecimal equivalent for the decimal 15 is F and
that of 2 is 2. For conversion of decimal numbers into binary and octal or vice versa, there is no problem
of looking for the equivalent of each remainder. You need such conversion of a remainder only when the
remainder is a double digit number. Therefore, 47 = 2F16
Therefore, 1100012 = 49
It is evident that calculating the product of 0s since the product is 0 and do not contribute anything to the
final result. However, you should remember to skip the positional value as well.
Therefore, 228 = 18
D116 = (13 × 161) + (1 × 160); you should be aware that the calculations are in decimal thus the hex digit
D must first be converted into its decimal equivalent (13).
= (13 × 16) + (1 × 1)
= 208 + 1
= 209
It is possible to use decimal number system as an intermediate base to convert from any base to any
other base. However, for conversion from binary to octal or vice versa, there is a very simple method.
Step 1: Group the binary digits (bits) starting from the rightmost digit. Each group should contain 3 bits.
If the remaining bits at the leftmost position are fewer than 3, add 0s at the front.
Step 2: For each 3-bit binary string, find the corresponding octal number. Table 3-1 shows the
conversion equivalents.
A. 110011
110011
6 3
The bits are grouped in three with the equivalent octal digit given below the three bit group. Thus,
1100112 = 638
B. 1101111
001101111
1 5 7
Since we are left with a single bit at the leftmost position, two 0s are added at the front to make create a
three-bit group. The result shows that 11011112 = 1578.
Step 1: For each octal digit, find the equivalent three digit binary number.
Step 2: If there are leading 0s for the binary equivalent of the leftmost octal digit, remove them.
Example: Find the binary equivalent for the octal numbers 73 and 160.
A. 73
7 3
111 011
Since there are no leading 0s at the leftmost position (the bits for octal 7), there is no 0 to remove.
Therefore, 738 = 1110112
B. 160
1 6 0
001 110 000
The binary equivalent for the leftmost octal digit 1 has two 0s. To get the final result, remove them and
concatenate the rest. Thus, 1608 = 11100002
One possible way to convert a binary number to hexadecimal, is first to convert the binary number to
decimal and then from decimal to hex. Nevertheless, the simple way to convert binary numbers to hex is
by grouping as used in conversion to octal. Here a single group has 4 bits.
Step 1: Starting from the rightmost bit, group the bits in 4. If the remaining bits at the leftmost position
are fewer than 4, add 0s at the front.
Step 2: For each 4-bit group, find the corresponding hexadecimal number. You can use Table 3-1 to find
conversion equivalents.
A. 1110110001
001110110001
2 B 1
B. 10011110
10011110
9 E
Step 1: For each hexadecimal digit, find the equivalent four digit binary number.
Step 2: If there are leading 0s for the binary equivalent of the leftmost hexadecimal digit, remove them.
Example: Find the binary equivalents for the hexadecimal numbers 1C and 823.
A. 1C
1 C
0001 1100
After removing the leading 0s for the binary equivalent of the leftmost hexadecimal number 1, the result
becomes 11100. Thus, 1C16 = 111002
B. 823
8 2 3
1000 0010 0011
There is no leading 0s for the binary equivalent of the hexadecimal number 8, we simply concatenate the
binary digits to get the final result. Hence, 82316 = 1000001000112
The decimal number system can be used as an intermediate conversion base. As it is shown in the above
sections, however, it the binary number system is quite convenient for conversion to or from octal and
hexadecimal. To convert an octal number to a hexadecimal number or from hexadecimal to octal, the
binary number system is used as an intermediate base.
000110100111
1 A 7
Example 2: Find the octal equivalent for the hexadecimal number 3D5
Convert 3D516 to binary
3 D 5
0011 1101 0101
001111010101
1 7 2 5
Computer storage has a limited capacity to hold data. The number of bits available for data
representation determines the range of integers we can represent. With 4 bits, it is possible to represent a
total of 16 integers. If the number of available bits increases to 5, we can represent 32 integers. In fact
with every bit added, the number of possible integers we can represent is doubled. In general, the
number of integers we can represent with n bits is 2 n. Singed integers include positive integers, negative
integers, as well as zero. The 2n places are partitioned among the negative, positive, and zero. For
example, with 8 bits, it is possible to represent 256 different integers. Typically, 128 of them are positive
integers and zero while the rest 128 of them are negative integers.
Signed integer representations discussed in this section are sign magnitude, 1’s complement, 2’s
complement, and excess-N. We assume the number of bits available for representation is 8 unless
explicitly specified.
In mathematics, positive integers are indicated by a preceding + sign (although usually it is avoided) an
a preceding – sign identify the integer as a negative number. In computers, there is no place for a + or a
– sign; there are only 0s and 1s. A similar way of representing + and – signs is to treat the most
significant bit as a sign bit the remaining bits are used to represent the magnitude of the integer. By
convention a 0 on the sign bit indicate the integer is positive and a 1 indicate the integer is a negative.
A problem of this representation method is that there are two representations for 0; a positive 0 and a
negative 0. An integer with all the magnitude bits and the sign bit set to 0 is a positive 0 while an integer
with all the magnitude bits set to 0 and a 1 on its sign bit is a negative 0. This type of ambiguous
representation is undesirable and as a solution, the negative zero can be ignored. The biggest problem of
this representation method is, however, its inconvenience in binary arithmetic. Early computers, for
instance IBM 7090, used this representation.
As example, the sign-magnitude representation of 79 and -79 in 8 bits are 01001111 and 11001111
respectively. The only difference is on the sign (first) bit; for 79 it is 0 while for -79 it is 1. Figure 3-1,
shows the representation scheme with 8 bits. In sign magnitude, the lower half range of values represent
the positive integers starting at positive 0 while the higher half values represent negative integers
starting at negative 0.
0
1
2
20
126
127
128
129
130
236
254
255
01111110
01111111
10000000
10000001
10000010
11101100
11111110
11111111
00010100
-108
-126
-127
0
1
2
20
126
127
Figure 3-1: Sign-magnitude representation number line; (a) the unsigned decimal value of the binary
number, (b) the binary numbers, (c) the actual value the binary numbers represent in sign-
magnitude representation
Every number system has two complement systems. For a given base n the complements are n’s
complement and (n-1)’s complement. Thus, in decimal numbers system (base 10), the complement
systems are 10’s complement and 9’s complement. Similarly in binary number system, the complements
are 2’s complement and 1’s complement. Using complements in binary number systems makes
subtraction and logical negation very simple. Calculating the complement of a binary integer is trivial.
Furthermore, using complements makes arithmetic operations simple.
The one’s complement of a binary integer is found by inverting all 0s to 1s and all 1s to 0s. This makes
complementing operation simple at hardware level. In one’s complement integer representation, the
negative of an integer is represented by its complement. For example, the one’s complement
representation of 16 and -16 in 8 bits are 00010000 and 11101111 respectively.
As with sign-magnitude representation, there are two representations for 0. A one’s complement
representation with all 0s is a positive 0 while one with all 1s is a negative 0. Still representations for
positive integer s begin with 0 while that of negative integers start with 1. The arrangement of numbers
on the number line is in such a way that it starts with positive 0, puts positive numbers starting with 1,
arranges negative numbers beginning with the smallest one (with the highest magnitude) and ends with
negative 0. Figure 3-2 shows the arrangement of integers with one’s complement representation on the
number line.
0
1
2
20
126
127
128
129
130
236
254
255
... ... . . . (a)
. . .
00000000
00000001
00000010
01111110
01111111
10000000
10000001
10000010
11101100
11111110
11111111
00010100
-19
-1
-0
0
1
2
20
126
127
Figure 3-2: One’s complement number line; (a) the unsigned decimal value of the binary number, (b)
the binary numbers, (c) the actual value the binary numbers represent in one’s complement
representation
The two’s complement of an integer is found by adding 1 to its one’s complement. As a reminder, in
binary arithmetic, 0+1 = 1 and 1+1 = 0 with a carry of 1 to the next higher significant bit. A shortcut
method to find the two’s complement of a number is to keep all the bits up to and including the first 1
from the right and invert all the others. Two’s complement representation of 19 and -19 in 8 bits are
00010011 and 11101101 respectively.
There is only one representation for 0 where all the bits are set to 0. The consequence is that the number
of negative integers is one more than their positive counterparts. The arrangement of two’s complement
integers on the number line is similar to that of one’s complement except that the last value is -1 rather
than -0. Figure 3-3 shows this.
0
1
2
20
126
127
128
129
130
236
254
255
... ... . . . (a)
. . .
00000000
00000001
00000010
01111110
01111111
10000000
10000001
10000010
11101100
11111110
11111111
00010100
-20
-2
-1
0
1
2
20
126
127
Figure 3-3: Two’s complement number line; (a) the unsigned decimal value of the binary number, (b)
the binary numbers, (c) the actual value the binary numbers represent in two’s complement
representation
Another name for excess-N representation is biased representation. Excess-N, uses a pre-specified
number N as a biasing value, i.e., it is used to shift the position of the representation of 0 so that values
less than the bias are considered as representation for negative integers. An integer x is represented by
the unsigned number which is x + N. Thus 0 is represented by N, and −N is represented by a bit pattern
of all 0s. The value of N is usually chosen as 2(k-1)-1, where k is the number of bits used for
representation.
As an example, let us represent 34 and -34 in excess-127 in 8 bits (note that the value 127 is chosen
because 2(8-1)-1=127.) 34 is represented by the binary equivalent of 161 which is 10100001, because
34+127 is 161. With similar argument, -34 is represented by 01011101 which is the binary equivalent of
93 and 93 = -34+127. In general, in exess-127 representation, all values less than 127 represent negative
values, 127 represents 0, and all values greater than 127 represent positive integers. On the number line
of excess-N, the lowest value represent –N and the largest value represents N. Figure 3-4 shows the
number line for the excess-127 representation.
Excess-N integer representation is mainly used in representation of floating point numbers. The IEEE
floating-point standard defines the exponent field of as an 8-bit Excess-127 field and as an 11-bit
Excess-1023 field. Floating point number representation is discussed next.
20
126
127
128
129
130
236
254
255
... ... . . . (a)
. . .
00000000
00000001
00000010
01111110
01111111
10000000
10000001
10000010
11101100
11111110
11111111
00010100
. . . ... ... . . . (b)
-125
-107
-1
0
1
2
3
109
127
128
Figure 3-4: Excess-127 number line; (a) the unsigned decimal value of the binary number, (b) the
binary numbers, (c) the actual value the binary numbers represent in excess-127
representation
In mathematics numbers are written in two common ways: either placing the radix point at a fixed
location or by locating the point at any location in the number and providing an extra data which
indicates the actual location of the radix point. With integers the radix point is implicitly known to be at
the right end of the number. This is an example of representation with the radix point located at a fixed
location in the number. In science very large and very small numbers are quite common. Writing them
with the radix point at its actual position is inconvenient. Therefore the common and convenient way of
writing such numbers is the scientific notation. In scientific notation, a number is written with the radix
point put after the first nonzero digit and the actual position of the radix point is indicated by the
exponent of the number. The scientific notation allows the radix point to “float” anywhere in the
number. This makes the representation of very small and very large numbers effective and efficient and
makes representation of numbers over a large range of magnitude. Manipulation of fixed point
representation is less costly than floating point numbers but the range of numbers that can be represented
by fixed point representation is quite narrow.
In computers, representation of numbers similar to the scientific notation is available and is known as
floating point number. Unlike the scientific notation, there is no radix point character in floating point
numbers. For the representation of integers, the only required information is the magnitude and the sign
of the number. To represent floating point numbers or in scientific notation we need the following
information:
Of these required pieces of information, only the first four are necessary to represent floating point
numbers in computers. The base of the number is not necessary for representation because it is always 2
as the number system of computers is binary.
Among the variety of floating number representations in computers, the IEEE 754 standard is the most
widely used. It is commonly referred as the IEEE floating point. Two formats of IEEE floating point are:
Single precision: 32 bits wide representation with 23 bits significand, one sign bit, and 8 bits of
exponent. The accuracy of binary numbers with 23 bits significant is equivalent to accuracy of 7
digits of decimal number.
Double precision: 64 bits wide representation of which 52 bits are for the signicand, one bit for
the sign of the number and 11 bits for the exponent. Double precision numbers are accurate to 16
digits of decimal number.
Modern high level languages have numerical data types for both single and double precision IEEE
floating point format. For example the float data type in C and C++ is for single precision representation
and the data type double is for the double precision representation. Figure 3-5 shows the layout of single
and double precision formats of the IEEE floating point representation. The two formats are exactly the
same except for the number of bits they have. Notice the layout of bits for the sign, exponent and
significand.
Figure 3-5: Single and double precision IEEE floating point formats
1 bit 8 bits 23 bits
It is already mentioned that to represent a floating point number, we need to have representation for the
magnitude of the number (significand), the sign of the number, the magnitude of the exponent, and the
sign of the exponent. Of these four items, only three have place in the IEEE floating point. The
magnitude of the exponent and its sign has to be combined together and placed within the exponent bits.
The magnitude of the number has its own place within the representation. The set of bits for the
magnitude is commonly referred as the significand. The word mantissa is also used as a synonym to the
word significand. The sign of the number (the sign of the significand has a separate bit. A 0 on the sign
bit shows the number is positive while a 1 is used for negative numbers.
In line with the above discussion, sign-magnitude representation is used for the significand while for the
exponent a representation system without a separate sign bit is use. One’s complement, two’s
complement, and excess-N represent signed numbers without a separate sign bit. Of these three
alternatives, excess-N is used in representation of IEEE floating point. The single precision format is
uses exess-127 and the double precision format uses excess-1023 representation for the exponent.
The IEEE 754 standard requires the significand to be in normal form. A number is in normal form when
the first digit of the number is non-zero (to be exact, the first digit should be 1, since the only non-zero
digit in the binary number system is 1.) This implies that we need a special representation for 0. As the
first digit of the significand is always 1, we need not represent it explicitly. This gives us an extra 1 bit
data for the significand. Thus with the 23 bits for the significand in single precision, the actual number
of bits of the significand’s magnitude is 24 bits and that of the double precision is 53 bits. In other
words, there is an implied 1 at the beginning of every floating point number.
The range of numbers that can be represented with single precision is from 10 -38 to 1038 as the range of
the exponent of single precision is [-127, 127]. 2-127 is roughly 10-38 since -127 × log102 ≈ -38 and 2127 is
approximately 238. With similar argument, the range of values of double precision is roughly from 10 -308
to 2308. However, this range does not include all the possible values of numbers because there are some
numbers that cannot be represented exactly. A good example is the real number 2/3.
For systems with much I/O and little computation of numbers, it is efficient to use a mechanism in
which each digit of a number is represented in their binary equivalent. The BCD (Binary Coded
Decimal), also called packed decimal, representation is based on this idea. In order to have
representations for the ten digits of the decimal number system, we need a four bit string. Thus 0 = 0000,
1 = 0001, 2 = 0010, …, and 9 = 1001. In BCD, multiples of 8 bits, in which the bits are grouped in 4, are
used to represent decimal numbers. Thus the decimal number 461 is represented as 0000 0100 0110
0001.
Although BCD avoids complex conversions of binary numbers to decimal and vice versa, it is inefficient
in memory space usage. As the total number of unique combinations with 4 bit strings is 16, 10 of them
are used to represent the numbers. Two more bit strings can be used to represent positive and negative
signs (although the bit string for 0 and 9 are used as positive and negative signs respectively). The
remaining bit strings are wasted. Performing arithmetic in BCD is similar to that of other representations
but the circuitry needs is also much complex. Sign-magnitude as well as complement systems can be
used in BCD. The complement system is either 9’s complement or 10’s complement. Since
complements have the advantage of simplicity in arithmetic, they are preferred.
3.4.2 Characters
Text documents contain strings of characters. Characters refer to letters of the alphabet, the ten digits (0
through 9), punctuation marks, characters that are used to format the layout of text on pages such as the
newline, space, and tab characters, and other characters that are useful for communication. The most
widely used character code is the International Reference Alphabet (IRA). The American version of IRA
is called American Standard Code for Information Interchange (ASCII). Even though IRA used 8 bits,
each character is represented by 7 bits; hence a total of 128 characters are represented. The eighth bit is
used for parity (error detection). The ASCII character set is given in the Appendix.
Another character encoding system is the EBCDIC (Extended Binary Coded Decimal for Interchange
Code) is used on IBM mainframe. It uses 8 bits per character (and a ninth parity bit), thus represents 256
characters. As with IRA, EBCDIC is compatible with BCD. In the case of EBCDIC, the codes
11110000 through 11111001 represent the digits 0 through 9.
ASCII is a standard for use in the United States. Many countries adapted their own versions of ASCII.
There are also 8-bit versions of ASCII which allow having additional 128 rooms for more characters,
especially of those languages based on the Latin character. To allow encoding of characters of all the
languages in the world, a character set known as the Unicode is devised. The Unicode character has
variants known as UTF-8, UTF-16, and UTF-32. UTF-8 is the same as ASCII. UTF-16 and UTF-32 use
16-bit and 32-bit per character respectively, thus, have incorporated much more characters. For
backward compatibility with ASCII, the first 128 characters of UTF-16 and UTF-32 are similar to those
of ASCII.
2. Convert each of the following binary numbers to decimal, octal, and hexadecimal.
a) 111000 b) 1010101 c) 1110111
3. Convert each of the following octal numbers to binary, decimal, and hexadecimal.
a) 567 b) 23 c) 400
4. Convert each of the following hexadecimal numbers to binary, octal, and decimal.
a) BAC b) 24 c) 7 d) D0
5. Convert the following decimal numbers into binary and write the sign-magnitude, one's complement,
and two's complement representations. Use 16 bits for the representation.
a) 78 b) -19 c) -134 d) 62
6. Write the single precision and double precision IEEE floating point representations for the following
binary real numbers.
a) 100001000111.11110001
b) 0.00000001111
c) - 0.0010010001100000001
d) - 10000100000001111.0011
7. Write the BCD representation of the following decimal numbers.
a) 346 b) 2879
8. Write the ASCII code for the following words. Notice the capitalization of differences in the words.
▪ MARKOS
▪ Markos
▪ markos
Chapter 4
Basic Computer Organization and Design
Objective:
After completing this chapter, you will be able to understand:
Instruction codes
Computer registers and their application
Computer instructions
Timing and control
The instruction cycle
Memory-reference instructions
Input and output interrupt
Design of basic computer
Depending on the complexityand the functions that the computer performs, a computer may be
classified as:
Instruction codes together with the data are stored in memory. The computer reads each instruction from
memory and places it in a control register. The control then interprets the binary code of the instruction
and proceeds to execute it by issuing a sequence of micro-operations.
An Instruction Code is a group of bits that instructs the computer to perform a specific operation. It is
usually divided into parts, each having its own particular interpretation. The most basic part of an
instruction code is its operation part.
The operation code of an instruction is a group of bits that defines such operations as Add, Subtract,
Multiply, Shift and Complement.
The operation part of an instruction code specifies the operation to be performed. This operation must be
performed on some data stored in processor registers or inmemory.An instruction must specify not
only the operation but also the registers or the memory words where the operands are to be found.
Note: The number of bits required for the operation code of an instruction depends on the total number
of operations available in the computer.
Note: Computers that have a single processor register usually assign to the term Accumulator (AC).
Example: For a memory unit with 4096 words we need 12 bits to specify an address (Since 2 12 = 4096).
If we store each instruction in one 16 bit memory word, we have available 4-bits for the operation code
OP-Code. To specify one out of 16 possible operations and 12 bits specify the address of an operand.
The control reads a 16-bit instruction from the program portion of memory. It uses the 12-bit address
part of the instruction to read 16-bit operand from the data portion of memory. It then executes the
operation specified by the operation code.
It is sometimes convenient to use the address bits of an instruction code not as an address but as an
actual operand. When the second part of an instruction code specifies an operand, the instruction is said
to have an immediate operand. When the second part specifies the address of an operand, the instruction
is said to have direct address. This is in contrast to a third possibility called indirect address, where
the bits in the second part of the instruction designate an address of memory word in which the address
of the operand is found.
Figure 4-2 illustrates the direct and indirect addressing modes. Note that one bit of the instruction code
is used to distinguish between a direct and an indirect address.
15 14 12 11 0
1 Opcode Address
Memory Memory
22 0 ADD 457 35 1 ADD 300
300 1350
Operand
457 1350 Operand
+ +
AC AC
instruction code after it is read from the memory. The computer needs processor registers for
manipulating data and a register for holding a memory address. These requirements dictate the register
configuration shown in Figure 4-3. The registers are also listed in Table 4-1.
The memory word has a capacity of 4096 words and each word contains 16 bits. 12 bits of an instruction
word are needed to specify the address of the operand. This leaves three bits for the operation part of the
instruction and a bit to specify a direct or indirect address. The data register (DR) holds the operand read
from memory. The accumulator (AC) register is a general purpose processing register. The instructions
read from memory are placed in instruction register (IR). The temporary register (TR) is used for
holding temporary data during the processing.
The memory address register (AR) has 12 bits since this is the width of a memory address. The program
counter (PC) also has 12 bits and it holds the address of the next instruction to read from memory after
the current instruction is executed. The PC goes through a counting sequence and causes the computer to
read sequential instructions previously stored in memory. Instruction words are read and executed in
sequence unless a branch instruction is encountered. A branch instruction calls for a transfer to a
nonconsecutive instruction in the program. The address part of a branch instruction is transferred to PC
to become the address of the next instruction. To read an instruction, the content of PC is taken as the
address for memory and memory read cycle is initiated. PC is then incremented by one, so it holds the
address of the next instruction in sequence.
11 0
AR Memory
4096 words
15 0
16 bits per word
IR
15 0 15 0
TR DR
7 0 7 0 15 0
OUTR INPR AC
The basic computer has eight registers, a memory unit, and a control unit. Paths must be provided to
transfer information from one register to another and between memory and registers. Path between each
and every register would cause too many wires running around the registers. A better solution is the use
a common bus. A bus is a set of common lines, one for each bit
Control signals determine which register is connected to the bus at a given time. The connection of the
registers and memory of the basic computer to a common bus system is illustrated in Figure 4-4.
In Figure 4-4, the output of seven registers and memory are connected to a common bus. The specific
output that is selected for the bus lines at any given time is determined from the binary value of the
selection variables S2, S1, and S0. The number along each output shows the decimal equivalent of the
required binary selection. For example, the number along the output DR is 3. The 16-bit output of DR
are placed on the bus line when S2S1S0=011 since this is the binary value of decimal number 3. The lines
from the common bus are connected to the inputs of each register and the data inputs of the memory.
The particular register whose LD (load) in put is enabled receives the data from the bus during the next
clock pulse transition. The memory receives the content of the bus when its write input is activated. The
memory places its 16-bit output onto the bus when the read input is activated and S2S1S0=111.
S2
s
2
S1
s
1 Bus
s
S0
0
Memory unit
4096×16
7
Address
Write Read
AR
1
LD INR CLR
PC 2
LD INR CLR
DR
3
LD INR CLR
Adder E
and AC 4
logic
LD INR CLR
INPR
IR 5
LD
TR
6
LD INR CLR
OUTR
Clock
LD
16-bit common bus
15 12 11 0
1 1 1 1 Register operation (Opcode = 111, I = 0)
15 12 11 0
1 1 1 1 I/O operation (Opcode = 111, I = 1)
The instructions for the basic computer are listed in Table 4-2. The symbol designation is a three-letter
word and represents an abbreviation intended for programmers.
Before investigating the operations performed by the instructions let us discuss the types of instructions
that must be included in a computer. A computer should have a set of instructions so that the user can
construct machine language program to evaluate any function that is known to be computable. The set of
instructions are said to be complete if the computer includes a sufficient number of instructions in each
of the following categories:
3. Instruction for moving information to and from memory and processor registers. Program control
instructions together with instructions that check status conditions.
4. Input and Output Instructions
Hexadecimal Code
Symbol I =0 I=1 Description
AND 0xxx 8xxx AND memory word to AC
ADD 1xxx 9xxx Add memory word to AC
LDA 2xxx Axxx Load memory word to AC
STA 3xxx Bxxx Store content of AC in memory
BUN 4xxx Cxxx Branch unconditionally
BSA 5xxx Dxxx Branch and save return address
ISZ 6xxx Exxx Increment and skip if zero
CLA 7800 Clear AC
CLE 7400 Clear E
CMA 7200 Complement AC
CME 7100 Complement E
CIR 7080 Circulate right AC and E
CIL 7040 Circulate left AC and E
INC 7020 Increment AC
SPA 7010 Skip next instruction if AC positive
SNA 7008 Skip next instruction if AC negative
SZA 7004 Skip next instruction if AC zero
SZE 7002 Skip next instruction if E is 0
HLT 7001 Halt computer
INP F800 Input character to AC
OUT F400 Output character from AC
SKI F200 Skip on input flag
SKO F100 Skip on output flag
ION F080 Interrupt on
IOF F040 Interrupt off
Arithmetic, logical and shift instructions provide computational capabilities for processing the type of
data that the user may wish to employ. The bulk of the binary information in a digital computer is stored
in memory, but all computations are done in processor registers. Therefore, the user must have the
capability of moving information between these two units
Remark: Although the set of instruction for basic computer is complete, it is not sufficient because
frequently used operations are not performed rapidly. An efficient set of instructions must include such
instructions as subtraction, OR, and XOR.
CLR
SC INC
……
Each instruction will require a specified number of time steps to complete a sequence of micro-
operations. Each step of the sequence is marked by a count value in SC. The SC outputs a string of bits
whose value is in the range from 0 to 2L-1. Eg.for L=3, from 0 to 7. We need a way of converting the bit
string value to single bit valued outputs labeled T0, T1, T2, T3, and so on, up to Tx (where x = 2 L-1). A
decoder serves our purpose, recalling that the output from the DEC is a 1 only on one line (the rest are
0`s)
Tx T2 T1 T0 CLR
…..
SC INC
L-to-2L DEC
……
…
The timing and control unit is the component that determines what the ALU should do at a given instant.
There are two kinds of control organization:
1. Hardwired control
2. Microprogrammed control
Hardwired control: The control logic is implemented with digital circuits(decoders, flip flops, etc.) It
has the advantage that it can be optimized to produce a fast mode of operationbut requires changes in the
wiring among the various components if the design has to be modified or changed.
X and Y are control variables. The load, clear and increment lines of the registers are a function of these
variables.
LD (R1) = XY (can also be further reduced),
a. The control inputs of the registers LD, INR, CLR must be known
b. The bus system must have control inputs to determine which component to select
c. The current instruction must be decoded to determine what is to be done.
Upon the completion of step 4, the control goes back to step 1 to fetch, decode and execute the next
instruction. This process continues indefinitely unless a HALT instruction is encountered.
Initially, the program counter PC is loaded with the address of the first instruction in the program and
SC is cleared to 0, providing a decoded timing signal T0. After each clock pulse, SC is incremented by
one, so that the timing signals go through a sequence T0, T1, T2 and so on. Since the address lines of the
memory unit are hardwired to AR, address of the instruction to be fetched must first be placed in AR.
Thus the first register transfer in an instruction cycle should be transferring the content of PC (address of
the instruction to be brought in for execution) to AR
T0: AR ← PC
Bus selects PC, LD(AR) = 1. Next the current instruction is brought from memory into IR and PC is
incremented
T1: IR ← M[AR], PC ← PC + 1. Bus selects RAM, Memory read is set, LD(IR)=1,INR(PC)=1
Therefore, micro-operations for the fetch and decodephases can be specified by the following register
transferstatements.
T0: AR ← PC
T1: IR ← M[AR], PC ← PC + 1
T2: D0, …, D7 ← Decode IR(12-14), AR ← IR(0-11), I ← IR(15)
Decoder that belongs to each instruction is included in the table. The effective address of the instruction
is in the address register AR and was placed there during timing signal T 2 when I = 0, or during timing
signal T3 when I=1. The execution of the memory-reference instruction starts with timing signal T 4. The
symbolic description of each is specified in the table in terms of register transfer notation. The actual
execution of the instruction in the bus system will require s sequence of micro-operations. This is
because data stored in memory cannot be processed directly. The data must be read from memory to a
register where they can be operated on with logic circuits.
LDA D2 AC M[AR]
STA D3 M[AR] AC
BUN D4 PC AR
AND to AC: This is an instruction that performs the AND logic operation on pairs of bits in AC and the
memory word specified by the effective address. The result of the operation is transferred to AC. The
micro-operations that execute this instruction are:
D0T4: DR M[AR]
The control function for this instruction uses the operation decoder D 0 since this output of the decoder is
active when the instruction has an AND operation whose binary code value is 000.
ADD to AC: This instruction adds the content of the memory word specified by effective address to the
value of AC. The sum is transferred into AC and the output carry C out is transferred into E (extended
accumulator) flip-flop.
D1T4: DR M[AR]
LDA: Load to AC: This instruction transfers the memory word specified by effective address to AC.
D2T4: DR M[AR]
D2T5: AC DR, SC 0
STA: Store AC: this instruction stores the content of AC into the memory word specified by the
effective address. Since the output of AC is applied to the bus and the data input of memory is connected
to the bus, we can execute this instruction with one microoperation.
BUN: Branch Unconditionally: This instruction transfers the program to the instruction specified by
the effective address. Remember that PC holds the address of the instruction to be read from memory in
the next instruction cycle. PC is incremented at T1 to prepare it for the address of the next instruction in
the program sequence. The BUN instruction allows the programmer to specify an instruction out of
sequence and we say that the program branches (jumps) unconditionally. The instruction is executed
with one microoperation:
D4T4: PC AR, SC 0
The effective address from AR is transferred through the common bus to PC. Resetting SC to 0 transfers
control to T0. The next instruction is then fetched and executed from the memory address given by the
new value in PC.
BSA: Branch and save Return Address: This instruction is useful for branching to a portion of the
program called subroutine or procedure. When executed, the BSA instruction stores the address of the
next instruction in sequence (which is available in PC) into a memory location specified by the
effective-address. The effective address plus one is then transferred to PC to serve as the address of the
first instruction in the subroutine.
Timing signal T4 initiates a memory write operation, places the content of PC onto the bus, and enables
the INR input of AR. The memory write operation is completed and AR is incremented by the time the
next clock transition occurs. The bus is used at T5 to transfer the content of AR to PC.
ISZ: Increment and Skip if Zero: This instruction increments the word specified by the effective
address, and if the incremented value is equal to 0, PC is incremented by 1. Since it is not possible to
increment a word inside the memory, it is necessary to read the word into DR, increment DR, and store
the word back into memory. This is done with the following sequence of micro-operations:
D6T4: DR M[AR]
D6T5: DR DR + 1
D6T6: M[AR] DR, if (DR = 0) then (PC PC + 1), SC 0
Input-Output Configuration
The terminal sends and receives serial information. Each quantity of information has eight bits of an
alphanumeric code. The serial information from the keyboard is shifted into the input register INPR. The
serial information for the printer is stored in the output register OUTR. These two register communicate
with a communication interface serially and with the AC in parallel. The input-output configuration is
shown in Figure 4-7.
Serial Computer
Input/output communication registers and flip-
terminal interface flops
FGO
Receiver OUTR
Printer
Interface
AC
Transmitter
Keyboard INPR
interface
FGI
The input register INPR consists of eight bits and holds alphanumeric input information. The 1-bit input
flag FGI is a control flip-flop. The flag bit is set to 1 when the new information is available in the input
device and is cleared to 0 when the information is accepted by the computer. The flag is needed to
synchronize the timing rate difference between the input devices and the computer. The process of
information transfer is as follows. Initially, the input flag (FGI) is cleared to 0. When a key is struck in
the keyboard, an 8-bit alphanumeric code is shifted into INPR and FGI is set to 1. As long as the flag is
set, the information in INPR cannot be changed by striking another key. The computer checks the flag
bit; if it is 1, the information from INPR is transferred in parallel into AC and FGI is cleared to 0. Once
the flag is cleared, new information can be shifted into INPR by striking another key. The output register
OUTR works similarly but the direction of information flow is reversed.
Input-output Instructions
Input and output instructions are needed for transferring information to and from AC register, for
checking the flag bits, and for controlling the interrupt facility. Input-output instructions have an
operation code 1111 and are recognized by the control when D 7 = 1and I = 1. The remaining bits of the
instruction specify the particular operation. The control functions and micro-operations for the input-
output instructions are listed in Table 4-4. These signals are executed with the clock transition associated
with timing signal T3. Each control function needs a Boolean relation D7IT3, which we designate for
convenience by the symbol p. The control function is distinguished by one of the bits in IR(6-11). By
assigning the symbol Bi to bit i of IR, all control functions can be denoted by pB i where Bi represents
bits in IR, for i = 6 through 11.
INP pB11 AC(0-7) INPR, FGI 0 Transfer input from INPR into the eight least
significant bits of AC and clear GFI
OUT pB10 OUTR AC(0-7), FGO 0 Transfer the eight least significant bits of AC
to OUTR and clear FGO
SKI pB9 If (FGI = 1) then (PC PC + 1) Check if input flag (FGI) is set. If so, skip to
the next instruction
SKO pB8 If (FGO = 1) then (PC PC + 1) Check if output flag (FGO) is set. If so, skip
to the next instruction
IOF pB6 IEN 0 Set the interrupt enable flag (IEN) off
The input-output communication described above is known as programmed control transfer. Because of
the difference in the rate of information flow between I/O devices and the computer, this type of transfer
is inefficient. An alternative to programmed control is to let the interrupt facility. In the alternative,
while the computer is running a program, it does not check the status of flags. Instead, when a flag is set,
the computer is momentarily interrupted from proceeding with the current program and is informed of
the fact that a flag has been set. The computer deviates momentarily from what it is doing to take care of
the input or output transfer. Once serving the interrupt, the computer returns to the task it was doing
before the interrupt arrives.
When the interrupt enable flip-flop IEN is cleared to 0 by the IOF instruction, the flags cannot interrupt
the computer. When IEN is set to 1 by the ION instruction, the computer can be interrupted. The
interrupt operation is done through the branch and save return address operation. The return address
available in PC is stored in a specific location where it can be found later when the program returns to
the instruction at which it was interrupted. This location can be a processor register, a memory stack, or
a specific memory address.
The interrupt cycle is initiated if the interrupt flip-flop R is equal to 1 after the last execute phase. This
flip-flop is set to 1 if IEN = 1 and either FGI or FGO are equal to 1. This can happen with any clock
transition except when timing signals T0, T1, or T2 are active. The condition for setting flip-flop R to 1
can be expressed with the following register transfer statement:
The memory unit is the standard component that can be obtained readily from a commercial source. The
registers are of the types shown in the previous lesson.
C. How many bits are there in the data and address inputs of the memory?
Chapter 5
Central Processing unit
Objective:
After completing this chapter, you will be able to understand:
General register organization
Stack organization
Instruction format
Addressing modes
Data transfer and manipulations
Program control
Reduced instruction set computers
5.1 Introduction
The part of the computer that performs the bulk of data processing operations is called the Central
processing unit and is referred to as the CPU. The CPU is made up of three major parts, as shown in
Figure 5-1. The register set stores intermediate data used during the execution of the instructions. The
Arithmetic and Logic unit (ALU) performs the required micro-operations for executing the instructions.
The control unit supervises the transfer of information among the registers and instructs the arithmetic
and logic units as to which operation to perform.
The CPU performs a variety of functions dictated by the type of instructions that are incorporated in the
computer. Computer architecture is some times defined as the computer structure and behavior as seen
by the programmer that uses machine language instructions. This includes the instruction formats,
addressing modes, the instruction sets and the general organizations of the CPU registers.
One boundary where the computer designer and the computer programmer see the same machine is the
part of the CPU associated with the set of the instruction. From the designers’ point of view, the
computer instruction set provides the specifications for the design of the CPU.
Register Set
Control
Arithmetic Logic
Unit(ALU)
The design of a CPU is a task that in large part involves choosing the hard ware for implementing the
machine instructions .The user who programs the computer in machine or assembly language must be
aware of the register set, the memory structure, the type of data supported by the instructions, and the
function that each instruction performs.
A bus organization for 7-CPU registers is shown Figure 5-2. The output of each register is connected to
multiplexers (MUX) to form the two buses A and B. The selection lines in each multiplexer select one
register or the input data for the particular bus. The A and B busses form the inputs to ca common
arithmetic logic unit.
The operation selected in the ALU determines the arithmetic or Logic micro-operation that is to be
performed. The result of micro-operation is available for out put data and also goes in to the inputs of all
the registers. The register that receives the information from the output bus is selected by a decoder. The
decoder activates one of the register load inputs, thus providing a transfer path between the data in the
output bus and the inputs of the selected destination register.
The control unit that operates the CPU bus system directs the information flow through the register and
ALU by selecting the various components in the system. For example, to perform the operation
R1 R2 + R3,
the control must provide binary selection variables to the following selector inputs:
The four control selection variables are generated in the control unit and must be available at the
beginning of a clock cycle. The data from the two source registers propagates through the gates in the
multiplexers and the ALU, to the output bus, and into the inputs of the destination register, all during the
clock cycle interval. Then, when the next clock transition occurs, the binary information from the output
bus is transferred into R1. To achieve a fast response time, the ALU is constructed with high speed
circuits. The busses are implemented with multiplexers or three state gates.
Control Word
There are 14 binary selection inputs in the unit, and their combined value specifies a control word. The
14-bit control word is illustrated in Figure 5-2 (b). It consists of four fields. Three fields contain three
bits each, and one field has five bits. The three bits SELA select a source register for the A input of the
ALU. The three bits of SELB select a register for the B input of the ALU. The Three bits of SELD
selects a destination register using the decoder and its seven load outputs. The five bits of the OPR select
one of the operations in the ALU. The 14-bit control word when applied to the selection inputs specify a
particular micro operation. The encoding of the register selection is specified in Table 5-1.
Table 5-1: Encoding of register selection fields
Binary code SELA SELB SELD
001 R1 R1 R1
010 R2 R2 R2
011 R3 R3 R3
100 R4 R4 R4
101 R5 R5 R5
110 R6 R6 R6
111 R7 R7 R7
The 3-bit binary code listed in the first column of Table 5-1 specifies the binary code for each of the
three fields. The register selected by fields SELA, SELB, and SELD is the one whose decimal number is
equivalent to the binary number in the code. When SELA or SELB is 000, the corresponding
multiplexer selects the external input data. When SELD=000, no destination register is selected but the
contents of the output bus are available in the external output.
The ALU provides arithmetic and logic operation. In addition, the CPU must provide shift operations.
The shifter may be placed in the input of the ALU to provide a pre-shift capability, or at the output of
the ALU to provide post-shifting capability. In some cases, the shift operations are included with the
ALU. The encoding of the ALU operations for the CPU is specified in Table 5-2.
A control word of 14-bits is needed to specify a micro-operation in the CPU. The control word for a
given micro-operation can be derived from the selection variables. For example, the subtract
microoperation given by the statement R1R2 - R3 specifies R2 for the input A of the ALU, R3 for the
Department of C.S GTC 75
ITec381 Computer Organization and Architecture
B inputs of the ALU, R1 for the destination register, and an ALU operation to subtract A - B. Thus, the
control word is specified by the four fields and the corresponding binary value for each field is obtained
from the encoding list in Table 5-1 and Table 5-2. The binary control word for the subtraction
microoperation is 010 011 001 00101 and is obtained as follows:
01010 OR A and B OR
Symbol: R2 R3 R1 SUB
The control word for this micro-operation and a few others are listed in Table 5-3. The increment and
transfer micro-operations do not use the B input of the ALU. For these cases, the B field is marked with
a dash.
The stack in digital computers is essentially a memory unit with an address register that can count only
(after an initial value is loaded into it). The register that holds the address for the stack is called a stack
pointer (SP) because its value always points at the top in the stack.
The two operations of a stack are the insertion and deletion of items. The operation of the insertion is
called PUSH (or push-down) because it can be thought of as the result of pushing a new item on top.
The operation of deletion is called POP (or pop-up) because it can be thought as the result of removing
one item so that the stack pops up. However, nothing is pushed or popped in a computer stack. In
computers, these operations are simulated by incrementing or decrementing the stack pointer register.
Register Stack
A stack can be placed in a portion of a large memory or it can be organized as a collection of a finite
number of memory words or registers. Figure 5-3 shows the organizations of a 64-word register stack.
The stack pointer register SP contains a binary number whose value is equal to the address of the word
that is currently on top of the stack. Three items are placed in the stack: A, B and C in that order. Item C
is on top of the stack so that the content of SP is now 3. To remove the top item, the stack is popped by
reading the memory word at address 3 and decrementing the content of SP. Item B is now on top of the
stack since SP holds address 2. To insert a new item, the stack is pushed by incrementing SP and writing
a word in the next higher location in the stack. Note that item C has been read out but not physically
removed. This does not matter because when the stack is pushed, a new item is written in its place.
63 Address
FULL EMTY
SP C 3
B 2
A 1
DR
0
In a 64-word stack, the stack pointer contains 6-bits because 2 6 = 64. Since SP has only 6-bits it can not
exceed a number greater than 63 (111111 in binary). When 63 is incremented by 1, the result is 0 since
111111+ 1 = 1000000 in binary, but SP can accommodate only the 6 list significant bits. Similarly,
when 000000 is decremented by 1, the result is 111111. The 1-bit register FULL is set to 1 when the
stack is full, and the 1-bit register EMTY is set to 1 when the stack is empty of items. DR is the data
register that holds the binary data to be written in to or read out of the stack.
Initially, SP is cleared to 0, EMTY is set to 1 and FULL is cleared to 0, so that points to the word at
address 0 and the stack is marked empty and not full. If the stack is not full (if FULL = 0), a new item is
inserted with the PUSH operation. The PUSH operation is implemented with the following sequence of
micro-operations:
The stack pointer is incremented so that it points to the address the next higher word. A memory write
operation inserts the word from DR in to the top of the stack. Note that SP holds the address of the top
of the stack and that M[SP] denotes the memory word specified by the address presently available in SP.
The first item stored in the stack is at address 1. The last item is stored at address 0. If SP reaches 0, the
stack is full of items, so FULL is set to 1. This condition is reached if the top item prior to the last push
was in location 63 and after incrementing SP, the last item is stored in location 0. Once an item is stored
in location 0, there are no more empty registers in the stack. If an item is written in the stack, obviously
the stack cannot be empty, so EMTY is cleared to 0.
A new item is deleted from the stack if the stack is not empty (if EMTY = 0). The pop operation consists
of the following sequences of micro-operations:
The top item is read from the stack into DR. The stack pointer is then decremented. If its value reaches
0, the stack is empty, so EMTY is set to 1. This condition is reached if the item read was in location 1.
Once this item is read out, SP is decremented and reaches the value of 0, which is the initial value of SP.
Note that if a pop operation reads the item from location 0 and then SP is decremented, SP changes to
111111, which is equivalent to decimal 63. In this configuration the word in address 0 receives the last
item in the stack. Note also that an erroneous operation will result if the stack is pushed when FULL=1
or popped when EMTY=1.
Memory Stack
A stack can exist as standalone unit as in Figure 5-3 or can be implemented in random access memory
attached to a CPU. The implementation of a stack in the CPU is done by assigning a portion of memory
to stack operation and using a processor register as a stack pointer. Figure 5-4 shows a portion of
computer memory partitioned into three segments: Program, Data, and Stack. The program counter PC
points at the address of the next instruction in the program. The address register ER points at an array of
data. The stack pointer SP points at the top of the stack.
The three registers are connected to a common address bus, and either one can provide an address for
memory. PC is used during the fetch phase to read an instruction. AR is used during the execute phase to
read an operand. SP is used to push or POP items into or from the stack.
Address
M em ory unit
1000
PC
Program
(instructions)
2000
AR
D ata
(operands)
3000
Stack
3997
SP 3998
3999
4000
4001
DR
Figure 5-4: Computer memory with program, data, and stack segments
As shown in Figure 5-4, the initial value of SP is 4001 and the stack grows with decreasing addresses.
Thus the first item stored in the stack is at address 4000, the second item is stored at address 3999, and
the last address that can be used for the stack is 3000. No provisions are available for stack limit checks.
We assume that the items in the stack communicate with a data register DR. A new item is inserted with
the push operation as follows:
SP SP - 1
M[SP] DR
The stack pointer is decremented so that it points at the address of the next word. A memory write
operation inserts the word from DR into the top of the stack. A new item is deleted with the pop
operation as follows:
DR M[SP]
SP SP + 1
The top item is read from the stack into DR. The stack pointer is then incremented to point at the next
item in the stack.
Most computers do not provide hardware to check for stack overflow (full stack) or underflow (empty
stack). The stack limits can be checked by using to processor registers: one to hold the upper limit (3000
in our example), and the other to hold the lower limit (4001 in the example). After a push operation, SP
is compared with the upper limit register and after a pop operation SP is compared with the lower limit
register.
The two micro-operations needed for either the push or pop are:
1. An access to memory through SP and
2. Updating SP.
Computers may have instruction of several lengths containing varying number of addresses. The number
of address field in the instruction format of a computer depends on the internal organization of its
registers. Most computers fall into one of three types CPU organizations.
1. Single accumulator organization
2. General register organization
3. Stack organization
In an accumulator-type organization, all operations are performed with an implied accumulator register.
The instruction format in this type of computer uses one address field. For example, the instruction that
specifies an arithmetic addition is defined by an assembly language instruction as ADD X, where X is
the address of the operand. The ADD instruction in this case results in the operation AC AC + M[X].
AC is the accumulator register and M[X] symbolizes the memory word located at address X.
The instruction format in a computer with a general register organization type needs three register
address fields. Thus, the instruction for an arithmetic addition may be written in an assembly language
as ADD R1, R2, R3 to denote the operation R1 R2 + R3. The number of address field in the
instruction can be reduced from three to two if the destination register is the same as one of the source
register. Thus the instruction ADD R1, R2 would denote the operation R1 R1 + R2. Only register
addresses for R1 and R2 need be specified in this instruction.
Computers with multiple processor registers use the move instruction with a mnemonics MOV to
symbolize a transfer instruction. Thus the instruction MOV R1, R2 denotes the transfer R1 R2 (or R2
R1) depending on the particular computer. Thus, transfer-type instructions need two address fields to
specify the source and the destination.
General register–type computers employ two or three address fields in their instruction format. Each
address field may specify a processor register or a memory word. An instruction symbolized by ADD
R1, X would specify the operation R1 R1 + M[x]. It has two address fields, one for register R1 and
the other for the memory address X.
Computers with stack-organization would have PUSH and POP instructions which require an address
field. Thus, the instruction PUSH X will push the word at address X to the top of the stack. The stack
pointer is updated automatically. Operation-type instructions do not need an address field in stack-
organized computers. This is because the operation is performed on the two items that are on top of the
stack. The instruction ADD in a stack-organized computer consists of an operation code only with no
address field. This operation has the effect of popping the two top numbers from the stack, adding the
numbers, and pushing the sum into the stack. There is no need to specify operands with an address filed
since all operands are implied to be in the stack.
Most computers fall in to one of the three types of organizations that have just been described. Some
computers combine features from more than one organizational structure. For example, the Intel 8080
microprocessor has seven CPU registers, one of which is an accumulator register. As a consequence, the
processor has some of the characteristics of a general register type and some of the characteristics of an
accumulator type. All arithmetic and logic instructions, as well as the load and store instructions, use the
accumulator register, so these instructions have only one address field. On the other hand, instructions
that transfer data among the seven processor registers have a format that contain two register address
fields.
To illustrate the influence of the number of addresses on computer programs, we will evaluate the
arithmetic statement X = (A+B) (C+D) using zero, one, two, or three address instructions. We will use
symbols ADD, SUB, MUL and DIV for four arithmetic operations; and LOAD and STORE for transfers
to and from memory and AC register. We will the operand assume that the operands are in memory
addresses A, B, C, and D, and the result must be stored in memory at address X.
Three-address Instruction
Computers with three-address instruction formats can use each address field to specify either a processor
register or a memory operand. The program in assembly language that evaluates X = (A+B) (C+D) is
shown below, together with comments that explain the register transfer operation of each instruction.
ADD R1, A, B R1 M[A] + M[B]
ADD R2, C, D R2 M[C] + M[D]
MUL X, R1, R2 M[X] R1 R2
It is assumed that the computer has two processor registers, R1 and R2. The symbol M[A] denotes the
operand at memory address symbolized by A. The advantage of three-address format is that it results in
short programs when evaluating arithmetic expressions. The disadvantage is that the binary-coded
instructions require too many bits to specify three addresses.
The MOV instruction moves or transfers the operands to and from memory and processor registers. The
first symbol listed in an instruction is assumed to be both a source and destination where the result of the
operation is transferred.
One-Address Instruction
One-address instructions use an implied accumulator (AC) register for all data manipulations. For
multiplication and division there is a need for a second register. However, here we will neglect the
second register and assume that the AC contains the result of all operations. The program to evaluate X
= (A+B) (C+D) is:
LOAD A AC M[A]
ADD B AC AC + M[B]
STORE T M[T] AC
LOAD C AC M[C]
ADD D AC AC + M[D]
MUL T AC ACM[T]
STORE X M[X] AC
All operations are done between the AC register and a memory operand. T is the address of a temporary
memory location required for storing the intermediate result.
Zero-Address Instruction
A stack-organized computer does not use an address field for instructions ADD and MUL. The PUSH
and POP instructions, however, need an address field to specify the operand that communicates with the
stack. The following program shows how X = (A+B) (C+D) will be written for a stack-organized
computer (TOS stands for top-of-stack)
PUSH A TOS A
PUSH B TOS B
ADD TOS (A + B)
PUSH C TOS C
PUSH D TOS D
ADD TOS (C + D)
MUL TOS (C + D) (A + B)
POP X M[X] TOS
RISC Instructions
RISC stands for reduced instruction set computer. The instruction set of a typical RISC processor is
restricted to the use of load and store instructions when communicating between memory and CPU. All
other instructions are executed within the registers of the CPU without referring to the memory. A
program for a RISC-type CPU consists of LOAD and STORE instructions that have one memory and
one register address and computational-type instructions that have three addresses with all three
specifying processor registers. The following is a program to evaluate X = (A+B) (C+D).
The load instructions transfer the operands from memory to CPU registers. The add and multiply
operations are executed with the data in the registers without accessing memory. The result of the
computations is stored in memory with the store instruction.
1. To give programming versatility to the user by providing such facilities as pointers to memory,
counters for loop control, indexing of data and program relocation.
2. To reduce the number of bits in the addressing field of the instruction
The availability of the addressing modes give the experienced assembly language programmer flexibility
for writing programs that are more efficient with respect to the number of instructions and execution
time. To understand the various addressing mode to be presented in this section, it is important that we
understand the basic operation cycle of the computer. The control unit of a computer is designed to go
through an instruction cycle that is divided into three major phases.
There is one register in the computer called the program counter (PC) that keeps track of the instructions
in the program stored in memory. PC holds the address of the instruction to be executed next and
incremented each time an instruction is fetched from memory. The decoding done in step 2, determines
the operation to be performed, the addressing mode of the instruction, and the location of the operand.
The computer then executes the instruction and returns to step 1 to fetch the next instruction in
sequence.
In some computers the addressing mode of the instruction is specified with a distinct binary code, just
like the operation code is specified. Other computers use a singe binary code that designates the
operation and the mode of the instruction. Instructions may be defined with variety of addressing modes,
and sometimes, two or more addressing modes are combined in one instruction.
An example of an instruction format with a distinct addressing mode field is shown in Figure 5-6. The
operation code specifies the operation to be performed. The mode field is used to locate the operands
needed for the operation. There may or may not be an address field in the instruction. If there is an
address field, it may designate a memory address or a processor register. Moreover, the instruction may
have more than one address field, and each address field may be associated with its own particular
addressing mode.
Although, most addressing modes specify the address field of the instruction there are two modes that
need no address field at all. These are the implied and immediate mode.
Implied Mode: In this mode the operands are specified implicitly in the definition of the instruction. For
example, the instruction “complement accumulator” is an implied mode instruction because the operand
in the accumulator register is implied in the definition of the instruction. In fact, all register reference
instructions that use an accumulator are implied mode instructions. Zero-address instructions in stack-
organized are implied mode instructions since the operands are implied to be on top of the stack.
Immediate Mode: In this mode the operand is specified in the instruction itself. In other words, an
immediate mode instruction has an operand field rather than an address field. The operand field contains
the actual operand to be used in conjunction with the operation specified in the instruction. Immediate
mode instructions are useful for initializing registers to a constant value. When the address field
specifies a processor register, the instruction is said to be in the register mode.
Register Mode: In this mode the operands are in registers that reside within the CPU. The particular
register is selected from a register field in the instruction. A k-bit field can specify any of the 2 k
registers.
Register Indirect Mode: In this mode the instruction specifies a register in the CPU whose contents
give the address of the operand in memory. In other words, the selected register contains the address of
the operand rather than the operand itself. Before using a register indirect mode instruction, the
programmer must ensure that the memory address of the operand is placed in the processor register with
a previous instruction. A reference to the register is then equivalent to specifying a memory address. The
advantage of the register indirect mode instruction is that the address field of the instruction uses fewer
bits to select a register than would have been required to specify a memory address directly.
Autoincrement and Autodecrement Mode: This is similar to the register indirect mode except that
the register is incremented or decremented after (or before) its value is used to access memory. When
the address stored in the register refers to a table of a data in memory, it is necessary to increment or
decrement the register after every access to the table. The address field of an instruction is used by the
control unit in CPU to obtain the operand from memory.
Sometimes the value given in the address field is the address of the operand, but sometimes it is just an
address from which the address of the operand is calculated. To differentiate among the various
addressing mode it is necessary to distinguish between the address part of the instruction and the
effective address used by the control when executing the instruction. The effective address is defined to
be the memory address obtained from the computation dictated by the given addressing mode. The
effective address is the address of the operand in a computational type instruction. It is the address
where control branches in response to a branch type instruction.
Direct Addressing Mode: In this mode the effective address is equal to the address part of the
instruction. The operand resides in memory and its address is given directly by the address field of the
instruction. In a branch type of instruction the address field specifies the actual branch address.
Indirect Addressing Mode: In this mode the address field of the instruction gives the address where
the effective address is stored in memory. Control fetches the instruction from memory and uses its
address part to access memory again to read the effective address.
Relative Addressing mode: In this mode the content of program counter is added to the address part of
the instruction in order to obtain the effective address. The address part of the instruction is usually a
signed number (in 2’s complement representation) which can be either negative or positive. When this
number is added to the content of the program counter, the result produces an effective address whose
position in memory is relative to the address of the next instruction.
Indexed Addressing Mode: In this mode the content of an index register is added to the address part of
the instruction to obtain the effective address. The address field of the instruction defines the beginning
address of a data array in memory. Each operand in the array is stored in memory relative to the
beginning address. The distance (offset) between the beginning address and the address of the operand is
the index value stored in the index register. In computers with a dedicated index register, the index
register is involved implicitly in index mode instructions.
Base Register Addressing Mode: In this mode the content of a base register is added to the address part
of the instruction to obtain the effective address. It is similar to the indexed addressing. The difference is
that the base register contains the base address and the address field of the instruction contains the
displacement (offset) relative to this base address. In contrast, in indexing mode, the index register
contains the offset relative to the address part of the instruction.
The load instruction has been used mostly to designate a transfer from memory to a processor register,
usually an accumulator. The store instruction designates a transfer from a processor register into
memory. The move instruction has been used in computers with multiple CPU registers to designate a
transfer from one register into another. It has been used for data transfers between CPU registers and
memory or between two memory words. The exchange instruction swaps information between two
registers or a register and a memory word. The Input and output instructions transfer data among
processor register and input or output terminals. The push and pop instructions transfer data between
processor registers and memory stack. Some assembly language conventions modify the mnemonic
symbol to differentiate between the different addressing modes. For example the mnemonic for the
loadimmediate becomes LDI. Other assembly language conventions use a special character to designate
the addressing mode. For example, the immediate mode is recognized from pound sign # placed before
the operand. In any case, the important thing to recognize is that each instruction can occur with a
variety of addressing modes. Table 5-6 lists the addressing mode for the load instruction.
Register LD R1 AC R1
Data Manipulation
Data manipulation instructions perform operations on data and provide the computational capabilities
for the computer. The data manipulation instructions in a typical computer are usually divided into three
basic types.
1. Arithmetic instruction
2. Logical and bit manipulation
3. Shift instruction
Arithmetic Instruction
The four basic arithmetic operations are addition, subtraction, multiplication and division. Most
computers provide instructions for all four operations. Some small computers have only addition and
possibly subtraction instructions. The multiplication and division must then be generated by means of
software subroutines. The four basic arithmetic operations are sufficient for formulating solution to
scientific problems when expressed in terms of numerical analysis methods.
A list of typical arithmetic instructions is given in Table 5-7. The increment instruction adds 1 to the
value stored in a register or memory word. One common characteristic of the increment operations when
executed in processor registers is that a binary number of all 1’s when incremented produces a result of
all 0’s. The decrement instruction subtracts 1 from a value stored in a register or memory word. A
number with all 0’s, when decremented, produces a number with all 1’s. The add, subtract, multiply and
divide instructions may be available for different types of data. The data type assumed to be in processor
registers during the execution of these arithmetic operations is included in the definition of the operation
code.
Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with borrow SUBB
Negate(2’s complement) NEG
The clear instruction causes the specified operand to be replaced by 0’s. The complement instruction
produces the 1’s complement by inverting all the bits of the operand. The AND, OR and XOR
instructions produces the corresponding logical operations on individual bits of the operands. Although
they perform Boolean operation, when used in computer instructions, the logical instructions should be
considered as performing bit manipulation operation. There are three bit manipulations: a selected bit
can be clear to 0, or can be set to 1, or can be complemented.
The AND instruction is used to clear a bit or a selected group of bits of an operand. For any Boolean
variable x, the relationship x ∙ 0 = 0 and x ∙ 1 = x dictate that a binary variable ANDed with 0 produces a
0, but the variable does not change in valued when ANDed with a 1. Therefore, the AND instruction
can be used to clear bits of an operand selectively by ANDing the operand with another operand that
has 0’s in the bit positions that must be cleared. The AND instruction is also called a mask because it
masks or inserts 0’s in a selected portion of the operand.
The OR instruction is used to set a bit or a selected group of bits of an operand. For any Boolean
variables x, the relationship x + 1 = 1 and x + 0 = x dictates that a binary variable ORed with a 1
produces a 1; but the variable bits of x does not change when ORed with a 0. Therefore, the OR
instruction can be used to selectively set the bits of an operand by ORing it with another operand with
1’s in the bit positions that must be set to 1.
Similarly, the XOR instruction is used to selectively complement bits of an operand. This is because of
the Boolean relationships x 1 = x′ and x 0 = x. Thus a binary variable is complemented when
XORed with a 1 but does not change in value when XORed with a 0.
Shift Instruction
Instructions to shift the content of an operand are quite useful and are often provided in several
variations. Shifts are operations in which the bits of a word are moved to the left or right. The bit shifted
in at the end of the word determines the type of shift used. Shift instructions may specify either logical
shifts, arithmetic shifts, or rotate type operations. In either case the shift may be to the right or to the left.
Table 5-9 lists four types of shift instructions. The logical shift inserts 0 to the end bit position. The end
position is the leftmost bit for shift right and the rightmost bit position for the shift left. Arithmetic shifts
usually conform with the rules for signed 2’s complement. The arithmetic shift right instruction must
preserve the sign bit in the left most position. The sign bit is shifted to the right together with the rest of
the number, but the sign bit itself remains unchanged. This is a shift right operation with the end bit
remaining the same. The arithmetic shift-left instruction inserts 0 to the end position and is identical to
the logical shift-left instruction. For this reason many computers do not provide a distinct arithmetic
shift-left instruction when the logical shift-left instruction is already available.
The rotate instructions produce a circular shift. Bits shifted out at one end of the word are not lost as in
logical shift but are circulated back into the other end. The rotate through carry instruction treats a carry
bit as an extension of the register whose word is being rotated. Thus a rotate–left through carry
instruction transfers the carry bit into the right most bit position of the register, transfers the left most bit
position into the carry, and at the same time shifts the entire register to the left.
Some computers have a multiple field format for the shift instructions. One field contains the operation
code and the others specify a type of shift and the number of times that an operand is to be shifted. A
possible instruction code format of a shift instruction may include five fields as follows:
Here OP is the operation code field; REG is a register address that specifies the location of an operand;
TYPE is a 2-bit field specifying the four different types of shifts; RL is a 1-bit field specifying a shit-
right or shift-left, and COUNT is a k-bit field specifying up to 2 k-1 shifts. With such a format, it is
possible to specify the type of shifts, the direction, and the number of shifts all in one instruction
instructions. These computers also employ a variety of data types and a large number of addressing
modes. The trend into computer hardware complexity was influenced by various factors, such as
upgrading existing models, to provide more customer applications, adding instructions that facilitate the
translation from high-level language into machine language programs, and striving to develop machines
that move functions from software implementation into hardware implementation. A computer with a
large number of instructions is classified as a complex instruction set computer, abbreviated as CISC.
In the early 1980’s, a number of computer designers recommended that computers use fewer
instructions with simple constructs so they can be executed much faster within the CPU without having
to use memory as often. This type of computer is classified as a Reduced Instruction Set Computer
(RISC).
CISC characteristics
large number of instructions – typically from 100 to 250 instructions
some instructions that perform specialized tasks and are used infrequently
a large variety of addressing modes – typically from 5 to 20 different modes
variable length instruction formats
instructions that manipulate operands in memory
RISC characteristics
relatively few instructions
relatively few addressing modes
memory access limited to load and store instructions
all operations done within the register of the CPU
fixed length, easily decoded instruction format
single cycle instruction execution
hard-wired rather than micro-programmed control
a relatively large number of registers in the processor
use overlapped register windows to speed up procedure call and return
efficient instruction pipeline
compiler support for efficient transmission high-level language programs into machine language
programs
a) How many multiplexers are there in the A bus, and what is the size of each
multiplexers?
b) How many selection inputs are needed for a MUX-A and MUX-B?
c) How many inputs and outputs are there in the decoder?
d) How many inputs and outputs are there in the ALU for data, including inputs and
outputs carries?
e) Formulate your control word for the system assuming that the ALU has 35
operations.
2. A computer has 32-bit instructions and 12-bit addresses. If there are 250 2-address instructions,
how many one-address instructions can be formulated?
3. Give an example of a RISC instructions that will perform the following operations
A. Decrement a register C. Negate a register E. Divide a signed number by 4
B. Complement a register D. Clear a register to 0 F. No operation
Chapter 6
Parallel Processing
Objective:
After completing this chapter you will be familiar with:
The concept of parallel processing
Pipelining
Variety of ways through which pipelining processing can be accomplished
Vector processing
Array processor
SMP consists of multiple similar processors within the same computer, interconnected by a bus or some
sort of switching arrangement. Each processor has its own cache and so it is possible for a given line of
data to bepresent in more than one cache.
When more than one processor is implemented on a single chip, the configuration is referred to as
multiprocessor system. This design scheme is used to replicate some of the components of a single
processor so that the processor can execute multiple threads/processes/tasks concurrently.
This view of the computer is upgraded to the micro-operation levels, multiple control signals are
generated at the same time. Instruction pipelining, at least to the extent of overlapping fetch and
execute operations, has been around for a long time. Both of these are examples of performing
functions in parallel. This approach is taken further with superscalar organization, which exploits
instruction-level parallelism. With a superscalar machine, there are multiple execution units within a
single processor, and these may execute multiple instructions from the same program in parallel. As
computer technology has evolved, and as the cost of computer hardware has dropped, computer
designers have sought more and more opportunities for parallelism.
Parallel processing or parallel computing is a form of computation in which many calculations are
carried out simultaneously, operating on the principle that large problems can often be divided into
smaller ones, which are then solved concurrently ("in parallel").
Figure 6-1 shows one possible way of separating the execution unit into eight functional units operating
in parallel. The operands in the registers are applied to one of the units depending on the operation
specified by the instruction associated with the operands. The operation performed in each functional
unit is indicated in each block of the diagram. The adder and integer multiplier perform the arithmetic
operations with integer numbers. The floating point operations are separated into three circuits operating
in parallel.
Adder-subtractor
Integer multiply
Logic unit
Incrementer
Floating point
add-subtract
Floating point
multiply
Floating point
divide
o Pipeline processing
o Instruction Pipeline
o Arithmetic pipeline
o RISC pipeline
o Vector processing
o Array processors
6.2Pipelining
Pipelining is an implementation technique where multiple instructions are overlapped in execution. The
computer pipeline is divided intostages. Each stage completes a part of an instruction in parallel. The
stages are connected one to the next to form a pipe - instructions enter at one end, progress through the
stages, and exit at the other end.
Pipelining refers to the technique in which a given task is divided into a number of subtasks that need to
be performed in sequence. Each subtask is performed by a given functional unit. The units are connected
in a serial fashion and all of them operate simultaneously. The use of pipelining improves the
performance compared to the traditional sequential execution of tasks.
In general, pipelining is a technique of decomposing a sequential process into sub operations, with each
sub process being executed in a special dedicated segment that operates concurrently with all other
segments.Each segment performs partial processing dictated by the way the task is partitioned.The result
obtained from the computation in each segment is transferred to the next segment in the pipeline.The
final result is obtained after the data have passed through all segments.
Example: Suppose that we want to perform the combined multiply and add operations with a stream of
numbers. In Figure 6-2, R1 through R5 are registers that receives new data with every clock pulse.
Ai Bi + Ci , for i = 1, 2, 3, …, 7
The suboperations performed in each segment are:
R1 ← Ai , R2 ← Bi Input Ai and Bi
R3 ← R1 R2, R4 ← Ci Multiply and input Ci
R5 ← R3 + R4 Add Ci to product
Ai Bi Ci
R1 R2
Multiplier
R3 R4
Adder
R5
Figure 6-2: Example of pipeline processing
Table 6-2 shows the effect of each clock. The first clock pulse transfers A 1 and B1 into R1 and R2. The
second clock pulse transfers the product of R1 and R2 into R3 and C1 into R4. The same clock pulse
transfers A2 and B2 into R1 and R2 .The third clock pulse operates on all three segments simultaneously.
It places A3 and B3 into R1 and R2, transfer the product of R1 and R2 into R3, transfer C2 into R4, and
place the sum of R3 and R4 into R5.
Any operation that can be decomposed into a sequence of sub operations of about the same complexity
can be implemented by a pipeline processor.The technique is efficient for those applications that need to
repeat the same task many times with different sets of data. A task is the total operation performed going
through all segments of a pipeline.
6.2.1Arithmetic Pipeline
Pipeline organization is applicable for arithmetic operations and fetching instructions. The principles
used in instruction pipelining can be used in order to improve the performance of computers in
performing arithmetic operations such as add, subtract, and multiply. In this case, these principles will
be used to realize the arithmetic circuits inside the ALU. In this section, we will elaborate on the use of
arithmetic pipeline as a means to speed up arithmetic operations. We will start with fixed-point
arithmetic operations and then discuss floating-point operations.
Pipeline arithmetic units are usually found in very high speed computers. They are used to implement
floating-point operations, multiplication of fixed-point numbers, and similar computations encountered
in scientific problems. Example for floating-point addition andsubtraction is shown below. The inputs
a b
are two normalized floating-point binary numbers X and Y where X = A x 2 and Y = B x 2
a b A B
R R
Add or subtract
Segment 3: mantissas
R R
R R
Segment 4 normalizes the result by shifting the mantissa once to the right and incrementing the
exponent by one to obtain Z = 0.10324 104
Pipeline processing can occur not only in the data stream but also in the instruction stream as well.An
instruction pipeline reads consecutive instructions from memory while previous instructions are being
executed in other segments.This causes the instruction fetch and executes phases to overlap and perform
simultaneous operations.
Consider a computer with an instruction fetch unit and an instruction execution unit forming a two
segment pipeline.A FIFO (First In First Out) buffer can be used for the fetch segment. Thus, an
instruction stream can be placed in a queue, waiting for decoding and processing by the execution
segment.This reduces the average access time to memory for reading instructions.Whenever there is
space in the buffer, the control unit initiates the next instruction fetch phase.The following steps are
needed to process each instruction:
1. Fetch the instruction from memory
2. Decode the instruction
3. Calculate the effective address
4. Fetch the operands from memory
5. Execute the instruction
6. Store the result in the proper place
Assume that most of the instructions store the result in a register so that the execution and storing of the
result can be combined in one segment. This reduces the instruction pipeline into four segments. Figure
6-4 shows how the instruction cycle in the CPU can be processed with a four segment pipeline. While an
instruction is being executed in segment 4, the next instruction in sequence is busy fetching an operand
from memory in segment 3. Up to four suboperations in the instruction cycle can overlap and up to four
different instructions can be in progress of being processed at the same time.
Fetch instruction
Department of C.S Segment 1: GTC 103
from memory
Decode instruction
effective address
No
Fetch operand
Segmentcauses
There are three major difficultiesthat 3: instruction
from memorypipeline to deviate from its normal operation are:
1. Resource conflicts caused by access to memory by two segments at the same time.
2. Data dependency conflicts arise when an instruction depends on the result of a previous
Segment 4: Execute instruction
instruction, but this result is not yet available
3. Branch difficulties arise from program control instructions that may change the value of PC
Methods to handle data dependency:
Interrupt Yes
handling Interrupt?
Hardware interlocks are circuits that detect instructions whose source operands are destinations of
prior instructions. Detection causes the hardware to insert the required delays without altering the
No
program sequence.
Update PC
Operand forwarding uses special hardware to detect a conflict and then avoid it by routing the
data through special paths between pipeline segments. This requires additional hardware paths
Empty pipe
through multiplexers as well as the circuit to detect the conflict.
Delayed load is a procedure that gives the responsibility for solving data conflicts to the compiler.
The compiler is designed to detect a data conflict and reorder the instructions as necessary to delay
the loading of the conflicting data by inserting no-operation instructions.
Methods to handle branch instructions:
Prefetching the target instruction in addition to the next instruction allows either instruction to
be available.
A branch target buffer is an associative memory included in the fetch segment of the branch
instruction that stores the target instruction for a previously executed branch. It also stores the
next few instructions after the branch target instruction. This way, the branch instructions that
have occurred previously are readily available in the pipeline without interruption.
Branch prediction uses some additional logic to guess the outcome of a conditional branch
instruction before it is executed. The pipeline then begins prefetching instructions from the
predicted path.
Delayed branch is used in most RISC processors so that the compiler rearranges the instructions
to delay the branch.
A RISC (Reduced Instruction Set Computer) processor pipeline operates in much the same way,
although the stages in the pipeline are different. While different processors have different numbers of
steps, they have basically variations of these five steps:
1. fetch instructions from memory
2. read registers and decode the instruction
3. execute the instruction or calculate an address
4. access an operand in data memory
5. write the result into a register
The length of the pipeline is dependent on the length of the longest step. Because RISC instructions are
simpler than those used in pre-RISC processors (now called CISC, or Complex Instruction Set
Computer), they are more conducive to pipelining. While CISC instructions varied in length, RISC
instructions are all the same length and can be fetched in a single operation. Ideally, each of the stages in
a RISC processor pipeline should take 1 clock cycle so that the processor finishes an instruction each
clock cycle and averages one cycle per instruction (CPI).
Among the characteristics attributed to RISC is its ability to use an efficient instruction pipeline. The
simplicity of the instruction set can be utilized to implement an instruction pipeline using small number
of suboperations, with each being executed in one clock cycle.
6.3Vector Processing
The part of a computer that allows it to function, carrying out the instructions of various programs, is the
central processing unit (CPU). The CPU, also called a processor, receives a program's instructions;
decodes those instructions, breaking them into individual parts; executes those instructions; and reports
the results, writing them back into memory. The format for that processor comes in one of two primary
types: vector and scalar. The difference between the two is that scalar processors operate on only one
data point at a time, while vector processors operate on an array of data.
Many scientific problems require arithmetic operations on large arrays of numbers. These numbers are
usually formulated as vectors and matrices of floating point numbers. A vector is an ordered set of a one
dimensional array of data items. A vector V of length n is represented as a row vector by V= [V1, V2,
V3,...,Vn].It may be represented as a column vector if the data items are listed in a column.
A computer capable of vector processing eliminates the overhead associated with the time it takes to
fetch and execute the instructions in the program. Vector processing allows operations to be specified
with a single vector instruction of the form:
C (1:100) = A (1:100) + B (1:100)
This vector instruction includes the initial address of the operands, the length of the vectors, and the
operation to be performed, all in one composite instruction. The addition is done with a pipelined
floating point adder .Matrix multiplication is one of the most computational intensive operations
performed in computer with vector processors.
In general, vector processors operate on an array of data points. This means that rather than handling
each item individually, multiple items that all have the same instruction can be handled at once. This can
save time over scalar processing, but also adds complexity to a system, which can slow other functions.
Vector processing works best when there is a large amount of data to be processed, groups of which can
be handled by one instruction.
6.4Array Processor
Array processor, is a CPU design that is able to run mathematical operations on multiple data elements
simultaneously. Array processors were common in the scientific computing area, where they formed the
basis of most supercomputers. Today almost all commodity CPU designs include some array processing
instructions, typically known as SIMD.
An array processor is a processor that performs computations on large arrays of data. The term is used to
refer to two processors. An attached array processor is an auxiliary processor attached to a general
purpose computer. It is intended to improve the performance of the host computer in specific numerical
computation tasks.An SIMD array processor is a processor that has a Single Instruction Multiple Data
organization. It manipulates vector instructions by means of multiple instructional units responding to
common instruction.
Chapter 7
Input/OutputOrganization
Objectives:
By the end of this chapter, you should be able to:
Discuss the classification of external devices
Discuss I/O problems and functions of I/O modules
Explain the concept of programmed I/O.
Explain the various I/O instructions
Explain the concept of Interrupt driven I/O.
Explain the DMA function, operation and configurations
7.1External Devices
The input/output subsystem of a computer, referred to as I/O, provides an efficient mode of
communication between the central system and the outside environment. Programs and data must be
entered into computer memory for processing and results obtained from computations must be recorded
or displayed for users. A computer serves no useful purpose without the ability to receive information
from an outside source and to transmit results in a meaningful form.
I/O operations are accomplished through a wide assortment of external devices that provide a means of
exchanging data between the external environment and the computer. An external device attaches to the
computer by a link to an I/O module as shown in figure below. The link is used to exchange control,
status, and data between the I/O module and the external device. An external device connected to an I/O
module also called Interface is often referred to as a peripheral device or simply a peripheral.
I/O bus
Data
Processor Address
Control
Keyboard
and display Magnetic Magnetic
Printer
terminal disk tape
Input/output Problems
The major functions or requirements for an I/O module fall into the following five categories.
Control & Timing
CPU Communication
Device Communication
Data Buffering
Data Buffering
During any period of time, the CPU may communicate with one or more external devices in
unpredictable patterns on the program’s need for I/O. The internal resources, main memory and the CPU
must be shared among number of activities including handling data I/O. Thus the I/O device includes a
control and timing requirement to coordinate the flow of traffic between internal resources and external
devices to the CPU. Thus CPU might involve in sequence of operations like:
CPU checks I/O module device status
I/O module returns device status
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
Variations for output, DMA, etc.
I/O module must have the capability to engage in communication with the CPU and external device.
Thus CPU communication involves:
Command decoding: The I/O module accepts commands from the CPU carried on the control
bus.
Data: data are exchanged between the CPU and the I/O module over data bus
Status reporting: Because peripherals are slow it is important to know the status of I/O device.
I/O module can report with the status signals common used status signals are BUSY or READY.
Various other status signals may be used to report various error conditions.
Address recognition: just as each memory word has an address, there is address associated with
every I/O device. Thus I/O module must be recognized with an unique address for each
peripheral it controls.
The I/O module must also be able to perform device communication. This communication involves
commands, status information, and data. Some of the essentials tasks are listed below:
Error detection: I/O module is often responsible for error detection and subsequently reporting
errors to the CPU.
Data buffering: the transfer rate into and out of main memory or CPU is quite high, and the rate
is much lower for most of the peripherals. The data is buffered in the I/O module and then sent to
the peripheral device at its rate. In the opposite direction data are buffered so as not to tie up the
memory in a slow transfer operation. Thus I/O module must be able to operate at both device and
memory speeds.
Interrupt driven
Direct Memory Access (DMA)
With Programmed I/O, data are exchanged between the CPU and the I/O module. The CPU executes a
program that gives it direct control of the I/O operation, including sensing device status, sending a read
or write command and transferring data. When CPU issues a command to I/O module, it must wait until
I/O operation is complete. If the CPU is faster than I/O module, there is wastage of CPU time.
The I/O module does not take any further action to alert CPU. That is it doesn’t interrupt CPU. Hence it
is the responsibility of the CPU to periodically check the status of the I/O module until it finds that the
operation is complete.
The sequences of actions that take place with programmed I/O are:
CPU requests I/O operation
I/O commands
To execute an I/O related instruction, the CPU issues an address, specifying the particular I/O module
and external device and an I/O command. Four types of I/O commands can be received by the I/O
module when it is addressed by the CPU. They are
A control command: is used to activate a peripheral and tell what to do.Example: a magnetic
tape may be directed to rewind or move forward a record.
A test command: is used to test various status conditions associated with an I/O module and its
peripherals. The CPU wants to know the interested peripheral for use. It also wants to know the
most recent I/O operation is completed and if any errors have occurred.
A read command: it causes the I/O module to obtain an item of data from the peripheral and
place it in an internal buffer. The CPU then gets the data items by requesting I/O module to place
it on the data bus.
A write command: it causes the I/O module to take an item of data from the data bus and
subsequently transmit the data item to the peripheral.
I/O Mapping
When the CPU, main memory, and I/O module share a common bus two modes of addressing are
possible.
2. Isolated I/O
Separate address spaces
Need I/O or memory select lines
Special commands for I/O
Limited set
Using Program-controlled I/O requires continuous involvement of the processor in the I/O activities. It
is desirable to avoid wasting processor execution time. An alternative is for the CPU to issue an I/O
command to a module and then go on other work. The I/O module will then interrupt the CPU
requesting service when it is ready to exchange data with the CPU. The CPU will then execute the data
transfer and then resumes its former processing. Based on the use of interrupts, this technique improves
the utilization of the processor.
With Interrupt driven I/O, the CPU issues a command to I/O module and it does not wait until I/O
operation is complete but instead continues to execute other instructions. When I/O module has
completed its work it interrupts the CPU.
An interrupt is more than a simple mechanism for coordinating I/O transfers. In a general sense,
interrupts enable transfer of control from one program to another to be initiated by an event that is
external to a computer. Execution of the interrupted program resumes after completion of execution of
the interrupt service routine. The concept of interrupts is useful in operating systems and in many
control applications where processing of certain routines has to be accurately timed relative to the
external events.
Using Interrupt Driven I/O technique CPU issues read command. I/O module gets data from peripheral
while CPU does other work and I/O module interrupts CPU checks the status if no error that is the
device is ready then CPU requests data and I/O module transfers data. Thus CPU reads the data and
stores it in the main memory.
An interrupt is an exception condition in a computer system caused by an event external to the CPU.
Interrupts are commonly used in I/O operations by a device interface (or controller) to notify the CPU
that it has completed an I/O operation.
An interrupt is indicated by a signal sent by the device interface to the CPU via an interrupt request line
(on an external bus). This signal notifies the CPU that the signaling interface needs to be serviced. The
signal is held until the CPU acknowledges or otherwise services the interface from which the interrupt
originated.
The CPU checks periodically to determine if an interrupt signal is pending. This check is usually done at
the end of each instruction, although some modern machines allow for interrupts to be checked for
several times during the execution of very long instructions.
When the CPU detects an interrupt, it then saves its current state (at least the PC and the Processor
Status Register containing condition codes); this state information is usually saved in memory. After the
interrupt has been serviced, this state information is restored in the CPU and the previously executing
software resumes execution as if nothing had happened.
Figure 7-2:Connections between devices and interrupt controller actually use the bus lines instead of
dedicated wires
Direct Memory Access is capabilities provided by some computer bus architectures that allow data to be
sent directly from an attached device (such as a disk drive) to the memory on the computer’s
motherboard. The microprocessor (CPU) is freed from involvement with the data transfer, thus speeding
up overall computer operation.
Consider an example of input of a block of data using DMA access. The flow chart is given in Figure 7-3.
When the CPU wishes to read or write a block of data, it issues a command to the DMA module and
gives following information:
DMA Configurations
The DMA mechanism can be configured in variety of ways. Some of the common configurations are
discussed here.
In this configuration all modules share the same system bus. The block diagram of single bus detached
DMA is as shown in Figure 7-4. The DMA module that is mimicking the CPU uses the programmed I/O
to exchange the data between the memory and the I/O module through the DMA module. This scheme
may be inexpensive but is clearly inefficient. The features of this configuration are:
Single Bus, Detached DMA controller
Here, there is a path between DMA module and one or more I/O modules that do not include the system
bus. The block diagram of single bus Integrated DMA is as shown in Figure 7-5. The DMA logic can
actually be considered as a part of an I/O module or there may be a separate module that controls one
more I/O modules.
Figure 7-5: Block diagram of single bus, integrated DMA
DMA to memory
CPU is suspended once
One further step of the concept of integrated DMA is to connect I/O modules to DMA controller using a
separate bus called I/O bus. This reduces the number of I/O interfaces in the DMA module to one and
provides for an easily expandable configuration. The block diagram of DMA using I/O bus is as shown
in Figure 7-6. Here the system bus that the DMA shares with CPU and main memory is used by DMA
module only to exchange data with memory. And the exchange of data between the DMA module and
the I/O modules takes place off the system bus that is through the I/O bus.
Figure 7-6: Diagram of DMA using I/O bus
System bus
DMA Main
CPU Controller Memory
I/O bus
I/O I/O I/O I/O
device device device device
With both Programmed I/O and Interrupt driven I/O the CPU is responsible for extracting data from
main memory foroutput and storing data in main memory for input. Table 7-1 indicates the relationship
among the three techniques.
Advantages of DMA
DMA has several advantages over polling and interrupts. DMA is fast because a dedicated piece of
hardware transfers data from one computer location to another and only one or two bus read/write cycles
are required per piece of data transferred. In addition, DMA is usually required to achieve maximum
data transfer speed, and thus is useful for high speed data acquisition devices. DMA also minimizes
latency in servicing a data acquisition device because the dedicated hardware responds more quickly
than interrupts and transfer time is short. Minimizing latency reduces the amount of temporary storage
(memory) required on an I/O device. DMA also off-loads the processor, which means the processor does
not have to execute any instructions to transfer data. Therefore, the processor is not used for handling
the data transfer activity and is available for other processing activity. Also, in systems where the
processor primarily operates out of its cache, data transfer is actually occurring in parallel, thus
increasing overall system utilization.
Chapter 8
Memory Organization
Objectives:
By the end of this chapter, you should be able to:
Define the different types of memory /storage.
Explain the various accessing methods
Explain memory hierarchy.
With neat diagram explain the cache memory. Also explain its interaction with the processor.
Explain the cache read operation
Memory systems are classified according to their key characteristics. The most important are listed
below:
Location
The classification of memory is done according to the location of the memory as:
Registers: The CPU requires its own local memory in the form of registers and also control unit
requires local memories which are fast accessible.
Internal (main): is often equated with the main memory (RAM)
External (secondary): consists of peripheral storage devices like Hard disks, magnetic tapes, etc.
Capacity
Storage capacity is one of the important aspects of the memory. It is measured in bytes. Since the
capacity of memory in a typical memory is very large, the prefixes kilo (K), mega (M), and giga (G). A
kilobyte is 210 bytes, a mega byte is 220 bytes, and a giga byte is 230 bytes.
Unit of Transfer
Unit of transfer for internal memory is equal to the number of data lines into and out of memory module.
Word: For internal memory, unit of transfer is equal to the number of data lines into and out of
the memory module.
Block: For external memory, data are often transferred in much larger units than a word, and
these are referred to as blocks.
Access Method
Sequential: Tape units have sequential access. Data are generally stored in units called
“records”. Data is accessed sequentially; the records may be passed (or rejected) until the record
that is searched is found.
Random: Each addressable location in memory has a unique addressing mechanism. The time to
access a given location is independent of the sequence of prior accesses and constant. Any
location can be selected at random and directly addressed and accessed. Main memory and cache
systems are random access.
Performance
Access time: For random-access memory, this is the time it takes to perform a read or write
operation: that is, the time from the instant that an address is presented to the memory to the
instant that data have been stored or made available for use. For non-random-access memory,
access time is the time it takes to position the read-write mechanism at the desired location.
Transfer rate: This is the rate at which data can be transferred into or out of a memory unit.
Physical Type
Semiconductor: Main memory, cache. RAM, ROM.
Magnetic: Magnetic disks (hard disks), magnetic tape units.
Optical: CD, DVD.
Physical Characteristics
Volatile/nonvolatile: In a volatile memory, information decays naturally or is lost when
electrical power is switched off.
Erasable/nonerasable:Nonerasable memory cannot be altered (except by destroying the storage
unit). ROM’s are nonerasable.
The main memory (RAM) stores data and instructions. Main memories are usually built from dynamic
IC’s known as dynamic RAMs. These semiconductor ICs can also implement static memories referred
to as static RAMs (SRAMs). SRAMs are faster but cost per bit is higher. These are often used to build
caches.
Semiconductor memories in which the storage cells are small transistor circuits are used for high speed
CPU registers. Single chip RAMs can be manufactured in sizes ranging from a few hundred bits to 1 GB
or more. Semiconductor memories fall into two categories, SRAMs (static RAMs) and DRAMs
(dynamic RAMs). Both bipolar transistor and MOS ((Metal Oxide Semiconductor) transistor are used
for designing RAMs but MOS is dominant technology for designing large RAMs.
A semiconductor memory constructed using bipolar transistors or MOS transistor stores information in
the form of a flip-flop voltage levels. These voltage levels are not likely to be discharged. Such
memories are called static memories. In these memories information remains constant for longer period
of time. Semiconductor memory designed using a MOS with a capacitor stores the information in the
form of charge on a capacitor. The stored charge has a tendency to leak away. A stored ‘1′ will become
‘0′ if no precautions are taken. These types of memory are called dynamic memories.
Static RAM: for example: Flip-flop logic-gate. Applying power is enough (no need for
refreshing).
Dynamic RAM is simpler and hence smaller than the static RAM. Therefore it is denser and less
expensive. But it requires supporting refresh circuitry.
From the system standpoint, the memory unit can be viewed as a “black box”. Data transfer between the
main memory and the CPU register takes place through two registers namely MAR (memory address
register) and MDR (memory data register). If MAR is k bits long and MDR is n bits long, the main
memory unit can contain upto 2k addressable locations. During a ‘memory cycle’ n bits of data are
transferred between main memory and CPU. This transfer takes place over the processor bus, which has
k address lines and n data lines.
The CPU initiates a memory operation by loading the appropriate data into registers MDR and MAR,
and setting either Read or Write memory control line to 1. When the required operation is completed the
memory control circuitry sends Memory Function Completed (MFC) signal to CPU.
The time that elapses between the initiation of an operation and completion of that operation is called
memory access time. The minimum time delay between two successive memory operations is called
memory cycle time. The cycle time is usually slightly lower than the access time.
The solution is to exploit the principle of locality by providing a small, fast memory between the CPU
and the main memory. This memory is known as cache memory. Thus the intention of using cache
memory is to give memory speed approaching the speed of fastest memories available, and at the same
time provide a large memory size at the price of less expensive types of semiconductor memory.
The concept is illustrated in Figure 8-3. The cache memory contains the copy of portions of main
memory. When CPU attempts to read a word from main memory, a check is made to find if the word is
in cache memory. If so cache and then the word are delivered to the CPU. The purpose of reading a
block from main memory is that it is likely future references will be to other words in the block.
The structure of the cache and main memory is as shown in Figure 8-4. The figure shows a main
memory consisting of 2n addressable words, each word having a unique n bit address. This memory is
considered to consist of a number of fixed length blocks of size K words each. That is M=2 n/K blocks.
Cache consists of C lines of K words each. The number of lines (C) of cache is much less than the
number of main memory blocks (M).
Slot Memory
number Tag Block address2
0 0
1 1
2 Block
. 3 (K words)
.
.
.
C-1
Block length
.
(K words) .
(a) Cache
Block
2n-1
Word
length
Now let us see how cache read operation is executed. The flow chart illustrating the steps of cache read
operation is illustrated in Figure 8-5. The processor generates the address ‘RA’ of a word to be read. If
the word is contained in the cache, it is delivered to the processor. Otherwise, the block containing that
word is loaded into the cache and simultaneously the word is delivered to the processor from main
memory. These last two operations occur in parallel.
Transferring data in blocks between the main memory and the cache enables an interleaved memory to
operate at its maximum possible speed.
Line Size: Greater line size => more hit (+) but also more line replacements (-). Too small line
size => less chance of hit for some parts of the block.
Number of caches: Two levels of cache: Cache internal to processor is called Level-1 (L1)
cache.external cache is called Level-2 (L2) cache.
Mapping Functions
The correspondence between the main memory and CPU are specified by a mapping function. There are
three standard mapping functions namely:
Direct mapping
Associative mapping
Block set associative mapping
In order to discuss these methods consider a cache consisting of 128 blocks of 16 words each. Assume
that main memory is addressable by a 16 bit address. For mapping purpose main memory is viewed as
composed of 4K blocks.
This is a simplest mapping technique. In this case, block K of the main memory maps onto block K
modulo 128 of the cache. Since more than one main memory block is mapped onto a given cache block
position, contention may arise for that position even when the cache is not full. This is overcome by
allowing the new block to overwrite the currently resident block.
A main memory address can be divided into three fields, TAG, BLOCK and WORD as shown in Figure
8-6. The TAG bit is required to identify a main memory block when it is resident in the cache. When a
new block enters the cache the 7-bit cache block field determines the cache position in which this block
must be stored. The tag field of that block is compared with tag field of the address. If they match, then
the desired word is present in that block of cache. If there is no match, then the block containing the
required word must be first read from the main memory and then loaded into the cache.
This is a much more flexible mapping technique. Here any main memory block can be loaded to any
cache block position. Associative mapping is illustrated in Figure 8-7. In this case 12 tag bits are
required to identify a main memory block when it is resident in the cache.
The tag bits of an address received from the CPU are compared with the tag bits of each cache block to
see if the desired block is present in the cache. Here we need to search all 128 tag patterns to determine
whether a given block is in the cache. This type of search is called associative search. Therefore the cost
of implementation is higher. Because of complete freedom in positioning, a wide range of replacement
can be used.
This is a combination of two techniques discussed above. In this case blocks of the cache are grouped
into sets and the mapping allows a block of main memory to reside in any block of a particular set.
Block Set associative mapping is illustrated in Figure 8-8.
Figure 8-8: Block set associative mapping cache with two blocks per set
Consider an example, a cache with two blocks per set. The 6 bit set field of the address determines
which set of the cache might contain the addressed block. The tag field of the address must be
associatively compared to the tags of the two blocks of the set to check if the desired block is present.
The number of blocks per set can be selected according to the requirements of a particular
computer system. Four blocks per set can be accommodated by a 5 bit set field, eight blocks per
set by a 4-bit set field and so on. The extreme conditions are one block per set (direct mapping
method) and 128 blocks per set (fully associative technique).
Each block must be provided with a valid bit which indicates whether the block contains valid
data. When the main memory is loaded with new programs and data from mass storage devices,
valid bit of a particular cache block is set to 1. It stays at 1 unless a main memory is updated by a
source that bypasses the cache. In this case, a check is made to determine whether the block is
currently in the cache. If it is, its valid bit is set to 0.
Magnetic Disk
A disk is a circular platter constructed of metal or of plastic coated with a magnetic material. Data are
recorded on and later retrieved from the disk via a conducting coil named the head. During a read or
write operation, the head is stationary while the platter rotates beneath it. Writing is achieved by
producing a magnetic field which records a magnetic pattern on the magnetic surface.
Figure 8-9 depicts the data layout of a disk. The head is capable of reading or writing from a portion of
the platter rotating beneath it. This gives rise to organization of data on the platter in a concentric set of
rings called Tracks. Each track is the same width as the head. Adjacent tracks are separated by gaps that
minimize errors due to misalignment of head. Data is transferred to and from the disk in blocks. And the
block is smaller than the capacity of a track. Data is stored in block regions which is an angular part of a
track and is referred to as a sector. Typically 10-100 sectors are there per track. These may be either of
fixed or variable length.
Optical Memory & Magnetic Tape are other two external memories (Recall your Introduction to IT
(ITec201) course).
Virtual (or logical) memory is a concept that, when implemented by a computer and its operating
system, allows programmers to use a very large range of memory or storage addresses for stored data.
The computing system maps the programmer’s virtual addresses to real hardware storage addresses.
Usually, the programmer is freed from having to be concerned about the availability of data storage.
In addition to managing the mapping of virtual storage addresses to real storage addresses, a computer
implementing virtual memory or storage also manages storage swapping between active storage (RAM)
and hard disk or other high volume storage devices. Data is read in units called “pages” of sizes ranging
from a thousand bytes (actually 1,024 decimal bytes) up to several megabytes in size. This reduces the
amount of physical storage access that is required and speeds up overall system performance.
The memory control circuitry translates the address specified by the program into an address that can be
used to access the physical memory. This address is called logical address or virtual address. A set of
virtual addresses constitute the virtual address space.
The mechanism that translates virtual addresses into physical address is usually implemented by a
combination of hardware and software components. If a virtual address refers to a part of the program or
data space that is currently in the physical memory, then the contents of the appropriate physical
location in the main memory are accessed. Otherwise its contents must be brought into a suitable
location in the main memory. The mapping function is implemented by a special memory control unit
called as memory management unit. Mapping function can be changed during the program execution
according to the system requirement.
The simplest method of translation assumes that all programs and data are composed of fixed length
units called pages. Each page consists of a block of words that occupy contiguous locations in the main
memory or in the secondary storage. Page normally ranges from 1K to 8K bytes in length. They form
the basic unit of information that is transmitted between main memory and secondary storage devices.
Virtual memory increases the effective size of the main memory. Only the active space of the virtual
address space is mapped onto locations in the physical main memory, whereas the remaining virtual
addresses are mapped onto the bulk storage devices used. During a memory cycle the addressing spacing
mechanism (hardware or software) determines whether the addressed information is in the physical main
memory unit. If it is, the proper information is accessed and the execution proceeds. If it is not, a
contiguous block of words containing the desired information are transferred from the bulk storage to
main memory displacing some block that is currently inactive.
Management routines are the parts of operating system of the computer. Virtual address space is divided
into two parts system space and user space. Operating system routines reside in system space and user
application programs reside in the user space.
In a multiuser environment each user will have a separate user space with a separate page table. The
memory management unit uses a page table base register to determine the address of table to be used in
the translation process. Hence by changing the contents of this register operating system can switch from
one space to another. The physical main memory is thus shared by the active pages of the system space
and several user spaces. However, only the pages that belong to one of these spaces are accessible at any
given time. In a multiuser environment, no program should be allowed to modify either data or
instructions of other programs in the main memory. Hence protection should be given.
Such protection can be provided in several ways. Let us first consider the most basic form of protection.
Recall that in the simplest case the processor has two states namely, the supervisor state and user state.
The processor is placed in the supervisor state while operating system routines are being executed and in
the user state to execute user programs. In the user state some machine instructions cannot be executed.
These privileged instructions which include operations such as modifying the page table base register
cannot be executed in the user state. Hence a user program is prevented from accessing the page tables
of other user spaces or of the system space.
It is sometimes desirable for one application program to have access to certain pages belonging to
another program. The operating system can arrange this by causing these pages to appear in both spaces.
The shared pages will therefore have entries in two different page tables. The control bits in each table
entry can be set differently to control the Read/Write access privileges granted to each program. The
following types of partitioning methods are Fixed-size and unequal-size partitions and dynamic
partitioning. Paging is more efficient than partitioning. Programs are divided to “logical” chunks known
as “pages”, assigned to available chunks of memory known as “frames” or “page frames”.
8.5 ReviewQuestions
1. Explain cache replacement algorithms.
2. What is the reason for having so many types (hierarchy) of memory?
3. Discuss the physical characteristics of a disk.
4. What is RAID? What is its advantage over ordinary disks?
5. What is virtual memory? Why is it important?
Character codes 128 to 255 are known as the extended character set.
References
1. M, Moris Mano; Computer System Architecture (3rd Edition),Prentice Hall,
2. William Stallings;Computer Organization and Architecture: Designing for Performance (8th Edition),
Prentice Hall,2010