0% found this document useful (0 votes)
6 views422 pages

Department of Computer Science: Computer Architecture and Organization

computer graphics about third-generation computers

Uploaded by

yeabsamuelz25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views422 pages

Department of Computer Science: Computer Architecture and Organization

computer graphics about third-generation computers

Uploaded by

yeabsamuelz25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 422

KIBUR COLLEGE

Department of Computer Science

Computer Architecture and


Organization

Lecture by: Netsanet Getnet


1. INTRODUCTION
 Organization and Architecture
 Structure and Function
 Computer Evolution and Performance

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


2 B.Sc. Electrical Eng.
Organization and Architecture
 Computer architecture refers to those attributes of a system
visible to a programmer (or those attributes that have a direct
impact on the logical execution of a program).
 Examples of architectural attributes:
 the instruction set,
 the number of bits used to represent various data types (e.g.,
numbers, characters),
 I/O mechanisms, and techniques for addressing memory.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


3 B.Sc. Electrical Eng.
Organization and Architecture (2)
 Computer organization refers to the operational units
and their interconnections that realize the architectural
specifications.
 Examples of organizational attributes:
 hardware details transparent to the programmer, such as control
signals; interfaces between the computer and peripherals; and
the memory technology used.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


4 B.Sc. Electrical Eng.
Organization and Architecture (3)

 For example, it is an architectural design issue whether a


computer will have a multiply instruction.
 It is an organizational issue whether that instruction will be
implemented by a special multiply unit or by a mechanism that
makes repeated use of the add unit of the system.
 The organizational decision may be based on the anticipated
frequency of use of the multiply instruction, the relative speed of
the two approaches, and the cost and physical size of a special
multiply unit.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


5 B.Sc. Electrical Eng.
Structure and Function
 Structure: The way in which the components are
interrelated.
 Function: The operation of each individual component as part
of the structure.
 In general terms, there are only four functions:
 Data processing
 Data storage
 Data movement
 Control

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


6 B.Sc. Electrical Eng.
Fig. 1.1: A Functional
View of the Computer

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


7 B.Sc. Electrical Eng.
Fig 1.2a. Data movement: Fig 1.2b Data storage: data transferred
transferring data from one from the external environment to
peripheral or communication computer storage (read) and vice versa
8
lineBy:toNetsanet
another.Getnet, Lecturer, M.Sc. Computer Eng., (write).
B.Sc. Electrical Eng.
Fig 1.2c. Data processing, Fig 1.2d. Data processing, on data
on data in Storage. en route between storage and the
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,external environment.
9 B.Sc. Electrical Eng.
Structure

Fig 1.3. The simplest possible


depiction of a computer.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


10 B.Sc. Electrical Eng.
Structure (2)
 There are four main structural (internal) components:
 Central processing unit (CPU): Controls the operation of the
computer and performs its data processing functions; often simply
referred to as processor.
 Main memory: Stores data for the processor.
 I/O: Moves data between the computer and its external
environment.
 System interconnection: Mechanism that provides for
communication among CPU, main memory, and I/O.
 A common example of system interconnection is by means of a
system bus, consisting of a number of conducting wires to which all
the other components attach.
 There may be one or more of these components
Examples
By:Netsanet Getnet,uniprocessor/multiprocessor
Lecturer, M.Sc. Computer Eng.,
11 B.Sc. Electrical Eng.
Structure (3)
 The most interesting and the most complex component is the
CPU.
 Its major structural components are:
 Control unit: Controls the operation of the CPU and hence
the computer.
 Arithmetic and logic unit (ALU): Performs the
computer’s data processing functions.
 Registers: Provides storage internal to the CPU.
 CPU interconnection: Some mechanism that provides for
communication among the control unit, ALU, and registers.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


12 B.Sc. Electrical Eng.
Fig 1.4. The Computer:
Top-Level Structure
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
13 B.Sc. Electrical Eng.
Performance and Evolution
 Reading Assignment: Evolution
 THE VON NEUMANN MACHINE
 Suppose a program could be represented in a form suitable
for storing in memory alongside the data.
 Then, a computer could get its instructions by reading them
from memory, and a program could be set or altered by
setting the values of a portion of memory.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


14 B.Sc. Electrical Eng.
Performance and Evolution (2)
 This idea is known as the stored-program concept,
is usually attributed to the mathematician John von
Neumann.
 In 1946, von Neumann and his colleagues began the
design of a new stored program computer, referred to as
the IAS computer, at the Princeton Institute for
Advanced Studies.
 The IAS computer, although not completed until 1952,
is the prototype of all subsequent general-purpose
computers.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
15 B.Sc. Electrical Eng.
Performance and Evolution (3)

Fig.By:1.6. Structure
Netsanet of the
Getnet, Lecturer, M.Sc. IAS Computer
Computer Eng., (compare to middle part in slide
16 B.Sc. Electrical Eng. 13)
Performance and Evolution (4)
 The IAS computer consists of
 A main memory, which stores both data and instructions
 An arithmetic and logic unit (ALU) capable of operating
on binary data
 A control unit, which interprets the instructions in memory
and causes them to be executed
 Input/output (I/O) equipment operated by the control unit

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


17 B.Sc. Electrical Eng.
Performance and Evolution (5)
 With rare exceptions, all of today’s computers have this same
general structure and function and are thus referred to as
von Neumann machines.
 The memory of the IAS consists of 1000 storage locations,
called words, of 40 binary digits (bits) each.
 Both data and instructions are stored there.
 Numbers are represented in binary form, and each
instruction is a binary code.
 Figure 1.7 below illustrates these formats.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


18 B.Sc. Electrical Eng.
Performance and Evolution (6)

Fig. 1.7. IAS Memory Formats

• A word may also contain two 20-bit instructions, with each


19
instruction
By: Netsanet Getnet, consisting of:
Lecturer, M.Sc. Computer Eng.,
B.Sc. Electrical Eng.
Performance and Evolution (7)
 an 8-bit operation code (opcode) specifying the operation
to be performed, and
 a 12-bit address designating one of the words in memory
(numbered from 0 to 999).
 The control unit operates the IAS by fetching instructions
from memory and executing them one at a time.
 Both the control unit and the ALU contain storage
locations, called registers, defined as follows:
 Memory buffer register (MBR): Contains a word to be
stored in memory or sent to the I/O unit, or is used to receive a
word from memory or from the I/O unit.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
20 B.Sc. Electrical Eng.
Performance and Evolution (8)
 Memory address register (MAR): Specifies the address in memory of
the word to be written from or read into the MBR.
 Instruction register (IR): Contains the 8-bit opcode instruction being
executed.
 Instruction buffer register (IBR): Employed to hold temporarily the
righthand instruction from a word in memory.
 Program counter (PC): Contains the address of the next instruction pair
to be fetched from memory.
 Accumulator (AC) and multiplier quotient (MQ): Employed to
hold temporarily operands and results of ALU operations.
 For example, the result of multiplying two 40-bit
numbers is an 80-bit number; the most significant 40
bits are stored in the AC and the least significant in the
MQ.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
21 B.Sc. Electrical Eng.
Fig. 1.8. Expanded
Structure of IAS
Computer

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


22 B.Sc. Electrical Eng.
Performance and Evolution (9)
 The IAS operates by repetitively performing an
instruction cycle, as shown in Fig. 1.9.
• Each instruction cycle consists of two subcycles.
 The fetch cycle
 The execute cycle
 During the fetch cycle, the opcode of the next instruction
is loaded into the IR and the address portion is loaded into
the MAR.
 This instruction may be taken from the IBR, or it can be
obtained from memory by loading a word into the MBR,
and then down to the IBR, IR, and MAR.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


23 B.Sc. Electrical Eng.
Performance and Evolution (10)
 Once the opcode is in the IR, the execute cycle is
performed.
 Control circuitry interprets the opcode and
executes the instruction by sending out the
appropriate control signals to cause data to be moved
or an operation to be performed by the ALU.
 The IAS computer had a total of 21 instructions,
which are listed in Table 1.1.
 These can be grouped as follows:
 Data transfer: Move data between memory and ALU registers or
between two ALU registers.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
24 B.Sc. Electrical Eng.
Performance and Evolution (11)
 Unconditional branch: Normally, the control unit
executes instructions in sequence from memory. This sequence
can be changed by a branch instruction, which facilitates
repetitive operations.
 Conditional branch: The branch can be made dependent on
a condition, thus allowing decision points.
 Arithmetic: Operations performed by the ALU.
 Address modify: Permits addresses to be computed in the
ALU and then inserted into instructions stored in memory.
This allows a program considerable addressing flexibility.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


25 B.Sc. Electrical Eng.
Fig. 1.9. Expanded
Structure of IAS
Computer

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


26 B.Sc. Electrical Eng.
The IAS Instruction Set

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


27 B.Sc. Electrical Eng.
Performance and Evolution (12)
 Each instruction must conform to the format of one on slide 19.
 The opcode portion (first 8 bits) specifies which of the 21
instructions is to be executed.
 The address portion (remaining 12 bits) specifies which of the
1000 memory locations is to be involved in the execution of the
instruction.
 Note that each operation requires several steps. Some of these
are quite elaborate.
 The multiplication operation requires 39 suboperations, one for
each bit position except that of the sign bit.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


28 B.Sc. Electrical Eng.
Computer System
 Program Concept
 Hardwired systems are inflexible
 General purpose hardware can do different tasks,
given correct control signals
 Instead of re-wiring, supply a new set of control
signals

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


29 B.Sc. Electrical Eng.
What is a program?
 A sequence of steps
 For each step, an arithmetic or logical operation is done
 For each operation, a different set of control signals is needed

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


30 B.Sc. Electrical Eng.
Function of Control Unit
 For each operation a unique code is provided
 e.g. ADD, MOVE
 A hardware segment accepts the code and issues the control
signals

 We have a computer!

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


31 B.Sc. Electrical Eng.
Components
 The Control Unit and the Arithmetic and Logic Unit
constitute the Central Processing Unit
 Data and instructions need to get into the system and results
out
 Input/output
 Temporary storage of code and results is needed
 Main memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


32 B.Sc. Electrical Eng.
Computer Components: Top Level View

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


33 B.Sc. Electrical Eng.
Instruction Cycle
 Two steps:
 Fetch
 Execute

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


34 B.Sc. Electrical Eng.
Fetch Cycle
 Program Counter (PC) holds address of next instruction to
fetch
 Processor fetches instruction from memory location pointed
to by PC
 Increment PC
 Unless told otherwise
 Instruction loaded into Instruction Register (IR)
 Processor interprets instruction and performs required
actions

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


35 B.Sc. Electrical Eng.
Execute Cycle
 Processor-memory
 data transfer between CPU and main memory
 Processor I/O
 Data transfer between CPU and I/O module
 Data processing
 Some arithmetic or logical operation on data
 Control
 Alteration of sequence of operations
 e.g. jump
 Combination of above

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


36 B.Sc. Electrical Eng.
0101 Add to AC from memory Example of Program Execution
0010 Store AC to memory
0001 Load AC from memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


37 B.Sc. Electrical Eng.
Instruction Cycle State Diagram

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


38 B.Sc. Electrical Eng.
Interrupts
 Mechanism by which other modules (e.g. I/O) may interrupt
normal sequence of processing
 Program
 e.g. overflow, division by zero
 Timer
 Generated by internal processor timer
 Used in pre-emptive multi-tasking
 I/O
 from I/O controller
 Hardware failure
 e.g. memory parity error

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


39 B.Sc. Electrical Eng.
Program Flow Control

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


40 B.Sc. Electrical Eng.
Interrupt Cycle
 Added to instruction cycle
 Processor checks for interrupt
 Indicated by an interrupt signal
 If no interrupt, fetch next instruction
 If interrupt pending:
 Suspend execution of current program
 Save context
 Set PC to start address of interrupt handler routine
 Process interrupt
 Restore context and continue interrupted program

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


41 B.Sc. Electrical Eng.
Transfer of Control via Interrupts

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


42 B.Sc. Electrical Eng.
Instruction Cycle with Interrupts

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


43 B.Sc. Electrical Eng.
Program Timing: Short I/O Wait

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


44 B.Sc. Electrical Eng.
Program Timing: Long I/O Wait

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


45 B.Sc. Electrical Eng.
Instruction Cycle (with Interrupts) - State Diagram

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


46 B.Sc. Electrical Eng.
2. The Central Processing Unit

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


47 B.Sc. Electrical Eng.
Computer Arithmetic

•The ALU is that part of the computer that performs


arithmetic and logic operations on data.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
48 B.Sc. Electrical Eng.
 All the other elements: registers, control unit, memory and
I/O, are there mainly to bring data into the ALU for it to
process and then to take the results back out.
 Operands for arithmetic and logic operations are presented to
the ALU in registers and the results of an operation are stored in
registers.
 These registers are temporary storage locations within the
processor that are connected by signal paths to the ALU.
 The ALU may also set flags as the result of an operation.
 Example: an overflow flag: set to 1 if the result of a computation
exceeds the length of the register into which it is to be stored.
 The flag values are also stored in registers within the processor.
 The processor provides signals that control the operation of the
ALU and the movement of the data into and out of the ALU.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
49 B.Sc. Electrical Eng.
Integer Representation
 In the binary number system, numbers can be represented
with just the digits zero and one, the minus sign (for negative
numbers), and the period, or radix point (for numbers with
a fractional component).
 For purposes of computer storage and processing, we do not
have the benefit of special symbols for the sign and radix
point.
 Only binary digits (0 and 1) may be used to represent numbers.
 If we are limited to nonnegative integers, the representation
is straightforward.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


50 B.Sc. Electrical Eng.
 In general, if an n-bit sequence of binary digits an-1an-2…a1a0 is
interpreted as an unsigned integer A, its value is:

 There are several alternative conventions used to represent


negative as well as positive integers:
 Sign-Magnitude Representation
By:  Twos
Netsanet Complement
Getnet, Representation
Lecturer, M.Sc. Computer Eng.,
51 B.Sc. Electrical Eng.
Sign-Magnitude Representation
 The simplest form of representation that employs a sign bit.
 In an n-bit word, the rightmost n – 1 bits hold the magnitude of
the integer.

 In general,

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


52 B.Sc. Electrical Eng.
Instruction Sets: Characteristics
 What is an Instruction Set?
 The operation of the processor is determined by the instructions
it executes, referred to as machine instructions or computer
instructions.
 The collection of different instructions that the processor can
execute is referred to as the processor’s instruction set.
 Elements of a Machine Instruction
 Each instruction must contain the information required by the
processor for execution including:
 Operation code (opcode): a binary code that specifies the operation
to be performed (e.g., ADD, I/O).
 Source operand reference: the operation may involve one or more
source operands (operands that are inputs for the operation).
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
53 B.Sc. Electrical Eng.
 Result operand reference: the operation may produce a result. This
tells where to put the result produced.
 Next instruction reference: this tells the processor where to fetch
the next instruction after the execution of this instruction is complete.
 Source and result operands can be in one of four areas:
 Main or virtual memory: as with next instruction references, the
main or virtual memory address must be supplied.
 Processor register: with rare exceptions, a processor contains one or
more registers that may be referenced by machine instructions.
 If only one register exists, reference to it may be implicit. If more than one
register exists, then each register is assigned a unique name or number, and the
instruction must contain the number of the desired register.
 Immediate: the value of the operand is contained in a field in the
instruction being executed.
 I/O device: the instruction must specify the I/O module and device
for the operation. If memory-mapped I/O is used, this is just another
main or virtual memory address.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
54 B.Sc. Electrical Eng.
 Instruction Representation
 Within the computer, each instruction is represented by a sequence of
bits.
 The instruction is divided into fields, corresponding to the constituent
elements of the instruction.
 A simple example of an instruction format is shown in Figure 1.
 As another example, the IAS instruction format is shown in Figure 2.
 With most instruction sets, more than one format is used.
 During instruction execution, an instruction is read into an instruction
register (IR) in the processor.
 The processor must be able to extract the data from the various
instruction fields to perform the required operation.
 It is difficult for both the programmer and the reader of textbooks to
deal with binary representations of machine instructions.
 Thus, it has become common practice to use a symbolic representation of
machine instructions. An example of this was used for the IAS instruction set,
in Table 1.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
55 B.Sc. Electrical Eng.
Table 1: The IAS Instruction Set

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


56 B.Sc. Electrical Eng.
 Opcodes are represented by abbreviations, called mnemonics, that
indicate the operation.
 Common examples include:
 ADD Add
 SUB Subtract
 MUL Multiply
 DIV Divide
 LOAD Load data from memory
 STOR Store data to memory
 Operands are also represented symbolically.
 For example, the instruction ADD R,Y may mean add the value
contained in data location Y to the contents of register R.
 In this example,Y refers to the address of a location in memory,
and R refers to a particular register. Note that the operation is
performed on the contents of a location, not on its address.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
57 B.Sc. Electrical Eng.
Figure 1. A Simple Instruction Format

Figure 2. IAS Instruction Format

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


58 B.Sc. Electrical Eng.
Instruction Types
 Consider a high-level language instruction that could be expressed in a
language such as C++ or Java.
 For example, X = X + Y.
 This statement instructs the computer to add the value stored in Y to the
value stored in X and put the result in X.
 How might this be accomplished with machine instructions? Let us assume
that the variables X and Y correspond to locations 513 and 514.
 If we assume a simple set of machine instructions, this operation could be
accomplished with three instructions:
1. Load a register with the contents of memory location 513.
2. Add the contents of memory location 514 to the register.
3. Store the contents of the register in memory location 513.
 As can be seen, the single BASIC instruction may require three machine
instructions.
 This is typical of the relationship between a high-level language and a
machine language.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


59 B.Sc. Electrical Eng.
 A high-level language expresses operations in a concise algebraic
form, using variables.
 A machine language expresses operations in a basic form involving
the movement of data to or from registers.
 A computer should have a set of instructions that allows the user
to formulate any data processing task.
 Any program written in a high-level language must be translated
into machine language to be executed.
 Thus, the set of machine instructions must be sufficient to express
any of the instructions from a high-level language.
 With this in mind we can categorize instruction types as follows:
 Data processing: Arithmetic and logic instructions
 Data storage: Movement of data into or out of register and or memory
locations
 Data movement: I/O instructions
 Control: Test and branch instructions

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


60 B.Sc. Electrical Eng.
 Arithmetic instructions: processing of numeric data.
 Logic (Boolean) instructions: operate on the bits of a word as bits
rather than as numbers; thus, they provide capabilities for processing
any other type of data the user may wish to employ.
 Data Storage instructions: arithmetic and Boolean operations are
performed primarily on data in processor registers.
 Therefore, there must be memory instructions for moving data between
memory and the registers.
 I/O instructions: are needed to transfer programs and data into
memory and the results of computations back out to the user.
 Test instructions: are used to test the value of a data word or the
status of a computation.
 Branch instructions: are then used to branch to a different set of
instructions depending on the decision made.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
61 B.Sc. Electrical Eng.
Number of Addresses
 One way of describing processor architecture is in terms of the number of
addresses contained in each instruction.
 What is the maximum number of addresses one might need in an
instruction?
 Arithmetic and logic instructions will require the most operands.
 All arithmetic and logic operations are either unary (one source operand)
or binary (two source operands).
 Thus, we would need a maximum of two addresses to reference source
operands.
 The result of an operation must be stored, suggesting a third address, which
defines a destination operand.
 Finally, after completion of an instruction, the next instruction must be
fetched, and its address is needed.
 Hence an instruction could contain four address references: two source
operands, one destination operand, and the address of the next instruction.
 In most architectures, most instructions have one, two, or three operand
addresses, with the address of the next instruction being implicit (obtained
from the Getnet,
By: Netsanet program counter).
Lecturer, M.Sc. Computer Eng.,
62 B.Sc. Electrical Eng.
Comparison of one-, two- and three-address instructions to compute:
Y = (A – B)/[C + (D X E)].
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
63 B.Sc. Electrical Eng.
Exercise: Repeat the above using 0-
address Instructions
Y = (A – B)/[C + (D X E)]

PUSH A
PUSH B
SUB
PUSH C
PUSH D
PUSH E
MUL
ADD
DIV
POP Y
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
64 B.Sc. Electrical Eng.
 Three-address instruction formats are not common because they
require long instruction format to hold the three address references.
 Two-address instruction format: one address doubles as operand and
result.
 ADD a, b: carries out the calculation a+b and stores the result in a.
 Reduces length of instruction
 Requires some extra work (move instruction)
 Temporary storage to hold some results
 One-address instruction format: Implicit second address
 Usually a register (accumulator, AC). Also holds the result.
 Common on early machines
 Zero-address instruction format: applicable to a special memory
organization called a stack.
 A stack is a last-in-first-out set of locations.
 The stack is in a known location and, often, at least the top two elements
are in processor registers.
 Thus, zero-address instructions would reference the top two stack
elements.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
65 B.Sc. Electrical Eng.
Table 2. Utilization of Instruction Addresses

More addresses
• More complex instructions
• More registers
 Inter-register operations are quicker
• Fewer instructions per program
Fewer addresses
• Less complex (powerful?) instructions
• More instructions per program
• Netsanet
By: Faster Getnet,
fetch/execution of instructions
Lecturer, M.Sc. Computer Eng.,
66 B.Sc. Electrical Eng.
Instruction Set Design
 One of the most interesting, and most analyzed, aspects of
computer design
 Very complex because it affects so many aspects of the
computer system.
 The instruction set defines many of the functions performed by
the processor and thus has a significant effect on the
implementation of the processor.
 The instruction set is the programmer’s means of controlling
the processor.
 Thus, programmer requirements must be considered in
designing the instruction set.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


67 B.Sc. Electrical Eng.
 The most important of the fundamental instruction set design issues
include:
 Operation repertoire: How many and which operations to provide,
and how complex operations should be.
 Data types: The various types of data upon which operations are
performed.
 Instruction format: Instruction length (in bits), number of addresses,
size of various fields, and so on.
 Registers: Number of processor registers that can be referenced by
instructions, and their use.
 Addressing modes: The mode or modes by which the address of an
operand is specified.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


68 B.Sc. Electrical Eng.
TYPES OF OPERANDS
 Machine instructions operate on data.
 The most important general categories of data are
 Addresses
 Numbers
 Characters
 Logical data
 All machine languages include numeric data types.
 Even in nonnumeric data processing, numbers to act as counters,
field widths, and so forth.
 Distinction between numbers used in ordinary mathematics and
numbers stored in a computer is that the latter are limited.
 a limit to the magnitude of numbers representable on a machine
 In the case of floating-point numbers, a limit to their precision.
 Thus, the programmer is faced with understanding the consequences
of rounding, overflow, and underflow.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


69 B.Sc. Electrical Eng.
 Three types of numerical data are common in computers:
 Binary integer or binary fixed point
 Binary floating point
 Decimal
 The human users deal with decimal numbers. Thus, there is a
necessity to convert from decimal to binary on input and from binary
to decimal on output.
 For applications in which there is a great deal of I/O and
comparatively little and simple computation, it is preferable to store
and operate on the numbers in decimal form.
 The most common representation for this purpose is packed
decimal.
 With packed decimal, each decimal digit is represented by a 4-bit
code, in the obvious way, with two digits stored per byte.
 Thus, 0 = 0000, 1 = 0001, 8 = 1000, and 9 = 1001.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
70 B.Sc. Electrical Eng.
 To form numbers, 4-bit codes are strung together, in multiples
of 8 bits.
 Example: the code for 246 is 0000 0010 0100 0110.
 Less compact than a straight binary representation.
 But avoids the conversion overhead.
 Negative numbers can be represented by including a 4-bit sign
digit at either the left or right end of a string of packed decimal
digits.
 Standard sign values are 1100 for positive (+) and 1101 for
negative (-).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


71 B.Sc. Electrical Eng.
Characters
 A number of codes have been devised to represent characters by a
sequence of bits.
 The earliest common example of this is the Morse code.
 The most commonly used character code in the International
Reference Alphabet (IRA)/referred also as the American Standard
Code for Information Interchange (ASCII)
 Each character in this code is represented by a unique 7-bit pattern;
 128 different characters can be represented.
 represent printable characters and control characters.
 ASCII-encoded characters are almost always stored and transmitted
using 8 bits per character.
 The eighth bit may be set to 0 or used as a parity bit for error
detection. (odd parity/even parity)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


72 B.Sc. Electrical Eng.
Logical Data
 Normally, each word or other addressable unit (byte,
halfword, and so on) is treated as a single unit of data.
 Sometimes an n-bit unit can be considered as consisting of n
1-bit items.
 When data are viewed this way, they are considered to be
logical data.
 There are two advantages to the bit-oriented view.
 To store an array of Boolean or binary data items, in which
each item can take on only the values 1 (true) and 0 (false).
 To manipulate the bits of a data item. For example, if
floating-point operations are implemented in software, we
need to be able to shift significant bits in some operations.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


73 B.Sc. Electrical Eng.
x86 numeric data formats

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


74 B.Sc. Electrical Eng.
Types of Operations
 The number of different opcodes varies widely from machine to
machine.
 However, the same general types of operations are found on all
machines.
 A typical categorization is the following:
 Data transfer
 Arithmetic
 Logical
 Conversion
 I/O
 System control
 Transfer of control
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
75 B.Sc. Electrical Eng.
Common Instruction Set Operations

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


76 B.Sc. Electrical Eng.
Common Instruction Set Operations …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


77 B.Sc. Electrical Eng.
Processor actions for various types of operations

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


78 B.Sc. Electrical Eng.
Data Transfer
 The most fundamental type of machine instruction.
 Must specify
 Location of source and destination operands:
 Each location could be memory, a register, or the top of the stack
 The length of data to be transferred.
 The mode of addressing for each operand
 May be different instructions for different movements or one
instruction and different addresses
 If one or both operands are in memory, then the processor performs
the following:
 Calculate the memory address, based on the address mode
 If the address refers to virtual memory, translate from virtual to real
memory address.
 Determine whether the addressed item is in cache.
 If not, issue a command to the memory module.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


79 B.Sc. Electrical Eng.
80 Examples of IBM EAS/390 Data Transfer Operations
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
B.Sc. Electrical Eng.
Arithmetic
 Most machines provide the basic arithmetic operations:
 add, subtract, multiply, and divide.
 These are invariably provided for signed integer (fixed-point) numbers.
 Often they are also provided for floating-point and packed decimal
numbers.
 Other operations include single-operand instructions;
 Absolute: Take the absolute value of the operand.
 Negate: Negate the operand.
 Increment: Add 1 to the operand.
 Decrement: Subtract 1 from the operand.
 Execution of an arithmetic instruction may involve data transfer
operations
 Bring operands for input to the ALU
 Deliver the output of the ALU
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
81 B.Sc. Electrical Eng.
Logical
 Operations for manipulating individual bits of a word or other
addressable units.
 They are based upon Boolean operations.
 Some of the basic logical operations that can be performed on
Boolean or binary data are shown in Table below.
 The NOT operation inverts a bit.
 AND, OR, and Exclusive-OR (XOR):
 The most common logical functions with two operands.
 EQUAL is a useful binary test.
 Can be applied bitwise to n-bit logical data units.
 If two registers contain the data:
(R1) = 10100101
(R2) = 00001111
(R1) AND (R2) = 00000101
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
82 B.Sc. Electrical Eng.
Shifting and Rotating Operations
Logical shift:
 the bits of a word are shifted left or right. On one end, the bit
shifted out is lost.
 On the other end, a 0 is shifted in.
 Logical shifts are used to isolate fields within a word.
 The 0s that are shifted into a word displace unwanted information
that is shifted off the other end

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


83 B.Sc. Electrical Eng.
 Example: We want to transmit characters of data to an I/O
device 1 character at a time.
 Assume each memory word is 16 bits in length containing two
characters.
 To send the two characters in a word: first the left-hand
character
1. Load the word into a register.
2. Shift to the right eight times. This shifts the remaining
character to the right half of the register.
3. Perform I/O. The I/O module reads the lower-order 8
bits from the data bus.
 To send the right-hand character,
1. Load the word again into the register.
2. AND with 0000000011111111. (Masking)
3. Perform I/O.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
84 B.Sc. Electrical Eng.
Arithmetic shift
 Treats the data as a signed integer and does not shift the sign
bit.
 On a right arithmetic shift, the sign bit is replicated into the bit
position to its right.
 On a left arithmetic shift, a logical left shift is performed on all bits
but the sign bit, which is retained.
 These operations can speed up certain arithmetic operations.
 With numbers in twos complement notation, a right arithmetic shift
corresponds to a division by 2, with truncation for odd numbers.
 Both an arithmetic left shift and a logical left shift correspond to a
multiplication by 2 when there is no overflow.
 If overflow occurs, arithmetic and logical left shift operations
produce different results, but the arithmetic left shift retains the sign
of the number.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
85 B.Sc. Electrical Eng.
Rotate, or cyclic shift
 preserve all of the bits being operated on.
 One use of a rotate is to bring each bit successively into the
leftmost bit, where it can be identified by testing the sign of the
data.
Examples of Shift and Rotate Operations

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


86 B.Sc. Electrical Eng.
Conversion
 Conversion instructions are those that change the format or
operate on the format of data. An example is converting from
decimal to binary.
Input/Output
 May be specific instructions
 May be done using data movement instructions (memory mapped)
 May be done by a separate controller (DMA) or I/O processor
System Control
 Those instructions that can be executed only while the processor is
in a certain privileged state or is executing a program in a special
privileged area of memory.
 Instructions reserved for the use of the operating system.
 Some examples:
 A system control instruction may read or alter a control register.
 An instruction to read or modify a storage protection key.
 Access
By: Netsanet toLecturer,
Getnet, process M.Sc.control blocks in a multiprogramming system.
Computer Eng.,
87 B.Sc. Electrical Eng.
Transfer of Control
 For these instructions, processor updates the program counter
to contain the address of some instruction in memory.
 Why we need them?
 To execute each instruction more than once and perhaps many
thousands of times. If a table or a list of items is to be processed, a
program loop is needed.
 Decision making. do one thing if one condition holds, and another
thing if another condition holds.
 Breaking a bigger program up into smaller pieces that can be
worked on one at a time.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


88 B.Sc. Electrical Eng.
 Branch
 e.g. branch to x if result is zero
 Conditional and unconditional branches
 Unconditional: in which the branch is always taken
 Conditional: 1 or multiple bits condition code.
 Example: Consider an arithmetic operation (ADD, SUBTRACT,
…) could set a 2-bit condition code with one of the four values:
 zero, positive, negative, overflow.
 On such a machine, there could be four different conditional
branch instructions:
 BRP X Branch to location X if result is positive.
 BRN X Branch to location X if result is negative.
 BRZ X Branch to location X if result is zero.
 BRO X Branch to location X if overflow occurs.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
89 B.Sc. Electrical Eng.
 Another approach that can be used with a three-address instruction
format is to perform a comparison and specify a branch in the same
instruction.
 Example:
 BRE R1, R2, X
 Branch to X if contents of R1 = contents of R2.\

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


90 B.Sc. Electrical Eng.
 SKIP INSTRUCTIONS
 Another form of transfer-of-control instruction;
 includes an implied address;
 the skip implies that one instruction be skipped;
 the implied address = address of the next instruction plus one
instruction length.
 Because the skip instruction does not require a destination address
field, it is free to do other things.
 Example: increment-and-skip-if-zero (ISZ)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


91 B.Sc. Electrical Eng.
 In this fragment, the two transfer-of-control instructions are
used to implement an iterative loop.
 R1 is set with the negative of the number of iterations to be
performed.
 At the end of the loop, R1 is incremented.
 If it is not 0, the program branches back to the beginning of the
loop.
 Otherwise, the branch is skipped, and the program continues
with the next instruction after the end of the loop.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


92 B.Sc. Electrical Eng.
 PROCEDURE CALL INSTRUCTIONS
 Perhaps the most important innovation in the development of
programming languages is the procedure.
 A procedure is a selfcontained computer program that is incorporated
into a larger program.
 At any point in the program the procedure may be invoked, or called.
 The processor is instructed to go and execute the entire procedure and
then return to the point from which the call took place.
 Two reasons for the use of procedures are economy and modularity.
 A procedure allows the same piece of code to be used many times.
 This is important for
 economy in programming effort and
 making the most efficient use of storage space in the system (the
program must be stored).
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
93 B.Sc. Electrical Eng.
 Procedures also allow large programming tasks to be subdivided into
smaller units.
 This use of modularity greatly eases the programming task.
 The procedure mechanism involves two basic instructions:
 a call instruction that branches from the present location to the
procedure, and
 a return instruction that returns from the procedure to the place from
which it was called.
 Both of these are forms of branching instructions.
 Figure below illustrates the use of procedures to construct a
program.
 There is a main program starting at location 4000.
 This program includes a call to procedure PROC1, starting at
location 4500.
 When this call instruction is encountered, the processor suspends
execution of the main program and begins execution of PROC1 by
fetching the next instruction from location 4500.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
94 B.Sc. Electrical Eng.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
95 B.Sc. Electrical Eng.
 Within PROC1, there are two calls to PROC2 at location 4800.
 In each case, the execution of PROC1 is suspended and PROC2
is executed.
 The RETURN statement causes the processor to go back to the
calling program and continue execution at the instruction after
the corresponding CALL instruction.
 Three points to note:
1. A procedure can be called from more than one location.
2. A procedure call can appear in a procedure. This allows the
nesting of procedures to an arbitrary depth.
3. Each procedure call is matched by a return in the called
program.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


96 B.Sc. Electrical Eng.
 The processor must save the return address so that the return can
take place appropriately.
 There are three common places for storing the return address:
 Register
 Start of called procedure
 Top of stack
 Consider a machine-language instruction CALL X, (call procedure at
location X).
 If the register approach is used, CALL X causes the following actions:
RN  PC + 1
PC  X
 RN = a register always used for this purpose,
 PC = is the program counter
  = the instruction length.
 The called procedure can use the contents of RN later return.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


97 B.Sc. Electrical Eng.
 A second option is to store the return address at the start of the
procedure.
 In this case, CALL X causes
X  PC
PC  X + 1
 Limitation: both these approaches complicate the use of
reentrant procedures.
 A reentrant procedure is one in which it is possible to have
several calls open to it at the same time. (E.g. recursive
procedure (one that calls itself) is reentrant.
 If parameters are passed via registers/memory, for a reentrant
procedure, some code must be responsible for saving the
parameters so that the registers or memory space are available
for other procedure calls.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
98 B.Sc. Electrical Eng.
 Stacks
 A powerful approach to store return address and parameters
 When a call is executed, return address is kept on the stack.
 When a return is executed, the address on the stack is used.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


99 B.Sc. Electrical Eng.
 For parameters passing, registers or memory could be used
 For example by keeping the parameters in memory just after CALL/ or directly by passing in registers
 But the approaches are not flexible enough for variable number of parameters.
 Again, stacks are more efficient and flexible ways to pass parameters.
 Example- Consider the following:

100 A(15, 10) 200 SUB A(x, y) ;defn of A


101 … 201 B(x) //call to B
202 END SUB
300 SUB B(p)
301 C(p + 5)
15 302 E(10)
202 303 END SUB
15 400 SUB C(t)

10 404 END SUB
101 500 SUB E(a)
501 ...
502 END SUB

Assignment I:
1. Transmitting characters in GOLD to an I/O device, one character at a time.
2. Implement a loop using the skip instruction (ISZ)
3. Show the stack contents for return address of the above program

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


100 B.Sc. Electrical Eng.
ADDRESSING MODES
 Relatively small address field in instructions
 But a need to reference a large range of memory locations
 A variety of addressing techniques:
1. Immediate
2. Direct
3. Indirect
4. Register
5. Register indirect
6. Displacement
7. Stack

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


101 B.Sc. Electrical Eng.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
102 B.Sc. Electrical Eng.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
103 B.Sc. Electrical Eng.
• All computer architectures support more than one of
these addressing modes.
• How does the process determine the addressing mode
used in an instruction?
• Different opcodes use different modes
• One or more bits used in the instruction format (mode field)
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
104 B.Sc. Electrical Eng.
 Immediate Addressing
 The simplest form of addressing is immediate addressing
 The operand value is present in the instruction
 Operand = A
 Can be used to define and use constants or set initial values of
variables.
 Typically, the number will be stored in twos complement form;
the leftmost bit of the operand field is used as a sign bit.
 Advantage of immediate addressing:
 No memory reference other than the instruction fetch is required
to obtain the operand
 saving one memory or cache cycle in the instruction cycle.
 Disadvantage: the size of the number is limited to the size of the
address field, which is small compared with the word length.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
105 B.Sc. Electrical Eng.
 Direct Addressing
 A very simple form of addressing
 The address field contains the effective address of the operand:
 EA = A
 Disadvantage: provides only a limited address space.
 Indirect Addressing
 With direct addressing:
 length of the address field < word length => limiting the address range
 One solution is to have the address field refer to the address of a
word in memory,
 This in turn contains a full-length address of the operand.
 This is known as indirect addressing:
 EA = (A), ( ) stands for “contents of ”
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
106 B.Sc. Electrical Eng.
 Advantage: for a word length of N, an address space of 2N is
available.
 Disadvantage: instruction execution requires two memory
references to fetch the operand:
 one to get its address and another to get its value.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


107 B.Sc. Electrical Eng.
 Register Addressing
 similar to direct addressing.
 Difference: address field refers to a register rather than a main
memory address:
 EA = R
 That is: if the contents of a register address field in an
instruction is 5, then the operand value is contained in R5.
 If n-bits are used for the address field that references registers
 2n registers can be referenced.
 Typically n is 3 to 5 bits
 Hence a total of 8 to 32 general-purpose registers can be
referenced.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


108 B.Sc. Electrical Eng.
 Registers are used for operands that remain in use for multiple
operations. (for example for intermediate results)
 It would be wasteful if every operand is brought into a register
from main memory, operated on once, and then returned to
main memory.
 Register Indirect Addressing
 Analogous to indirect addressing.
 Difference: address field refers to a register.
 EA = (R)
 The advantages and limitations of register indirect addressing are the
same as for indirect addressing.
 In both cases, the address space limitation (limited range of
addresses) of the address field is overcome by having that field refer
to a wordlength location containing an address.
 In addition, register indirect addressing uses one less memory
109
reference than
By: Netsanet Getnet, indirect
Lecturer, addressing.
M.Sc. Computer Eng.,
B.Sc. Electrical Eng.
 Displacement Addressing
 A very powerful mode of addressing combines the capabilities
of direct addressing and register indirect addressing.
 It is known by a variety of names depending on the context of
its use, but the basic mechanism is the same.
 We will refer to this as displacement addressing:
 EA = A + (R)
 Displacement addressing requires that the instruction have two
address fields, at least one of which is explicit.
 The value contained in one address field (value = A) is used
directly.
 The other address field, or an implicit reference based on
opcode, refers to a register whose contents are added to A to
produce the effective address.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
110 B.Sc. Electrical Eng.
 Displacement Addressing
 A very powerful mode of addressing
 Combines the capabilities of direct addressing and
register indirect addressing.
 EA = A + (R)
 Requires the instruction to have two address fields:
 The value contained in one address field (value = A) is used directly.
 The other address field, or an implicit reference based on
opcode, refers to a register whose contents are added to A to
produce the effective address.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


111 B.Sc. Electrical Eng.
 Three common uses of displacement addressing:
 Relative addressing
 Base-register addressing
 Indexing
 RELATIVE ADDRESSING (also called PC-relative
addressing)
 The implicitly referenced register is the program counter (PC).
 That is, the next instruction address is added to the address field to
produce the EA.
 Typically, the address field is treated as a twos complement
number for this operation.
 Thus, the effective address is a displacement relative to the address
of the current instruction.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


112 B.Sc. Electrical Eng.
 BASE-REGISTER ADDRESSING
 Interpreted as follows:
 The referenced register contains a main memory address
 The address field contains a displacement (usually an unsigned
integer representation) from that address.
 The register reference may be explicit or implicit.
 Either a single segment-base register is employed and is used
implicitly or the programmer has to choose a register to hold the
base address of a segment, and the instruction must reference it
explicitly.
 In this latter case, if the length of the address field is K and the
number of possible registers is N, then one instruction can reference
any one of N areas of 2K words.
 (N addresses each from N registers) x (2K addresses from the
address field of k bits) = N x 2K addresses.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
113 B.Sc. Electrical Eng.
 INDEXING
 Interpreted as follows:
 The address field references a main memory address, and
 the referenced register contains a positive displacement from that
address.
 Just the opposite of the interpretation for base-register addressing.
 Because the address field is considered to be a memory address in
indexing, it generally contains more bits than an address field in a
comparable base-register instruction.
 But the method of calculating the EA is the same for both base-
register addressing and indexing.
 And, in both cases, the register reference is sometimes explicit and
sometimes implicit (for different processor types).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


114 B.Sc. Electrical Eng.
 An important use of indexing is to provide an efficient
mechanism for performing iterative operations.
 For example:
 Consider a list of numbers stored starting at location A.
 Suppose that we would like to add 1 to each element on the list.
 We need to fetch each value, add 1 to it, and store it back.
 The sequence of effective addresses that we need is A, A + 1, A +
2, . . . , up to the last location on the list.
 With indexing, this is easily done.
 The value A is stored in the instruction’s address field, and the
chosen register, called an index register, is initialized to 0.
 After each operation, the index register is incremented by 1.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


115 B.Sc. Electrical Eng.
 Because index registers are commonly used for such iterative
tasks, it is typical that there is a need to increment or
decrement the index register after each reference to it.
 Because this is such a common operation, some systems will
automatically do this as part of the same instruction cycle.
 This is known as autoindexing.
 If certain registers are devoted exclusively to
indexing, then autoindexing can be invoked implicitly and
automatically.
 If general-purpose registers are used, the autoindex operation
may need to be signaled by a bit in the instruction.
 Autoindexing using increment can be depicted as follows.
 EA = A + (R)
 (R)  (R) + 1
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
116 B.Sc. Electrical Eng.
 In some machines, both indirect addressing and indexing are
provided, and it is possible to employ both in the same
instruction.
 There are two possibilities: the indexing is performed either
before or after the indirection.
 If indexing is performed after the indirection, it is termed
postindexing:
 EA = (A) + (R)
 First, the contents of the address field are used to access a
memory location containing a direct address.
 This address is then indexed by the register value.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


117 B.Sc. Electrical Eng.
 With preindexing, the indexing is performed before
the indirection:
 EA = (A + (R))
 An address is calculated as with simple indexing.
 In this case, the calculated address contains not the operand, but
the address of the operand.
 An example of the use of this technique is to construct a
multiway branch table.
 At a particular point in a program, there may be a branch to
one of a number of locations depending on conditions.
 A table of addresses can be set up starting at location A.
 By indexing into this table, the required location can be found.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


118 B.Sc. Electrical Eng.
 Stack Addressing
 A stack is a linear array of locations.
 It is also referred to as a pushdown list or last-in-first-out queue.
 The stack is a reserved block of locations.
 Items are appended to the top of the stack so that, at any given
time, the block is partially filled.
 Associated with the stack is a pointer whose value is the address of
the top of the stack.
 Alternatively, the top two elements of the stack may be in
processor registers.
 In this case, the stack pointer references the third element of the
stack.
 The stack pointer is maintained in a register.
 Hence, references to stack locations in memory are register
indirect addresses.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
119 B.Sc. Electrical Eng.
 The stack mode of addressing is a form of implied addressing.
 The machine instructions need not include a memory reference
but implicitly operate on the top of the stack.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


120 B.Sc. Electrical Eng.
ASSEMBLY LANGUAGE
 A processor can understand and execute machine instructions.
 Such instructions are simply binary numbers stored in the
computer.
 To program directly in machine language, then the program has
to be entered as binary data.
 Consider the statement
N=I+J+K
 Suppose we wished to program this statement in machine
language and to initialize I, J, and K to 2, 3, and 4, respectively.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


121 B.Sc. Electrical Eng.
ASSEMBLY LANGUAGE …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


122 B.Sc. Electrical Eng.
ASSEMBLY LANGUAGE …
 The program starts in location 101 (hexadecimal).
 Memory is reserved for the four variables starting at location
201.
 The program consists of four instructions:
 Load the contents of location 201 into the AC.
 Add the contents of location 202 to the AC.
 Add the contents of location 203 to the AC.
 Store the contents of the AC in location 204.
 This is clearly a tedious and very error-prone process.
 A slight improvement is to write the program in hexadecimal
rather than binary notation. Shown in (b).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


123 B.Sc. Electrical Eng.
ASSEMBLY LANGUAGE …
 We could write the program as a series of lines.
 Each line contains the address of a memory location and the
hexadecimal code of the binary value to be stored in that
location.
 Then we need a program that will accept this input, translate
each line into a binary number, and store it in the specified
location.
 For more improvement, we can make use of the symbolic name
or mnemonic of each instruction.
 This results in the symbolic program shown in (c).
 Each line of input still represents one memory location.
 Each line consists of three
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
124 B.Sc. Electrical Eng.
Memory
Systems

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


125 B.Sc. Electrical Eng.
Characteristics
 We classify memory systems according to their characteristics.
 The most important of these are:
 Location: whether memory is internal or external to the
computer
 Internal memory is often equated with main memory.
 But there are other forms of internal memory. The processor
requires its own local memory, in the form of registers. Further, as
we shall see, the control unit portion of the processor may also
require its own internal memory.
 Cache is another form of internal memory.
 External memory: peripheral storage devices, such as disk and
tape, that are accessible to the processor via I/O controllers.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


126 B.Sc. Electrical Eng.
Characteristics …
 Capacity: For internal memory, this is typically expressed in
terms of bytes (1 byte = 8 bits) or words.
 Common word lengths are 8, 16, and 32 bits.
 External memory capacity is typically expressed in terms of bytes.
 Unit of transfer: For internal memory, this is equal to the
number of electrical lines into and out of the memory module.
 This may be equal to the word length, but is often larger, such as
64, 128, or 256 bits.
 To clarify this point, consider three related concepts for internal
memory:
 Word: The “natural” unit of organization of memory.
equal to the number of bits
 The size of a word is typically
used to represent an integer and to the instruction
length.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
127 B.Sc. Electrical Eng.
Characteristics …
 Addressable units: In some systems, the addressable unit is the
word.
 However, many systems allow addressing at the byte level.
 In any case, the relationship between the length in bits A of an address
and the number N of addressable units is 2A = N.
 Unit of transfer: For main memory, this is the number of bits read
out of or written into memory at a time.
 The unit of transfer need not equal a word or an addressable unit.
 For external memory, data are often transferred in much larger units
than a word, called blocks.
 Method of accessing:
 Sequential Access: Memory is organized into units of data,
called records.
 Access must be made in a specific linear sequence.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
128 B.Sc. Electrical Eng.
Characteristics …
 A shared read–write mechanism is used, and this must be moved
from its current location to the desired location, passing and
rejecting each intermediate record.
 Thus, the time to access an arbitrary record is highly variable.
 E.g. Tape
 Direct access: As with sequential access, direct access involves
a shared read–write mechanism.
 However, individual blocks or records have a unique address based
on physical location.
 Access is accomplished by direct access to reach a general vicinity
plus sequential searching, counting, or waiting to reach the final
location.
 Access time is variable.
 E.g. Disks
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
129 B.Sc. Electrical Eng.
Characteristics …
 Random access: Each addressable location in memory has a
unique, physically wired-in addressing mechanism.
 Access time is independent of the sequence of prior accesses and is
constant.
 Thus, any location can be selected at random and directly addressed and
accessed.
 E.g. Main memory and some cache systems are random access.
 Associative: This is a random access type of memory that enables
one to make a comparison of desired bit locations within a word for a
specified match, and to do this for all words simultaneously.
 Thus, a word is retrieved based on a portion of its contents rather than
its address.
 As with ordinary random-access memory, each location has its own
addressing mechanism, and retrieval time is constant independent of
location or prior access patterns.
 E.g. Cache memories.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


130 B.Sc. Electrical Eng.
Characteristics …
 Here
 Performance: Three performance parameters are used:
 Access time (latency): For random-access memory, this is the time it
takes to perform a read or write operation;
 That is, the time from the instant that an address is presented to the memory to
the instant that data have been stored or made available for use.
 For non-random-access memory, access time is the time it takes to position the
read–write mechanism at the desired location.
 Memory cycle time: This concept is primarily applied to random-access
memory
 Consists of: access time + any additional time required before a second access
can commence.
 This additional time may be required for transients to die out on signal lines or
to regenerate data if they are read destructively.
 Note that memory cycle time is concerned with the system bus, not the
processor.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
131 B.Sc. Electrical Eng.
Characteristics …
 Transfer rate: This is the rate at which data can be transferred
into or out of a memory unit.
 For random-access memory, it is equal to 1/(cycle time).
 For non-random-access memory, the following relationship holds:

 Where :
 Tn = Average time to read or write n bits
 TA = Average access time
 n = Number of bits
 R = Transfer rate, in bits per second (bps)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


132 B.Sc. Electrical Eng.
Characteristics …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


133 B.Sc. Electrical Eng.
Types of Memory
 Physical types:
 The most common are:
 Semiconductor memory
 Magnetic surface memory, used for disk and tape, and
 Optical and magneto-optical.
 Physical Characteristics:
 Volatile vs non-volatile
 Magnetic-surface memories are nonvolatile.
 Semiconductor memory (memory on integrated circuits) may be either
volatile or nonvolatile.
 Non-erasable (ROM)
 Cannot be altered (except by destroying the storage unit).
 Semiconductor memory of this type is known as read-only memory (ROM).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


134 B.Sc. Electrical Eng.
The Memory Hierarchy
 The design constraints on a computer’s memory:
 How much?
 How fast?
 How expensive?
 Trade-off among these key characteristics.
 The following relationships hold for a variety of technologies used to
implement memory systems:
 Faster access time, greater cost per bit
 Greater capacity, smaller cost per bit
 Greater capacity, slower access time
 Dilemma:
 Use large-capacity memory: capacity is needed and cost per bit is low.
 But, for performance requirements, we need to use expensive, lower-capacity
memory.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


135 B.Sc. Electrical Eng.
Memory Hierarchy …
 Way out of this dilemma:
 Instead of relying on a single memory component or technology,
employ a memory hierarchy. A typical one shown below.
 As one goes down the hierarchy:
a. Decreasing cost per bit
b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access of the memory by the processor
 Smaller, more expensive, faster memories are supplemented by larger,
cheaper, slower memories.
 The key to the success of this organization is item (d):
 Decreasing frequency of access.
 The use of two levels of memory reduces average access time.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


136 B.Sc. Electrical Eng.
Memory Hierarchy …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


137 B.Sc. Electrical Eng.
Memory Hierarchy …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


138 B.Sc. Electrical Eng.
Memory Hierarchy …

Typical values of the memory hierarchy parameters

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


139 B.Sc. Electrical Eng.
Memory Hierarchy …
 The basis for the validity of condition (d) is a principle known
as locality of reference.
 The effectiveness of a memory hierarchy depends on the
principle of moving information into the fast memory
infrequently and accessing it many times before replacing it
with new information.
 During execution of programs, memory references by the
processor, for both instructions and data, tend to cluster.
 That is, within a given period of time, programs tend to
reference a relatively confined area of memory repeatedly.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


140 B.Sc. Electrical Eng.
Memory Hierarchy …
 Programs contain a number of iterative loops and subroutines.
 Repeated references to a small set of instructions once a loop or subroutine is entered.
 Similarly, operations on tables and arrays involve access to a clustered set of data words.
 Over a long period of time, the clusters in use change;
 But over a short period of time, the processor is primarily working with fixed clusters of
memory references.
 There exist two forms of locality: spatial and temporal locality.
 Accordingly, it is possible to organize data across the hierarchy such that:
 The percentage of accesses to each successively lower level is substantially less than that of the
level above.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


141 B.Sc. Electrical Eng.
Memory Hierarchy …
 Spatial locality refers to the phenomenon that when a given address has been referenced, it
is most likely that addresses near it will be referenced within a short period of time.
 Example: consecutive instructions in a straight-line program.
 Temporal locality refers to the phenomenon that once a particular memory item has been
referenced, it is most likely that it will be referenced next,
 Example: an instruction in a program loop. (refer to lmc loop)
 When the processor makes a request for an item:
 First, the item is sought in the first memory level of the memory hierarchy.
 The probability of finding the requested item in the first level is called the hit ratio, h1.
 The probability of not finding (missing) the requested item in the first level of the memory
hierarchy is called the miss ratio, (1-h1).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


142 B.Sc. Electrical Eng.
Memory Hierarchy …
 When the requested item causes a “miss,” it is sought in the next subsequent memory level.
 The probability of finding the requested item in the second memory level, the hit ratio of the second level, is
h2.
 The miss ratio of the second memory level is (1-h2).
 The process is repeated until the item is found.
 Upon finding the requested item, it is brought and sent to the processor.
 In a memory hierarchy that consists of three levels, the average memory access time can be expressed as
follows:
 tA = h1 ⅹ t1 + (1-h1)[t1 + h2 ⅹ t2 + (1 - h2)(t2 + t3)]
= t1 + (1 - h1)[t2 + (1 - h2)t3]
 In general,
 t
 The average access time of a memory level is defined as the time required to access one word in that level.
 In this equation, t1, t2, t3 represent, respectively, the access times of the three levels.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


143 B.Sc. Electrical Eng.
Memory Hierarchy …
 Example:
 Two levels of memory: L1 and L2;
 L1: 1000 words, t1= 0.01µs; L2: 100,000 words, t2 = 0.1µs
 Assume: if word is in L1, access directly; if it is in L2, word first transferred to L1, and then accessed by the
processor.
 Hit ratio, h1: the fraction of all memory accesses in the faster memory (e.g. cache).
 t1=access time for L1, t2 = access time for L2.
 Hence, average access time to access a word:
tA = h1 x t1 + (1-h1) x (t1 + t2)
 In our example: Lets say the processor finds the words it needs 95% of the time from level 1. Hence,
tA = 0.95(0.01µs) + 0.05(0.01µs + 0.1µs)
= 0.0095 + 0.0055 = 0.015µs
 The average access time is much closer to 0.01µs, as desired.
 In general, for high percentages of level 1 access, the average access time is much closer to that of level 1
than that of level 2.
 That is,
lim [H x T1 + (1-H) x (T1 + T2)] = T1

H1

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


144 B.Sc. Electrical Eng.
Performance of Accesses Involving only Level 1 (hit ratio)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


145 B.Sc. Electrical Eng.
Cache Memory
• It is a small, very fast memory that retains copies of recently used
information from main memory.
• It operates transparently to the programmer, automatically
deciding which values to keep and which to replace (overwrite).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


146 B.Sc. Electrical Eng.
Cache Memory …
 A small high-speed memory that is near the CPU
 Sits between normal main memory and CPU
 May be located on CPU chip or module
 The idea behind using a cache as the first level of the memory
hierarchy is to keep the information expected to be used more
frequently by the CPU in the cache.
 At any given time some active portion of the main memory is
duplicated in the cache.
 Therefore, when the processor makes a request for a memory
reference, the request is first sought in the cache.
 If the request corresponds to an element that is currently in the
cache, we call that a cache hit.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


147 B.Sc. Electrical Eng.
Cache Memory …
 On the other hand, if the request corresponds to an element that is not currently in
the cache, we call that a cache miss.
 A cache hit ratio, hc: the probability of finding the requested element in the cache.
 A cache miss ratio (1 - hc): the probability of not finding the requested element in
the cache.
 During cache miss, a block of main memory, consisting of some fixed number of
words, is read into the cache
 Then the word is delivered to the processor.
 Because of the phenomenon of locality of reference, when a block of data is fetched
into the cache to satisfy a single memory reference, it is likely that there will be future
references to that same memory location or to other words in the block.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


148 B.Sc. Electrical Eng.
Cache and Main Memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


149 B.Sc. Electrical Eng.
Instruction Cache/Data Cache
• It is common to split cache memory into cache
dedicated to data and cache dedicated to instructions

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


150 B.Sc. Electrical Eng.
Cache/Main Memory Structure

M = 2n/K blocks

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


151 B.Sc. Electrical Eng.
Cache …
 For mapping purposes, main memory is considered to consist of a number of
fixed-length blocks of K words each.
M = 2n/K blocks in main memory.
 The cache consists of m blocks, called lines: m << M
 The term line (not block) is used for cache:
 To differentiate it from memory blocks
 It contains additional fields as tag and control bits
 Control bits, such as a bit to indicate whether a line has been modified since being loaded into
the cache.
 Each line contains K words
 The length of a line (not including tag and control bits) is called line size.
 Line size could be as small as 32 bits
 with each word = a byte
 A line size = 4 bytes

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


152 B.Sc. Electrical Eng.
Cache …
 More blocks in memory than cache lines: m << M
 A line cannot be dedicated to a particular block.
 Each line has a tag that identifies which particular block is
currently being stored.
 A tag is usually a portion of the main memory address

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


153 B.Sc. Electrical Eng.
Cache Read Operation
 CPU generates read address (RA) of a word to be read.
 If word is in cache, it is delivered to the processor (fast).
 If not present, read required block from main memory to cache and
then deliver it to CPU
 These two operations occur in parallel. (A typical organization)
 In an alternative organization,
 The cache is physically interposed between the processor and the
main memory for all data, address, and control lines.
 In this case, for a cache miss, the desired word is first read into the
cache and then transferred from cache to processor.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


154 B.Sc. Electrical Eng.
Cache Read Operation

Another
Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


155 B.Sc. Electrical Eng.
Typical Cache Organizations
 Cache connects to the processor via data, control and address
lines
 The data and address lines also attach to data and address
buffers, which attach to the system bus
 Through which main memory is reached
 When cache hit occurs:
 Data and address buffers are disabled
 Communication is only between processor and cache
 When cache miss occurs:
 Desired address is loaded onto the system bus
 Data are copied to both the cache and processor

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


156 B.Sc. Electrical Eng.
Typical Cache Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


157 B.Sc. Electrical Eng.
Cache Design
 Addressing
 Size
 Mapping Function
 Replacement Algorithm
 Write Policy
 Block Size
 Number of Caches

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


158 B.Sc. Electrical Eng.
Cache Addressing
 Where does cache sit?
 Between processor and virtual memory management unit
 Between MMU and main memory
 Logical cache (virtual cache) stores data using virtual addresses
 Processor accesses cache directly, not thorough physical cache
 Cache access faster, before MMU address translation
 Virtual addresses use same address space for different applications
 Must flush cache on each context switch
 Physical cache stores data using main memory physical addresses

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


159 B.Sc. Electrical Eng.
Size does matter
 Cost
 More cache is expensive
 Speed
 More cache is faster (up to a point)
 Checking cache for data takes time

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


160 B.Sc. Electrical Eng.
Comparison of Cache Sizes

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


161 B.Sc. Electrical Eng.
Mapping Function
 Cache of 64kByte
 Cache block of 4 bytes
 i.e. cache is 16k (214) lines of 4 bytes
 16MBytes main memory
 24 bit address
 (224=16M)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


162 B.Sc. Electrical Eng.
Mapping …
 Algorithm is needed for mapping main memory blocks into
cache lines.
 And a means is needed for determining which main memory
block currently occupies a cache line.
 The choice of the mapping function dictates how the cache is
organized.
 Three techniques:
 Direct
 Associative, and
 Set associative

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


163 B.Sc. Electrical Eng.
Direct Mapping
 The simplest technique of mapping.
 Maps each block of main memory into only one possible
cache line
 i.e. if a block is in cache, it must be in one specific place
 The mapping is expressed as:
i = j modulo m
where
o i = cache line number
o j = main memory block number
o m = number of lines in the cache

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


164 B.Sc. Electrical Eng.
Direct Mapping …
 Figure below shows the mapping for the first m blocks of main
memory.
 Each block of main memory maps into one unique line of the
cache.
 The next m blocks of main memory map into the cache in the
same fashion;
 That is, block Bm of main memory maps into line L0 of cache,
block Bm+1 maps into line L1, and so on.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


165 B.Sc. Electrical Eng.
Direct Mapping from Cache to Main Memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


166 B.Sc. Electrical Eng.
Direct Mapping …
 Each main memory address consists of three fields:
 The least significant w bits identify a unique word or byte
within a block of main memory.
 The remaining s bits specify one of the 2s blocks of main
memory.
 The cache logic interprets these s bits as a tag of s - r bits (most
significant portion) and a line field of r bits.
 This latter field identifies one of the m = 2r lines of the cache.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


167 B.Sc. Electrical Eng.
Direct Mapping …
To summarize:
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2s+w/2w = 2s
 Number of lines in cache = m = 2r
 Size of cache = 2r+w words or bytes
 Size of tag = (s - r) bits

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


168 B.Sc. Electrical Eng.
Direct Mapping Address Structure
Tag s-r Line or Slot r Word w

8 14 2

 24 bit address
 2 bit word identifier (4 byte block)
 22 bit block identifier
 8 bit tag (=22-14)
 14 bit slot or line
 No two blocks that map into the same line have the same Tag field.
 Check contents of cache by finding line and checking Tag
Direct Mapping Cache Line Table

Cache line Main Memory blocks held

0 0, m, 2m, 3m…2s-m

1 1,m+1, 2m+1…2s-m+1

m-1 m-1, 2m-1,3m-1…2s-1


Direct Mapping Cache Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


171 B.Sc. Electrical Eng.
Example: Direct Mapping
• Refer to figure on next slide
• m = 16K = 214 and i = j modulo 214.
• The mapping becomes:
Cache Line Starting Memory Address of Block
0 000000, 010000, …, FF0000
1 000004, 010004, …, FF0004
… …
214 - 1 00FFFC, 01FFFC, …, FFFFFC
• Note that no two blocks that map into the same line number have
the same tag number.
• Thus, blocks with starting addresses 000000, 010000, …, FF0000
have tag numbers 00, 01, …, FF, respectively.
Multiple blocks from
memory that will map
to cache tag of 00. (Each
memory line is a block.)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


173 B.Sc. Electrical Eng.
Example: Direct Mapping …
 A read operation works as follows.
 The cache system is presented with a 24-bit address.
 The 14-bit line number is used as an index into the cache to access a
particular line.
 If the 8-bit tag number matches the tag number currently stored in that
line, then the 2-bit word number is used to select one of the 4 bytes in
that line.
 Otherwise, the 22-bit tag-plus-line field is used to fetch a block from
main memory.
 The actual address that is used for the fetch is the 22-bit tag-plus-line
concatenated with two 0 bits (the right most 2-bits on address), so that 4
bytes are fetched starting on a block boundary.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


174 B.Sc. Electrical Eng.
Direct Mapping pros & cons
 Simple
 Inexpensive
 Fixed location for given block
 If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


175 B.Sc. Electrical Eng.
Victim Cache
 Lower miss penalty
 Remember what was discarded
 Already fetched
 Use again with little penalty
 Fully associative
 4 to 16 cache lines
 Between direct mapped L1 cache and next memory level

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


176 B.Sc. Electrical Eng.
Associative Mapping
 A main memory block can load into any line of cache
 Memory address is interpreted as tag and word
 Tag uniquely identifies block of memory
 Every line’s tag is examined for a match
 Cache searching gets expensive (better in direct mapping
because we know where a memory block is in the cache!)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


177 B.Sc. Electrical Eng.
Associative Mapping from
Cache to Main Memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


178 B.Sc. Electrical Eng.
Fully Associative Cache Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


179 B.Sc. Electrical Eng.
Associative
Mapping
Example

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


180 B.Sc. Electrical Eng.
Associative Mapping Address Structure

Word
Tag 22 bit 2 bit

 22 bit tag stored with each 32 bit block of data


 Compare tag field with tag entry in cache to check for hit
 Least significant 2 bits of address identify which 16 bit word is
required from 32 bit data block
 e.g.
 Address Tag Data Cache line
 FFFFFC FFFFFC 24682468 3FFF

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


181 B.Sc. Electrical Eng.
Associative Mapping Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2s+ w/2w = 2s
 Number of lines in cache = undetermined
 Size of tag = s bits

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


182 B.Sc. Electrical Eng.
Set Associative Mapping
 Cache is divided into a number of sets
 Each set contains a number of lines
 A given block maps to any line in a given set
 e.g. Block B can be in any line of set i
 e.g. 2 lines per set
 2 way associative mapping
 A given block can be in one of 2 lines in only one set

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


183 B.Sc. Electrical Eng.
Set Associative Mapping
Example
 13 bit set number
 Block number in main memory is modulo 213
 000000, 00A000, 00B000, 00C000 … map to same set

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


184 B.Sc. Electrical Eng.
Mapping From Main Memory to Cache: v Associative

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


185 B.Sc. Electrical Eng.
Mapping From Main Memory to Cache:
k-way Associative

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


186 B.Sc. Electrical Eng.
K-Way Set Associative Cache Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


187 B.Sc. Electrical Eng.
Set Associative Mapping Address Structure

Word
Tag 9 bit Set 13 bit 2 bit

 Use set field to determine cache set to look in


 Compare tag field to see if we have a hit
 e.g
 Address Tag Data Set number
 1FF 7FFC 1FF 12345678 1FFF
 001 7FFC 001 11223344 1FFF

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


188 B.Sc. Electrical Eng.
Two Way Set Associative Mapping Example

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


189 B.Sc. Electrical Eng.
Set Associative Mapping Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2s+w / 2w = 2s
 Number of lines in set = k
 Number of sets = v = 2d
 Number of lines in cache = kv = k * 2d
 Size of cache = k * 2d+w words or bytes
 Size of tag = (s – d) bits
 (i.e. Total # of blocks per set = Total MM blocks / # of sets)
=2s / 2d = 2s-d

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


190 B.Sc. Electrical Eng.
Direct and Set Associative Cache Performance Differences

 Significant up to at least 64kB for 2-way


 Difference between 2-way and 4-way at 4kB much less than
4kB to 8kB
 Cache complexity increases with associativity
 Not justified against increasing cache to 8kB or 16kB
 Above 32kB gives no improvement
 (simulation results)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


191 B.Sc. Electrical Eng.
Figure 4.16 Varying Associativity over Cache Size

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


192 B.Sc. Electrical Eng.
Replacement Algorithms (1)
Direct mapping
 No choice
 Each block only maps to one line
 Replace that line

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


193 B.Sc. Electrical Eng.
Replacement Algorithms (2)
Associative & Set Associative
 To achieve high speed, such an algorithm must be
implemented in hardware.
 A number of algorithms have been tried.
 Least Recently used (LRU)
 e.g. in 2 way set associative
 Which of the 2 block is lru?
 First in first out (FIFO)
 replace block that has been in cache longest
 Least frequently used
 replace block which has had fewest hits
 Random
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
194 B.Sc. Electrical Eng.
Write Policy
 Two cases to consider.
 If the old block in the cache has not been altered, then it may be overwritten with a new block
without first writing out the old block.
 If at least onewrite operation has been performed on a word in that line of the cache, then
main memory must be updated by writing the line of cache out to the block of memory before
bringing in the new block.
 Must not overwrite a cache block unless main memory is up to date
 Multiple CPUs may have individual caches
 If a word is altered in one cache, it could invalidate a word in other caches.
 I/O may address main memory directly
 If a word has been altered only in the cache, then the corresponding memory word is invalid.
 Further, if the I/O device has altered main memory, then the cache word is invalid.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


195 B.Sc. Electrical Eng.
Write through
 The simplest technique.
 All writes go to main memory as well as cache
 Multiple CPUs can monitor main memory traffic to keep local
(to CPU) cache up to date
 Lots of traffic
 Slows down writes

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


196 B.Sc. Electrical Eng.
Write back
 Updates initially made in cache only
 Update bit (dirty bit) for cache slot is set when update occurs
 If block is to be replaced, write to main memory only if update
bit is set
 Other caches get out of sync
 I/O must access main memory through cache
 This makes for complex circuitry and a potential bottleneck.
(why?)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


197 B.Sc. Electrical Eng.
Multilevel Caches
 High logic density enables caches on chip
 Faster than bus access
 Frees bus for other transfers
 Common to use both on and off chip cache
 L1 on chip, L2 off chip in static RAM
 L2 access much faster than DRAM or ROM
 L2 often uses separate data path
 L2 may now be on chip
 Resulting in L3 cache
 Bus access or now on chip…

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


198 B.Sc. Electrical Eng.
Unified v Split Caches
 One cache for data and instructions or two, one for data and
one for instructions
 Advantages of unified cache
 Higher hit rate
 Balances load of instruction and data fetch
 Only one cache to design & implement

 Advantages of split cache


 Eliminates cache contention between instruction fetch/decode
unit and execution unit
 Important in pipelining

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


199 B.Sc. Electrical Eng.
Input/Output Systems

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


200 B.Sc. Electrical Eng.
Introduction
 Input/Output Problems
 Why one does not connect peripherals directly to the system bus:
 Peripherals often use different data formats and word lengths than the computer to
which they are attached
 Wide variety of peripherals
 Delivering different amounts of data
 At different speeds
 Use different data formats
 All slower than CPU and RAM
 Need I/O modules: have two major functions
 Interface to the processor and memory via the system bus or central switch.
 Interface to one or more peripheral devices.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


201 B.Sc. Electrical Eng.
I/O Module Generic Model

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


202 B.Sc. Electrical Eng.
I/O Module …
 Each module interfaces to the system bus or central switch
and controls one or more peripheral devices.
 An I/O module is not simply a set of mechanical
connectors that wire a device into the system bus.
 Rather, the I/O module contains logic for performing a
communication function between the peripheral and the
bus.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


203 B.Sc. Electrical Eng.
EXTERNAL DEVICES
 Provide a means of exchanging data between the external
environment and the computer.
 An external device attaches to the computer by a link to an
I/O module.
 The link is used to exchange control, status, and data
between the I/O module and the external device.
 An external device connected to an I/O module is referred
to as a peripheral device.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


204 B.Sc. Electrical Eng.
External Devices …
 Three broad categories of external devices:
 Human readable: Suitable for communicating with
the computer user.
 display terminals and printers.
 Machine readable: Suitable for communicating with
equipment.
 magnetic disk and tape systems, and sensors and actuators
 Communication: Suitable for communicating with
remote devices.
 NIC, Modem

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


205 B.Sc. Electrical Eng.
External Devices …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


206 B.Sc. Electrical Eng.
External Devices …
 The interface to the I/O module is in the form of control, data, and
status signals.
 Control signals determine the function that the device will perform, such as:
 send data to the I/O module (INPUT or READ),
 accept data from the I/O module (OUTPUT or WRITE),
 report status, or perform some control function particular to the
device (e.g., position a disk head).
 Data are in the form of a set of bits to be sent to or received from the I/O
module.
 Status signals indicate the state of the device.
 E.g. READY/NOT-READY to show whether the device is ready for
data transfer.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


207 B.Sc. Electrical Eng.
External Devices …
 Control logic associated with the device controls the device’s operation
in response to direction from the I/O module.
 The transducer converts data from electrical to other forms of
energy during output and from other forms to electrical during
input.
 A buffer is associated with the transducer to temporarily hold
data being transferred between the I/O module and the
external environment;
 A buffer size of 8 to 16 bits is common.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


208 B.Sc. Electrical Eng.
I/O Module Function
 The major functions or requirements for an I/O module are:
 Control and timing
 Processor communication
 Device communication
 Data buffering
 Error detection
 Processor may communicate with one or more external
devices in unpredictable patterns, depending on the program’s
need for I/O.
 The internal resources, such as main memory and the system
bus, must be shared among a number of activities, including
data I/O.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


209 B.Sc. Electrical Eng.
I/O Module Function …
 Thus, the I/O module function includes a control and timing, to coordinate the
flow of traffic between internal resources and external devices.
 For example, the control of the transfer of data from an external device to the
processor might involve the following sequence of steps:
1. The processor interrogates the I/O module to check the status of the attached
device.
2. The I/O module returns the device status.
3. If the device is ready to transmit, the processor requests the transfer of data, by
means of a command to the I/O module.
4. The I/O module obtains a unit of data (e.g., 8 or 16 bits) from the external
device.
5. The data are transferred from the I/O module to the processor.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


210 B.Sc. Electrical Eng.
I/O Module Function …
 The I/O module must communicate with the processor and
with the external device.
 Processor communication involves:
 Command decoding: The I/O module accepts commands
from the processor, typically sent as signals on the control
bus. For example, an I/O module for a disk drive might
accept the following commands: READ SECTOR,
WRITE SECTOR, SEEK track number, and SCAN
record ID. The latter two commands each include a
parameter that is sent on the data bus.
 Data: Data are exchanged between the processor and the I/O
module over the data bus.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


211 B.Sc. Electrical Eng.
I/O Module Function …
 Status reporting: Because peripherals are so slow, it is
important to know the status of the I/O module. For
example, if an I/O module is asked to send data to the
processor (read), it may not be ready to do so because
it is still working on the previous I/O command. This
fact can be reported with a status signal. Common
status signals are BUSY and READY. There may also be
signals to report various error conditions.
 Address recognition: Just as each word of memory has an
address, so does each I/O device. Thus, an I/O module must
recognize one unique address for each peripheral it controls.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


212 B.Sc. Electrical Eng.
I/O Module Function …
 The I/O module on the other side must be able to perform device
communication.
 This communication involves commands, status information, and data.
 Data buffering:
 The transfer rate into and out of main memory or the processor is quite high;
 The rate is orders of magnitude lower for peripheral devices.
 Data coming from main memory are sent to an I/O module in a rapid burst.
 The data are buffered in the I/O module and then sent to the peripheral device at
its data rate.
 In the opposite direction, data are buffered so as not to tie up the memory in a slow
transfer operation.
 Thus, the I/O module must be able to operate at both device and memory speeds.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


213 B.Sc. Electrical Eng.
I/O Module Function …
 Error detection:
 Class 1: Mechanical and electrical malfunctions reported by the
device (e.g., paper jam, bad disk track).
 Class 2: Unintentional changes to the bit pattern as it is
transmitted from device to I/O module.
 Some form of error-detecting code is often used to
detect transmission errors.
 A simple example is the use of a parity bit on each
character of data.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


214 B.Sc. Electrical Eng.
IO Module Structure

215 B.Sc. Electrical Eng.


Block Diagram of an I/O Module
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
I/O Module Structure …
 The module connects to the rest of the computer through a set
of signal lines (e.g., system bus lines).
 Data transferred to and from the module are buffered in one or
more data registers.
 There may also be one or more status registers that provide
current status information.
 A status register may also function as a control register, to
accept detailed control information from the processor.
 The logic within the module interacts with the processor via a
set of control lines.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


216 B.Sc. Electrical Eng.
I/O Module Structure …
 The processor uses the control lines to issue commands to the
I/O module.
 Some of the control lines may be used by the I/O module
(e.g., for arbitration and status signals).
 The module must also be able to recognize and generate
addresses associated with the devices it controls.
 Each I/O module has a unique address or, if it controls more
than one external device, a unique set of addresses.
 Finally, the I/O module contains logic specific to the interface
with each device that it controls.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


217 B.Sc. Electrical Eng.
I/O Module Structure …
 Different I/O Modules based on capabilities:
 An I/O module that takes on most of the detailed processing
burden, presenting a high-level interface to the processor, is
usually referred to as an I/O channel or I/O processor.
 An I/O module that is quite primitive and requires detailed
control is usually referred to as an I/O controller or device
controller.
 I/O controllers are commonly seen on microcomputers, whereas
I/O channels are used on mainframes.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


218 B.Sc. Electrical Eng.
Techniques for I/O Operations
 When the processor is executing a program and encounters an
instruction relating to I/O, it executes that instruction by
issuing a command to the appropriate I/O module.
 Three techniques are possible for I/O operations.
 Programmed I/O
 Interrupt-Driven I/O
 Direct Memory Access (DMA)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


219 B.Sc. Electrical Eng.
PROGRAMMED I/O
 Data are exchanged between the processor and the I/O
module.
 The processor executes a program that gives it direct control of
the I/O operation, including sensing device status, sending a
read or write command, and transferring the data.
 When the processor issues a command to the I/O module, it
must wait until the I/O operation is complete.
 This is wasteful of processor time.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


220 B.Sc. Electrical Eng.
PROGRAMMED I/O …
 The I/O module will perform the requested action and then
 Set the appropriate bits
 CPU requests I/O operation
 I/O module performs operation
 I/O module sets status bits in the I/O status register.
 I/O module takes no further action to alert the processor directly
(does not interrupt CPU).
 CPU checks status bits periodically.
 CPU may wait or come back later.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


221 B.Sc. Electrical Eng.
I/O Commands
 To execute an I/O-related instruction:
 The processor issues an address, specifying:
 The particular I/O module and external device, and
 An I/O command.

 There are four types of I/O commands that an I/O module may receive:
 Control - telling module what to do
 e.g. spin up disk, rewind or to move forward one record on a magnetic-tape.
 Test - check status of I/O Module and its peripherals.
 e.g. power? Error?
 Read - Module transfers data via buffer from peripheral device and place it in an internal
buffer
 The processor can then obtain the data item by requesting that the I/O module place it
on the data bus.
 Write - take an item of data (byte or word) from the data bus and transmit that data item
to the peripheral.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


222 B.Sc. Electrical Eng.
Keeps the processor busy needlessly

Programmed I/O
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
223 B.Sc. Electrical Eng.
I/O Instructions
 With programmed I/O: I/O-related instructions that the
processor fetches from memory are mapped into I/O
commands the processor issues to an I/O module to execute
the instructions.
 And there is often a simple one-to-one relationship.
 data transfer is very like memory access (CPU viewpoint)
 Each device given unique identifier (address)
 When CPU issues an I/O command, the command contains
the address of the desired device.
 When the processor, main memory, and I/O share a
common bus, two modes of addressing are possible: memory
mapped and isolated.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


224 B.Sc. Electrical Eng.
Memory Mapped I/O
 Memory mapped I/O
 Devices and memory share an address space
 I/O looks just like memory read/write
 No special commands for I/O
 Large selection of memory access commands available

 Isolated I/O
 Separate address spaces
 Need I/O or memory select lines
 Special commands for I/O
 Limited set

 The bus may be equipped with memory read and write plus
input and output command lines.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
225 B.Sc. Electrical Eng.
Isolated I/O
 The bus may be equipped with memory read and write plus
input and output command lines.
 Now, the command line specifies whether the address refers to a
memory location or an I/O device.
 The full range of addresses may be available for both.
 Again, with 10 address lines, the system may now support both
1024 memory locations and 1024 I/O addresses.
 Because the address space for I/O is isolated from that for
memory, this is referred to as isolated I/O.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


226 B.Sc. Electrical Eng.
Example
 Assume a 10-bit address, with a 512 memory (locations 0–511)
and up to 512 I/O addresses (locations 512–1023).
 Two addresses are dedicated to keyboard input from a
particular terminal.
 Address 516 refers to the data register and address 517 refers to
the status register, which also functions as a control register for
receiving processor commands.
 The program shown will read 1 byte of data from the keyboard
into an accumulator register in the processor. (Note the loop!)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


227 B.Sc. Electrical Eng.
Example …

200 Load AC "1"


//load accum. with
value 00000001, so
start read

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


228 B.Sc. Electrical Eng.
Isolated …
 The bus may be equipped with memory read and write plus
input and output command lines.
 Now, the command line specifies whether the address refers to
a memory location or an I/O device.
 The full range of addresses may be available for both.
 Again, with 10 address lines, the system may now support both
1024 memory locations and 1024 I/O addresses.
 Because the address space for I/O is isolated from that for
memory, this is referred to as isolated I/O.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


229 B.Sc. Electrical Eng.
Example …
 Isolated I/O

 For most types of processors, there is a relatively large set of different instructions for
referencing memory.
 If isolated I/O is used, there are only a few I/O instructions.
 Thus, an advantage of memory-mapped I/O is that this large repertoire of instructions
can be used, allowing more efficient programming.
 A disadvantage is that valuable memory address space is used up.
 Both memory-mapped and isolated I/O are in common use.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


230 B.Sc. Electrical Eng.
INTERRUPT-DRIVEN I/O
 Problem with Programmed I/O:
 CPU waiting until the I/O finishes (or is ready)
 CPU must repeatedly check for status of I/O module
 Alternative:
 CPU issue an I/O command to module
 Go on to do some useful work.
 I/O module will interrupt the CPU when ready to exchange data.
 CPU executed data transfer, and then resumes its former process.

 When an interrupt request is encountered, a transfer to an interrupt handling


routine takes place.
 Interrupt handling routines are programs that are invoked to collect the state of the
currently executing program, correct the cause of the interrupt, and restore the
state of the program.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


231 B.Sc. Electrical Eng.
INTERRUPT-DRIVEN I/O …
 From I/O module point of view:
 I/O module receives a READ command from the processor.
 The I/O module then proceeds to read data from associated
peripheral.
 Once the data are in the module’s data register, the module signals
an interrupt to the processor over a control line.
 The module then waits until its data are requested by the
processor.
 When the request is made, the module places its data on the data
bus and is then ready for another I/O operation.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


232 B.Sc. Electrical Eng.
INTERRUPT-DRIVEN I/O …
 From CPU module point of view:
 CPU issues a READ command.
 It then goes off and does something else.
 At the end of each instruction cycle, CPU checks for
interrupts
 When the interrupt from the I/O module occurs, CPU saves
the context (e.g., program counter and processor registers)
of the current program and handles the interrupt.
 In this case, the CPU reads the word of data from the I/O
module and stores it in memory.
 It then restores the context of the program it was working on
(or some other program) and resumes execution.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


233 B.Sc. Electrical Eng.
Interrupt-Driven I/O
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
234 B.Sc. Electrical Eng.
Instruction Cycle With Interrupts

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


235 B.Sc. Electrical Eng.
Simple Interrupt Processing
 The occurrence of an interrupt triggers a number of events,
both in the processor hardware and in software.
 Steps:
 Device issues interrupt signal to CPU.
 CPU finishes execution of the current instruction before
responding to the interrupt, (Figure in previous slide).
 CPU tests for existence of an interrupt, sends an
acknowledgment signal to the device.
 The acknowledgment allows the device to remove its interrupt
signal.
 The CPU now prepares to transfer control to the interrupt
routine.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


236 B.Sc. Electrical Eng.
Simple Interrupt Processing …
 First it needs to save information needed to resume the current
program at the point of interrupt.
 Minimum information required:
 The status of the CPU, contained in a register called the program
status word (PSW), and
 The location of the next instruction to be executed, contained in PC
 These can be pushed onto the system control stack.
 The CPU now loads the PC with the entry location of the interrupt-
handling program that will respond to this interrupt.
 Once the PC has been loaded, the CPU proceeds to the next instruction
cycle, which begins with an instruction fetch.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


237 B.Sc. Electrical Eng.
Simple Interrupt Processing …
 At this point, the PC and PSW relating to the interrupted program have been saved on
the system stack.
 There is also other information that is considered part of the “state” of the executing
program.
 That is, the contents of the CPU registers need to be saved, because these registers may
be used by the interrupt handler.
 So, all of these values, plus any other state information, need to be saved.
 See part (a) in Figure below.
 In this case, a user program is interrupted after the instruction at location N.
 The contents of all of the registers plus the address of the next instruction (N + 1) are
pushed onto the stack.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


238 B.Sc. Electrical Eng.
Simple Interrupt Processing …

(a) Interrupt after (b) Return from interrupt


By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
239 B.Sc. Electrical Eng.
Instruction at N
Simple Interrupt Processing …
 The stack pointer: updated to point to the new top of stack
 PC: updated to point to the beginning of the interrupt service routine.
 The interrupt handler next processes the interrupt.
 This includes an examination of status information relating to the I/O operation or
other event that caused an interrupt.
 When interrupt processing is complete, the saved register values are retrieved
from the stack and restored to the registers. See part (b).
 The final act is to restore the PSW and program counter values from the stack.
 As a result, the next instruction to be executed will be from the previously
interrupted program.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


240 B.Sc. Electrical Eng.
Simple Interrupt Processing …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


241 B.Sc. Electrical Eng.
DIRECT MEMORY ACCESS (DMA)
 Drawbacks of Programmed I/O and Interrupt-Driven I/O
 The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
 The processor is tied up in managing an I/O transfer; a number of instructions must be
executed for each I/O transfer.
 Programmed: faster I/O transfer at the cost of doing nothing else.
 The processor is dedicated to the task of the I/O
 Interrupt-Driven: frees up CPU at the cost of I/O transfer rate.
 There is somewhat of a trade-off between these two drawbacks.
 both methods have an adverse impact on both a processor activity and I/O transfer
rate.
 DMA is a more efficient technique to move large volumes of data.
 DMA involves an additional module (hardware) on the system bus.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


242 B.Sc. Electrical Eng.
DMA …
 Both the DMA and CPU use memory bus
 Only one of them can use the memory at a time.
 The DMA controller sends a request to the CPU asking its
permission to use the bus.
 The CPU returns an acknowledgment to the DMA controller
granting it bus access.
 The DMA can now take control of the bus to independently
conduct memory transfer.
 When the transfer is complete the DMA hands over its control of
the bus to the CPU.
 Processors that support DMA provide
 One or more input signals that the bus requester can assert to gain
control of the bus, and
 One or more output signals that the CPU asserts to indicate it has
handed over the bus
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
243 B.Sc. Electrical Eng.
Typical DMA Module Diagram

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


244 B.Sc. Electrical Eng.
DMA Function
 The DMA controller/Module is a piece of hardware that controls one or more
peripheral devices.
 It allows devices to transfer data to or from the system’s memory without the
help/intervention of the processor.
 It is capable of mimicking the processor and taking over control of system from the
processor.
 To transfer data to and from memory over the system bus.
 For this, either:
 The DMA module must use the bus only when the processor does not need it, or
 It must force the processor to suspend operation temporarily (cycle stealing)
 When the processor wishes to read or write a block of data, it issues a command to the
DMA module, by sending to the DMA module the following information:
 Whether a read or write is requested, using the read or write control line between the
processor and the DMA module

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


245 B.Sc. Electrical Eng.
DMA Function …
 The address of the I/O device involved, communicated on the data
lines
 The starting location in memory to read from or write to,
communicated on the data lines and stored by the DMA module in
its address register
 The number of words to be read or written, again communicated via
the data lines and stored in the data count register
 CPU then carries on with other work.
 It has delegated DMA module to deal with transfer.
 DMA module then sends interrupt when transfer is complete.
 Hence, Processor is involved only at the beginning and end of the
transfer.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


246 B.Sc. Electrical Eng.
DMA Function …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


247 B.Sc. Electrical Eng.
DMA Function …
 Both the DMA and CPU use memory bus
 Only one of them can use the memory at a time.
 The DMA controller sends a request to the CPU asking its
permission to use the bus.
 The CPU returns an acknowledgment to the DMA controller
granting it bus access.
 The DMA can now take control of the bus to independently
conduct memory transfer.
 When the transfer is complete the DMA hands over its control of
the bus to the CPU.
 Processors that support DMA provide
 One or more input signals that the bus requester can assert to gain
control of the bus, and
 One or more output signals that the CPU asserts to indicate it has
handed over the bus
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
248 B.Sc. Electrical Eng.
BUS Sharing Among CPU and DMA

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


249 B.Sc. Electrical Eng.
DMA Transfer Cycle Stealing
 How does the DMA use the bus for transfer to/from memory?
 Processor is suspended
 DMA controller takes over bus for a cycle
 Transfer of one word of data and returns control to the processor
 This is not an interrupt (CPU does not switch context and do something else)
 Rather, it pauses for one bus cycle
 CPU suspended just before it accesses bus
 i.e. before an operand or data fetch or a data write
 Overall effect: CPU executes more slowly
 But, not as slow as CPU doing transfer (as in case of programmed
or interrupt-driven I/O).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


250 B.Sc. Electrical Eng.
DMA and Interrupt Breakpoints During an Instruction Cycle

At any of memory
access points by
CPU
Only at the end of an
instruction cycle
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
251 B.Sc. Electrical Eng.
DMA Configurations
 The DMA mechanism can be configured in a variety of ways.
 Configuration 1

Single Bus, Detached DMA controller


 All modules share the same system bus
 DMA uses programmed I/O to exchange data between memory and I/O module, through
the DMA module
 Disadvantage:
 Each transfer uses bus twice for each transfer of a word (two bus cycles) (Similar to processor
controlled programmed I/O)
 I/O to DMA then DMA to memory
 CPU is suspended twice

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


252 B.Sc. Electrical Eng.
DMA Configurations …
 Configuration 2

Single-bus, integrated DMA-I/O


 Integrate the DMA and I/O functions
 There is a path between the DMA and one or more I/O modules (that does not
include the system bus)
 DMA may be part of I/O module or may be a separate module that controls
one or more I/O modules
 Controller may support > 1 device
 Each transfer uses bus once
 DMA to memory
 CPU is suspended once
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
253 B.Sc. Electrical Eng.
DMA Configurations …

Separate I/O bus


 Connects I/O modules to the DMA module using an I/O bus
 Reduces the number of interfaces in the DMA to one
 Easily expandable configuration
 In both cases above, the system bus the DMA shares with the CPU and memory is used by the
DMA module only to exchange data with memory.
 Bus supports all DMA enabled devices
 Each transfer uses bus once
 DMA to memory
 CPU is suspended once

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


254 B.Sc. Electrical Eng.
Processor Unit Design
 We have seen basics of CPU
 Major components: Register set, ALU, CU
 The register set differs between architectures
 It is usually a combination of general purpose and special purpose (such as
PC, IR)
 Memory access registers
 MAR, MDR (MBR)
 Instruction Fetch Registers
 PC, IR
 Condition registers (flags)
 PSW register (in some architectures)
 Bits in the PSW are set by the CPU to
indicate the current status of an executing
program
 Indicators typically for arithmetic
operations, interrupts, memory
protection information, or CPU status.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


255 B.Sc. Electrical Eng.
Processor …
 Special-Purpose Address Registers
 Index register, Segment pointer, Stack pointer
 CPU
 Fetches instructions from memory
 Reads/writes data from/to memory
 Transfers data from/to I/O devices
 A typical execution cycle:
1. Fetch next instruction from memory (using address contained in PC) and store it in
IR
2. Decode instruction
3. Fetch operands from memory and store in CPU registers
4. Execute instruction
5. Transfer (store) results from CPU registers to memory
 The execution cycle is repeated as long as there are more instructions

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


256 B.Sc. Electrical Eng.
Processor …
 Interrupts checking is usually included in the cycle
 When interrupt occurs, a transfer to an interrupt handling routine
takes place
 Interrupt handling routines are programs that are invoked
 to collect the state of the currently executing program,
 handle the interrupt, and
 restore the state of the program
 The actions of the CPU during an execution cycle are defined by
micro-orders issued by the control unit.
 These micro-orders are individual control signals sent over
dedicated control lines

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


257 B.Sc. Electrical Eng.
Processor …
 Example: To execute an instruction that moves contents of register
X to register Y
 Assume both registers are connected to the data bus, D (recall registers with
parallel load from DLD course!)
 Control unit issues control signal to tell register X to place its contents on D
 After some delay, control unit issues another control signal to tell register Y
to read data from D
 The activation of the control signals is determined using either:
 Hardwired control or
 Microprogramming

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


258 B.Sc. Electrical Eng.
DATAPATH
 CPU can be divided into a data section and a control section
 Data section, also called datapath,
 Contains the registers and ALU
 Is capable of performing certain operations on data items
 Control section (control unit)
 Issues control signals to the datapath
 Internal to the CPU
 Data move from one register to another and between ALU and registers
 These movements are performed via local buses (which may carry data,
instructions, addresses)
 External to CPU,
 Data move from registers to memory and I/O devices
 Via a system bus

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


259 B.Sc. Electrical Eng.
DATAPATH …
 The internal data movement (among registers and between ALU
and registers) may be carried out using different organizations
 One-bus organization
 Two-bus organization
 Three-bus organization
 Dedicated datapath may also be used between components that
transfer data between themselves more frequently.
 E.g. At the beginning of each instruction cycle, fetch is done:
 MAR  PC
 Hence, a dedicated datapath from PC to MAR could be useful
 Speeds up this stage of the instruction cycle

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


260 B.Sc. Electrical Eng.
One-Bus Organization
 Registers and ALU use a single bus to move outgoing and
incoming data
 Since the bus can handle only one data movement within one
clock cycle, two-operand operations will need two cycles to
fetch the operands
 Additional registers may also be needed to buffer data for ALU
 Simplest and least expensive, but least efficient because of
limited amount of data transfer in the same cycle

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


261 B.Sc. Electrical Eng.
One-Bus Organization

Fig. One-bus datapath

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


262 B.Sc. Electrical Eng.
Two-Bus Organization
 Using two buses is a faster organization than one-bus
 General purpose registers are connected to both the buses
 Data can be transferred from two registers to the input of ALU at the same time
 Thus, a two-operand operation can fetch both operands in the same clock cycle
 Additional buffer register may be needed to hold the output of the ALU when the two
buses are busy carrying the two operands
 In some cases, one of the buses may be dedicated for moving data into registers (in-
bus), while the other is dedicated for transferring data out of the registers (out-bus)
 In such cases, the additional buffer register may be used to hold one of the operands
 The ALU output can be connected directly to the in-bus, which will transfer the result
into one of the registers

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


263 B.Sc. Electrical Eng.
Two-Bus Organization

Fig. Two-bus datapath

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


264 B.Sc. Electrical Eng.
Two-Bus Organization: in-bus & out-bus

Fig. Two-bus datapath with in-bus and out-bus

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


265 B.Sc. Electrical Eng.
Three-Bus Organization
 Two buses used as source buses and third bus as destination
 The source buses move data out of registers (out-bus), and
 The destination bus may move data into a register (in-bus)
 Each of the two out-buses is connected to an ALU input
 Output of ALU is connected directly to the in-bus
 More buses, more data move within a single clock cycle
 But, more buses increase the complexity of the hardware

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


266 B.Sc. Electrical Eng.
Three-Bus Organization

Fig. Three-bus datapath

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


267 B.Sc. Electrical Eng.
CPU Instruction Cycle
 As long as there are instructions, they are fetched and executed
 At the end of execution, check is made for occurrence of an interrupt
 Interrupt routine is invoked in case of an interrupt
 Basic actions during fetching an instruction, executing an instruction, or
handling an interrupt are defined by a sequence of micro-operations
 A group of control signals must be enabled in a prescribed sequence to
trigger the execution of a micro-operation
 Lets see the micro-operations for implementing instruction fetch,
execution of simple arithmetic instructions, and interrupt handling.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


268 B.Sc. Electrical Eng.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
269 B.Sc. Electrical Eng.
Micro-Operations
Fetch
t1: MAR  (PC) t1: MAR  (PC)
t2: MBR  Memory t2: MBR  Memory
PC  (PC) + I t3: PC  (PC) + I
t3: IR  (MBR) IR  (MBR)
// I = instruction length
 We assume that a clock is available for timing purposes and it emits
regularly spaced clock pulses
 Each clock pulse defines a time unit
 All time units are of equal duration
 Each micro-operation can be performed within the time of a single
time unit (t1, t2, t3, ..)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


270 B.Sc. Electrical Eng.
Micro-Operations …
Rules for grouping micro-operations:
1. Proper sequence of events must be followed
 MAR  (PC) must precede MBR  Memory
2. Conflicts must be avoided
 Avoid reading from and writing to the same register in one time
unit
 Example: (MBR  Memory) and (IR  MBR)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


271 B.Sc. Electrical Eng.
Micro-Operations …
Indirect Cycle
 Once fetched, next step is to fetch source operands
 If an indirect addressing is specified, then indirect cycle must
precede the execute cycle
Indirect cycle micro-operations (assume one-address instruction)
t1: MAR  (IR(Address)) //address field in IR to MAR
t2: MBR  Memory // fetch address of operand
t3: IR(Address)  (MBR(Address)) //update address field of IR so //that
it contains a direct address
 IR is now in the same state as if indirect addressing had not been used and
is ready for execute cycle

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


272 B.Sc. Electrical Eng.
Micro-Operations …
Interrupt Cycle
 If test for interrupt at completion of execute cycle determines
occurrence of an interrupt, then interrupt cycle occurs
 This cycle may vary from machine to machine
t1: MBR  (PC) //Return address to MBR
t2: MAR  Save_Address //Address where return address will be saved
PC  Routine_Address //PC is loaded with starting address of routine
t3: Memory  (MBR) //store old PC value to memory

 The processor is now ready to begin the next instruction cycle

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


273 B.Sc. Electrical Eng.
Micro-Operations …
Execute Cycle
 Fetch, Indirect and Interrupt cycles are predictable
 Fixed sequence of micro-operations, same micro-operations repeated
 For execute, it varies with the opcodes
 Example 1: Add instruction
ADD R1, X //Add content of loc. X to register R1
t1: MAR  (IR(Address)) //Address portion of IR to MAR
t2: MBR  Memory //Operand to MBR
t3: R1 (R1) + (MBR)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


274 B.Sc. Electrical Eng.
Micro-Operations …
Execute Cycle …
 Example 2: ISZ instruction (increment and skip if zero)
ISZ X //Content of X is incremented by 1.
//If result is zero, next instruction is skipped

t1: MAR  (IR(Address)) //Address portion of IR to MAR


t2: MBR  Memory //Operand to MBR
t3: MBR  (MBR) + 1
t4: Memory  (MBR)
If ((MBR) = 0 then PC  (PC) + I)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


275 B.Sc. Electrical Eng.
Micro-Operations …
E.g. Add R1, R2, R0 //Add R1 and R2 and put result in R0
1. The registers R0 , R1 , R2 , are extracted from the IR.
2. The contents of R1 and R2 are passed to the ALU for addition.
3. The output of the ALU is transferred to R.
 Using one-bus datapath
t0: A  (R1)
t1: B  (R2)
t2: R0  (A) + (B)
 Using the two-bus datapath
t0: A  (R1) + (R2)
t1: R0  (A)

 Using the three-bus datapath


t0: R0  (R1) + (R2)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


276 B.Sc. Electrical Eng.
The Control Unit
 So Far we have:
 Defined the basic elements of the processor.
 Described the micro-operations that the processor performs.
 Now what?
 Determine the functions that the control unit must perform to cause the micro-operations to
be performed.
 Implementation of control unit can be one of two types
 Hardwired
 The control unit is a state machine circuit.
 Its input logic signals are transformed into a set of output logic signals, which
are the control signals

 Microprogrammed
 Store the control signals associated with the implementation of a certain
instruction as a microprogram

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


277 B.Sc. Electrical Eng.
The Control Unit …
 The control unit is the main component that directs the system operations by sending
control signals to the datapath.
 These signals control the flow of data within the CPU and between the CPU and external
units such as memory and I/O.
 Control buses generally carry signals between the control unit and other computer
components in a clock-driven manner.
 The system clock produces a continuous sequence of pulses in a specified duration and
frequency.
 A sequence of steps t0 , t1 , t2 , . . . , (t0 < t1 < t2 < . . .) is used to execute a certain
instructions.
 Op-code of a fetched instruction decoded to provide the control signal generator with
information about the instruction to be executed.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


278 B.Sc. Electrical Eng.
Control Unit …

Fig. Inputs to control unit


By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
279 B.Sc. Electrical Eng.
Control Unit …
 The key inputs are
 The instruction register
 The clock
 Flags
 Control bus signals
 The instruction register.
 The control unit makes use of the opcode and will perform
different actions (issue a different combination of control signals)
for different instructions.
 To simplify the control unit logic, there should be a unique logic
input for each opcode.
 This function can be performed by a decoder, which takes an
encoded input and produces a single output.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


280 B.Sc. Electrical Eng.
Timing of Control Signals

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


281 B.Sc. Electrical Eng.
The Control Unit …
 Step information generated by a logic circuit module is used
with other inputs to generate control signals.
 The signal generator can be specified simply by a set of Boolean
equations for its output in terms of its inputs.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


282 B.Sc. Electrical Eng.
The Control Unit … Examples
 Add the contents of source registers R1 , R2 , and store the results in
destination register R0.
 Using 3-bus organization
 Step Instruction MOP Control
t0 Inst-x R0  (R1) + (R2) Select R1 as source 1
on out-bus1
Select R2 as source 2
on out-bus2
Select R0 as destination
on in-bus
Select the ALU function
ADD
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
283 B.Sc. Electrical Eng.
The Control Unit … Examples (1)
• Signals generated to execute Inst-x during time period t0 are shown.
• The AND gate ensures that these signals will be issued when the op-code is decoded
into Inst-x and during time period t0 .
• The signals (R1 out-bus 1), (R2 out-bus2), (R0 in-bus), and (Add) will select R1 as a
source on out-bus1, R2 as a source on outbus2, R0 as destination on in-bus, and
select the ALUs add function

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


284 B.Sc. Electrical Eng.
Microprogrammed Control
 As we have already seen, an instruction is implemented using a set of micro-operations.
 Associated with each micro-operation is a set of control lines that must be activated to
carry out the corresponding micro-operation.
 The idea of mircroprogrammed control is to store the control signals associated with
implementation of a certain instruction as a microprogram in a special memory called a
control memory (CM) or control store, which might be a ROM / RAM
 This memory is inaccessible by the programmer.
 A microprogram consists of a sequence of microinstructions.
 A microinstruction is a vector of bits, where each bit is a control signal, condition code
or the address of the next microinstruction.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


285 B.Sc. Electrical Eng.
Microprogrammed Control …
 Microinstructions are fetched from CM the same way program
instructions are fetched from main memory.
 When an instruction is fetched from memory, the op-code field
of the instruction determines the microprogram to be
executed.
 In other words, the op-code is mapped to a microinstruction
address in the control memory.
 The microinstruction processor uses that address to fetch the
first microinstruction in the microprogram.
 After fetching each microinstruction, the appropriate control
lines will be enabled.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


286 B.Sc. Electrical Eng.
Microprogrammed Control …
 Every control line that corresponds to a “1” bit should be turned on.
 Every control line that corresponds to a “0” bit should be left off.
 After completing the execution of one microinstruction, a new microinstruction will
be fetched and executed.
 If the condition code bits indicate that a branch must be taken, the next
microinstruction is specified in the address bits of the current microinstruction.
 Otherwise, the next microinstruction in the sequence will be fetched and executed.
 A control word is a microinstruction that specifies one or more micro-operations.
 A sequence of microinstructions is called a microprogram, stored in CM which might
be a ROM or RAM.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


287 B.Sc. Electrical Eng.
Microprogrammed Control… Example
 Consider a computer with three-bus datapath, as follows:
 Registers: PC, IR, MAR, and MDR, and 16 general-purpose registers (R0
– R15).
 ALU supported operations (Add, Subtract, Multiply, Divide, AND, OR,
shift-left, and shift-right).
 Consider the Add Operation: adds contents of source registers R1 and R2
and store the results in destination register R0
 The format of the microinstruction should have control bits for the
following:
 ALU Operations
 Registers that output to out-bus1
 Registers that output to out-bus2
 Registers that input from in-bus (destination)
 Other operations that are not shown here.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


288 B.Sc. Electrical Eng.
Microprogrammed Control… Example …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


289 B.Sc. Electrical Eng.
Microprogrammed Control… Example …
Table: The number of bits needed for ALU, Source 1, Source 2,
and destination

Microinstruction for: Add R1, R2, R0


By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
290 B.Sc. Electrical Eng.
Vertical Vs Horizontal Microinstructions
 Microinstructions can be classified as vertical or horizontal.
 In horizontal, individual bits correspond to individual control lines.
 Horizontal microinstructions are long and allow maximum parallelism since
each bit controls a single control line. (refer to previous example)
 In vertical microinstructions, control lines are coded into specific fields within
the microinstruction.
 Decoders are needed to map a field of k bits to 2k possible combinations of
control lines.
 For example, a 3-bit field in a microinstruction could be used to specify any
one of eight possible lines.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


291 B.Sc. Electrical Eng.
Vertical Vs Horizontal Microinstructions …
 Because of the encoding, vertical microinstructions are much shorter
than horizontal.
 Control lines encoded in the same field cannot be activated
simultaneously.
 Hence, vertical microinstructions allow limited parallelism.
 Example: Consider the same system as the previous example …
 Assume there are 16 general purpose registers and that the ALU
supports 8 functions.
 The tables show the encoding for ALU functions, registers connected to
out-bus 1 (Source 1), registers connected to out-bus 2 (Source 2), &
registers connected to in-bus (Destination).

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


292 B.Sc. Electrical Eng.
Vertical Microinstructions … Example

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


293 B.Sc. Electrical Eng.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
294 B.Sc. Electrical Eng.
Microinstruction for Add R1, R2, R0

•As another example, let us find vertical microinstructions used in fetching an


instruction.
•MAR← PC :
•Select PC as source 1 by using “10000” for source 1 field.
• Select MAR as our destination by using “10010” in the destination field.
•Use “0000” for the ALU field, which will be decoded to “NONE”.
•“NONE” means that out-bus1 will be connected to in-bus.
•Set field source 2 to “10000”, which means none of the registers will be
selected.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


295 B.Sc. Electrical Eng.
Vertical Microinstructions … Example …

Microinstruction for MAR← PC


 Memory Read and Write Memory operations can easily be
accommodated by adding 1 bit for read and another for write.
 Fetch Fetching an instruction can be done using the three
microinstructions shown at last figure.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


296 B.Sc. Electrical Eng.
Vertical Microinstructions … Example …

Microinstructions for memory read and write

Microinstructions for fetching an instruction


By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
297 B.Sc. Electrical Eng.
Hardwired Control
 Fixed logic circuits that correspond directly to the Boolean expressions
are used to generate the control signals.
 Clearly hardwired control is faster than microprogrammed control.
 However, hardwired control could be very expensive and complicated
for complex systems.
 Hardwired control is more economical for small control units.
 Microprogrammed control could adapt easily to changes in the system
design.
 We can easily add new instructions without changing hardware.
 Hardwired control will require a redesign of the entire systems in the
case of any change.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


298 B.Sc. Electrical Eng.
Hardwired …
 In hardwired, direct implementation using logic circuits.
 For each control line, one must find the Boolean expression in
terms of inputs to the control signal generator.
 Example: Assume a 3-instructions instruction set for a machine:
 Inst-x, Inst-y, Inst-z
 And A, B, C, D, E, F, G and H are control lines.
 The table below shows the control lines to be activated for the
three instructions at the three steps t0 , t1 , and t2.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


299 B.Sc. Electrical Eng.
Hardwired …
 The Boolean expressions for control lines A, B, and C can be
obtained as follows:
 A = Inst-x . t1 + Inst-z . t1 = (Inst-x + Inst-z) . t1
 B = Inst-x . t0 + Inst-y . t2
 C = Inst-x . t1 + Inst-x . t2 + Inst-y . t2 + Inst-z . t1
 = (Inst-x + Inst-z) . t1 + (Inst-x + Inst-y) . t2
 Logic circuits for control lines A, B, C and the Instruction
execution state digaram are shown in the next slides.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


300 B.Sc. Electrical Eng.
Hardwired …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


301 B.Sc. Electrical Eng.
Hardwired …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


302 B.Sc. Electrical Eng.
PIPELINING

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


303 B.Sc. Electrical Eng.
PIPELINING …
 Instruction pipelining is similar to the use of an assembly line in a
manufacturing plant.
 An assembly line takes advantage of the fact that a product goes through
various stages of production.
 By laying the production process out in an assembly line, products at
various stages can be worked on simultaneously.
 This process is also referred to as pipelining, because, as in a pipeline, new
inputs are accepted at one end before previously accepted inputs appear as
outputs at the other end.
 A common analogy used to illustrate pipelining is the Laundry Problem
 Assume we have four loads of dirty laundry that need to be
washed, dried, and folded.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


304 B.Sc. Electrical Eng.
PIPELINING …

No Pipelining:
• Put the first load in the washer for 30 minutes
• Dry it for 40 minutes, and
• Take 20 minutes to fold the clothes.
• Then pick up the second load and wash, dry, and fold, and repeat for the third
and fourth loads.
305 •By: Netsanet
Total Getnet,4Lecturer,
time:
B.Sc. Electrical Eng.
× (30 M.Sc.
+ 40Computer
+20) Eng.,
= 6 hours!
PIPELINING …

Smarter Approach - Pipelining:


• Put the second load in the washer while the first is in dryer
• While the first is being folded, the second will be in the dryer and the third in the washer
• While the second is being folded, the third will be in the dryer and the fourth in the washer
• Total time: 30 + 4 × 40 +20 = Three and half hours!

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


306 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 To apply this concept to instruction execution, we must recognize
that, an instruction has a number of stages.
 The instruction cycle is composed of 10 tasks, which occur in
sequence.
 Clearly, there should be some opportunity for pipelining.
 Consider subdividing instruction processing into two stages:
fetch instruction and execute instruction.
 There are times during the execution of an instruction when main
memory is not being accessed.
 This time could be used to fetch the next instruction in parallel
with the execution of the current.
 The pipeline has two independent stages.
 The first stage fetches an instruction and buffers it.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
307 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …

308
Two-Stage Instruction Pipeline
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 When the second stage is free, the first stage passes the buffered
instruction.
 While the second stage is executing the instruction, the first stage
takes advantage of any unused memory cycles to fetch and buffer
the next instruction.
 This is called instruction prefetch or fetch overlap.
 Note that this approach, which involves instruction buffering,
requires more registers.
 In general, pipelining requires registers to store data between
stages.
 This process will speed up instruction execution.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


309 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 If the fetch and execute stages were of equal duration, the instruction
cycle time would be halved.
 However, if we look more closely (part b of figure), we will see that this
doubling of execution rate is unlikely for two reasons:
1. The execution time will generally be longer than the fetch time.
 Execution will involve reading and storing operands and the performance of
some operation.
 Thus, the fetch stage may have to wait for some time before it can empty its
buffer.
2. A conditional branch instruction makes the address of the next instruction
to be fetched unknown.
 Thus, the fetch stage must wait until it receives the next instruction address
from the execute stage.
 The execute stage may then have to wait while the next instruction is
fetched.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
310 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 Guessing can reduce the time loss from the second reason. A
simple rule is the following:
 When a conditional branch instruction is passed on from the fetch to
the execute stage, the fetch stage fetches the next instruction in
memory after the branch instruction.
 Then, if the branch is not taken, no time is lost.
 If the branch is taken, the fetched instruction must be discarded and a
new instruction fetched.
 To gain further speedup, the pipeline must have more stages.
 Let us consider the following decomposition of the instruction
processing.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


311 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 Fetch instruction (FI): Read the next expected instruction into a
buffer.
 Decode instruction (DI): Determine the opcode and the operand
specifiers.
 Calculate operands (CO): Calculate the effective address of each
source operand.
 This may involve displacement, register indirect, indirect, or other forms of
address calculation.
 Fetch operands (FO): Fetch each operand from memory. Operands
in registers need not be fetched.
 Execute instruction (EI): Perform the indicated operation and store
the result, if any, in the specified destination operand location.
 Write operand (WO): Store the result in memory
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
312 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …

Figure P0: Sequential Processing (No pipelining)

• Without pipelining, we need 18 time units for execution of 3


instructions

Figure
By: Netsanet Getnet,P1: Pipelining:
Lecturer, we
hereEng.,
M.Sc. Computer need just 8 time units for execution of 3
313 instructions
B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …
 With this decomposition, the various stages will be of more nearly
equal duration.
 For the sake of illustration, let us assume equal duration.
 Figure below shows that a six-stage pipeline can reduce the
execution time for 9 instructions from 54 time units to 14 time
units.
 Assumptions for the figure (figure P2):
1. Each instruction goes through all six stages of the pipeline.
 Not always the case (e.g. a load instruction does not need WO)
2. All stages can be performed in parallel (e.g. no memory conflicts)
 E.g. FI, FO, and WO stages involve memory access. The figure
assumes that all these accesses can occur simultaneously, which
mostGetnet,
By: Netsanet memory
Lecturer,systems doEng.,
M.Sc. Computer not permit.
314 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


315 Figure
B.Sc. Electrical Eng.P2: Timing Diagram for Instruction Pipeline Operation
INSTRUCTION PIPELINING …
 But, desired value may be in cache or the FO or WO stages may be null.
 Thus, much of the time memory conflicts will not slow down the pipeline.
3. All stages are of equal duration.
 There will be some waiting involved at various stages
 Another problem is conditional branch instruction, which can
invalidate several instruction fetches.
 Same is true for interrupt which is unpredictable.
 Figure P3 below illustrates the effects of the conditional branch (from
instruction 3 to instruction 15).
 Until instruction 3 is executed, there is no way to know which
instruction will come next.
 The pipeline simply loads the next instruction in sequence
(instruction 4) and proceeds.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
316 B.Sc. Electrical Eng.
INSTRUCTION PIPELINING …

Figure P3: The Effect of a Conditional Branch on


By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
317 Instruction
B.Sc. Electrical Eng. Pipeline Operation
INSTRUCTION PIPELINING …
 In figure P2, the branch is not taken and we get the full performance
benefit of pipelining.
 In figure P3, the branch is taken, and this not determined until time
unit 7.
 At this point, the pipeline must be cleared of instructions that are not
useful.
 During time unit 8, instruction 15 enters the pipeline
 No instructions complete during time units 9 through 12.
 Performance penalty incurred because we could not anticipate the branch.
 Other problems:
 CO stage may depend on contents of a register that could be altered by a
previous instruction that is still in the pipeline.
 Other such register and memory conflicts could occur.
 The system must contain to account for this type of conflict.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
318 B.Sc. Electrical Eng.
Pipelining Performance Measures
1. Speed-up S(n)
 For execution of m tasks (instructions) using n-stages
 Hence, using pipelining, total time required is:

As the number of instructions increases, what happens?

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


319 B.Sc. Electrical Eng.
Pipelining Performance Measures …
2. Throughput U(n): number of tasks executed per unit time

U(n) =

= 1 (assuming t = 1 unit time)

3. Efficiency E(n): Ratio of the actual speed-up to the maximum speed-up

E(n) = /n=

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


320 B.Sc. Electrical Eng.
Pipelining Problems
1. Pipeline Stall
 A situation where one unit (stage) requires more time to
perform its function, thus forcing the other stages to be
idle.
 Example: during cache miss for instruction fetch!
 Figure below shows the effect of instruction 3 incurring a cache
miss
 Assume that a cache miss requires three extra time units
 This delays the other stages
 Normally, recall that 14 time units were required to execute the 9
instructions
 Now, 17 time units are required!
 This is just for a single cache miss!
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
321 B.Sc. Electrical Eng.
Pipelining Problems …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


322 B.Sc. Electrical Eng.
Pipelining Problems: Hazards
 Hazard: a situation that prevents the next instruction
from executing in the designated clock cycle
 There are 3 classes of hazards:
1. Structural Hazards:
o Arise from resource conflicts (Caused by limitations in
hardware that don’t allow concurrent execution of different
instructions)
o Where two different instructions (execution overlapped)
want to use the same piece of hardware.
 Examples
 Bus
 Single ALU
 Single Memory for instructions and data
 Single IR
323 By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
Remedy:
B.Sc. Electrical Eng.
to add additional elements to datapath to eliminate hazard
Structural Hazard
 Example: Memory Conflict
Data access

Instruction fetch
• Solution
• Stall one of the instructions until required unit is
available
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
324 B.Sc. Electrical Eng.
Structural Hazard …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


325 B.Sc. Electrical Eng.
Pipelining Problems: Hazards …
2. Data Hazards: (data dependency problem)
o Occur when given instruction depends on data from an
instruction ahead of it in pipeline
o When the pipeline changes the order of read/write accesses
to operands;
o So that the order differs from the order seen by sequentially
executing instructions on an un-pipelined processor.
o Comes about as a result of overlapping (or changing the
order of) the execution of data-dependent instructions.
 Example: Consider the following instructions
o I1: ADD R2, R3, R4 //R2 = R3 + R4
o I2: ADD R5, R2, R1 //R5 = R2 + R1
 Instruction I2 has a data dependency on I1 because it uses the result
of I1
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
326 B.Sc. Electrical Eng.
Pipelining Problems: Data Hazards

• If the instructions were sent to a pipeline in the normal


manner, I2 would be in the FO stage before I1 passed through
the WO stage.
• This would result in using the old contents of R2 for
computing a new value for R5, leading to an invalid result.
• To have a valid result, I2 must not enter the FO stage until I1
has passed through the WO stage.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
327 B.Sc. Electrical Eng.
Pipelining Problems: Data Hazards …

Solutions:
1. Delay the FI or FO of I2 by two cycles (special HW required)
2. Create delays by inserting NOP instructions in between (from
By: compiler)
Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
328 B.Sc. Electrical Eng.
Pipelining Problems: Hazards …
3. Control Hazards:
o Result from branch, jump or other instructions that change
flow of program (i.e. change PC)
o Subsequent expression may or may not be executed and what
happens cannot be determined until some operand has been
evaluated
o Example: Already illustrated in figure P3 on a previous slide!

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


329 B.Sc. Electrical Eng.
Operating System Support
(Memory Management)
Operating System Overview

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


331 B.Sc. Electrical Eng.
Objectives and Functions
 Convenience
 Making the computer easier to use
 Efficiency
 Allowing better use of computer resources

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


332 B.Sc. Electrical Eng.
Layers and Views of a Computer
System

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


333 B.Sc. Electrical Eng.
Operating System Services
 Program creation
 Program execution
 Access to I/O devices
 Controlled access to files
 System access
 Error detection and response
 Accounting

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


334 B.Sc. Electrical Eng.
O/S as a Resource Manager

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


335 B.Sc. Electrical Eng.
Types of Operating System
 Interactive
 Batch
 Single program (Uni-programming)
 Multi-programming (Multi-tasking)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


336 B.Sc. Electrical Eng.
Early Systems
 Late 1940s to mid 1950s
 No Operating System
 Programs interact directly with hardware
 Two main problems:
 Scheduling
 Setup time

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


337 B.Sc. Electrical Eng.
Simple Batch Systems
 Resident Monitor program
 Users submit jobs to operator
 Operator batches jobs
 Monitor controls sequence of events to process batch
 When one job is finished, control returns to Monitor which
reads next job
 Monitor handles scheduling

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


338 B.Sc. Electrical Eng.
Memory Layout for Resident Monitor

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


339 B.Sc. Electrical Eng.
Desirable Hardware Features

 Memory protection
 To protect the Monitor
 Timer
 To prevent a job monopolizing the system
 Privileged instructions
 Only executed by Monitor
 e.g. I/O
 Interrupts
 Allows for relinquishing and regaining control

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


340 B.Sc. Electrical Eng.
Multi-programmed Batch Systems

 I/O devices very slow


 When one program is waiting for I/O, another can use the
CPU

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


341 B.Sc. Electrical Eng.
Single Program

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


342 B.Sc. Electrical Eng.
Multi-Programming with Two Programs

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


343 B.Sc. Electrical Eng.
Multi-Programming with Three Programs

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


344 B.Sc. Electrical Eng.
Utilization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


345 B.Sc. Electrical Eng.
Time Sharing Systems
 Allow users to interact directly with the computer
 i.e. Interactive
 Multi-programming allows a number of users to interact
with the computer

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


346 B.Sc. Electrical Eng.
Scheduling
 Key to multi-programming
 Long term
 Medium term
 Short term
 I/O

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


347 B.Sc. Electrical Eng.
Long Term Scheduling
 Determines which programs are submitted for processing
 i.e. controls the degree of multi-programming
 Once submitted, a job becomes a process for the short term
scheduler
 (or it becomes a swapped out job for the medium term
scheduler)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


348 B.Sc. Electrical Eng.
Medium Term Scheduling
 Part of the swapping function (later…)
 Usually based on the need to manage multi-programming
 If no virtual memory, memory management is also an issue

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


349 B.Sc. Electrical Eng.
Short Term Scheduler
 Dispatcher
 Fine grained decisions of which job to execute next
 i.e. which job actually gets to use the processor in the next
time slot

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


350 B.Sc. Electrical Eng.
Process States

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


351 B.Sc. Electrical Eng.
Process Control Block
 Identifier
 State
 Priority
 Program counter
 Memory pointers
 Context data
 I/O status
 Accounting information

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


352 B.Sc. Electrical Eng.
PCB Diagram

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


353 B.Sc. Electrical Eng.
Key Elements of O/S

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


354 B.Sc. Electrical Eng.
Process Scheduling

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


355 B.Sc. Electrical Eng.
Memory Management
 Uni-program
 Memory split into two
 One for Operating System (monitor)
 One for currently executing program
 Multi-program
 “User” part is sub-divided and shared among active processes

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


356 B.Sc. Electrical Eng.
Swapping
 Problem: I/O is so slow compared with CPU that even in
multi-programming system, CPU can be idle most of the
time
 Solutions:
 Increase main memory
 Expensive
 Leads to larger programs
 Swapping

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


357 B.Sc. Electrical Eng.
What is Swapping?
 Long term queue of processes stored on disk
 Processes “swapped” in as space becomes available
 As a process completes it is moved out of main memory
 If none of the processes in memory are ready (i.e. all I/O
blocked)
 Swap out a blocked process to intermediate queue
 Swap in a ready process or a new process
 But swapping is an I/O process...

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


358 B.Sc. Electrical Eng.
Partitioning
 Splitting memory into sections to allocate to processes
(including Operating System)
 Fixed-sized partitions
 May not be equal size
 Process is fitted into smallest hole that will take it (best fit)
 Some wasted memory
 Leads to variable sized partitions

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


359 B.Sc. Electrical Eng.
Fixed
Partitioning

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


360 B.Sc. Electrical Eng.
Variable Sized Partitions (1)
 Allocate exactly the required memory to a process
 This leads to a hole at the end of memory, too small to use
 Only one small hole - less waste
 When all processes are blocked, swap out a process and bring
in another
 New process may be smaller than swapped out process
 Another hole

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


361 B.Sc. Electrical Eng.
Variable Sized Partitions (2)
 Eventually have lots of holes (fragmentation)
 Solutions:
 Coalesce - Join adjacent holes into one large hole
 Compaction - From time to time go through memory and move
all hole into one free block (c.f. disk de-fragmentation)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


362 B.Sc. Electrical Eng.
Effect of Dynamic Partitioning (1)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


363 B.Sc. Electrical Eng.
Effect of Dynamic Partitioning (2)

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


364 B.Sc. Electrical Eng.
Relocation
 No guarantee that process will load into the same place in
memory
 Instructions contain addresses
 Locations of data
 Addresses for instructions (branching)
 Logical address - relative to beginning of program
 Physical address - actual location in memory (this time)
 Automatic conversion using base address

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


365 B.Sc. Electrical Eng.
Paging
 Split memory into equal sized, small chunks -page frames
 Split programs (processes) into equal sized small chunks -
pages
 Allocate the required number page frames to a process
 Operating System maintains list of free frames
 A process does not require contiguous page frames
 Use page table to keep track

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


366 B.Sc. Electrical Eng.
Logical and Physical Addresses - Paging

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


367 B.Sc. Electrical Eng.
Virtual Memory
 Demand paging
 Do not require all pages of a process in memory
 Bring in pages as required
 Page fault
 Required page is not in memory
 Operating System must swap in required page
 May need to swap out a page to make space
 Select page to throw out based on recent history

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


368 B.Sc. Electrical Eng.
Thrashing
 Too many processes in too little memory
 Operating System spends all its time swapping
 Little or no real work is done
 Disk light is on all the time

 Solutions
 Good page replacement algorithms
 Reduce number of processes running
 Fit more memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


369 B.Sc. Electrical Eng.
Bonus
 We do not need all of a process in memory for it to run
 We can swap in pages as required
 So - we can now run processes that are bigger than total
memory available!

 Main memory is called real memory


 User/programmer sees much bigger memory - virtual
memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


370 B.Sc. Electrical Eng.
Page Table Structure

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


371 B.Sc. Electrical Eng.
Translation Lookaside Buffer
 Every virtual memory reference causes two physical memory
access
 Fetch page table entry
 Fetch data
 Use special cache for page table
 TLB

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


372 B.Sc. Electrical Eng.
TLB Operation

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


373 B.Sc. Electrical Eng.
TLB and Cache Operation

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


374 B.Sc. Electrical Eng.
Segmentation
 Paging is not (usually) visible to the programmer
 Segmentation is visible to the programmer
 Usually different segments allocated to program and data
 May be a number of program and data segments

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


375 B.Sc. Electrical Eng.
Advantages of Segmentation
 Simplifies handling of growing data structures
 Allows programs to be altered and recompiled
independently, without re-linking and re-loading
 Lends itself to sharing among processes
 Lends itself to protection
 Some systems combine segmentation with paging

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


376 B.Sc. Electrical Eng.
PARALLEL PROCESSING

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


377 B.Sc. Electrical Eng.
Multiple Processor Organization
 Single instruction, single data stream - SISD
 Single instruction, multiple data stream - SIMD
 Multiple instruction, single data stream - MISD
 Multiple instruction, multiple data stream- MIMD

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


378 B.Sc. Electrical Eng.
Single Instruction, Single Data Stream -
SISD
 Single processor
 Single instruction stream
 Data stored in single memory
 Uni-processor

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


379 B.Sc. Electrical Eng.
Single Instruction, Multiple Data
Stream - SIMD
 Single machine instruction
 Controls simultaneous execution
 Number of processing elements
 Each processing element has associated data memory
 Each instruction executed on different set of data by different
processors
 Vector and array processors

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


380 B.Sc. Electrical Eng.
Multiple Instruction, Single Data
Stream - MISD
 Sequence of data
 Transmitted to set of processors
 Each processor executes different instruction sequence
 Never been implemented

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


381 B.Sc. Electrical Eng.
Multiple Instruction, Multiple Data
Stream- MIMD
 Set of processors
 Simultaneously execute different instruction sequences
 Different sets of data
 SMPs, clusters and NUMA systems

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


382 B.Sc. Electrical Eng.
Taxonomy of Parallel Processor
Architectures

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


383 B.Sc. Electrical Eng.
MIMD - Overview
 General purpose processors
 Each can process all instructions necessary
 Further classified by method of processor communication

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


384 B.Sc. Electrical Eng.
Tightly Coupled - SMP
 Processors share memory
 Communicate via that shared memory
 Symmetric Multiprocessor (SMP)
 Share single memory or pool
 Shared bus to access memory
 Memory access time to given area of memory is approximately the
same for each processor

Tightly Coupled - NUMA


 Nonuniform memory access
 Access times to different regions of memroy may differ
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
385 B.Sc. Electrical Eng.
Loosely Coupled - Clusters
 Collection of independent uniprocessors or SMPs
 Interconnected to form a cluster
 Communication via fixed path or network connections

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


386 B.Sc. Electrical Eng.
Parallel Organizations - SISD

Parallel Organizations - SIMD

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


387 B.Sc. Electrical Eng.
Parallel Organizations - MIMD Shared Memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


388 B.Sc. Electrical Eng.
Parallel Organizations - MIMD
Distributed Memory

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


389 B.Sc. Electrical Eng.
Symmetric Multiprocessors
 A stand alone computer with the following characteristics
 Two or more similar processors of comparable capacity
 Processors share same memory and I/O
 Processors are connected by a bus or other internal connection
 Memory access time is approximately the same for each
processor
 All processors share access to I/O
 Either through same channels or different channels giving paths
to same devices
 All processors can perform the same functions (hence
symmetric)
 System controlled by integrated operating system
 providing interaction between processors

390
By: Interaction
Netsanet at job,
Getnet, Lecturer, task, file
M.Sc. Computer Eng.,and data element levels
B.Sc. Electrical Eng.
Multiprogramming and Multiprocessing

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


391 B.Sc. Electrical Eng.
SMP Advantages
 Performance
 If some work can be done in parallel
 Availability
 Since all processors can perform the same functions, failure of a
single processor does not halt the system
 Incremental growth
 User can enhance performance by adding additional processors
 Scaling
 Vendors can offer range of products based on number of processors

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


392 B.Sc. Electrical Eng.
Block Diagram of Tightly Coupled Multiprocessor

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


393 B.Sc. Electrical Eng.
Organization Classification
 Time shared or common bus
 Multiport memory
 Central control unit

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


394 B.Sc. Electrical Eng.
Time Shared Bus
 Simplest form
 Structure and interface similar to single processor system
 Following features provided
 Addressing - distinguish modules on bus
 Arbitration - any module can be temporary master
 Time sharing - if one module has the bus, others must wait and may
have to suspend
 Now have multiple processors as well as multiple I/O modules

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


395 B.Sc. Electrical Eng.
Symmetric Multiprocessor Organization

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


396 B.Sc. Electrical Eng.
Time Share Bus - Advantages
 Simplicity
 Flexibility
 Reliability

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


397 B.Sc. Electrical Eng.
Time Share Bus - Disadvantage
 Performance limited by bus cycle time
 Each processor should have local cache
 Reduce number of bus accesses
 Leads to problems with cache coherence
 Solved in hardware

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


398 B.Sc. Electrical Eng.
MULTICORE COMPUTERS

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


399 B.Sc. Electrical Eng.
Multicore Computers
 A multicore computer, also known as a chip
multiprocessor, combines two or more processors (called
cores) on a single piece of silicon (called a die).
 Typically, each core consists of all of the components of an
independent processor such as:
 Registers
 ALU
 Pipeline hardware, and
 control unit,
 plus L1 instruction and data caches.
 In addition to the multiple cores, contemporary multicore chips
also include L2 cache and, increasingly, L3 cache.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
400 B.Sc. Electrical Eng.
Hardware Performance Issues
 Increase in Parallelism and Complexity:
 The organizational changes in processor design have primarily been
focused on increasing instruction-level parallelism.
 So that more work could be done in each clock cycle.
 These changes include (in chronological order):
 Pipelining: multiple instructions executing in various stages of the
pipeline. There is limit to number of stages though.
 Superscalar: Multiple pipelines are constructed by replicating
execution resources. This enables parallel execution of instructions in
parallel pipelines. Again, there is a limit as to the number of pipelines
 Simultaneous Multithreading (SMT): Register banks are replicated so
that multiple threads can share the use of pipeline resources.
 Complexity of managing multiple threads over a set of pipelines limits the number
of threads
By: Netsanet Getnet,and number
Lecturer, M.Sc.of pipelines
Computer Eng.,that can be effectively utilized.
401 B.Sc. Electrical Eng.
Hardware Performance …
 Power Consumption
 Power requirements have grown exponentially as chip density and
clock frequency have risen.
 One way to control power density is to use more of the chip area
for cache memory.
 Memory transistors are smaller and have a power density an order
of magnitude lower than that of logic.
 As chip transistor density has increased, the percentage of chip
area devoted to memory has grown (now well over half the chip
area).
 How to use the logic transistors is a key design issue.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


402 B.Sc. Electrical Eng.
Hardware Performance …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


403 B.Sc. Electrical Eng.
Multicore Organization
 Main variables in multicore organization:
 The number of core processors on the chip
 The number of levels of cache memory
 Amount of cache memory that is shared
 The following are multicore organization alternatives.
 The first one is an organization where there is only L1 as on-chip
cache.
 Each core has its own dedicated L1 cache.
 Which is divided into instruction (L1-I) and data (L1-D) caches.
 Found on earlier multicore computer chips and is still seen in
embedded chips. (example: ARM11 MPCore.)
 The second one is also one where there is no on-chip sharing of
cache.
 Additional on-chip cache (L2). (Example: AMD Opteron).
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
404 B.Sc. Electrical Eng.
Multicore Organization …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


405 B.Sc. Electrical Eng.
Multicore Organization …

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


406 B.Sc. Electrical Eng.
Multicore Organization …
 The third one shows a similar organization but here with a
shared L2 cache. (Example: Intel Core Dou.)
 The last one shows dedicated L1 and L2 caches for each core
processor while a separate, shared cache (L3) is used. (Example:
Intel Core i7).
 Advantage of using a shared L2 cache over dedicated caches:
1. Constructive interference can reduce overall miss rates.
 That is, access to a memory location by a thread on one core
brings the block containing the referenced location into the share
cache.
 If a thread on another core requires the same memory block, those
locations will already be available in the shared on-chip cache.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
407 B.Sc. Electrical Eng.
Multicore Organization …
2. Data shared by multiple cores is not replicated at shared cache
level.
3. The amount of shared cache allocated to each core is dynamic,
thereby threads that have less locality can employ more cache.
4. Interprocess communication is easy to implement, via shared
memory locations.
5. Use of shared L2 cache confines the cache coherence problem to
the L1 cache level.
 An advantage for dedicated L2 cache: Each core accesses its
private l2 cache faster
 Advantageous for threads that exhibit strong locality.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


408 B.Sc. Electrical Eng.
RISC and CISC

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


409 B.Sc. Electrical Eng.
What is CISC?
 CISC is an acronym for Complex Instruction Set Computer and are
chips that are easy to program and which make efficient use of memory.
Since the earliest machines were programmed in assembly language and
memory was slow and expensive, the CISC philosophy made sense, and
was commonly implemented in such large computers as the PDP-11
and the DECsystem 10 and 20 machines.
 Most common microprocessor designs such as the Intel 80x86 and
Motorola 68K series followed the CISC philosophy.
 But recent changes in software and hardware technology have forced a
re-examination of CISC and many modern CISC processors are hybrids,
implementing many RISC principles.
 CISC was developed to make compiler development simpler. It shifts
most of the burden of generating machine instructions to the processor.
For example, instead of having to make a compiler write long machine
instructions to calculate a square-root, a CISC processor would have a
410 built-in ability to do this.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
B.Sc. Electrical Eng.
CISC Attributes
 The design constraints that led to the development of CISC (small
amounts of slow memory and fact that most early machines were
programmed in assembly language) give CISC instructions sets some
common characteristics:
 A 2-operand format, where instructions have a source and a
destination. Register to register, register to memory, and memory to
register commands. Multiple addressing modes for memory,
including specialized modes for indexing through arrays
 Variable length instructions where the length often varies according
to the addressing mode
 Instructions which require multiple clock cycles to execute.

 E.g. Pentium is considered a modern CISC processor

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


411 B.Sc. Electrical Eng.
 Most CISC hardware architectures have several
characteristics in common:
 Complex instruction-decoding logic, driven by the need for a single
instruction to support multiple addressing modes.
 A small number of general purpose registers. This is the direct result
of having instructions which can operate directly on memory and the
limited amount of chip space not dedicated to instruction decoding,
execution, and microcode storage.
 Several special purpose registers. Many CTSC designs set aside
special registers for the stack pointer, interrupt handling, and so on.
This can simplify the hardware design somewhat, at the expense of
making the instruction set more complex.
 A 'Condition code" register which is set as a side-effect of most
instructions. This register reflects whether the result of the last
operation is less than, equal to, or greater than zero and records if
certain error conditions occur.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


412 B.Sc. Electrical Eng.
 At the time of their initial development, CISC machines used available
technologies to optimize computer performance.
 Microprogramning is as easy as assembly language to implement, and
much less expensive than hardwiring a control unit.
 The ease of microcoding new instructions allowed designers to make
CISC machines upwardly compatible: a new computer could run the
same programs as earlier computers because the new computer would
contain a superset of the instructions of the earlier computers.
 As each instruction became more capable, fewer instructions could be
used to implement a given task. This made more efficient use of the
relatively slow main memory.
 Because microprogram instruction sets can be written to match the
constructs of high-level languages, the compiler does not have to be as
complicated.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


413 B.Sc. Electrical Eng.
CISC Disadvantages
Designers soon realized that the CISC philosophy had its own problems,
including:
 Earlier generations of a processor family generally were contained as a
subset in every new version - so instruction set & chip hardware
become more complex with each generation of computers.
 So that as many instructions as possible could be stored in memory
with the least possible wasted space, individual instructions could be of
almost any length - this means that different instructions will take
different amounts of clock time to execute, slowing down the overall
performance of the machine.
 Many specialized instructions aren't used frequently enough to justify
their existence -approximately 20% of the available instructions are
used in a typical program.
 CISC instructions typically set the condition codes as a side effect of
the instruction. Not only does setting the condition codes take time,
but programmers have to remember to examine the condition code
bits before a subsequent instruction changes them.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
414 B.Sc. Electrical Eng.
What is RISC?
 RISC?
RISC, or Reduced Instruction Set Computer. is a type of microprocessor
architecture that utilizes a small, highly-optimized set of instructions,
rather than a more specialized set of instructions often found in other
types of architectures.
 History
The first RISC projects came from IBM, Stanford, and UC-Berkeley
in the late 70s and early 80s. The IBM 801, Stanford MIPS, and
Berkeley RISC 1 and 2 were all designed with a similar philosophy
which has become known as RISC. Certain design features have been
characteristic of most RISC processors:
 one cycle execution time: RISC processors have a CPI (clock per instruction) of one
cycle. This is due to the optimization of each instruction on the CPU and a
technique called PIPELINING
 pipelining: a technique that allows for simultaneous execution of parts, or stages,
of instructions to more efficiently process instructions;
 large number of registers: the RISC design philosophy generally incorporates a
larger number of registers to prevent in large amounts of interactions with
By: memory
Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
415 B.Sc. Electrical Eng.
RISC Attributes
The main characteristics of CISC microprocessors are:
 Extensive instructions.
 Complex and efficient machine instructions.
 Microencoding of the machine instructions.
 Extensive addressing capabilities for memory operations.
 Relatively few registers.
In comparison, RISC processors are more or less the opposite of the
above:
 Reduced instruction set.
 Less complex, simple instructions.
 Hardwired control unit and machine instructions.
 Few addressing schemes for memory operands with only two basic
instructions, LOAD and
 STORE
 Many symmetric registers which are organized into a register file.
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
416 B.Sc. Electrical Eng.
Pipelining
RISC Pipelines
A RISC processor pipeline operates in much the same way, although
the stages in the pipeline are different. While different processors have
different numbers of steps, they are basically variations of these five,
used in the MIPS R3000 processor:
- fetch instructions from memory
- read registers and decode the instruction
- execute the instruction or calculate an address
- access an operand in data memory
- write the result into a register

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


417 B.Sc. Electrical Eng.
RISC Disadvantages
 There is still considerable controversy among experts about the
ultimate value of RISC architectures. Its proponents argue that RISC
machines are both cheaper and faster, and are therefore the machines
of the future.
 However, by making the hardware simpler, RISC architectures put a
greater burden on the software. Is this worth the trouble because
conventional microprocessors are becoming increasingly fast and
cheap anyway?

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


418 B.Sc. Electrical Eng.
CISC versus RISC
CISC RISC

Emphasis on hardware Emphasis on software


Includes multi-clock Single-clock,
complex instructions reduced instruction only
Memory-to-memory: Register to register:
"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in are independent
instructions instructions
Small code sizes, Low cycles per second,
high cycles per second large code sizes
Transistors used for storing Spends more transistors
complex instructions on memory registers
By: Netsanet Getnet, Lecturer, M.Sc.
Computer Eng., B.Sc. Electrical Eng. 419
Summary
 As memory speed increased, and high-level languages displaced
assembly language, the major reasons for CISC began to disappear, and
computer designers began to look at ways computer performance could
be optimized beyond just making faster hardware.
 One of their key realizations was that a sequence of simple instructions
produces the same results as a sequence of complex instructions, but
can be implemented with a simpler (and faster) hardware design.
(Assuming that memory can keep up.) RISC (Reduced Instruction Set
Computers) processors were the result.
 CISC and RISC implementations are becoming more and more alike.
Many of today’s RISC chips support as many instructions as yesterday's
CISC chips. And today's CISC chips use many techniques formerly
associated with RISC chips.
 To some extent, the argument is becoming moot because CISC and
RISC implementations are becoming more and more alike. Many of
today's RISC chips support as many instructions as yesterday's CISC
chips. And today's CISC chips use many techniques formerly associated
By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,
420 with RISC chips.
B.Sc. Electrical Eng.
Modern Day Advancement
 CISC and RISC Convergence
State of the art processor technology has changed significantly since
RISC chips were first introduced in the early '80s.
 Because a number of advancements are used by both RISC and CISC
processors, the lines between the two architectures have begun to
blur.
 In fact, the two architectures almost seem to have adopted the
strategies of the other.
 Because processor speeds have increased, CISC chips are now able to
execute more than one instruction within a single clock.
 This also allows CISC chips to make use of pipelining.
 With other technological improvements, it is now possible to fit
many more transistors on a single chip.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


421 B.Sc. Electrical Eng.
 This gives RISC processors enough space to incorporate more
complicated, CISC-like commands. RISC chips also make use of more
complicated hardware, making use of extra function units for
superscalar execution.
 All of these factors have led some groups to argue that we are now in a
"post-RISC" era, in which the two styles have become so similar that
distinguishing between them is no longer relevant.
 However, it should be noted that RISC chips still retain some
important traits.
 RISC chips strictly utilize uniform, single-cycle instructions.
 They also retain the register-to-register, load/store architecture.
 And despite their extended instruction sets, RISC chips still have a
large number of general purpose registers.

By: Netsanet Getnet, Lecturer, M.Sc. Computer Eng.,


422 B.Sc. Electrical Eng.

You might also like