0% found this document useful (0 votes)
16 views120 pages

Architecture CA

The document discusses various addressing modes used in computer architecture, including immediate, direct, indirect, register, register indirect, displacement, and stack addressing, along with their definitions and examples. It also covers instruction formats, arithmetic operations, integer representation, and the classification of computer architectures based on instruction and data streams, such as SISD, SIMD, MISD, and MIMD. Additionally, it explains pipelining techniques, their advantages and disadvantages, and the challenges faced due to hazards in pipelining.

Uploaded by

karrijanardhan17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views120 pages

Architecture CA

The document discusses various addressing modes used in computer architecture, including immediate, direct, indirect, register, register indirect, displacement, and stack addressing, along with their definitions and examples. It also covers instruction formats, arithmetic operations, integer representation, and the classification of computer architectures based on instruction and data streams, such as SISD, SIMD, MISD, and MIMD. Additionally, it explains pipelining techniques, their advantages and disadvantages, and the challenges faced due to hazards in pipelining.

Uploaded by

karrijanardhan17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 120

Addressing Modes and

Instruction Format
Addressing Modes
The most common addressing modes are
1. Immediate
2. Direct
3. Indirect
4. Register
5. Register Indirect
6. Displacement
7. Stack
Addressing Mode
• All computer architectures provide more than one of
these addressing modes
• The question arises as to how the control unit can
determine which addressing mode is being used in a
particular instruction
• Several approaches are used. Often, different
opcodes will use different addressing modes
• Also, one or more bits in the instruction format can
be used as a mode field
• The value of the mode field determines which
addressing mode is to be used
In order to explain the addressing modes, we use the
following notation:

➢A = contents of an address field in the instruction


that refers to a memory

➢R = contents of an address field in the instruction


that refers to a register

➢EA = actual (effective) address of the location


containing the referenced operand

➢(X) = contents of location X


1. Immediate Addressing
The simplest form of addressing is immediate addressing in
which the operand value is present in the instruction.
Operand = A
2. Direct Addressing
A very simple form of addressing is addressing ,in which the
address field contains the effective address of the operand.
EA =A
3. Indirect Addressing
With direct addressing, the length of the address field is
usually less than the word length, thus limiting the address
range. One solution is to have the address field refer to the
address of a word in memory, which in turn contains a full-
length address of the operand. This is known as indirect
addressing.
EA = (A)
4. Register Addressing
It is similar to direct addressing. The only difference is
that the address field refers to a register rather than a main
memory address.
EA = R
5. Register Indirect Addressing
Just as Register addressing is analogous to Direct addressing,
Register Indirect addressing is analogous to indirect
addressing. In both cases, the only difference is whether the
address field refers to memory location or a register. Thus,
for register indirect address.
EA =(R)
6. Displacement Addressing
A very powerful mode of addressing combines the
capabilities of direct addressing and register indirect
addressing. It is known by a variety of names depending on
the context of its use, but the basic mechanism is the same.
EA = A + (R)
7. Stack Addressing
A Stack is a linear array of locations.
It is sometimes referred to as a pushdown list or
Last-in-first-out queue. The stack is a reserved
block of locations.
The Stack pointer is maintained in a register. Thus,
references to stack locations in memory are in fact
register indirect addresses.
The stack mode of addressing is a form of implied
addressing.
The machine instructions need not include a
memory reference but implicitly operate on the
top of the stack.
Instruction Format
• An instruction format defines the layout of the bits
of an instruction in terms of its constituent’s parts
• An instruction format must include an opcode and
implicitly or explicitly zero or more operands
• Each explicit operand is referenced using one of
the addressing modes that is available for that
machine
• The format must implicitly or explicitly indicate the
addressing mode of each operand
• For most instruction sets, more than one
instruction format is used
Instruction Format
• Four common instruction formats are shown in
the figure
Instruction Length
• Three address instruction formats are not
common because they require a relatively long
instruction format to hold the three address
reference.

• With two address instructions and for binary


operations one address must do double duty as
both an operand and a result.

• In one address instruction format a second


address must be implicit for a binary operation.
For implicit reference, a processor register is
used and it is termed as accumulator (AC).
Consider a simple arithmetic expression to evaluate:
Y= (A + B) / (C * D)
MUL R3, K
Arithmetic Operations
• ALU-Arithmetic and Logic Unit
• ALU is that part of the computer that actually
performs arithmetic and logical operations on data.
• All of the other elements of the Computer System-
Control Unit, Registers, Memory, I/O- mainly bring
data into the ALU to process and then to take the
results back out.
Integer Representation
• Computer understands binary language
• Here numbers are represented in form of ‘0’ and ‘1’
Sign-Magnitude Representation
• Treating the most significant(leftmost) bit in the
word as a sign.
• If the sign bit is ‘0’, the number is positive; if the
sign bit is ‘1’,the number is negative.
• In a n-bit word, the rightmost n-1 bits hold the
magnitude of the integer.
Addition
• Addition in two’s complement
is shown in figure.
• The first four examples
illustrate successful
operations is positive, we get
a positive number in twos
complement form, which is
the same as in unsigned-
integer form.
• Overflow RULE: If two
numbers are added, and they
are both positive or both
negative, then overflow
occurs if and only if the result
has the opposite sign.
Subtraction
• To subtract one
number(subtrahend)
from
another(minuend), take
the twos
complement(negation)
of the subtrahend and
add it to the minuend.
Multiplication
• Multiplication involves the generation of partial
products, one for each digit in the multiplier. These
partial products are then summed to produce the
final product.
• The partial products are easily defined. When the
multiplier bit is 0, the partial product is 0. When the
multiplier is 1, the partial product is the multiplicand.
• The total product is produced by summing the partial
products. For this operation, each successive partial
product is shifted one position to the left relative to
the preceding partial product.
Multiplication of negative
numbers
• If the multiplier or
multiplicand are
negative numbers then
we cant multiply the
using shift method.
• For negative numbers
first convert into
positive numbers using
Twos complement then
multiply .
• Booth’s Algorithm
Booth’s Algorithm
• The right shift is such
that the leftmost bit of
A, namely An-1, not only
is shifted into An-2, but
also remains in An-1.
This is required to
preserve the sign of the
number in A and Q. It is
known as an arithmetic
shift, because it
preserves the sign bit.
Division
• Division is somewhat more complex than
multiplication but based on the same general
principles.
• The algorithm involves repetitive Shifting and
Addition or Subtraction.
Division algorithm for unsigned
• The divisor is place in M register,
the dividend in the Q register.
• At each step, the A and Q registers
together are shifted to the left 1
bit.
• M is subtracted from A to
determine whether A divides the
partial remainder. If it does, then
Q0 gets a 1 bit.
• Otherwise Q0 gets a 0 bit and M
must be added back to A to restore
the previous value.
• The count is then decremented,
and the process continues.
Floating Point Representation
• With a fixed point notation it is possible to represent a range
of positive and negative integers centred on or near 0. By
assuming a fixed binary or radix point, this format allow the
representation of numbers with a fractional component as
well.
• We represent exponent notation for big numbers

• This number can be stored in a binary word with three fields


:
• Sign : plus or minus
• Significand S
• Exponent E
IEEE Standard for Binary Floating-point
Representation
PROCESSORS
AND
CONTROL
UNIT
Flynn’s Classification of
Computer Architectures
Flynn’s Classification
Based on notions of instruction and data streams (1972)
➢ SISD (Single Instruction stream over a Single Data
stream)
➢ SIMD (Single Instruction stream over Multiple Data
streams)
➢ MISD (Multiple Instruction streams over a Single Data
stream)
➢ MIMD (Multiple Instruction streams over Multiple Data
stream)
Popularity
– MIMD > SIMD > MISD
SISD (Single Instruction Stream over a Single Data Stream)

SISD
➢ Conventional sequential machines

IS: Instruction Stream DS: Data Stream


CU: Control Unit PU: Processing Unit
MU: Memory Unit
SIMD (Single Instruction Stream over Multiple Data
Streams)

SIMD

➢ Vector computers
➢ Special purpose computations
PE: Processing Element LM: Local Memory

SIMD architecture with distributed memory


MISD (Multiple Instruction Streams Over A Single Data
Streams)
• MISD

➢ Processor arrays, systolic arrays


➢ Special purpose computations

MISD architecture (the systolic array)


MIMD (Multiple Instruction Streams Over Multiple Data
Stream)
• MIMD

➢ General purpose parallel computers

MIMD architecture with shared memory


Pipelining Technique
➢A technique used in advanced microprocessors
where the microprocessor begins executing a
second instruction before the first has been
completed.
➢A Pipeline is a series of stages, where some work is
done at each stage. The work is not finished until it
has passed through all stages

➢With pipelining, the computer architecture allows


the next instructions to be fetched while the
processor is performing other operations
Working of pipeline
➢The pipeline is divided into segments and each
segment can execute it operation concurrently
with the other segments.
➢Once a segment completes an operations, it
passes the result to the next segment in the
pipeline
➢And fetches the next operations from the
preceding segment.
Example
➢There are four pipelining instructions with five
different stages of operation are shown below:
• Laundry Example
• Ann, Brian, Cathy, Dave
each have one load of A B C D
clothes to wash, dry, and fold
• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes


6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
C
d
e
r D

Sequential laundry takes 6 hours for 4 loads


If they learned pipelining, how long would laundry take?
Start work ASAP
6 PM 7 8 9 10 11 Midnight

Time

30 40 40 40 40 20
T
a A
s • Pipelined laundry takes 3.5
k hours for 4 loads
B
O
r
C
d
e
r D
Pipelining Lessons
6 PM
• Pipelining doesn’t help latency
7 8 9
of single task, it helps
throughput of entire workload
Time
• Pipeline rate limited by
T
30 40 40 40 40 20 slowest pipeline stage
a
s • Multiple tasks operating
k A simultaneously
• Potential speedup = Number
O
B pipe stages
r
d • Unbalanced lengths of pipe
e stages reduces speedup
C
r • Time to “fill” pipeline and
time to “drain” it reduces
D speedup
The Basic Pipeline Structure
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
Ifetch Reg DMem Reg
I
n
s

ALU
Reg
t Ifetch Reg DMem

r.

ALU
O Ifetch Reg DMem Reg

r
d

ALU
e Ifetch Reg DMem Reg

r
IF: The instruction Fetch (IF) stage is responsible
for obtaining the requested instruction from
memory.

ID: The Instruction Decode (ID) stage is


responsible for decoding the instruction and
sending out the various control lines to the other
parts of the processor.

EX: The Execution (EX) stage is where any


calculations are performed. The main component
in this stage is the ALU.
MEM: The Memory and IO (MEM) stage is
responsible for storing and loading values to
and from memory. It also responsible for
input or output from the processor.

WB: The Write Back (WB) stage is


responsible for writing the result of a
calculation, memory access or input into the
register file.
Advantages:
➢ More efficient use of processor
➢ Quicker time of execution of large
number of instructions

Disadvantages:
➢ Pipelining involves adding hardware to the chip

➢ Inability to continuously run the pipeline at full


speed because of pipeline hazards which
disrupt the smooth execution of the pipeline.
Pipeline Hurdles
Limits to pipelining: Hazards prevent next instruction
from executing during its designated clock cycle
• Structural hazards: HW cannot support this
combination of instructions (single person to fold
and put clothes away)
• Data hazards: Instruction depends on result of
prior instruction still in the pipeline
• Control hazards: Pipelining of branches & other
instructions that change the PC
• Common solution is to stall the pipeline until the
hazard is resolved, inserting one or more
“bubbles” in the pipeline
Pipeline Hurdles
Definition
• Conditions that lead to incorrect behavior if not fixed
• Structural hazard
• two different instructions use same h/w in same cycle
• Data hazard
• two different instructions use same storage
• must appear as if the instructions execute in correct order
• Control hazard
• one instruction affects which instruction is next
Resolution
• Pipeline interlock logic detects hazards and fixes them
• simple solution: stall
• better solution: partial stall
Structural Hazards
When two or more
Time (clock cycles) different
instructions want
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 to use same
hardware resource
I Load Ifetch

ALU
Reg DMem Reg in same cycle
n
e.g., MEM uses the
s Instr 1

ALU
Ifetch Reg DMem Reg
same memory port
t as IF as shown in
r. this slide.

ALU
Ifetch Reg DMem Reg
Instr 2
O

ALU
r Instr 3 Ifetch Reg DMem Reg

d
e

ALU
Instr 4 Ifetch Reg DMem Reg

r
Structural Hazards
Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 This is another
way of looking
at the effect of
I Load

ALU
Ifetch Reg DMem Reg
a stall.
n
s Instr 1

ALU
Ifetch Reg DMem Reg
t
r.

ALU
Instr 2 Ifetch Reg DMem Reg

O
r Stall Bubble Bubble Bubble Bubble Bubble
d
e
r

ALU
Instr 3 Ifetch Reg DMem Reg
Structural Hazards
Structural hazards are reduced with these rules:
• Each instruction uses a resource at most once
• Always use the resource in the same pipeline stage
• Use the resource for one cycle only
Many RISC processors are designed by keeping this in mind
Sometimes it is observed very complex to do this. For example,
memory of necessity is used in the IF and MEM stages.
Some common Structural Hazards:
• Memory - we’ve already mentioned this one.
• Floating point - Since many floating point instructions
require many cycles, it’s easy for them to interfere with
each other.
• Starting up more of one type of instruction than there are
resources.
Data Hazards
These occur when at any time, there are instructions
active that need to access the same data (memory or
register) locations.

Where there’s real trouble is when we have:

instruction A
instruction B

and B manipulates (reads or writes) data before A does.


This violates the order of the instructions, since the
architecture implies that A completes entirely before B
is executed.
Data Hazards
Read After Write (RAW)
Execution Order is: InstrJ tries to read operand before InstrI
InstrI
InstrJ
writes it

I: add r1,r2,r3
J: sub r4,r1,r3

• Caused by a “Dependence” (in compiler


nomenclature). This hazard results from an
actual need for communication.
Data Hazards
Execution Order is: Write After Read (WAR)
InstrI
InstrJ
InstrJ tries to write operand before InstrI
reads i
• Gets wrong operand
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7

• Called an “anti-dependence” by
compiler writers.
This results from reuse of the name “r1”.
Data Hazards
Execution Order is: Write After Write (WAW)
InstrI
InstrJ InstrJ tries to write operand before InstrI writes
it
• Leaves wrong result ( InstrI not InstrJ )

I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7

• Called an “output dependence” by compiler


writers
This also results from the reuse of name “r1”.
Control Hazard
➢Control hazard :
➢Also called branch hazard.
➢When a planned instruction cannot execute
in the proper pipeline clock cycle because the
instruction that was fetched is not the
intended one
➢A control hazard is when we need to find the
destination of a branch, and can’t fetch any
new instructions until we know that
destination.
Exceptions
• An exception is an event , which occurs during the
execution of a program that disrupts the normal flow
of the program’s instructions.

• When an error occurs within a method, the method


creates an object and hands it off to the runtime
system. This block of code is called an exception
handler.

• There are mainly two types of exceptions :


• Checked
• Unchecked
Basic MIPS Implementation
Five steps in MIPS instruction execution are as follows
1. Fetch instruction from memory.
2. Read registers while decoding the instruction. The
regular format of MIPS instructions allows reading
and decoding to occur simultaneously.
3. Execute the operation or calculate an address.
4. Access an operand in data memory
5. Write results into a register.
An Overview of the Implementation
• For most instructions : fetch instruction, fetch
operands, execute, store.
• Missing Multiplexers, and some Control lines for
read and write.
• The program counter gives the instruction address to the
instruction memory.
• After the instruction is fetched, the register operands
required by an instruction are specified by fields of that
instructions.
• Once the register operands have been fetched, they can be
used to compute a memory address, to compute an
arithmetic result or a compare.
• If the instruction is an arithmetic-logical instruction, the
results from the ALU must be written to register.
• If the operation is a load or store, the ALU result is used as
an address to either store a value from memory into the
registers. The results from the ALU or memory is written
back into the register file.
• Branches require the use of the ALU output to determine
the next instruction address which comes either from the
ALU or from an adder that increments the current PC by 4.
• The basic implementation of the MIPS subset including
the necessary multiplexers and control lines.
• Multiplexers select from among several inputs based on
the setting of its control lines. The control lines are set
based on information taken from the instruction being
executed.
Super Scalar Architecture
• Superscalar is a computer designed to improve the
performance of the execution of scalar instructions.
• A scalar is a variable that can hold only one atomic value
at a time, e.g., an integer or a real.
• A scalar architecture processes one data item at a time.
• Examples of scalar variables:
• Arrays
• Matrices
• Records
Super Scalar Architecture
• In a superscalar architecture (SSA), several scalar
instructions can be initiated simultaneously and
executed independently.

• Pipelining allows also several instructions to be


executed at the same time, but they have to be in
different pipeline stages at a given moment.

• There can be several instructions executing


simultaneously in the same pipeline stage.
Basic Concept of Super Scalar
• SSA allows several instructions to be issued and
completed per clock cycle.

• It consists of a number of pipelines that are working


in parallel.

• Depending on the number and kind of parallel units


available, a certain number of instructions can be
executed in parallel.

• Each unit is also pipelined and can execute several


operations in different pipeline stages.
Working of Super Scalar Processor
• A SSA processor fetches multiple instructions at a
time, and attempts to find nearby instructions that
are independent of each other and therefore can be
executed in parallel.

• Based on the dependency analysis, the processor


may issue and execute instructions in an order that
differs from that of the original machine code.

• The processor may eliminate some unnecessary


dependencies by the use of additional registers and
renaming of register references.
RISC vs CISC Processor
RISC CISC
Simple instructions taking one Complex instructions taking
cycle multiple cycles
Instructions are executed by Instructions are executed by
hardwired control unit microprogramed control unit

Few instructions Many instructions


Fixed format instructions Variable format instructions
Few addressing mode, and most Many addressing modes
instructions have register to
register addressing mode

Multiple register set Single register set


Highly pipelined Not pipelined or less pipelined
Problems with conventional approach
• Limits to conventional exploitation of ILP:
1) pipelined clock rate: at some point, each increase
in clock rate has corresponding CPI increase
(branches, other hazards)
2) instruction fetch and decode: at some point, its
hard to fetch and decode more instructions per clock
cycle
3) cache hit rate: some long-running (scientific)
programs have very large data sets accessed with
poor locality.
others have continuous data streams (multimedia)
and hence poor locality
Alternative Model: Vector Processing
• Vector processors have high-level operations that work on
linear arrays of numbers: "vectors"

SCALAR VECTOR
(1 operation) (N operations)

r1 r2 v1 v2
+ +
r3 v3 vector
length

add r3, r1, r2 add.vv v3, v1, v2


Vector Processing
• There is a class of computational problems that
are beyond the capabilities of a conventional
computer.

• These problems require a vast number of


computations that will take a conventional
computer days or even weeks to complete.

• Vector processors have high-level operations


that work on linear arrays of numbers:
"vectors".
Properties of Vector Processors
• Each result independent of previous result
• long pipeline, compiler ensures no dependencies
• high clock rate
• Vector instructions access memory with known pattern
• highly interleaved memory
• amortize memory latency of over 64 elements
• no (data) caches required! (Do use instruction cache)
• Reduces branches and branch problems in pipelines
• Single vector instruction implies lots of work ( loop)
• fewer instruction fetches
Vector Processing Applications
• Problems that can be efficiently formulated in
terms of vectors and matrices:
• Long range weather forecasting
• Petroleum explorations
• Seismic data analysis
• Medical diagnosis
• Aerodynamics and space flight simulations
• Artificial intelligence and export systems
• Mapping the human genome
• Image processing
• Vector processor computer has the ability to process
vectors and matrices much faster than conventional
computers
Styles of Vector Architectures

• memory-memory vector processors: all vector


operations are memory to memory
• vector-register processors: all vector operations
between vector registers (except load and store)

• Vector equivalent of load-store architectures

• Includes all vector machines since late 1980s:


Cray, Convex, Fujitsu, Hitachi, NEC
Components of Vector Processor
• Vector Register: fixed length bank holding a single vector
• has at least 2 read and 1 write ports
• typically 8-32 vector registers,
• Vector Functional Units (FUs): fully pipelined, start new
operation every clock
• typically 4 to 8 FUs: FP add, FP mult, FP reciprocal (1/X),
integer add, logical, shift; may have multiple of same unit
• Vector Load-Store Units (LSUs): fully pipelined unit to
load or store a vector; may have multiple LSUs
• Scalar registers: single element for FP scalar or address
• Cross-bar to connect FUs , LSUs, registers
“DLXV” Vector Instructions
Instr. Operands Operation Comment
• ADDV V1,V2,V3 vector + vector V1=V2+V3

• ADDSV V1,F0,V2 scalar + vector V1=F0+V2

• MULTV V1,V2,V3 vector x vector V1=V2xV3

• MULSV V1,F0,V2 scalar x vector V1=F0xV2

• MOV VLR,R1 Vec. Len. Reg. = R1 set vector length

• MOV VM,R1 Vec. Mask = R1 set vector mask


Memory operations
• Load/store operations move groups of data between
registers and memory
• Three types of addressing

• Unit stride
• Fastest

• Non-unit (constant) stride

• Indexed (gather-scatter)
• Vector equivalent of register indirect
• Good for sparse arrays of data
• Increases number of programs that vectorize
DAXPY (Y = a * X + Y)
Assuming vectors X, Y LD F0,a ;load scalar a
are length 64 LV V1,Rx ;load vector X
Scalar vs. Vector MULTS V2,F0,V1 ;vector-scalar mult.
LV V3,Ry ;load vector Y
ADDV V4,V2,V3 ;add
LD F0,a SV Ry,V4 ;store the result
ADDI R4,Rx,#512 ;last address to load
loop: LD F2, 0(Rx) ;load X(i)
MULTD F2,F0,F2 ;a*X(i)
LD F4, 0(Ry) ;load Y(i)
ADDD F4,F2, F4 ;a*X(i) + Y(i)
SD F4 ,0(Ry) ;store into Y(i)
ADDI Rx,Rx,#8 ;increment index to X
ADDI Ry,Ry,#8 ;increment index to Y
SUB R20,R4,Rx ;compute bound
BNZ R20,loop ;check if done
One Example
FORTRAN language
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
Conventional computer (Machine language)
Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = I + 1
If I  100 goto 20
One Example
Vector computer

Vector instruction format

ADD A B C 100

Matrix Multiplication:
3 x 3 matrices multiplication :
 a11 a12 a13  b11 b12 b13   c11 c12 c13 
a a22 a23   b21 b22 b23  = c21 c22 c23 
 21
 a31 a32 a33  b31 b32 b33  c31 c32 c33 
c11 = a11 b11 + a12 b21 + a13 b31
Array Processors
How Array Processor Help?
Classification of Array Processors
Array processor is a processor that performs computations
on a large array of data. (It executes one instruction at a
time but on array of data.)
Attached array processor:
➢ Auxiliary processor attached to a general purpose
computer to improve the numerical computation
performance.
SIMD array processor:
➢ Computer with multiple processing units operating in
parallel
Vector C = A + B ci = ai + bi
Attached array processor

➢ The processor is designed as a peripheral for complex scientific


applications attached with a conventional host computer.
➢ The peripheral is treated like and external interface. The data
are transferred from main memory to local memory through
high-speed bus.
➢ The general-purpose computer without the attached processor
serves the users that need conventional data processing.
SIMD array processor

➢ In this processor, scalar and program control instructions are


directly executed within the master control unit.
➢ Vector instructions are broadcast to all processing elements (Pes)
simultaneously.
Example: C = A + B
➢ The master control unit first stores the i th components ai and bi
in local memory Mi for i = 1, 2, …, n.
➢ Broadcasts the floating-point add instruction ci = ai + bi to all PEs
➢ The components of ci are stored in fixed locations in each local
memory.
Performance and scalability of array
processors
Advantages of Array processors
ST Questions
Q1. Discuss the paging technique in details with adequate
example. Explain in details the techniques, which can be
used for RISC and CISC computers with suitable example.
Distinguish the addressing schemes during I/O data
transfer process with example. [4+4+2]

Q2. Discuss Flynn’s classification of computer architectures in


details with neat diagram. Explain the address translation
procedure in virtual memory with an example. Explain the
term “instructions are independent” in the vector
computer. [4+4+2]
❖ Distribution of questions:
Col. 1 Col. 2 Col. 3 Col. 4 Col. 5 Col. 6 Col. 7 Col. 8

Q1 Q2 Q1 Q2 Q1 Q2 Q1 Q2
The Control Unit
• Control Unit State Machine has very simple
structure:
• 1) Fetch: Ask the RAM for the instruction
whose address is stored in IP.
• 2) Execute: There are only a small number
of possible instructions.
Depending on which it is, do
what is necessary to execute it.
• 3) Repeat: Add 1 to the address stored in
IP, and go back to Step 1 !
The Control Unit is a State Machine
Fetch
Add Store
Load … Goto

Exec Exec Exec Exec Exec

… … … …

Add 1
to IP
A Simple Program
• Want to add values of variables a and b (assumed
to be in memory), and put the result in variable c in
memory, I.e. c  a+b
• Instructions in program
• Load a into register r1
• Load b into register r3
• r2  r1 + r3
• Store r2 in c
Running the Program

2 a

1 c

3 b
r1 2

r2
r3
Logic Memory
r4
IR Load a into r1 Load a into r1 2005
Load b into r3 2006
IP 2005
r2 r1 + r3 2007
Store r2 into c 2008
CPU
Running the Program

2 a

1 c

3 b
r1 2
r2
r3 3
Logic Memory
r4
IR Load b into r3 Load a into r1 2005
Load b into r3 2006
IP 2006
r2 r1 + r3 2007
Store r2 into c 2008
CPU
Running the Program

2 a

1 c

3 b
r1 2
r2 5

r3 3
Logic Memory
r4
IR r2  r1 + r3 Load a into r1 2005
Load b into r3 2006
IP 2007
r2 r1 + r3 2007
Store r2 into c 2008
CPU
Running the Program

2 a

1 c

3 b
r1 2
r2 5

r3 3
Logic Memory
r4
IR Store r2 into c Load a into r1 2005
Load b into r3 2006
IP 2008
r2 r1 + r3 2007
Store r2 into c 2008
CPU
Running the Program

2 a

5 c

3 b
r1 2
r2 5

r3 3
Logic Memory
r4
IR Store r2 into c Load a into r1 2005
Load b into r3 2006
IP 2008
r2 r1 + r3 2007
Store r2 into c 2008
CPU
Component of Control Unit
➢The control units use fixed logic circuits to
interpret instructions and generate control
signals from them.
➢The fixed logic circuit block includes
combinational circuit that generates the required
control outputs for decoding and encoding
functions.
➢There are two types of CU used in computer
system
❖HCU
❖MCU
Hardwired Control Unit (HCU)
Principle of operation (HCU)
Instruction decoder
➢ It decodes the instruction loaded in the IR.
➢ If IR is an 8 bit register then instruction decoder
generates 256 lines; one for each instruction.
➢ According to code in the IR, only one line
amongst all output lines of decoder goes high
(set to 1 and all other lines are set to 0).

Step decoder
➢ It provides a separate signal line for each step,
or time slot, in a control sequence.
Principle of operation (HCU)
Encoder
➢ It gets in the input from instruction decoder,
step decoder, external inputs and condition
codes.

➢ It uses all these inputs to generate the individual


control signals.

➢ After execution of each instruction end signal is


generated this resets control step counter and
make it ready for generation of control step for
next instruction.
Characteristics of HCU

1. It uses flags, decoder, logic gates and other


digital circuits.
2. On the basis of input Signal output is
generated.
3. Difficult to design, test and implement.
4. Inflexible to modify.
5. Faster mode of operation.
6. Expensive and high error.
7. Used in RISC processor.
Pro and cons of HCU
Pros of HCU
1. Faster than micro- programmed control unit.
2. Can be optimized to produce fast mode of
operation.

Cons of HCU
1. Instruction set control logic are direct
2. Require change in wiring if designed has to
be controlled.
Microprogrammed Control Unit (MCU)
MCU
➢ Microprogramming is a method of control unit design
in which the control signal memory CM.

➢ The control signals to be activated at any time are


specified by a microinstruction, which is fetched from
CM.

➢ A sequence of one or more micro operations designed


to control specific operation, such as addition,
multiplication is called a micro program.

➢ The micro programs for all instructions are stored in the


control memory.
Sequencer
➢ The address where these microinstructions are
stored in CM is generated by microprogram
sequencer/micro program controller.
➢ The micro program sequencer generates the
address for microinstruction according to the
instruction stored in the IR.
Microinstruction
➢ A simple way to structure microinstructions is to
assign one bit position to each control signal
required in the CPU.
Principle of operation (MCU)
➢ The control address register holds the address of the
next microinstruction to be read.
➢ When address is available in control address register,
the sequencer issues READ command to the control
memory.
➢ After issue of READ command, the word from the
addressed location is read into the microinstruction
register.
➢ Now the content of the micro instruction register
generates control signals and next address information
for the sequencer.
➢ The sequencer loads a new address into the control
address register based on the next address information.
Characteristics of MCU
1. It uses sequence of micro-instruction in
micro programming language.
2. It is mid-way between Hardware and
Software.
3. It generates a set of control signal on the
basis of control line.
4. Easy to design, test and implement.
5. Flexible to modify.
6. Slower mode of operation.
7. Cheaper and less error.
8. Used in CISC processor.
Pro and cons of MCU
Pros of MCU
1. It simplifies the design of control unit. Thus it is both, cheaper and
less error prone implement.
2. Control functions are implemented in software rather than
hardware.
3. The design process is orderly and systematic
4. More flexible, can be changed to accommodate new system
specifications or to correct the design errors quickly and cheaply.
5. Complex function such as floating point arithmetic can be realized
efficiently.
Cons of MCU
1. A micro programmed control unit is somewhat slower than the
hardwired control unit, because time is required to access the
microinstructions from CM.
2. The flexibility is achieved at some extra hardware cost due to the
control memory and its access circuitry.

You might also like