0% found this document useful (0 votes)
415 views40 pages

Computer I/O Systems Explained

Uploaded by

Monisha R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
415 views40 pages

Computer I/O Systems Explained

Uploaded by

Monisha R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Digital Design &Computer Organization(BCS302) Module -4

MODULE 4
INPUT/OUTPUT ORGANIZATION

There are a number of input/output (I/O) devices, that can be connected to a


computer. The input maybe from a keyboard, a sensor, switch, mouse etc. Similarly
output may be a speaker, monitor, printer, a digital display etc.

These variety of I/O devices exchange information in varied format, having different
word length, transfer speed is different, but are connected to the same system and
exchange information with the same computer. Computer must be capable of handling
these wide variety of devices.

ACCESSING I/O-DEVICES
A single bus-structure can be used for connecting I/O-devices to a computer. The
simple arrangement of connecting set of I/O devices to memory and processor by
means of system bus is as shown in the figure. Such an arrangement is called as Single
Bus Organization.

Fig: A Single Bus structure

1
Digital Design &Computer Organization(BCS302) Module -4
 The single bus organization consists of
o Memory
o Processor
o System bus
o I/O device
 The system bus consists of 3 types of buses:

o Address bus (Unidirectional)


o Data bus (Bidirectional)
o Control bus (Bidirectional)

 The system bus enables all the devices connected to it to involve in the data
transfer operation.
 The system bus establishes data communication between I/O device and
processor.
 Each I/O device is assigned a unique set of address.
 When processor places an address on address-lines, the intended-
device responds to thecommand.
 The processor requests either a read or write-operation.
 The requested data are transferred over the data-lines

Steps for input operation:


 The address bus of system bus holds the address of the input device.
 The control unit of CPU generates IORD Control signal.
 When this control signal is activated the processor reads the data from
the input device(DATAIN) into the CPU register.
Steps for output operation:
 The address bus of system bus holds the address of the output device.
 The control unit of CPU generates IOWR control signal.
 When this control signal is enabled CPU transfers the data from processor
register to outputdevice(DATAOUT)

2
Digital Design &Computer Organization(BCS302) Module -4

There are 2 schemes available to connect I/O devices to CPU


1. Memory mapped I/O:
 In this technique, both memory and I/O devices use the common bus to
transfer the data to CPU .

 same address space is used for both memory and I/O interface. They have
only one set of read and write signals.
 All memory related instructions are used for data transfer between I/O and
processor.
 In case of memory mapped I/O input operation can be
implemented as,MOVE DATAIN , R0

Source destination

This instruction sends the contents of location DATAIN to register R0.


 Similarly output can be implemented as, MOVE R DATAOUT

Source Destination

 The data is written from R0 to DATAOUT location (address of output buffer.


1) I/O Mapped I/O:
 In this technique, a separate bus is used for I/O devices and memory to transfer
the data to CPU. Address space for memory and I/O devices are different.
 Hence two sets of instruction are used for data transfer.
 One set for memory operations and another set for I/O
operations.Whole address space is available for the program.
 Eg – IN AL, DX

3
Digital Design &Computer Organization(BCS302) Module -4

I/O INTERFACE

The hardware arrangement of connecting i/p device to the system bus is as shown in
the fig.

Fig: I/O interface for an input device

This hardware arrangement is called as I/O interface. The I/O interface consists of
3 functional devicesnamely:

4
Digital Design &Computer Organization(BCS302) Module -4
1) Address Decoder:
o Its function is to decode the address, in-order to recognize the input device
whose address isavailable on the unidirectional address bus.
o The recognition of input device is done first, and then the control and data
registers becomes active.
o The unidirectional address bus of system bus is connected to input of the
address decoder asshown in figure
2) Control Circuit:
o The control bus of system bus is connected to control circuit as shown in the
fig.
o The processor sends commands to the I/O system through the control bus.
o It controls the read write operations with respect to I/O device.

3) Status & Data register:


o It specifies type of operation (either read or write operation) to be performed
on I/O device. Itspecifies the position of operation.

4) Data Register:

o The data bus carries the data from the I/O devices to or from the processor.
The data bus isconnected to the data/ status register.
o The data register stores the data, read from input device or the data, to be
written into outputdevice. There are 2 types:
DATAIN - Input-buffer associated with keyboard.
DATAOUT -Output data buffer of a display/printer.

Data buffering is an essential task of an I/O interface. Data transfer rates of


processor and memory are high, when compared with the I/O devices, hence the
data are buffered at the I/O interface circuit and then forwarded to output device,
or forwarded to processor in case of input devices.

5
Digital Design &Computer Organization(BCS302) Module -4

Input Device DATAIN Buffer Processor

Processor DATAOUT Buffer Output Device

Input & Output registers –

Various registers in keyboard and display devices -

DATAIN register: is a part of input device. It is used to store the ASCII characters read from
keyboard.
DATAOUT register: is a part of output device. It is used to store the ASCII
characters to bedisplayed on the output device.
STATUS register stores the status of working of I/O devices –
 SIN flag – This flag is set to 1, when DATAIN buffer contains the data
from keyboard. The flag is set to 0, after the data is passed from DATAIN
buffer to the processor.
 SOUT flag – This flag is set to 1, when DATAOUT buffer is empty and
the data can be added to it by processor. The flag is set to 0, when
DATAOUT buffer has the data to be displayed.
 KIRQ (Keyboard Interrupt Request) – By setting this flag to 1, keyboard
requests the processor to obtain its service and an interrupt is sent to the
processor. It is used along with the SIN flag.

6
Digital Design &Computer Organization(BCS302) Module -4
 DIRQ(Display Interrupt Request) – The output device request the
processor to obtain its service for output operation, by activating this flag
to 1.
Control registers
KEN (keyboard Enable) – Enables the keyboard for input operations.
DEN (Display Enable) – Enables the output device for input operations.
Program Controlled I/O
 In this technique CPU is responsible for executing data from the memory for
output and storing data in memory for executing of Programmed I/O
 Drawback of the Programmed I/O: was that the CPU has to monitor the units
all the times when the program is executing. Thus, the CPU stays in a
program loop until the I/O unit indicates that it is ready for data transfer.
 This is a time-consuming process and the CPU time is wasted a lot in keeping
an eye to the executing of program.

 It is the process of controlling the input and output operations by executing 2


sets of instruction,one set for input operation and the next set for output
operation.

 The program checks the status of I/O register and reads or displays data.
Here the I/O operationis controlled by program.
WAITK TestBit #0, STATUS (Checks SIN
flag)Branch = 0 WAITK
Move DATAIN, R0 (Read character)[
*Code to read a character from DATAIN to R0]

This code checks the SIN flag, and if it is set to 0 (ie. If no character in DATAIN
Buffer), then move back to WAITK label. This loop continues until SIN flag is set
to 1. When SIN is 1, data ismoved from DATAIN to R0 register. Thus the program,
continuously checks for input operation.

Similarly code for Output operation,


WAITD TestBit #0, STATUS (Checks SOUT flag)
Branch = 0 WAITD
Move R0, DATAOUT (Send character for
display)
The code checks the SOUT flag, and if it is set to 1 (ie. If no character in
DATAOUT Buffer), then move back to WAITK label. This loop continues until
SOUT flag is set to 0. When SOUT is 0, data is moved from R0 register to
DATAOUT (ie. Sent by processor).

7
Digital Design &Computer Organization(BCS302) Module -4

Interrupt
 It is an event which suspends the execution of one program and begins the
execution of another program.
 In program controlled I/O, a program should continuously check whether the
I/O device is free. By this continuous checking the processor execution time is
wasted. It can be avoided by I/O device sending an ‘interrupt’ to the processor,
when I/O device is free.
 The interrupt invokes a subroutine called Interrupt Service Routine (ISR),
which resolves the cause of interrupt.
 The occurrence of interrupt causes the processor to transfer the execution
control from user program to ISR.

Program1 ISR

8
Digital Design &Computer Organization(BCS302) Module -4

The following steps takes place when the interrupt related instruction is
executed:

 It suspends the execution of current instruction i.


 Transfer the execution control to sub program from main program.
 Increments the content of PC by 4 memory location.
 It decrements SP by 4 memory locations.
 Pushes the contents of PC into the stack segment memory whose address is
stored in SP.
 It loads PC with the address of the first instruction of the sub program.

The following steps takes place when ‘return’ instruction is executed in ISR -
 It transfers the execution control from ISR to user program.
 It retrieves the content of stack memory location whose address is stored in SP
into the PC.
 After retrieving the return address from stack memory location into the PC
it increments theContent of SP by 4 memory location.
Interrupt Latency / interrupt response time is the delay between the time
taken for receiving aninterrupt request and start of the execution of the ISR.
Generally, the long interrupt latency is unacceptable.

INTERRUPT HARDWARE
 The external device (I/O device) sends interrupt request to the processor by
activating a bus lineand called as interrupt request line.
 All I/O device uses the same single interrupt-request line.
 One end of this interrupt request line is connected to input power supply by
means of a register.
 The another end of interrupt request line is connected to INTR (Interrupt
request) signal ofprocessor as shown in the fig.

9
Digital Design &Computer Organization(BCS302) Module -4

 The I/O device is connected to interrupt request line by means of switch,


which is grounded asshown in the fig.
 When all the switches are open the voltage drop on interrupt request line is
equal to the VDD and INTR value at process is 0.
 This state is called as in-active state of the interrupt request line.
 The I/O device interrupts the processor by closing its switch.

 When switch is closed the voltage drop on the interrupt request line is found
to be zero, as theswitch is grounded, hence INTR=0 and INTR=1.
 The signal on the interrupt request line is logical OR of requests from the
several I/O devices.Therefore, INTR=INTR1 + INTR2 + + INTRn

ENABLING AND DISABLING THE INTERRUPTS

The arrival of interrupt request from external devices or from within a process,
causes the suspension ofon-going execution and start the execution of another
program.

 Interrupt arrives at any time and it alters the sequence of execution. Hence
the interrupt to beexecuted must be selected carefully.
 All computers can enable and disable interruptions as desired.
 When an interrupt is under execution, other interrupts should not be
invoked. This is performedin a system in different ways.
 The problem of infinite loop occurs due to successive interruptions of active
INTR signals.

10
Digital Design &Computer Organization(BCS302) Module -4

 There are 3 mechanisms to solve problem of infinite loop:


1) Processor should ignore the interrupts until execution of first instruction of
the ISR.
2) Processor should automatically disable interrupts before starting the
execution of the ISR.
3) Processor has a special INTR line for which the interrupt-handling circuit.

Interrupt-circuit responds only to leading edge of signal. Such line


is called edge-triggered.
• Sequence of events involved in handling an interrupt-request:

1) The device raises an interrupt-request.

2) The processor interrupts the program currently being executed.

3) Interrupts are disabled by changing the control bits in the processor status
register (PS).
4) The device is informed that its request has been recognized. And in
response, the device deactivates the interrupt-request signal.
5) The action requested by the interrupt is performed by the interrupt-service
routine.
6) Interrupts are enabled and execution of the interrupted program is
resumed.

HANDLING MULTIPLE DEVICES


While handling multiple devices, the issues concerned are:
 How can the processor recognize the device requesting an interrupt?
 How can the processor obtain the starting address of the appropriate ISR?
 Should a device be allowed to interrupt the processor while
another interrupt isbeing serviced?
 How should 2 or more simultaneous interrupt-requests be handled?

VECTORED INTERRUPT
• A device requesting an interrupt identifies itself by sending a special-code to
processor over bus.
• Then, the processor starts executing the ISR.

• The special-code indicates starting-address of ISR.

• The special-code length ranges from 4 to 8 bits.

• The location pointed to by the interrupting-device is used to store the staring


address to ISR.

11
Digital Design &Computer Organization(BCS302) Module -4

• The staring address to ISR is called the interrupt vector.

• Processor

→ loads interrupt-vector into PC &


→ executes appropriate ISR.
• When processor is ready to receive interrupt-vector code, it activates INTA line.

• Then, I/O-device responds by sending its interrupt-vector code & turning off the
INTR signal.
• The interrupt vector also includes a new value for the Processor Status Register

INTERRUPT NESTING
• A multiple-priority scheme is implemented by using separate INTR & INTA lines for
each device
• Each INTR line is assigned a different priority-level as shown in Figure.

INTR1 INTRp
Processor

Device Device Devic p

INTA INT p

Priority

• Each device has a separate interrupt-request and interrupt-acknowledge line.

• Each interrupt-request line is assigned a different priority level.

• Interrupt requests received over these lines are sent to a priority arbitration circuit
in the processor.
• If the interrupt request has a higher priority level than the priority of the processor,
then the request is accepted.
• Priority-level of processor is the priority of program that is currently being executed.

• Processor accepts interrupts only from devices that have higher-priority than its
own.

12
Digital Design &Computer Organization(BCS302) Module -4
• At the time of execution of ISR for some device, priority of processor is raised to
that of the device.
• Thus, interrupts from devices at the same level of priority or lower are disabled.

Privileged Instruction
• Processor's priority is encoded in a few bits of PS word. (PS = Processor-Status).

• Encoded-bits can be changed by Privileged Instructions that write into PS.

• Privileged-instructions can be executed only while processor is running in


Supervisor Mode.
• Processor is in supervisor-mode only when executing operating-system routines.

Privileged Exception
• User program cannot

→ accidently or intentionally change the priority of the processor &


→ disrupt the system-operation.
• An attempt to execute a privileged-instruction while in user-mode leads to a
Privileged Exception.
SIMULTANEOUS REQUESTS
DAISY CHAIN
• The daisy chain with multiple priority levels is as shown in the figure.
 The interrupt request line INTR is common to all devices as shown in the fig.

 The interrupt acknowledge line is connected in a daisy fashion as shown in the figure.

 This signal propagates serially from one device to another device.

 The several devices raise an interrupt by activating INTR signal. In response to


the signal, processortransfers its device by activating INTA signal.

 This signal is received by device 1. The device-1 blocks the propagation of INTA
signal to device-2,when it needs processor service.

 The device-1 transfers the INTA signal to next device when it does not require the
processor service.

 In daisy chain arrangement device-1 has the highest priority.


Advantage: It requires fewer wires than the individual connection

13
Digital Design &Computer Organization(BCS302) Module -4

ARRANGEMENT OF PRIORITY GROUPS

• In this technique, devices are organizes in a group and each group is connected
to the processor at adifferent priority level.

• With in a group devices are connected in a daisy chain fashion as shown in the
figure.

Direct Memory Access (DMA)


 It is the process of transferring the block of data at high speed in between
main memory and external device (I/O devices) without continuous
intervention of CPU is called as DMA.
 The DMA operation is performed by one control circuit and is part of the I/O
interface.
 This control circuit is called DMA controller. Hence DMA transfer operation is
performed by DMAcontroller.
 To initiate Directed data transfer between main memory and external devices
DMA controller needsparameters from the CPU.
 These 3 Parameters are:

1) Starting address of the


memory block. 2)No of words to
be transferred.
3)Type of operation (Read or Write).
After receiving these 3 parameters from CPU, DMA controller establishes
directed data transferoperation between main memory and external devices
without the involvement of CPU.

14
Digital Design &Computer Organization(BCS302) Module -4

• Register of DMA Controller:


It consists of 3 type of register:

Starting address register:


The format of starting address register is as shown in the fig. It is used to store
the starting addressof the memory block.

.
Word-Count register:
The format of word count register is as shown in fig. It is used to store the no of
words to be transferredfrom main memory to external devices and vice versa.

Status and Controller register:


The format of status and controller register is as shown in fig.

a) DONE bit:
 The DMA controller sets this bit to 1 when it completes the direct data
transfer between mainmemory and external devices.
 This information is informed to CPU by means of DONE bit.
b) R/W (Read or Write):
 This bit is used to differentiate between memory read or memory write
operation.
 The R/W = 1 for read operation and
= 0 for write operation.
 When this bit is set to 1, DMA controller transfers the one block of data
from external deviceto main memory.
 When this bit is set to 0, DMA controller transfers the one block of data
from main memoryto external device.

15
Digital Design &Computer Organization(BCS302) Module -4
c) IE (Interrupt enable) bit:
 The DMA controller enables the interrupt enable bit after the completion of

DMA operationd)Interrupt request (IRQ):

 The DMA controller requests the CPU to transfer new block of data
from source todestination by activating this bit.

The computer with DMA controller is as shown in the fig.:

 The DMA controller connects two external devices namely disk 1 and disk 2
to system bus asshown in the above fig.
 The DMA controller also interconnects high speed network devices to system
bus as shownin the above fig.
 Let us consider direct data transfer operation by means of DMA controller
without the involvement of CPU in between main memory and disk 1 as
indicated by dotted lines (in the fig.).
 To establish direct data transfer operation between main memory and disk
1. DMA controller request the processor to obtain 3 parameters namely:
1)Starting address of the memory block.
2)No of words to be transferred.
3)Type of operation (Read or Write).

 After receiving these 3 parameters from processor, DMA controller directly


transfers block ofdata main memory and external devices (disk 1).

16
Digital Design &Computer Organization(BCS302) Module -4

 This information is informed to CPU by setting respective bits in the


status and controllerregister of DMA controller.
These are 2 types of request with respect to
system bus1). CPU request.
2). DMA request.
Highest priority will be given to DMA request.

 Actually the CPU generates memory cycles to perform read and write
operations. The DMA controller steals memory cycles from the CPU to
perform read and write operations. This approach is called as “Cycle
stealing”.

 An exclusive option will be given for DMA controller to transfer block of data from
external devices to main memory and from main memory to external devices.
This technique is called as “Burst mode of operation.”

BUS ARBITRATION
 Any device which initiates data transfer operation on bus at any instant of
time is called as Bus-Master.
 When the bus mastership is transferred from one device to another device,
the next device isready to obtain the bus mastership.
 The bus-mastership is transferred from one device to another device based
on the principle ofpriority system. There are two types of bus-arbitration
technique:

a) Centralized bus arbitration:


In this technique CPU acts as a bus-master or any control unit connected to bus
can be acts as a busmaster.

The schematic diagram of centralized bus arbitration is as shown in the fig.:

17
Digital Design &Computer Organization(BCS302) Module -4

The following steps are necessary to transfer the bus mastership from CPU to
one of the DMAcontroller:
 The DMA controller request the processor to obtain the bus mastership by
activating BR (Busrequest) signal
 In response to this signal the CPU transfers the bus mastership to
requested devices DMAcontroller1 in the form of BG (Bus grant).
 When the bus mastership is obtained from CPU the DMA controller1 blocks the
propagation of busgrant signal from one device to another device.
 The BG signal is connected to DMA controller2 from DMA controller1 in as daisy
fashion style isas shown in the figure.
 When the DMA controller1 has not sent BR request, it transfers the bus
mastership to DMAcontroller2 by unblocking bus grant signal.
 When the DMA controller1 receives the bus grant signal, it blocks the signal from
passing to DMA controller2 and enables BBSY signal. When BBSY signal is set to 1
the set of devices connected to system bus doesn’t have any rights to obtain the bus
mastership from the CPU.

b) Distributed bus arbitration:


 In this technique 2 or more devices trying to access system bus at the same
time may participatein bus arbitration process.
 The schematic diagram of distributed bus arbitration is as shown in the figure

18
Digital Design &Computer Organization(BCS302) MODULE -4

 The external device requests the processor to obtain bus mastership by


enabling start arbitrationsignal.
 In this technique 4 bit code is assigned to each device to request the CPU in
order to obtain busmastership.
 Two or more devices request the bus by placing 4 bit code over the system bus.
 The signals on the bus interpret the 4 bit code and produces winner as a result
from the CPU.
 When the input to the one driver = 1, and input to the another driver = 0, on
the same bus line,this state is called as “Low level voltage state of bus”.
 Consider 2 devices namely A & B trying to access bus mastership at
the same time.Let assigned code for devices A & B are 5 (0101) & 6
(0110) respectively.
 The device A sends the pattern (0101) and device B sends its pattern (0110) to
master. The signals on the system bus interpret the 4 bit code for devices A & B
produces device B as a winner.
 The device B can obtain the bus mastership to initiate direct data transfer
between external devices and main memory.
The Memory System
Speed, Size and Cost
The block diagram of memory hierarchy is as shown in the figure below.

19
Digital Design &Computer Organization(BCS302) MODULE -4

 Registers: The fastest access is to data held in registers. Hence registers are part of
the memory hierarchy. More speed, small size and cost per bit is also more.
 At the next level of hierarchy, small amount of memory can be directly
implemented on the processor chip.
 This memory is called as processor cache. It holds the copy of recently accessed
data and instructions.
There are 2 levels of caches viz level-1 and level-2.
Level-1 cache is part of the processor and level-2 cache is placed in
between level-1 cache and main memory.
 The level-2 cache is implemented using SRAM chips

 The next level in the memory hierarchy is called as main memory. It is


implemented using dynamic memory components (DRAM). The main
memory is larger but slower than cache memory. The access time for
main memory is ten times longer than the cache memory

 The level next in the memory hierarchy is called as secondary memory.


It holds huge amount of data.

• The main-memory is built with DRAM


• SRAMs are used in cache memory, where speed is essential.

• The Cache-memory is of 2 types:

1) Primary/Processor Cache (Level1 or L1 cache)

 It is always located on the processor-chip.

2) Secondary Cache (Level2 or L2 cache)

 It is placed between the primary-cache and the rest of the memory.

20
Digital Design &Computer Organization(BCS302) MODULE -4

• The memory is implemented using the dynamic components (SIMM, DIMM).

The access time for main-memory is about 10 times longer than the
access time for L1cache.
Cache Memory
It is the fast access memory located in between processor and main memory

Processor Cache Memory Main Memory

as shown inthe fig. It is designed to reduce the access time.


The cache memory holds the copy of recently accessed data and instructions.
 The processor needs less access time to read the data and instructions from
the cache memory as compared to main memory .
 Hence by incorporating cache memory, in between processor and main
memory, itis possible to enhance the performance of the system.
• The effectiveness of cache mechanism is based on the property of

“Locality of Reference”.

Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during
sometime of execution
• Remainder of the program is accessed relatively infrequently

• There are 2 types of locality reference:


1) Temporal
 The recently executed instructions are likely to be executed again and again.

 Eg –instruction in loops, nested loops and few function calls.

2) Spatial
 Instructions in close proximity to recently executed instruction are likely to be

executed soon. (near by instructions)


• If active segment of program is placed in cache-memory, then total execution time

can be reduced.
• Cache Block / cache line refers to the set of contiguous address locations of some
size.

• The Cache-memory stores a reasonable number of blocks at a given time.


Digital Design &Computer Organization(BCS302) MODULE -4

• This number of blocks is small compared to the total number of blocks available

in main-memory.
• Correspondence b/w main-memory-block & cache-memory-block is specified by

mapping-function.
• If the cache memory is full, one of the block should be removed to create space

forthe new block, this is decided by cache control hardware.


• The collection of rule for selecting the block to be removed is called the
Replacement Algorithm.
• The cache control-circuit determines whether the requested-word currently exists in
the cache.
• If data is available, for read-operation, the data is read from cache.

• The write-operation (writing to memory) is done in 2 ways:

1) Write-throughprotocol &
2) Write-back protocol.
Write-Through Protocol
 Here the cache-location and the main-memory-locations are updated
simultaneously.
Write-Back Protocol
 This technique is to
→ update only the cache-location &
→ mark the cache-location with a flag bit called Dirty/Modified Bit.

 The word in memory will be updated later, when the marked-block is

removed from cache.

During Read-operation
• If the requested-word currently does not exists in the cache, then read-miss will
occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
Load–Through Protocol
 The block of words that contains the requested-word is copied from the

memory into cache.


 After entire block is loaded into cache, the requested-word is forwarded
to processor.
Dr.Ajay V G, Dept. of CSE, 22
Digital Design &Computer Organization(BCS302) MODULE -4


During Write-operation
• If the requested-word does not exists in the cache, then write-miss will occur.

1) If Write Through Protocol is used, the information is

written directlyinto main-memory.


2) If Write Back Protocol is used,

→ then block containing the addressed word is first brought


into the cache&
→ then the desired word in the cache is over-written
with the newinformation.

Mapping functions
There are 3 techniques to map main memory blocks into cache memory –
1. Direct mapped cache
2. Associative Mapping
3. Set-Associative Mapping
DIRECT MAPPING
• The simplest way to determine cache locations in which to store memory blocks

is the direct mapping technique as shown in the figure.

• Cache block number= (block-j of main memory)%128; • If there are 128


blocks in a cache, the
block-j of the main-memory maps onto block-jmodulo-128 of the cache . When the memory-blocks
0, 128, & 256 are loaded into cache, the block is stored in cache-block 0. Similarly, memory-
blocks 1, 129, 257 are stored in cache-block 1.(eg:1mod 128=1,
129 mod 128=1)
• The contention may arise

1) Even when the cache is full.

2) But more than one memory-block is mapped onto a given cache-block position.

• The contention is resolved by allowing the new blocks to overwrite the currently resident-block.

Memory-address determines placement of block in the cache.


Digital Design &Computer Organization(BCS302) MODULE -4

main memory block has to be placed in particular


cache block number by using below formula
Cache block number=main memory block number
% number of blocks present in cache memory.

For eg: main memory block 129 has to be placed in


cache block number 1 by using above formula i.e
Cache block number=129 % 128 (consider
remainder that is 1).
Cache block number=258 % 128 (consider
remainder that is 2).
Main memory block 258 has to be placed in cache
block 2

 The main memory block is loaded into cache block by means of memory address. The main memory
address consists of 3 fields as shown in the figure.
 Each block consists of 16 words. Hence least significant 4 bits are used to select one of the 16
words.

 The 7bits of memory address are used to specify the position of the cache block, location. The most
significant 5 bits of the memory address are stored in the tag bits. The tag bits are used to map one of
25 = 32 blocks into cache block location (tag bit has value 0-31).
 The higher order 5 bits of memory address are compared with the tag bits associated with cache
location. If they match, then the desired word is in that block of the cache.
 If there is no match, then the block containing the required word must first be read from the main memory
Digital Design &Computer Organization(BCS302) MODULE -4
and loaded into the cache. It is very easy to implement, but not flexible.

2. Associative Mapping:
 It is also called as associative mapped cache. It is much more flexible.
 In this technique main memory block can be placed into any cache block
position.
 In this case , 12 tag bits are required to identify a memory block when it is
resident of the cache memory.
 The Associative Mapping technique is illustrated as shown in the fig.

 In this technique 12 bits of address generated by the processor are compared with
the tag bits of each block of the cache to see if the desired block is present. This
is called as associative mapping technique.
Digital Design &Computer Organization(BCS302) MODULE -4

3. Set Associative Mapping:


 It is the combination of direct and associative mapping techniques.
 The blocks of cache are divided into several groups. Such a groups are called as
sets.
 Each set consists of number of cache blocks. A memory block is loaded into one
of the cache sets.
 The main memory address consists of three fields, as shown in the figure.
 The lower 4 bits of memory address are used to select a word from a 16
words.
 A cache consists of 64 sets as shown in the figure. Hence 6 bit set field is used
to select a cache set from 64 sets.
 As there are 64 sets, the memory is divided into groups containing 64 blocks,
where each group is given a tag number.
 The most significant 6 bits of memory address is compared with the tag
fields of each set to determine whether memory block is available or not.
 The following figure clearly describes the working principle of Set
Associative Mapping technique.
• cache that has “k” blocks per set is called as “k-way set associative cache‟.

• Each block contains a control-bit called a valid-bit.

• The Valid-bit indicates that whether the block contains valid-data (updated data).

• The dirty bit indicates that whether the block has been modified during its cache
residency.
Valid-bit=0 - When power is initially applied to system.

Valid-bit=1 - When the block is loaded from main-memory at first time.


• If the main-memory-block is updated by a source & if the block in the source

is already exists in the cache, then the valid-bit will be cleared to “0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache

Coherence Problem.
• Advantages:
Digital Design &Computer Organization(BCS302) MODULE -4
1) Contention problem of direct mapping is solved by having few choices for block
placement.

2) The hardware cost is decreased by reducing the size of associative search.


Digital Design and Computer Organization (BCS302)
Module V

MODULE 5:
Basic Processing Unit and Pipelining

Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations, fetching a word from Memory, Storing a word in memory. Execution of a Complete
Instruction.
Pipelining: Basic concepts, Role of Cache memory, Pipeline Performance.

SOME FUNDAMENTAL CONCEPTS

The processing unit which executes machine instructions and coordinates


the activities of other units of computer is called the Instruction Set
Processor (ISP) or processor or Central Processing Unit (CPU).

The primary function of a processor is to execute the instructions stored


in memory. Instructions are fetched from successive memory locations
and executed in processor, until a branch instruction occurs.

• To execute an instruction, processor has to perform following 3 steps:


1. Fetch contents of memory-location pointed to by PC. Content of this
location isan instruction to be executed. The instructions are loaded
into IR, Symbolically, this operation is written as:
IR  [[PC]]
2. Increment PC by
4. PC [PC] +4
3. Carry out the actions specified by instruction (in the IR).

The steps 1 and 2 are referred to as Fetch Phase.


Step 3 is referred to as Execution Phase.

SINGLE BUS ORGANIZATION

• Here the processor contain only a single bus for the movement of data,
address andinstructions.
• ALU and all the registers are interconnected via a Single Common Bus
(Figure 7.1).
• Data & address lines of the external memory-bus is connected to
the internal processor-bus via MDR & MAR respectively.
(MDR -> Memory Data Register, MAR -> Memory Address Register).
• MDR has 2 inputs and 2 outputs. Data may be loaded
→ into MDR either from memory-bus (external) or
→ from processor-bus (internal).
• MAR‟s input is connected to internal-bus; MAR‟s output is connected to
external- bus. (address sent from processor to memory only)
Digital Design and Computer Organization (BCS302)
Module V

• Instruction Decoder & Control Unit is responsible for


→ Decoding the instruction and issuing the control-signals to all the units
inside the processor.
→ implementing the actions specified by the instruction (loaded in the IR).
• Processor Registers - Register R0 through R(n-1) are also called
as General Purpose Register.
The programmer can access these registers for general-purpose use.
• Temporary Registers – There are 3 temporary registers in the processor.
Registers
- Y, Z & Temp are used for temporary storage during program-
execution. The programmer cannot access these 3 registers.
• In ALU,1) “A‟ input gets the operand from the output of the multiplexer(MUX).
2) “B‟ input gets the operand directly from the processor-bus.
• There are 2 options provided for “A‟ input of the ALU.
• MUX is used to select one of the 2 inputs.
• MUX selects either
→ output of Y or
→ constant-value 4( which is used to increment PC content).
• An instruction is executed by performing one or more of the following
operations:
Digital Design and Computer Organization (BCS302)
Module V

1) Transfer a word of data from one register to another or to the ALU.


2) Perform arithmetic or a logic operation and store the result in a register.
3) Fetch the contents of a given memory-location and load them into a register.
4) Store a word of data from a register into a given memory-location.
• Disadvantage: Only one data-word can be transferred over the bus in a
clock cycle. Solution: Provide multiple internal-paths. Multiple paths allow
several data- transfers to take place in parallel.
REGISTER TRANSFERS
• Instruction execution involves a sequence of steps in which data are
transferred from one register to another.
• For each register, two control-signals are used: Riin & Riout. These are
called Gating Signals
• Riin=1,the data on the bus are loaded into Ri,
• Riout=1,the contents of register are placed on the bus,
• Riout=0,the bus can be used for transferring data from other registers.
Suppose we wish to transfer the contents of register R1 to register R2. This
can be accomplished as follows:
1. Enable the output of registers R1 by setting R1out to 1 (Figure 7.2).
This places thecontents of R1 on processor-bus.
2. Enable the input of register R4 by setting R4in to 1. This loads data from
processor-bus into register R4.
• All operations and data transfers within the processor take place
within time- periods defined by the processor-clock.
Digital Design and Computer Organization (BCS302)

• The control-signals that govern a particular transfer are asserted at the


start of theclock cycle.
Input & Output Gating for one Register Bit
Implementation for one bit of register Ri(as shown in fig 7.3)
🠶 All operations and data transfers are controlled by the processor clock.
• A 2-input multiplexer is used to select the data applied to the input
of an edge- triggered D flip-flop.
 Riin=1,Multiplexer selects data on the bus. This data will be loaded into
flip-flop at rising-edge of clock.
 Riin=0,Multiplexer feeds back the value currently stored in the flipflop
 Q output of flip-flop is connected to bus via a tri-state gate.
 When Riout=0, gates output in the high-impedance state.
 When Riout=1,gate drives the bus to 0 or 1,depending on the value
of Q.

PERFORMING AN ARITHMETIC OR LOGIC OPERATION(refer fig:7.2)


 The ALU is a combinational circuit that has no internal storage.
• The ALU performs arithmetic and logic operations on the 2 operands
applied to its A and B inputs.
• ALU gets the two operands, one is from MUX and another from bus. The
result is temporarily stored in register Z.
• Therefore, a sequence of operations [R3]=[R1]+[R2].
1) R1out, Yin
2) R2out, Select Y, Add, Zin
3) Zout, R3in
Instruction execution proceeds as follows:
Step 1 --> Contents from register R1 are loaded into register Y.
Step2 --> Contents from Y and from register R2 are applied to the A and
B inputs of ALU; Addition is performed & Result is stored in the Z register.
Step 3 --> The contents of Z register is stored in the R3 register.
• The signals are activated for the duration of the clock cycle
corresponding to thatstep. All other signals are inactive.
FETCHING A WORD FROM MEMORY
 To fetch instruction/data from memory, t h e p r o c e s s o r h a s to s p e c i f y
t h e a d dr e s s o f t h e m e m o r y lo c a t io n w h e r e t h i s i n f o r m a t io n i s
s t o r e d a n d r e q u e s t a R e a d o p e r a t i o n.
 processor transfers required address to MAR. At the same time, processor
issues Read signal on control-lines of memory-bus.
 When requested-data are received from memory, they are stored in MDR.
From MDR, they are transferred to other registers in the processor. 
The Connections for register MDR has shown in fig 7.4
Digital Design and Computer Organization (BCS302)

CONTROL-SIGNALS OF MDR
• The MDR register has 4 control-signals (Figure 7.4):
1) MDRin & MDRout control the connection to the internal processor data bus
&
2) MDRinE & MDRoutE control the connection to the external memory Data
bus.
• Similarly, MAR register has 2 control-signals.
1) MARin: controls the connection to the internal processor address bus &
2) MARout: controls the connection to the memory address bus.

The response time of each memory access varies. To accommodate this


MFC is used(MFC= Memory Function Completed)
MFC=1 indicate that contents of specified location have been read and are
available on the data lines of the memory bus.
• Consider the instruction Move (R1),R2. The action needed to execute this
instruction are

The sequence of steps is (Figure 7.5):


1) R1out,MAR in,Read ;desired address is loaded into MAR & Read command is
issued.
2) MDRinE,WMFC; load MDR from memory-bus & Wait for MFC response
frommemory.
3) MDRout, R2 in; load R2 from MDR.
where WMFC=control-signal that causes processor's control. circuitry
to wait for arrival of MFC signal.
Digital Design and Computer Organization (BCS302)

Storing a Word in Memory


• Consider the instruction Move R2,(R1). This requires the following sequence:
1) R1out, MARin ;desired address is loaded into MAR.
2) R2out,MDRin,Write ;data to be written are loaded into MDR & Write
commandis issued.
3) MDRoutE, WMFC ;load data into memory-location pointed by R1 from MDR.

EXECUTION OF A COMPLETE INSTRUCTION


• Consider the instruction Add (R3),R1 which adds the contents of a
memory-location pointed by R3 to register R1.
• Executing this instruction requires the following actions:
1) Fetch the instruction.
2) Fetch the first operand.
3) Perform the addition
4) Load the result into R1.
Fig:7.6 gives the sequence of control steps required to perform these operations for the
single -bus architecture .
Digital Design and Computer Organization (BCS302)

 Step1--> The instruction-fetch operation is initiated by loading contents of PC


into MAR & sending a Read request to memory. The Select signal is set to
Select4, which causes the Mux to select constant 4. This value is added to
operand at input B (PC‟s content), and the result is stored in Z.
 Step2--> Updated value in Z is moved to PC. This completes the PC increment
operation and PC will now point to next instruction.
 Step3--> Fetched instruction is moved into MDR and then to IR. The step 1
through 3 constitutes the Fetch Phase.
 At the beginning of step 4, the instruction decoder interprets the contents of
the IR. This enables the control circuitry to activate the control-signals for steps
4 through 7.
The step 4 through 7 constitutes the Execution Phase.
 Step4--> Contents of R3 are loaded into MAR & a memory read signal is issued.
 Step5--> Contents of R1 are transferred to Y to prepare for addition.
 Step6--> When Read operation is completed, memory-operand is available in MDR,
 Step7--> Sum is stored in Z, then transferred to R1.The End signal causes a new
instruction fetch cycle to begin by returning to step1.

Pipelining:
Basic Concepts:
The speed of execution of programs is influenced by many factors.
 One way to improve performance is to use faster circuit technology to build the
processor and the main memory. Another possibility is to arrange the hardware so that
more than one operation can be performed at the same time. In this way, the number
of operations performed per second is increased even though the elapsed time needed
to perform any one operation is not changed.
 Pipelining is a particularly effective way of organizing concurrent activity in a
computer system.
 The technique of decomposing a sequential process into sub-operations, with each sub-
operation being executed in a dedicated segment .
 pipelining is commonly known as an assembly-line operation.
Digital Design and Computer Organization (BCS302)

Consider how the idea of pipelining can be used in a computer. The processor executes
a program by fetching and executing instructions, one after the other.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii . Execution of a
program consists of a sequence of fetch and execute steps, as shown in Figure a.

Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure b. The instruction
fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer
is needed to enable the execution unit to execute the instruction while the fetch unit is
fetching the next instruction. The results of execution are deposited in the destination
location specified by the instruction.
The computer is controlled by a clock.
any instruction fetch and execute steps completed in one clock cycle.
Operation of the computer proceeds as in Figure 8.1c.
In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and
stores it in buffer B1 at the end of the clock cycle.
In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I2 (step F2). Meanwhile, the execution unit performs the
operation specified by instruction I1, which is available to it in buffer B1 (step E1).
By the end of the second clock cycle, the execution of instruction I1 is completed
and instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which is
no longer needed.
Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and
execute units are kept busy all the time. If the pattern in Figure 8.1c can be
sustained for a long time, the completion rate of instruction execution will be twice
that achievable by the sequential operation depicted in Figure a.
Digital Design and Computer Organization (BCS302)

Idea of Pipelining in a computer

a pipelined processor may process each instruction in four steps, as follows:


F (Fetch): read the instruction from the memory.
D (Decode): decode the instruction and fetch the source operand(s).
E (Execute): perform the operation specified by the instruction.
W (Write): store the result in the destination location.

The sequence of events for this case is shown in Figure a. Four instructions are in
progress at any given time. This means that four distinct hardware units are
needed, as shown in Figure b. These units must be capable of performing their
tasks simultaneously and without interfering with one another. Information is
passed from one unit to the next through a storage buffer. As an instruction
progresses through the pipeline, all the information needed by the stages
Digital Design and Computer Organization (BCS302)

downstream must be passed along. For example, during clock cycle 4, the
information in the buffers is as follows:
 Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being
decoded by the instruction-decoding unit.
 Buffer B2 holds both the source operands for instruction I2 and the
specification of the operation to be performed. This is the information
produced by the decoding hardware in cycle 3. The buffer also holds the
information needed for the write step of instruction I2 (stepW2). Even though
it is not needed by stage E, this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required Write
operation.
 Buffer B3 holds the results produced by the execution unit and the
destination information for instruction I1.

Role of Cache Memory


Each stage in a pipeline is expected to complete its operation in one clock
cycle. Hence, the clock period should be sufficiently long to complete
the task being performed in any stage. If different units require different
amounts of time, the clock period must allow the longest task to be
completed. A unit that completes its task early is idle for the remainder of
the clock period. Hence, pipelining is most effective in improving
performance if the tasks being performed in different stages require about
the same amount of time. This consideration is particularly important for the
instruction fetch step, which is assigned one clock period in Figure a. The
clock cycle has to be equal to or greater than the time needed to complete a
fetch operation. However, the access time of the main memory may be as
much as ten times greater than the time needed to perform basic pipeline
stage operations inside the processor, such as adding two numbers. Thus, if
each instruction fetch required access to the main memory, pipelining
would be of little value.
The use of cache memories solves the memory access problem. In
particular, when a cache is included on the same chip as the processor,
access time to the cache is usually the same as the time needed to perform
other basic operations inside the processor. This makes it possible to divide
instruction fetching and processing into steps that are more or less equal in
duration. Each of these steps is performed by a different pipeline stage, and
the clock period is chosen to correspond to the longest one.
Digital Design and Computer Organization (BCS302)

Pipeline Performance:
 The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
 However, this increase would be achieved only if pipelined operation as
depicted in Figure a could be sustained without interruption throughout
program execution.
 Unfortunately, this is not the True.
 Floating point may involve many clock cycle.
 For a variety of reasons, one of the pipeline stages may not be able to
complete its processing task for a given instruction in the time allotted. For
example, stage E in the four stage pipeline of Figure b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task.
Although this may be sufficient for most operations, some operations, such
as divide, may require more time to complete. Figure shows an example in
which the operation specified in instruction I2 requires three cycles to
complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write
stage must be told to do nothing, because it has no data to work with.
Meanwhile, the information in buffer B2 must remain intact until the
Execute stage has completed its operation. This means that stage 2 and, in
turn, stage 1 are blocked from accepting new instructions because the
information in B1 cannot be overwritten. Thus, steps D4 and F5 must be
postponed as shown.

Eg: for Data Hazard

Pipelined operation in Figure 8.3 is said to have been stalled for two clock
cycles. Normal pipelined operation resumes in cycle 7. Any condition that
causes the pipeline to stall is called a hazard. We have just seen an example
of a data hazard.
1) A data hazard is any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline. As a result some operation has to be
delayed, and the pipeline stalls.
Digital Design and Computer Organization (BCS302)

2) control hazards or instruction hazards: The pipeline may also be stalled


because of a delay in the availability of an instruction.
For example, this may be a result of a miss in the cache .
3) A third type of hazard known as a structural hazard: This is the
situation when two instructions require the use of a given hardware
resource at the same time.

The effect of a cache miss on pipelined operation is illustrated in Figure.


Instruction I1 is fetched from the cache in cycle 1, and its execution proceeds
normally. However, the fetch operation for instruction I2, which is started in
cycle 2, results in a cache miss. The instruction fetch unit must now suspend
any further fetch requests and wait for I2 to arrive. We assume that
instruction I2 is received and loaded into buffer B1 at the end of cycle 5. The
pipeline resumes its normal operation at that point.

Eg: for Instruction Hazard


An alternative representation of the operation of a pipeline in the case of a
cache miss is shown in Figure b. This figure gives the function performed by
each pipeline stage in each clock cycle. Note that the Decode unit is idle in
cycles 3 through 5, the Execute unit is idle in cycles 4 through 6, and the
Write unit is idle in cycles 5 through 7. Such idle periods are called stalls.
They are also often referred to as bubbles in the pipeline.
Digital Design and Computer Organization (BCS302)

If instructions and data reside in the same cache unit, only one instruction can
proceed and the other instruction is delayed. Many processors use separate
instruction and data caches to avoid this delay.
An example of a structural hazard is shown in Figure. This figure shows how the
load instruction
Load X(R1),R2
 The memory address, X+[R1], is computed in stepE2 in cycle 4, then memory
access takes place in cycle 5. The operand read from memory is written into
register R2 in cycle 6. This means that the execution step of this instruction
takes two clock cycles (cycles 4 and 5). It causes the pipeline to stall for one
cycle, because both instructions I2 and I3 require access to the register file
in cycle 6.
 Even though the instructions and their data are all available, the pipeline is
stalled because one hardware resource, the register file, cannot handle two
operations at once. If the register file had two input ports, that is, if it allowed
two simultaneous write operations, the pipeline would not be stalled. In
general, structural hazards are avoided by providing sufficient hardware
resources on the processor chip.

It is important to understand that pipelining does not result in


individual instructions being executed faster; rather, it is the
throughput that increases, where throughput is measured by the rate
at which instruction execution is completed.
The pipeline stalls, causes degradation in pipeline performance.
We need to identify all hazards that may cause the pipeline to stall
and to find ways to minimize their impact.

You might also like