Computer I/O Systems Explained
Computer I/O Systems Explained
MODULE 4
INPUT/OUTPUT ORGANIZATION
These variety of I/O devices exchange information in varied format, having different
word length, transfer speed is different, but are connected to the same system and
exchange information with the same computer. Computer must be capable of handling
these wide variety of devices.
ACCESSING I/O-DEVICES
A single bus-structure can be used for connecting I/O-devices to a computer. The
simple arrangement of connecting set of I/O devices to memory and processor by
means of system bus is as shown in the figure. Such an arrangement is called as Single
Bus Organization.
1
Digital Design &Computer Organization(BCS302) Module -4
The single bus organization consists of
o Memory
o Processor
o System bus
o I/O device
The system bus consists of 3 types of buses:
The system bus enables all the devices connected to it to involve in the data
transfer operation.
The system bus establishes data communication between I/O device and
processor.
Each I/O device is assigned a unique set of address.
When processor places an address on address-lines, the intended-
device responds to thecommand.
The processor requests either a read or write-operation.
The requested data are transferred over the data-lines
2
Digital Design &Computer Organization(BCS302) Module -4
same address space is used for both memory and I/O interface. They have
only one set of read and write signals.
All memory related instructions are used for data transfer between I/O and
processor.
In case of memory mapped I/O input operation can be
implemented as,MOVE DATAIN , R0
Source destination
Source Destination
3
Digital Design &Computer Organization(BCS302) Module -4
I/O INTERFACE
The hardware arrangement of connecting i/p device to the system bus is as shown in
the fig.
This hardware arrangement is called as I/O interface. The I/O interface consists of
3 functional devicesnamely:
4
Digital Design &Computer Organization(BCS302) Module -4
1) Address Decoder:
o Its function is to decode the address, in-order to recognize the input device
whose address isavailable on the unidirectional address bus.
o The recognition of input device is done first, and then the control and data
registers becomes active.
o The unidirectional address bus of system bus is connected to input of the
address decoder asshown in figure
2) Control Circuit:
o The control bus of system bus is connected to control circuit as shown in the
fig.
o The processor sends commands to the I/O system through the control bus.
o It controls the read write operations with respect to I/O device.
4) Data Register:
o The data bus carries the data from the I/O devices to or from the processor.
The data bus isconnected to the data/ status register.
o The data register stores the data, read from input device or the data, to be
written into outputdevice. There are 2 types:
DATAIN - Input-buffer associated with keyboard.
DATAOUT -Output data buffer of a display/printer.
5
Digital Design &Computer Organization(BCS302) Module -4
DATAIN register: is a part of input device. It is used to store the ASCII characters read from
keyboard.
DATAOUT register: is a part of output device. It is used to store the ASCII
characters to bedisplayed on the output device.
STATUS register stores the status of working of I/O devices –
SIN flag – This flag is set to 1, when DATAIN buffer contains the data
from keyboard. The flag is set to 0, after the data is passed from DATAIN
buffer to the processor.
SOUT flag – This flag is set to 1, when DATAOUT buffer is empty and
the data can be added to it by processor. The flag is set to 0, when
DATAOUT buffer has the data to be displayed.
KIRQ (Keyboard Interrupt Request) – By setting this flag to 1, keyboard
requests the processor to obtain its service and an interrupt is sent to the
processor. It is used along with the SIN flag.
6
Digital Design &Computer Organization(BCS302) Module -4
DIRQ(Display Interrupt Request) – The output device request the
processor to obtain its service for output operation, by activating this flag
to 1.
Control registers
KEN (keyboard Enable) – Enables the keyboard for input operations.
DEN (Display Enable) – Enables the output device for input operations.
Program Controlled I/O
In this technique CPU is responsible for executing data from the memory for
output and storing data in memory for executing of Programmed I/O
Drawback of the Programmed I/O: was that the CPU has to monitor the units
all the times when the program is executing. Thus, the CPU stays in a
program loop until the I/O unit indicates that it is ready for data transfer.
This is a time-consuming process and the CPU time is wasted a lot in keeping
an eye to the executing of program.
The program checks the status of I/O register and reads or displays data.
Here the I/O operationis controlled by program.
WAITK TestBit #0, STATUS (Checks SIN
flag)Branch = 0 WAITK
Move DATAIN, R0 (Read character)[
*Code to read a character from DATAIN to R0]
This code checks the SIN flag, and if it is set to 0 (ie. If no character in DATAIN
Buffer), then move back to WAITK label. This loop continues until SIN flag is set
to 1. When SIN is 1, data ismoved from DATAIN to R0 register. Thus the program,
continuously checks for input operation.
7
Digital Design &Computer Organization(BCS302) Module -4
Interrupt
It is an event which suspends the execution of one program and begins the
execution of another program.
In program controlled I/O, a program should continuously check whether the
I/O device is free. By this continuous checking the processor execution time is
wasted. It can be avoided by I/O device sending an ‘interrupt’ to the processor,
when I/O device is free.
The interrupt invokes a subroutine called Interrupt Service Routine (ISR),
which resolves the cause of interrupt.
The occurrence of interrupt causes the processor to transfer the execution
control from user program to ISR.
Program1 ISR
8
Digital Design &Computer Organization(BCS302) Module -4
The following steps takes place when the interrupt related instruction is
executed:
The following steps takes place when ‘return’ instruction is executed in ISR -
It transfers the execution control from ISR to user program.
It retrieves the content of stack memory location whose address is stored in SP
into the PC.
After retrieving the return address from stack memory location into the PC
it increments theContent of SP by 4 memory location.
Interrupt Latency / interrupt response time is the delay between the time
taken for receiving aninterrupt request and start of the execution of the ISR.
Generally, the long interrupt latency is unacceptable.
INTERRUPT HARDWARE
The external device (I/O device) sends interrupt request to the processor by
activating a bus lineand called as interrupt request line.
All I/O device uses the same single interrupt-request line.
One end of this interrupt request line is connected to input power supply by
means of a register.
The another end of interrupt request line is connected to INTR (Interrupt
request) signal ofprocessor as shown in the fig.
9
Digital Design &Computer Organization(BCS302) Module -4
When switch is closed the voltage drop on the interrupt request line is found
to be zero, as theswitch is grounded, hence INTR=0 and INTR=1.
The signal on the interrupt request line is logical OR of requests from the
several I/O devices.Therefore, INTR=INTR1 + INTR2 + + INTRn
The arrival of interrupt request from external devices or from within a process,
causes the suspension ofon-going execution and start the execution of another
program.
Interrupt arrives at any time and it alters the sequence of execution. Hence
the interrupt to beexecuted must be selected carefully.
All computers can enable and disable interruptions as desired.
When an interrupt is under execution, other interrupts should not be
invoked. This is performedin a system in different ways.
The problem of infinite loop occurs due to successive interruptions of active
INTR signals.
10
Digital Design &Computer Organization(BCS302) Module -4
3) Interrupts are disabled by changing the control bits in the processor status
register (PS).
4) The device is informed that its request has been recognized. And in
response, the device deactivates the interrupt-request signal.
5) The action requested by the interrupt is performed by the interrupt-service
routine.
6) Interrupts are enabled and execution of the interrupted program is
resumed.
VECTORED INTERRUPT
• A device requesting an interrupt identifies itself by sending a special-code to
processor over bus.
• Then, the processor starts executing the ISR.
11
Digital Design &Computer Organization(BCS302) Module -4
• Processor
• Then, I/O-device responds by sending its interrupt-vector code & turning off the
INTR signal.
• The interrupt vector also includes a new value for the Processor Status Register
INTERRUPT NESTING
• A multiple-priority scheme is implemented by using separate INTR & INTA lines for
each device
• Each INTR line is assigned a different priority-level as shown in Figure.
INTR1 INTRp
Processor
INTA INT p
Priority
• Interrupt requests received over these lines are sent to a priority arbitration circuit
in the processor.
• If the interrupt request has a higher priority level than the priority of the processor,
then the request is accepted.
• Priority-level of processor is the priority of program that is currently being executed.
• Processor accepts interrupts only from devices that have higher-priority than its
own.
12
Digital Design &Computer Organization(BCS302) Module -4
• At the time of execution of ISR for some device, priority of processor is raised to
that of the device.
• Thus, interrupts from devices at the same level of priority or lower are disabled.
Privileged Instruction
• Processor's priority is encoded in a few bits of PS word. (PS = Processor-Status).
Privileged Exception
• User program cannot
The interrupt acknowledge line is connected in a daisy fashion as shown in the figure.
This signal is received by device 1. The device-1 blocks the propagation of INTA
signal to device-2,when it needs processor service.
The device-1 transfers the INTA signal to next device when it does not require the
processor service.
13
Digital Design &Computer Organization(BCS302) Module -4
• In this technique, devices are organizes in a group and each group is connected
to the processor at adifferent priority level.
• With in a group devices are connected in a daisy chain fashion as shown in the
figure.
14
Digital Design &Computer Organization(BCS302) Module -4
.
Word-Count register:
The format of word count register is as shown in fig. It is used to store the no of
words to be transferredfrom main memory to external devices and vice versa.
a) DONE bit:
The DMA controller sets this bit to 1 when it completes the direct data
transfer between mainmemory and external devices.
This information is informed to CPU by means of DONE bit.
b) R/W (Read or Write):
This bit is used to differentiate between memory read or memory write
operation.
The R/W = 1 for read operation and
= 0 for write operation.
When this bit is set to 1, DMA controller transfers the one block of data
from external deviceto main memory.
When this bit is set to 0, DMA controller transfers the one block of data
from main memoryto external device.
15
Digital Design &Computer Organization(BCS302) Module -4
c) IE (Interrupt enable) bit:
The DMA controller enables the interrupt enable bit after the completion of
The DMA controller requests the CPU to transfer new block of data
from source todestination by activating this bit.
The DMA controller connects two external devices namely disk 1 and disk 2
to system bus asshown in the above fig.
The DMA controller also interconnects high speed network devices to system
bus as shownin the above fig.
Let us consider direct data transfer operation by means of DMA controller
without the involvement of CPU in between main memory and disk 1 as
indicated by dotted lines (in the fig.).
To establish direct data transfer operation between main memory and disk
1. DMA controller request the processor to obtain 3 parameters namely:
1)Starting address of the memory block.
2)No of words to be transferred.
3)Type of operation (Read or Write).
16
Digital Design &Computer Organization(BCS302) Module -4
Actually the CPU generates memory cycles to perform read and write
operations. The DMA controller steals memory cycles from the CPU to
perform read and write operations. This approach is called as “Cycle
stealing”.
An exclusive option will be given for DMA controller to transfer block of data from
external devices to main memory and from main memory to external devices.
This technique is called as “Burst mode of operation.”
BUS ARBITRATION
Any device which initiates data transfer operation on bus at any instant of
time is called as Bus-Master.
When the bus mastership is transferred from one device to another device,
the next device isready to obtain the bus mastership.
The bus-mastership is transferred from one device to another device based
on the principle ofpriority system. There are two types of bus-arbitration
technique:
17
Digital Design &Computer Organization(BCS302) Module -4
The following steps are necessary to transfer the bus mastership from CPU to
one of the DMAcontroller:
The DMA controller request the processor to obtain the bus mastership by
activating BR (Busrequest) signal
In response to this signal the CPU transfers the bus mastership to
requested devices DMAcontroller1 in the form of BG (Bus grant).
When the bus mastership is obtained from CPU the DMA controller1 blocks the
propagation of busgrant signal from one device to another device.
The BG signal is connected to DMA controller2 from DMA controller1 in as daisy
fashion style isas shown in the figure.
When the DMA controller1 has not sent BR request, it transfers the bus
mastership to DMAcontroller2 by unblocking bus grant signal.
When the DMA controller1 receives the bus grant signal, it blocks the signal from
passing to DMA controller2 and enables BBSY signal. When BBSY signal is set to 1
the set of devices connected to system bus doesn’t have any rights to obtain the bus
mastership from the CPU.
18
Digital Design &Computer Organization(BCS302) MODULE -4
19
Digital Design &Computer Organization(BCS302) MODULE -4
Registers: The fastest access is to data held in registers. Hence registers are part of
the memory hierarchy. More speed, small size and cost per bit is also more.
At the next level of hierarchy, small amount of memory can be directly
implemented on the processor chip.
This memory is called as processor cache. It holds the copy of recently accessed
data and instructions.
There are 2 levels of caches viz level-1 and level-2.
Level-1 cache is part of the processor and level-2 cache is placed in
between level-1 cache and main memory.
The level-2 cache is implemented using SRAM chips
20
Digital Design &Computer Organization(BCS302) MODULE -4
The access time for main-memory is about 10 times longer than the
access time for L1cache.
Cache Memory
It is the fast access memory located in between processor and main memory
“Locality of Reference”.
Locality of Reference
• Many instructions in the localized areas of program are executed repeatedly during
sometime of execution
• Remainder of the program is accessed relatively infrequently
2) Spatial
Instructions in close proximity to recently executed instruction are likely to be
can be reduced.
• Cache Block / cache line refers to the set of contiguous address locations of some
size.
• This number of blocks is small compared to the total number of blocks available
in main-memory.
• Correspondence b/w main-memory-block & cache-memory-block is specified by
mapping-function.
• If the cache memory is full, one of the block should be removed to create space
1) Write-throughprotocol &
2) Write-back protocol.
Write-Through Protocol
Here the cache-location and the main-memory-locations are updated
simultaneously.
Write-Back Protocol
This technique is to
→ update only the cache-location &
→ mark the cache-location with a flag bit called Dirty/Modified Bit.
During Read-operation
• If the requested-word currently does not exists in the cache, then read-miss will
occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
Load–Through Protocol
The block of words that contains the requested-word is copied from the
During Write-operation
• If the requested-word does not exists in the cache, then write-miss will occur.
Mapping functions
There are 3 techniques to map main memory blocks into cache memory –
1. Direct mapped cache
2. Associative Mapping
3. Set-Associative Mapping
DIRECT MAPPING
• The simplest way to determine cache locations in which to store memory blocks
2) But more than one memory-block is mapped onto a given cache-block position.
• The contention is resolved by allowing the new blocks to overwrite the currently resident-block.
The main memory block is loaded into cache block by means of memory address. The main memory
address consists of 3 fields as shown in the figure.
Each block consists of 16 words. Hence least significant 4 bits are used to select one of the 16
words.
The 7bits of memory address are used to specify the position of the cache block, location. The most
significant 5 bits of the memory address are stored in the tag bits. The tag bits are used to map one of
25 = 32 blocks into cache block location (tag bit has value 0-31).
The higher order 5 bits of memory address are compared with the tag bits associated with cache
location. If they match, then the desired word is in that block of the cache.
If there is no match, then the block containing the required word must first be read from the main memory
Digital Design &Computer Organization(BCS302) MODULE -4
and loaded into the cache. It is very easy to implement, but not flexible.
2. Associative Mapping:
It is also called as associative mapped cache. It is much more flexible.
In this technique main memory block can be placed into any cache block
position.
In this case , 12 tag bits are required to identify a memory block when it is
resident of the cache memory.
The Associative Mapping technique is illustrated as shown in the fig.
In this technique 12 bits of address generated by the processor are compared with
the tag bits of each block of the cache to see if the desired block is present. This
is called as associative mapping technique.
Digital Design &Computer Organization(BCS302) MODULE -4
• The Valid-bit indicates that whether the block contains valid-data (updated data).
• The dirty bit indicates that whether the block has been modified during its cache
residency.
Valid-bit=0 - When power is initially applied to system.
is already exists in the cache, then the valid-bit will be cleared to “0‟.
• If Processor & DMA uses the same copies of data then it is called as Cache
Coherence Problem.
• Advantages:
Digital Design &Computer Organization(BCS302) MODULE -4
1) Contention problem of direct mapping is solved by having few choices for block
placement.
MODULE 5:
Basic Processing Unit and Pipelining
Basic Processing Unit: Some Fundamental Concepts: Register Transfers, Performing ALU
operations, fetching a word from Memory, Storing a word in memory. Execution of a Complete
Instruction.
Pipelining: Basic concepts, Role of Cache memory, Pipeline Performance.
• Here the processor contain only a single bus for the movement of data,
address andinstructions.
• ALU and all the registers are interconnected via a Single Common Bus
(Figure 7.1).
• Data & address lines of the external memory-bus is connected to
the internal processor-bus via MDR & MAR respectively.
(MDR -> Memory Data Register, MAR -> Memory Address Register).
• MDR has 2 inputs and 2 outputs. Data may be loaded
→ into MDR either from memory-bus (external) or
→ from processor-bus (internal).
• MAR‟s input is connected to internal-bus; MAR‟s output is connected to
external- bus. (address sent from processor to memory only)
Digital Design and Computer Organization (BCS302)
Module V
CONTROL-SIGNALS OF MDR
• The MDR register has 4 control-signals (Figure 7.4):
1) MDRin & MDRout control the connection to the internal processor data bus
&
2) MDRinE & MDRoutE control the connection to the external memory Data
bus.
• Similarly, MAR register has 2 control-signals.
1) MARin: controls the connection to the internal processor address bus &
2) MARout: controls the connection to the memory address bus.
Pipelining:
Basic Concepts:
The speed of execution of programs is influenced by many factors.
One way to improve performance is to use faster circuit technology to build the
processor and the main memory. Another possibility is to arrange the hardware so that
more than one operation can be performed at the same time. In this way, the number
of operations performed per second is increased even though the elapsed time needed
to perform any one operation is not changed.
Pipelining is a particularly effective way of organizing concurrent activity in a
computer system.
The technique of decomposing a sequential process into sub-operations, with each sub-
operation being executed in a dedicated segment .
pipelining is commonly known as an assembly-line operation.
Digital Design and Computer Organization (BCS302)
Consider how the idea of pipelining can be used in a computer. The processor executes
a program by fetching and executing instructions, one after the other.
Let Fi and Ei refer to the fetch and execute steps for instruction Ii . Execution of a
program consists of a sequence of fetch and execute steps, as shown in Figure a.
Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure b. The instruction
fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer
is needed to enable the execution unit to execute the instruction while the fetch unit is
fetching the next instruction. The results of execution are deposited in the destination
location specified by the instruction.
The computer is controlled by a clock.
any instruction fetch and execute steps completed in one clock cycle.
Operation of the computer proceeds as in Figure 8.1c.
In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and
stores it in buffer B1 at the end of the clock cycle.
In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I2 (step F2). Meanwhile, the execution unit performs the
operation specified by instruction I1, which is available to it in buffer B1 (step E1).
By the end of the second clock cycle, the execution of instruction I1 is completed
and instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which is
no longer needed.
Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and
execute units are kept busy all the time. If the pattern in Figure 8.1c can be
sustained for a long time, the completion rate of instruction execution will be twice
that achievable by the sequential operation depicted in Figure a.
Digital Design and Computer Organization (BCS302)
The sequence of events for this case is shown in Figure a. Four instructions are in
progress at any given time. This means that four distinct hardware units are
needed, as shown in Figure b. These units must be capable of performing their
tasks simultaneously and without interfering with one another. Information is
passed from one unit to the next through a storage buffer. As an instruction
progresses through the pipeline, all the information needed by the stages
Digital Design and Computer Organization (BCS302)
downstream must be passed along. For example, during clock cycle 4, the
information in the buffers is as follows:
Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being
decoded by the instruction-decoding unit.
Buffer B2 holds both the source operands for instruction I2 and the
specification of the operation to be performed. This is the information
produced by the decoding hardware in cycle 3. The buffer also holds the
information needed for the write step of instruction I2 (stepW2). Even though
it is not needed by stage E, this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required Write
operation.
Buffer B3 holds the results produced by the execution unit and the
destination information for instruction I1.
Pipeline Performance:
The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
However, this increase would be achieved only if pipelined operation as
depicted in Figure a could be sustained without interruption throughout
program execution.
Unfortunately, this is not the True.
Floating point may involve many clock cycle.
For a variety of reasons, one of the pipeline stages may not be able to
complete its processing task for a given instruction in the time allotted. For
example, stage E in the four stage pipeline of Figure b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task.
Although this may be sufficient for most operations, some operations, such
as divide, may require more time to complete. Figure shows an example in
which the operation specified in instruction I2 requires three cycles to
complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write
stage must be told to do nothing, because it has no data to work with.
Meanwhile, the information in buffer B2 must remain intact until the
Execute stage has completed its operation. This means that stage 2 and, in
turn, stage 1 are blocked from accepting new instructions because the
information in B1 cannot be overwritten. Thus, steps D4 and F5 must be
postponed as shown.
Pipelined operation in Figure 8.3 is said to have been stalled for two clock
cycles. Normal pipelined operation resumes in cycle 7. Any condition that
causes the pipeline to stall is called a hazard. We have just seen an example
of a data hazard.
1) A data hazard is any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline. As a result some operation has to be
delayed, and the pipeline stalls.
Digital Design and Computer Organization (BCS302)
If instructions and data reside in the same cache unit, only one instruction can
proceed and the other instruction is delayed. Many processors use separate
instruction and data caches to avoid this delay.
An example of a structural hazard is shown in Figure. This figure shows how the
load instruction
Load X(R1),R2
The memory address, X+[R1], is computed in stepE2 in cycle 4, then memory
access takes place in cycle 5. The operand read from memory is written into
register R2 in cycle 6. This means that the execution step of this instruction
takes two clock cycles (cycles 4 and 5). It causes the pipeline to stall for one
cycle, because both instructions I2 and I3 require access to the register file
in cycle 6.
Even though the instructions and their data are all available, the pipeline is
stalled because one hardware resource, the register file, cannot handle two
operations at once. If the register file had two input ports, that is, if it allowed
two simultaneous write operations, the pipeline would not be stalled. In
general, structural hazards are avoided by providing sufficient hardware
resources on the processor chip.