UNIT – V (I/O Interface)
The I/O subsystem of a computer provides an efficient mode of communication between the
central system and the outside environment. It handles all the input- output operations of the
computer system. Input or output devices that are connected to computer are called peripheral
devices.
Input-Output Interface is used as a method which helps in transferring of information between
the internal storage devices i.e. memory and the external peripheral device. A peripheral device
is that which provide input and output for the computer, it is also called Input-Output devices.
For Example: A keyboard and mouse provide Input to the computer are called input devices
while a monitor and printer that provide output to the computer are called output devices. Just
like the external hard-drives, there is also availability of some peripheral devices which are
able to provide both input and output.
Input-Output Interface :
In micro-computer base system, the only purpose of peripheral devices is just to
provide special communication links for the interfacing them with the CPU. To resolve the
differences between peripheral devices and CPU, there is a special need for communication
links.
The major differences are as follows:
1. The nature of peripheral devices is electromagnetic and electro-mechanical. The nature of
the CPU is electronic. There is a lot of difference in the mode of operation of both
peripheral devices and CPU.
2. There is also a synchronization mechanism because the data transfer rate of peripheral
devices are slow than CPU.
3. In peripheral devices, data code and formats are differ from the format in the CPU and
memory.
4. The operating mode of peripheral devices are different and each may be controlled so as
not to disturb the operation of other peripheral devices connected to CPU.
Functions of Input-Output Interface:
1. It is used to synchronize the operating speed of CPU with respect to input-output devices.
2. It selects the input-output device which is appropriate for the interpretation of the input-
output signal.
3. It is capable of providing signals like control and timing signals.
4. In this data buffering can be possible through data bus.
5. There are various error detectors.
6. It converts serial data into parallel data and vice-versa.
7. It also convert digital data into analog signal and vice-versa.
Mode of Transfer:
The binary information that is received from an external device is usually stored in the memory
unit. The information that is transferred from the CPU to the external device is originated from
the memory unit. CPU merely processes the information but the source and target is always the
memory unit. Data transfer between CPU and the I/O devices may be done in different modes.
Data transfer to and from the peripherals may be done in any of the three possible ways
1. Programmed I/O.
2. Interrupt- initiated I/O.
3. Direct memory access( DMA).
Programmed I/O: It is due to the result of the I/O instructions that are written in the
computer program. Each data item transfer is initiated by an instruction in the program.
Usually the transfer is from a CPU register and memory. In this case it requires constant
monitoring by the CPU of the peripheral devices.
Interrupt- initiated I/O: Since in the above case we saw the CPU is kept busy
unnecessarily. This situation can very well be avoided by using an interrupt driven method for
data transfer. By using interrupt facility and special commands to inform the interface to issue
an interrupt request signal whenever data is available from any device. In the meantime the
CPU can proceed for any other program execution. The interface meanwhile keeps monitoring
the device. Whenever it is determined that the device is ready for data transfer it initiates an
interrupt request signal to the computer. Upon detection of an external interrupt signal the CPU
stops momentarily the task that it was already performing, branches to the service program to
process the I/O transfer, and then return to the task it was originally performing.
The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
The processor is tied up in managing an I/O transfer; a number of instructions must be
executed for each I/O transfer.
Terms:
Hardware Interrupts: Interrupts present in the hardware pins.
Software Interrupts: These are the instructions used in the program
whenever the required functionality is needed.
Vectored interrupts: These interrupts are associated with the static vector
address.
Non-vectored interrupts: These interrupts are associated with the dynamic
vector address.
Maskable Interrupts: These interrupts can be enabled or disabled explicitly.
Non-maskable interrupts: These are always in the enabled state. we cannot
disable them.
External interrupts: Generated by external devices such as I/O.
Internal interrupts: These devices are generated by the internal components
of the processor such as power failure, error instruction, temperature sensor,
etc.
Synchronous interrupts: These interrupts are controlled by the fixed time
interval. All the interval interrupts are called as synchronous interrupts.
Asynchronous interrupts: These are initiated based on the feedback of
previous instructions. All the external interrupts are called as asynchronous
interrupts.
Direct Memory Access: The data transfer between a fast storage media such as magnetic
disk and memory unit is limited by the speed of the CPU. Thus we can allow the peripherals
directly communicate with each other using the memory buses, removing the intervention of
the CPU. This type of data transfer technique is known as DMA or direct memory access.
During DMA the CPU is idle and it has no control over the memory buses. The DMA
controller takes over the buses to manage the transfer directly between the I/O devices and the
memory unit.
Working of DMA Controller
The DMA controller registers have three registers as follows.
Address register – It contains the address to specify the desired location in memory.
Word count register – It contains the number of words to be transferred.
Control register – It specifies the transfer mode.
Note: All registers in the DMA appear to the CPU as I/O interface registers. Therefore, the
CPU can both read and write into the DMA registers under program control via the data bus.
The figure below shows the block diagram of the DMA controller. The unit communicates
with the CPU through the data bus and control lines. Through the use of the address bus and
allowing the DMA and RS register to select inputs, the register within the DMA is chosen by
the CPU. RD and WR are two-way inputs. When BG (bus grant) input is 0, the CPU can
communicate with DMA registers. When BG (bus grant) input is 1, the CPU has relinquished
he buses and DMA can communicate directly with the memory.
Working Diagram of DMA Controller
Explanation: The CPU initializes the DMA by sending the given information through the data
bus.
The starting address of the memory block where the data is available (to read) or where
data are to be stored (to write).
It also sends word count which is the number of words in the memory block to be read or
written.
Control to define the mode of transfer such as read or write.
A control to begin the DMA transfer
Modes of Data Transfer in DMA
There are 3 modes of data transfer in DMA that are described below.
Burst Mode: In Burst Mode, buses are handed over to the CPU by the DMA if the whole
data is completely transferred, not before that.
Cycle Stealing Mode: In Cycle Stealing Mode, buses are handed over to the CPU by the
DMA after the transfer of each byte. Continuous request for bus control is generated by this
Data Transfer Mode. It works more easily for higher-priority tasks.
Transparent Mode: Transparent Mode in DMA does not require any bus in the transfer of
the data as it works when the CPU is executing the transaction.
Standardization: I/O interfaces provide a standard way of communicating with external
devices. This means that different devices can be connected to a computer using the same
interface, which makes it easier to swap out devices and reduces the need for specialized
hardware.
Modularity: With I/O interfaces, different devices can be added or removed from a computer
without affecting the other components. This makes it easier to upgrade or replace a faulty
device without affecting the rest of the system.
Efficiency: I/O interfaces can transfer data between the computer and the external devices at
high speeds, which allows for faster data transfer and processing times.
Compatibility: I/O interfaces are designed to be compatible with a wide range of devices, which
means that users can choose from a variety of devices that are compatible with their computer’s
I/O interface.
Parallel processing
Parallel processing can be described as a class of techniques which enables the system to
achieve simultaneous data-processing tasks to increase the computational speed of a computer
system.
A parallel processing system can carry out simultaneous data-processing to achieve faster
execution time. For instance, while an instruction is being processed in the ALU component of
the CPU, the next instruction can be read from memory.
The primary purpose of parallel processing is to enhance the computer processing capability and
increase its throughput, i.e. the amount of processing that can be accomplished during a given
interval of time.
A parallel processing system can be achieved by having a multiplicity of functional units that
perform identical or different operations simultaneously. The data can be distributed among
various multiple functional units.
Parallel processing is used to increase the computational speed of computer systems by
performing multiple data-processing operations simultaneously. For example, while an
instruction is being executed in ALU, the next instruction can be read from memory. The
system can have two or more ALUs and be able to execute multiple instructions at the same
time. In addition, two or more processing is also used to speed up computer processing
capacity and increases with parallel processing, and with it, the cost of the system increases.
But, technological development has reduced hardware costs to the point where parallel
processing methods are economically possible. Parallel processing derives from multiple levels
of complexity. It is distinguished between parallel and serial operations by the type of registers
used at the lowest level.
Instruction Pipeline:
In this a stream of instructions can be executed by overlapping fetch, decode and execute
phases of an instruction cycle. This type of technique is used to increase the throughput of the
computer system. An instruction pipeline reads instruction from the memory while previous
instructions are being executed in other segments of the pipeline. Thus we can execute multiple
instructions simultaneously. The pipeline will be more efficient if the instruction cycle is
divided into segments of equal duration.
In the most general case computer needs to process each instruction in following sequence of
steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the effective address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline is shown below.
Now it is decoded in next clock cycle, then operands are fetched and finally the instruction is
executed. We can see that here the fetch and decode phase overlap due to pipelining. By the
time the first instruction is being decoded, next instruction is fetched by the pipeline.
In case of third instruction we see that it is a branched instruction. Here when it is being
decoded 4th instruction is fetched simultaneously. But as it is a branched instruction it may
point to some other instruction when it is decoded. Thus fourth instruction is kept on hold until
the branched instruction is executed. When it gets executed then the fourth instruction is
copied back and the other phases continue as usual.
RISC is the way to make hardware simpler whereas CISC is the single instruction that handles
multiple work. In this article, we are going to discuss RISC and CISC in detail as well as the
Difference between RISC and CISC, Let’s proceed with RISC first.
Reduced Instruction Set Architecture (RISC)
The main idea behind this is to simplify hardware by using an instruction set composed of a
few basic steps for loading, evaluating, and storing operations just like a load command will
load data, a store command will store the data.
Characteristics of RISC
Simpler instruction, hence simple instruction decoding.
Instruction comes undersize of one word.
Instruction takes a single clock cycle to get executed.
More general-purpose registers.
Simple Addressing Modes.
Fewer Data types.
A pipeline can be achieved.
Advantages of RISC
Simpler instructions: RISC processors use a smaller set of simple instructions, which
makes them easier to decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can
execute instructions faster than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors,
making them ideal for portable devices.
Disadvantages of RISC
More instructions required: RISC processors require more instructions to perform
complex tasks than CISC processors.
Increased memory usage: RISC processors require more memory to store the additional
instructions needed to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than
CISC processors.
Array Processor performs computations on large array of data. These are two types of Array
Processors: Attached Array Processor, and SIMD Array Processor. These are explained as
following below.
Arithmetic Pipeline:
An arithmetic pipeline divides an arithmetic problem into various sub problems for execution
in various pipeline segments. It is used for floating point operations, multiplication and various
other computations. The process or flowchart arithmetic pipeline for floating point addition is
shown in the diagram.
Floating point addition using arithmetic pipeline :
The following sub operations are performed in this case:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalise the result
First of all the two exponents are compared and the larger of two exponents is chosen as the
result exponent. The difference in the exponents then decides how many times we must shift
the smaller exponent to the right. Then after shifting of exponent, both the mantissas get
aligned. Finally the addition of both numbers take place followed by normalisation of the result
in the last segment.
Example:
Let us consider two numbers,
X=0.3214*10^3 and Y=0.4500*10^2
Explanation:
First of all the two exponents are subtracted to give 3-2=1. Thus 3 becomes the exponent of
result and the smaller exponent is shifted 1 times to the right to give
Y=0.0450*10^3
Finally the two numbers are added to produce
Z=0.3664*10^3
As the result is already normalized the result remains the same.
Vector Processing
Vector processing is a central processing unit that can perform the complete vector input in
individual instruction. It is a complete unit of hardware resources that implements a sequential
set of similar data elements in the memory using individual instruction.
The scientific and research computations involve many computations which require extensive
and high-power computers. These computations when run in a conventional computer may take
days or weeks to complete. The science and engineering problems can be specified in methods of
vectors and matrices using vector processing.
Types of Array Processor :
Array Processor performs computations on large array of data. These are two types of Array
Processors: Attached Array Processor, and SIMD Array Processor. These are explained as
following below.
There are various features of Vector Processing which are as follows −
A vector is a structured set of elements. The elements in a vector are scalar quantities. A
vector operand includes an ordered set of n elements, where n is known as the length of the
vector.
Each clock period processes two successive pairs of elements. During one single clock
period, the dual vector pipes and the dual sets of vector functional units allow the processing
of two pairs of elements.
As the completion of each pair of operations takes place, the results are delivered to
appropriate elements of the result register. The operation continues just before the various
elements processed are similar to the count particularized by the vector length register.
In parallel vector processing, more than two results are generated per clock cycle. The
parallel vector operations are automatically started under the following two circumstances −
o When successive vector instructions facilitate different functional units and multiple
vector registers.
o When successive vector instructions use the resulting flow from one vector register as
the operand of another operation utilizing a different functional unit. This phase is
known as chaining.
A vector processor implements better with higher vectors because of the foundation delay in
a pipeline.
Vector processing decrease the overhead related to maintenance of the loop-control variables
which creates it more efficient than scalar processing.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational
tasks auxiliary processor is attached to it.
Attached array processor has two interfaces:
1. Input output interface to a common processor.
2. Interface with a local memory.
Here local memory interconnects main memory. Host computer is general purpose computer.
Attached processor is back end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the computer
treats it as an external interface.
Working:
Let us assume that we are executing vast number of instructions, in that case it is not possible
to execute all instructions with the help of host computer. Sometimes it may take days of
weeks to execute these vast number of introductions. So in order to enhance the speed and
performance of the host computer as shown in above diagram.
I/o interface is used to connect and resolve the difference between host and attached process.
Attached array processor is normally used to enhance the performance of the host
computer.Array processor mean bunch/group of process used together to perform any
computation.
2.SIMDarrayprocessor:
This is computer with multiple process unit operating in parallel Both types of array
processors, manipulate vectors but their internal organization is different.
SIMD is a computer with multiple processing units operating in parallel.
The processing units are synchronized to perform the same operation under the control of a
common control unit. Thus providing a single instruction stream, multiple data stream (SIMD)
organization. As shown in figure, SIMD contains a set of identical processing elements (PES)
each having a local memory M.
Working:
Here array processor is in built into the computer unlike in attached array processor where
array processor is attached externally to the host computer. Initially mark control unit decodes
the instruction and generate the control signals and passes it into all the processor
elements(PE’s) or ALU upon receiving the control signals from master control unit, all the
processing elements come to throw that operations need to perform. The data perform the
operations will be accessed from main memory into the local memory of respective PE’s.
SIMD array processor is normally used to compute vector data. All PE’s execute same
instruction simultaneously on different data.
for ex: Ci= Ai + Bi
Here the vector instruction Ci= Ai+ Bi need to be executed is addition operation, the master
control unit generate control signal and passes it onto all processing elements and data
Multiprocessor:
Multiprocessor: A Multiprocessor is a computer system with two or more central processing
units (CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed, with other objectives being fault
tolerance and application matching. There are two types of multiprocessors, one is called
shared memory multiprocessor and another is distributed memory multiprocessor. In shared
memory multiprocessors, all the CPUs shares the common memory but in a
distributed memory multiprocessor, every CPU has its own private memory.
The interconnection among two or more processor and shared memory is done with three
methods
1) Time shared common bus
2) Multiport memories
3) Crossbar switch network
Some characteristics of multiprocessors include :
Parallel computing
Multiprocessors use multiple processors to perform asynchronous parallelism, which allows
for parallel execution of different processes.
Shared resources
Multiprocessors share resources like memory, a system bus, and peripherals among their
CPUs. This can simplify communication between processors.
Reliability
If one processor fails, the other processors can continue to function, though the system might
slow down.
High throughput
Multiprocessors can achieve higher throughput because they can use more than one processor
at a time.
Pipelining
Multiprocessors can use pipelining to execute multiple instructions simultaneously, which can
increase system performance.
Uniform memory access
In a uniform memory access (UMA) model, all processors split the physical memory equally.
A multiprocessor is a computer that has many CPUs that share the main memory, a computer
bus, and peripherals to process programs simultaneously. These systems are also known as
strongly connected systems. Multiprocessors have the advantages of higher throughput,
increased dependability, and economies of scale.