Unit 6
Unit 6
CONTROL UNIT
Control Unit and Peripheral Devices
• F1, F2, F3 are the micro-operation fields. They determine micro-operations for the computer.
• CD is the condition for branching. They choose the status bit conditions.
• AD is the address field. It includes the address field whose length is 7 bits.
• The micro-operations are divided into three fields of three bits each. These three bits can define
seven different micro-operations. In total there are 21 operations as displayed in the table
Micro-programmed control unit
• The control signals associated with operations are stored in special
memory units inaccessible by the programmer as Control Words.
• Control signals are generated by a program that is similar to machine
language programs.
• The micro-programmed control unit is slower in speed because of the
time it takes to fetch microinstructions from the control memory.
Micro-programmed control unit
• Some Important Terms
• Control Word: A control word is a word whose individual bits represent
various control signals.
• Micro-routine: A sequence of control words corresponding to the control
sequence of a machine instruction constitutes the micro-routine for that
instruction.
• Micro-instruction: Individual control words in this micro-routine are
referred to as microinstructions.
• Micro-program: A sequence of micro-instructions is called a micro-
program, which is stored in a ROM or RAM called a Control Memory (CM).
• Control Store: the micro-routines for all instructions in the instruction set
of a computer are stored in a special memory called the Control Store.
Microprogrammed Control
Micro-operation, Micro-instruction, Micro program, Microcode
• Micro-operations:
• micro-operations are detailed low-level instructions used in some designs to
implement complex machine instructions (sometimes termed macro-instructions
in this context).
• Micro instruction:
• A symbolic microprogram can be translated into its binary equivalent by means of
an assembler.
• Each line of the assembly language microprogram, defines a symbolic
microinstruction.
• Each symbolic microinstruction is divided into five fields: label, micro-operations,
CD, BR, and AD.
Micro instruction format and applications of microprogramming
• The microinstruction format for the control memory is shown in figure 4.5. The 20 bits of the
microinstruction are divided into four functional parts as follows:
• 1. The three fields F1, F2, and F3 specify microoperations for the computer. The microoperations
are subdivided into three fields of three bits each. The three bits in each field are encoded to
specify seven distinct micro operations. This gives a total of 21 microoperations.
• 2. The CD field selects status bit conditions.
• 3. The BR field specifies the type of branch to be used.
• 4. The AD field contains a branch address. The address field is seven bits wide, since the control
memory has 128 = 27 words.
Micro-programmed Control Unit
2. Vertical Micro-programmed Control Unit :
• Types of Micro-programmed Control Unit –
• Based on the type of Control Word stored in The control signals are represented in the
the Control Memory (CM), it is classified into
two types : encoded binary format. For N control signals-
• 1. Horizontal Micro-programmed Control Unit : Log2(N) bits are required.
The control signals are represented in the • It supports shorter control words.
decoded binary format that is 1 bit/CS.
• Example: If 53 Control signals are present in the • It supports easy implementation of new
processor then 53 bits are required. More than control signals therefore it is more flexible.
1 control signal can be enabled at a time.
• It supports longer control words. • It allows a low degree of parallelism i.e., the
• It is used in parallel processing applications. degree of parallelism is either 0 or 1.
• It allows a higher degree of parallelism. If
degree is n, n CS is enabled at a time. • Requires additional hardware (decoders) to
• It requires no additional hardware (decoders). generate control signals, it implies it is
It is faster than Vertical Microprogrammed.
slower than horizontal microprogrammed.
• Typical functions of a micro-program sequencer are incrementing the control address register by
one, loading into the control address register an address from control memory, transferring an
external address, or loading an initial address to start the control operations.
• The control data register holds the present microinstruction while the next address is computed
and read from memory.
• The data register is sometimes called a pipeline register.
• It allows the execution of the microoperations specified by the control word simultaneously with
the generation of the next microinstruction.
• This configuration requires a two-phase clock, with one clock applied to the address register and
the other to the data register.
• The main advantage of the micro programmed control is the fact that once the hardware
configuration is established; there should be no need for further hardware or wiring changes.
• If we want to establish a different control sequence for the system, all we need to do is specify a
different set of microinstructions for control memory
Microprogrammed Control
• The Control memory address register specifies the address of the micro-
instruction.
• The Control memory is assumed to be a ROM, within which all control
information is permanently stored.
• The control register holds the microinstruction fetched from the memory.
• The micro-instruction contains a control word that specifies one or more micro-
operations for the data processor.
• While the micro-operations are being executed, the next address is computed in
the next address generator circuit and then transferred into the control address
register to read the next microinstruction.
• The next address generator is often referred to as a micro-program sequencer, as
it determines the address sequence that is read from control memory.
Selection of address for control memory
1. Incrementing of the
control address register.
2. Unconditional branch or
conditional branch,
depending on status bit
conditions.
3. A mapping process from
the bits of the
instruction to an address
for control memory.
4. A facility for subroutine
call and return.
Selection of address for control memory
• Above figure 4.2 shows a block diagram of a control memory and the associated hardware
needed for selecting the next microinstruction address.
• The microinstruction in control memory contains a set of bits to initiate microoperations in
computer registers and other bits to specify the method by which the next address is obtained.
• The diagram shows four different paths from which the control address register (CAR) receives
the address.
• The incrementer increments the content of the control address register by one, to select the next
microinstruction in sequence.
• Branching is achieved by specifying the branch address in one of the fields of the
microinstruction.
• Conditional branching is obtained by using part of the microinstruction to select a specific status
bit in order to determine its condition.
Selection of address for control memory
• An external address is transferred into control memory via a mapping logic circuit.
• The return address for a subroutine is stored in a special register whose value is then
used when the micro-program wishes to return from the subroutine. 6 UNIT -III
Microprogrammed Control
• The branch logic of figure 4.2 provides decision-making capabilities in the control unit.
• The status conditions are special bits in the system that provide parameter information
such as the carry-out of an adder, the sign bit of a number, the mode bits of an
instruction, and input or output status conditions.
• The status bits, together with the field in the microinstruction that specifies a branch
address, control the conditional branch decisions generated in the branch logic.
• A 1 output in the multiplexer generates a control signal to transfer the branch address
from the microinstruction into the control address register.
• A 0 output in the multiplexer causes the address register to be incremented
I/O modules- Programmed I/O
• This I/O technique is the simplest to exchange data between external
devices and processors. In this technique, the processor or Central
Processing Unit (CPU) runs or executes a program giving direct control of
I/O operations.
• Processor issues a command to the I/O module and waits for the
operation to complete. Also, the processor keeps checking the I/O
module status until it finds the completion of the operation.
• The processor's time is wasted, as the processor is faster than the I/O
module. I/O module is considered to be a slow module.
• Its application is in certain low-end microcomputers. It has a single output
and single input instruction.
• Each one of the instruction selects only one I/O device by number and
transfers only a single character by byte.
• Four registers are involved in this technique and they are output status
and character and input status and character.
I/O modules- Programmed I/O…..
• Its disadvantage is busy waiting which means the processor consumes most
of its time in a tight loop by waiting for the I/O device to be ready to be
used. Program checks or polls an I/O hardware component, device, or item.
• For Example − A computer mouse that is within a loop.
• It is easy to understand. It is easy to program. It is slow and inefficient.
• The system's performance is degraded, severely. It does not require
initializing the stack.
• System's throughput is decreased due to the increase in the number of I/O
devices connected in the system. The best example is that of the PC device
Advanced Technology Attachment (ATA) interface using programmed I/O.
I/O modules-Interrupt Driven I/O
• It is similar to the programmed-driven I/O technique. The processor does not wait until the I/O operation is
completed. The processor performs other tasks while the I/O operation is being performed.
• When the I/O operation is completed, the I/O module interrupts the processor letting the processor know
the operation is completed. Its module is faster than the programmed I/O module.
• The processor actually starts the I/O device and instructs it to generate and send an interrupt signal when
the operation is finished. This is achieved by setting an interruptenabled bit in the status register.
• This technique requires an interrupt for each character that is written or read. It is an expensive business to
interrupt a running process as it requires saving context.
• It requires additional hardware such as a Direct Memory Access (DMA) controller chip. It is fast and efficient.
• It becomes difficult to code, in case the programmer is using a low-level programming language. It can get
difficult to get the various pieces to be put to work well together. This is done by the OS developer, for
example, Microsoft or the hardware manufacturer.
• The system's performance is enhanced. It requires initializing the stack.
• The system's throughput is not affected despite the number of I/O devices connected in the system
increasing as the throughput does not rely on the number.
• For Example − The computer mouse triggers and sends a signal to the program for processing the mouse
event.
• Interrupt-driven I/O is better as it is fast, efficient. The system's performance is improved and enhanced
Memory Mapped I/o and I/O mapped IO
• In Memory Mapped Input Output −
• We allocate a memory address to an Input-Output device.
• Any instructions related to memory can be accessed by this Input-Output device.
• The Input-Output device data are also given to the Arithmetic Logical Unit.
• Input-Output Mapped Input Output −
• We give an Input-Output address to an Input-Output device.
• Only IN and OUT instructions are accessed by such devices.
• The ALU operations are not directly applicable to such Input-Output data.
Memory Mapped I/o and I/O mapped IO
• I/O is any general-purpose port used by processor/controller to handle peripherals connected to it.
• I/O mapped I/Os have a separate address space from the memory. So, total addressed capacity is the
number of I/Os connected and a memory connected. Separate I/O-related instructions are used to access
I/Os. A separate signal is used for addressing an I/O device.
• Memory-mapped I/Os share the memory space with external memory. So, total addressed capacity is
memory connected only. This is underutilisation of resources if your processor supports I/O-mapped I/O. In
this case, instructions used to access I/Os are the same as that used for memory.
• Let's take an example of the 8085 processor. It has 16 address lines i.e. addressing capacity of 64 KB memory.
It supports I/O-mapped I/Os. It can address up to 256 I/Os.
• If we connect I/Os to it an I/O-mapped I/O then, it can address 256 I/Os + 64 KB memory. And special
instructions IN and OUT are used to access the peripherals. Here we fully utilize the addressing capacity of
the processor.
• If the peripherals are connected in memory mapped fashion, then total devices it can address is only 64K.
This is underutilisation of the resource. And only memory-accessing instructions like MVI, MOV, LOAD, SAVE
are used to access the I/O devices.
•
Memory Mapped I/o and I/O mapped IO
DMA (Direct Memory Access)
• The data transfer between a fast storage
media such as magnetic disk and memory
unit is limited by the speed of the CPU.
• Thus we can allow the peripherals to
directly communicate with each other
using the memory buses, removing the
intervention of the CPU.
• This type of data transfer technique is
known as DMA or direct memory access.
• During DMA the CPU is idle and it has no
control over the memory buses.
• The DMA controller takes over the buses
to manage the transfer directly between
the I/O devices and the memory unit.
DMA (Direct Memory Access)
• Bus Request : It is used by the DMA controller to request the CPU to relinquish the control of the buses.
• Bus Grant : It is activated by the CPU to Inform the external DMA controller that the buses are in high
impedance state and the requesting DMA can take control of the buses. Once the DMA has taken the control
of the buses it transfers the data. This transfer can take place in many ways.
• Types of DMA transfer using DMA controller:
• Burst Transfer :
DMA returns the bus after complete data transfer. A register is used as a byte count,
being decremented for each byte transfer, and upon the byte count reaching zero, the DMAC will
release the bus. When the DMAC operates in burst mode, the CPU is halted for the duration of the data
transfer.
•
Steps involved are:
• Bus grant request time.
• Transfer the entire block of data at transfer rate of device because the device is usually slow than the
speed at which the data can be transferred to CPU.
• Release the control of the bus back to CPU
So, total time taken to transfer the N bytes
= Bus grant request time + (N) * (memory transfer rate) + Bus release control time.
DMA (Direct Memory Access)
DMA (Direct Memory Access)
• Cyclic Stealing :
•
An alternative method in which DMA controller transfers one word at a time after which it must
return the control of the buses to the CPU. The CPU delays its operation only for one memory
cycle to allow the direct memory I/O transfer to “steal” one memory cycle.
Steps Involved are:
• Buffer the byte into the buffer
• Inform the CPU that the device has 1 byte to transfer (i.e. bus grant request)
• Transfer the byte (at system bus speed)
• Release the control of the bus back to CPU.
• Before moving on transfer next byte of data, device performs step 1 again so that bus isn’t tied up
and
the transfer won’t depend upon the transfer rate of device.
So, for 1 byte of transfer of data, time taken by using cycle stealing mode (T).
= time required for bus grant + 1 bus cycle to transfer data + time required to release the bus, it
will be
NxT
DMA (Direct Memory Access)
• In cycle stealing mode we always follow pipelining concept that when one byte is getting
transferred then Device is parallel preparing the next byte.
• “The fraction of CPU time to the data transfer time” if asked then cycle stealing mode is used.
Interleaved mode:
In this technique , the DMA controller takes over the system bus when the
microprocessor is not using it.An alternate half cycle i.e. half cycle DMA + half
cycle processor.
I/O processors and channels
• There has been an increased focus on GPGPUs since DirectX 10 included unified shaders in its
shader core specifications for Windows Vista.
• Higher-level languages are being developed all the time to ease programming for computations
on the GPU.
• Both AMD/ATI and Nvidia have approaches to GPGPU with their own APIs (OpenCL and CUDA,
respectively).
General Purpose GPUs
• The history of general-purpose GPUs
•
Nvidia’s GeForce 3 was the first GPU that featured programmable shaders.
• At that time, the purpose was making rasterized 3D graphics more realistic; the new GPU capabilities enabled 3D
transform, bump mapping, specular mapping and lighting computations.
• ATI’s 9700 GPU, the first DirectX 9-capable card approached the programming flexibility of CPUs, although few general
purpose calculations were done at the time.
• With the introduction of Windows Vista, bundled with DirectX 10, unified shader cores were specified as part of the
standard.
• GPU’s new-found potential demonstrated performance increases several orders of magnitude over CPU-based
calculations.
• GPGPUs and the future of computer graphics
•
GPUs that were originally developed to speed rasterized 3D (as raytracing was too expensive calculation-wise) have
surpassed the performance of CPUs for ray traced pre-rendered graphics.
• Although raytracing is not yet used in games, there have been real-time demonstrations.
• The advances of GPGPUs mean that in the not-too-distant future, computer graphics should be capable of the same kind
of intensive geometry and lighting as 3D movies.
GPU applications, synchronization, coherence
• It goes without saying that if the data you work with is already in graphical form—as would be the case in
projects that involve computer vision, like the development of autonomous vehicles or facial recognition
systems—you would do well to have GPGPUs in your servers. Other possible applications include:
Earth science: Whether it's meteorological or astronomical research, the addition of GPGPUs can speed up
the process considerably. For example, Japan's Waseda University used GIGABYTE's G221-Z30 to assemble
a computing cluster that can run simulations on how climate change will affect coastal regions. Lowell
Observatory in Arizona, USA uses GIGABYTE's G482-Z50 to filter out "stellar noise" in the search for
habitable exoplanets.
Scientific knowledge: Academic institutes and research centers are always pushing the envelope of human
understanding. GPU Servers outfitted with GPGPUs can accelerate the effort. CERN, the European
Organization for Nuclear Research, uses GIGABYTE's G482-Z51 to analyze data produced by the Large Hadron
Collider (LHC) in search of the elusive "beauty" quark. The Institute of Theoretical and Computational
Chemistry at the University of Barcelona expanded the power of its data center by about 40% with
GIGABYTE's G292-Z42 and other servers.
A better tomorrow: As the world's brightest minds pit advanced processing power against problems faced by
humanity, servers equipped with GPGPUs have a clear role to play. IFISC, Spain's Institute for Cross-
Disciplinary Physics and Complex Systems, uses GIGABYTE's G482-Z54 for important research that covers
climate change, green energy, and ways of combating COVID-19. The National Taiwan Normal University has
built a Center for Cloud Computing with G190-H44 and other servers. As long as the data can be converted
to graphical form, GPGPUs are a natural fit in the server solution.
GPU_extra
• What Does a GPU Do?
• The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for
personal and business computing. Designed for parallel processing, the GPU is used in a wide range of applications,
including graphics and video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming
more popular for use in creative production and artificial intelligence (AI).
• GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and
programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects
and realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of
GPUs to dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
• GPU and CPU: Working Together
• The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver
performance increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically
designed to accelerate computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU
vs. GPU so you can make the most of both.