0% found this document useful (0 votes)
14 views54 pages

Ca Ut5

Uploaded by

priya s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views54 pages

Ca Ut5

Uploaded by

priya s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

St.

Joseph’s Institute of Technology


Department of CSE
CS8491 – COMPUTER ARCHITECTURE / Class Notes
UNIT V - MEMORY AND I/O SYSTEMS
Memory hierarchy-Memory technologies-Cache memory-Measuring and
improving cache-performance-Virtual memory-TLBs-Accessing I/O Devices-
Interrupts-Direct memory access-Bus structure-Bus operation-Arbitration-
Interface circuits-USB.

1. MEMORY HIERARCHY:
• An ideal memory has to be fast, large and inexpensive
• A very fast memory can be implemented using static RAM chips. But, these
chips are not suitable for implementing large memories, because their basic cells
are larger and consume more power than dynamic RAM cells.
• Dynamic memory can be implemented at an affordable cost, but size has to be
small compared to the demands of larger programs.
• A solution is provided by using secondary storage, mainly magnetic disks to
provide the required memory space
• Disks are available at a reasonable cost and are used extensively in the computer
system.
• A large and faster yet affordable main memory can be built using dynamic RAM
technology
• Hence, RAM technology can be used in smaller units.
• Different types of memory units are deployed in the computer system.
• The entire computer memory can be viewed as a hierarchy
• The fastest access is to data held in processor registers.
• The processor registers are at the top in terms of speed of access.
• The next level of hierarchy is a relatively small amount of memory that can be
implemented directly on the processor chip called PROCESSOR CACHE
• Processor cache holds copies of the instructions and data stored in a much larger
memory that is provided.
• There ae often two or more levels of cache.
• A primary cache is always located on the processor chip and is referred as
LEVEL 1 (L1) cache. This cache is small and access time is faster compared to
processor registers.
• A larger and slower secondary cache cache is placed between primary cache
and the rest of the memory referred as LEVEL 2 (L2) cache .
• Some computers have LEVEL 3 (L3) cache of even larger size. L3 is
implemented using SRAM technology.
• The next level of hierarchy is the main memory. It is implemented using
dynamic memory components.
• Main memory is larger and slower than cache memories.
• Disk devices provide a very large amount of inexpensive memory and widely
used as secondary storage devices. They are very slow compared to the main
memory and is at the bottom of the hierarchy
• During program execution, the speed of memory access is of utmost
importance.
2. MEMORY TECHNOLOGIES:
• Semiconductor random – access memories (RAM) are available in a wide
range of speeds.

INTERNAL ORGANIZATION OF MEMORY CHIPS:


• Memory cells are udually organized in the form of an array, where each cell
is capable of storing one bit of information.
• Each row of cells constitutes a memory word, and all cells of a row are
connected to a common line called word line
• The cells in each column are connected to a Sense/Write circuit by two bit
lines
• During a Read operation, these circuits sense or read the information stored in
the cells selected by a word line and place this information on the output data
lines.
• During a Write operation, the Sense/Write circuits receive input data and store
them in cells of the selected word.

Organization of bit cells in a memory chip


STATIC MEMORIES:
• Memories that consists of circuits capable of retaining their state as long as
power is applied are known as static memories or static RAM (SRAM)
• Two inverters are cross – connected to form a latch, which is connected to two
bit lines by transistors T1 and T2
• These transistors act as switches that can be opened or closed under control of
the word line.
• When the word line is at ground level, the transistors are turned off and the
latch remains its state

A Static RAM Cell


DYNAMIC RAMs:
• Static Rams are fast, but their cells require several transistors.
• Less expensive and higher density RAMs can be implemented with simpler
cells, They do not retain their state for a long period, unless accessed
frequently for Read or Write operations. Memories that use such cells are
called dynamic RAMs (DRAMs)
• Information is stored in a dynamic memory cell in the form of a charge on a
capacitor, but this charge can be maintained for only tens of milliseconds.
• The cell is required to store information for a much longer time, then its
contents must be periodically refreshed by restoring the capacitor charge to its
full value.
• This occurs when the contents of the cell are read or when new information is
written
• An example of a dynamic memory cell consists of a capacitor C, and a
transistor T.
• To store information , Transistor T is turned on and a n appropriate voltage is
applied to the bit line, causes a known amount of charge to be stored in a
capacitor.

A single-transistor dynamic memory cell


• After transistor is turned off, the charge remains stored in the capacitor, but
not for long. It begins to discharge
• Information stored in the cell can be retrieved correctly only if it is read before
charge in the capacitor drops below some threshold value.
SYNCHRONOUS DRAMs
• DRAMs whose operation is synchronized with a clock signal are called
synchronous DRAMs (SDRAMs)
• SDRAMs have built – in refresh circuitry, with a refresh counter to provide
the addresses of the rows to be selected for refreshing.
• The address and data connections of an SDRAM can be buffered by means of
registers
• Sense/Write functions as latches
• A read operation causes the contents of all cells in the selected row be loaded
into the latches
• The data in the latches of the selected column are transferred into the data
registers, available on the data output pins.
• The buffer registers are useful when transferring large blocks of data at very
high speed.
• SDRAMs have different modes of operation, which can be selected by writing
control information into a mode register.
Synchronous DRAM

READ-ONLY MEMORIES (ROM):


• Both static and dynamic RAM chips are volatile, which means that they retain
information only while power is turned on
• There are many applications requiring memory devices that retain the stored
information when power is turned off
• ROM involves only reading the stored data where information can be written
into it only once at the time of manufacture
• A logic value 0 is stored in the cell if the transistor is connected to ground at
point P, otherwise 1 is stored.
• The bit line is connected through a resistor to the power supply
• To read the state of the cell, the word line is activated to close the transistor
switch.
A ROM cell
PROM:
• ROM designs allow the data to be loaded by the user, thus providing
Programmable ROM (PROM)
• It can be achieved by inserting a fuse at point P.
• Before it is programmed, the memory contains all 0s
• The user can insert 1s at the required locations by burning out the fuses at
these locations
• PROM provides flexibility, convenience and less expensive approach

EPROM:
• It allows the stored data to be erased and new data to be written into it, called
erasable reprogrammable ROM (EPROM)
• It provides higher level of convenience and are capable of retaining
information for a longer time.
• An EPROM cell has a connection to ground at point P is made through a
special transistor.
• The transistor is usually turned off, creating an open switch.

EEPROM:
• PROM that can be programmed, erased and reprogrammable electrically is
called electrically erasable PROM (EEPROM)
• Possible to erase selective contents
• Disadvantage is it requires different voltages for erasing, reading and writing
data
FLASH MEMORY:
• Flash memory is a type of electrically erasable programmable read-only
memory (EEPROM).
• Where levelling is a process that is designed to extend the life of solid state
storage devices.
• Solid state storage is made up of microchips that store data in blocks.
• Each block can tolerate a finite number of program/erase cycles before
becoming unreliable.

3.CACHE MEMORY:
• The cache is small and very fast memory between the processor and the main
memory.
• Its purpose is to make the main memory appear faster to the processor.
• The effectiveness of this approach is based on locality of reference.
• It is of two types: Temporal and Spatial
• The first means that a recently executed instruction is likely to be executed
again very soon
• The Spatial aspect means that instructions close to a recently executed
instruction are also likely to be executed soon.
• Temporal locality suggests that whenever an information is fetched it is likely
to be brought into cache, because it can be needed again soon.
• Spatial suggests instead of bringing one item at a time from the main memory
to the cache, it is better to fetch all the adjacent addresses as well.
• Cache block refers to a contiguous address location of some size
• Cache block is also called cache line.
• When a processor issues a read request, the block of memory containing the
words in a specified location is transferred into the cache.
• During next reference the data is obtained directly from the cache.
• Cache memory can store reasonable amount of data but small when compared
to main memory.
• The correspondence between the main memory and cache is specified by
mapping function
• When cache is full and word not in cache is referenced , the cache control
hardware decides which block to remove using replacement algorithm.

Use of a cache memory


CACHE HITS:
• The processor does not need to know the existence of cache
• It simply issues read and write requests using addresses that refer to locations
in the memory.
• The cache control circuitry determines whether the requested word currently
exists in the location
• If it exists in a cache location, it is a read or write hit.
• For write operation, it can happen in two ways.
• First, is to proceed using write-through protocol, both cache and main location
is updated
• Second, is to update only the cache location and to mark the block containing
it with a flag called dirty or modified bit.
• When the block containing the marked word is removed to make room for
new word, the main memory location of the word is updated called write-
back, or copy-back, protocol.
CACHE MISSES:
• If the preferred word for read operation not in cache it constitutes a Read miss
• When a write miss occurs in a computer that uses the write-through protocol,
the information is written directly into the main memory.
• The write – back protocol, the block containing the addressed word is first
bought into the cache and then the desired word in caxhe is overwritten with
the new information.

MAPPING FUNCTIONS:
• There are several possible methods for determining where memory blocks are
placed in the cache
• A cache is considered using 128 blocks of 16 words each, for a total of 2048
(2K) words, and assume that the main memory is addressable by 16 bit
address.
• The main memory has 64K words, which we will view as 4K blocks of 16
words each.

DIRECT MAPPING:
• The simplest way to determine cache locations in which to store
memory blocks is the direct-mapping technique.
• In this technique, block j of the main memory maps onto block j
modulo 128 of the cache. Thus, whenever one of the main
memory blocks 0, 128, 256, . . . is loaded into the cache, it is
stored in cache block
• Blocks 1, 129, 257, . . . are stored in cache block 1, and so on. Since more
than one memory block is mapped onto a given cache block position,
contention may arise for that position even when the cache is not full. For
example, instructions of a program may start in block 1 and continue in block
129, possibly after a branch.
Direct-mapped cache
ASSOCIATIVE MAPPING:
• The most flexible mapping method, in which a main memory block can be
placed into any cache block position.
• Tag bits are required to identify a memory block when it is resident in the
cache.
• The tag bits of an address received from the processor are compared to the tag
bits of each block of the cache to see if the desired block is present. This is
called the associative-mapping technique.
Associative-mapped cache
SET-ASSOCIATIVE MAPPING:
• Another approach is to use a combination of the direct- and associative-
mapping tech-niques.
• The blocks of the cache are grouped into sets, and the mapping allows a block
of the main memory to reside in any block of a specific set.
• the contention problem of the direct method is eased by having a few choices
for block placement.
• At the same time, the hardware cost is reduced by decreasing the size of the
associative search.
• An example of this set-associative-mapping technique is shown in Figure for
a cache with two blocks per set. In this case, memory blocks 0, 64, 128, . . . ,
4032 map into cache set 0, and they can occupy either of the two block
positions within this set. Having 64 sets means that the 6-bit set field of the
address determines which set of the cache might contain the desired block.
• The tag field of the address must then be associatively compared to the tags of
the two blocks of the set to check if the desired block is present. This two-way
associative search is simple to implement.

Set-associative mapped cache with two blocks per set


Stale Data
• When power is first turned on, the cache contains no valid data.
• A control bit, usually called the valid bit, must be provided for each cache
block to indicate whether the data in that block are valid.
• This bit should not be confused with the modified, or dirty, bit mentioned
earlier.
• The valid bits of all cache blocks are set to 0 when power is initially applied
to the system. Some valid bits may also be set to 0 when new programs or data
are loaded from the disk into the main memory.
• Data transferred from the disk to the main memory using the DMA mechanism
are usually loaded directly into the main memory, bypassing the cache.
• If the memory blocks being updated are currently in the cache, the valid bits
of the corresponding cache blocks are set to 0. As program execution proceeds,
the valid bit of a given cache block is set to 1 when a memory block is loaded
into that location.
• The processor fetches data from a cache block only if its valid bit is equal to
1. The use of the valid bit in this manner ensures that the processor will not
fetch stale data from the cache.
• data in the memory do not always reflect the changes that may have been
made in the cached copy. It is important to ensure that such stale data in the
memory are not transferred to the disk. One solution is to flush the cache, by
forcing all dirty blocks to be written back to the memory before performing
the transfer.
• The need to ensure that two different entities use identical copies of the data
is referred to as a cache-coherence problem.
REPLACEMENT ALGORITHMS
• In a direct-mapped cache, the position of each block is predetermined by its
address
• When a new block is to be brought into the cache and all the positions that it
may occupy are full, the cache controller must decide which of the old blocks
to overwrite. This is an important issue, because the decision can be a strong
determining factor in system performance.
• In general, the objective is to keep blocks in the cache that are likely to be
referenced in the near future. But, it is not easy to determine which blocks are
about to be referenced.
• The property of locality of reference in programs gives a clue to a reasonable
strategy. Because program execution usually stays in localized areas for
reasonable periods of time, there is a high probability that the blocks that have
been referenced recently will be referenced again soon. Therefore, when a block
is to be overwritten, it is sensible to overwrite the one that has gone the longest
time without being referenced. This block is called the least recently used
(LRU) block, and the technique is called the LRU replacement algorithm.

4.MEASURING AND IMPROVING CACHE PERFORMANCE


• Two key factors in the commercial success of a computer are performance and
cost
• the best possible performance for a given cost is the objective. A common
measure of success is the price/performance ratio.
• Performance depends on how fast machine instructions can be brought into the
• processor and how fast they can be executed.
• The main purpose of this hierarchy is to create a memory that the processor sees
as having a short access time and a large capacity. When a cache is used, the
processor is able to access instructions and data more quickly when the data
from the referenced memory locations are in the cache. Therefore, the extent to
which caches improve performance is dependent on how frequently the
requested instructions and data are found in the cache.

HIT RATE AND MISS PENALTY


• An excellent indicator of the effectiveness of a particular implementation of the
memory hierarchy is the success rate in accessing information at various levels
of the hierarchy.
• A successful access to data in a cache is called a hit. The number of hits stated
as a fraction of all attempted accesses is called the hit rate, and the miss rate is
the number of misses stated as a fraction of attempted accesses.
• Ideally, the entire memory hierarchy would appear to the processor as a single
memory unit that has the access time of the cache on the processor chip and the
size of the magnetic disk. High hit rates over 0.9 are essential for high-
performance computers.
• Performance is adversely affected by the actions that need to be taken when a
miss occurs.
• A performance penalty is incurred because of the extra time needed to bring a
block of data from a slower unit in the memory hierarchy to a faster unit.
• During that period, the processor is stalled waiting for instructions or data. The
waiting time depends on the details of the operation of the cache.
• The total access time seen by the processor when a miss occurs as the miss
penalty.
• Consider a system with only one level of cache. In this case, the miss penalty
consists almost entirely of the time to access a block of data in the main
memory.
• Let h be the hit rate, M the miss penalty, and C the time to access information
in the cache. Thus, the average access time experienced by the processor is
tavg = hC + (1 − h)M
• One possibility is to make the cache larger, but without increase in cost.
• Another possibility is to increase the cache block size while keeping the total
cache size constant, to take advantage of spatial locality

CACHES ON THE PROCESSOR CHIP
• When information is transferred between different chips, considerable delays
occur
• Best to implement the cache on the processor chip. Most processor chips
include at least one L1 cache.
• Often there are two separate L1 caches, one for instructions and another for
data.

• In high-performance processors, two levels of caches are normally used,


separate L1 caches for instructions and data and a larger L2 cache.
• These caches are often implemented on the processor chip.
• the L1 caches must be very fast, as they determine the memory access time seen
by the processor.
• The L2 cache can be slower, but it should be much larger than the L1 caches to
ensure a high hit rate.
• Its speed is less critical because it only affects the miss penalty of the L1 caches.
• A typical computer may have L1 caches with capacities of tens of kilobytes and
an L2 cache of hundreds of kilobytes or possibly several megabytes.
• Thus, the average access time experienced by the processor in such a system is:
tavg = h1C1 + (1 − h1)(h2C2 + (1 − h2)M )

where
h1 is the hit rate in the L1 caches.
h2 is the hit rate in the L2 cache.
C1 is the time to access information in the L1 caches.
C2 is the miss penalty to transfer information from the L2 cache to
an L1 cache
OTHER ENHANCEMENTS:
• Several other possibilities exist for enhancing performance are of three types

Write Buffer:
• When the write-through protocol is used, each Write operation results in writing
a new value into the main memory.
• If the processor must wait for the memory function to be completed then the
processor is slowed down by all Write requests.
• To improve performance, a Write buffer can be included for temporary storage
of Write requests.
• The processor places each Write request into this buffer and continues
execution of the next instruction.
• The Write requests stored in the Write buffer are sent to the main memory
whenever the memory is not responding to Read requests.
• It is important that the Read requests be serviced quickly, because the processor
usually cannot proceed before receiving the data being read from the memory.
Hence, these requests are given priority over Write requests.
• The Write buffer may hold a number of Write requests. Thus, it is possible that
a subsequent Read request may refer to data that are still in the Write buffer.
• To ensure correct operation, the addresses of data to be read from the memory
are always compared with the addresses of the data in the Write buffer. In the
case of a match, the data in the Write buffer are used.

Prefetching:
• Cache mechanism follows, new data are brought into the cache when they are
first needed. Following a Read miss, the processor has to pause until the new
data arrive, thus incurring a miss penalty.
• To avoid stalling the processor, it is possible to prefetch the data into the cache
before they are needed.
• The simplest way to do this is through software. A special prefetch instruction
may be provided in the instruction set of the processor.
• Executing this instruction causes the addressed data to be loaded into the cache,
as in the case of a Read miss.
• A prefetch instruction is inserted in a program to cause the data to be loaded in
the cache shortly before they are needed in the program.
• Then, the processor will not have to wait for the referenced data as in the case
of a Read miss.
• The hope is that prefetching will take place while the processor is busy
executing instructions that do not result in a Read miss, thus allowing accesses
to the main memory to be overlapped with computation in the processor.
• Prefetch instructions can be inserted into a program either by the programmer
or by the compiler.

Lockup-Free Cache:
• Software prefetching does not work well if it interferes significantly with the
normal execution of instructions. This is the case if the action of prefetching
stops other accesses to the cache until the prefetch is completed.
• While servicing a miss, the cache is said to be locked. This problem can be
solved by modifying the basic cache structure to allow the processor to access
the cache while a miss is being serviced.
• It is possible to have more than one outstanding miss, and the hardware must
accommodate such occurrences.
• A cache that can support multiple outstanding misses is called lockup-free. Such
a cache must include circuitry that keeps track of all outstanding misses.
• This may be done with special registers that hold the pertinent information
about these misses.

5.VIRTUAL MEMORY AND TLB’S


• In most modern computer systems, the physical main memory is not as large as
the address space of the processor.
• If a program does not completely fit into the main memory, the parts of it not
currently being executed are stored on a secondary storage device, typically a
magnetic disk.
• As these parts are needed for execution, they must first be brought into the main
memory, possibly replacing other parts that are already in the memory.
• These actions are performed automatically by the operating system, using a
scheme known as virtual memory.
• Application programmers need not be aware of the limitations imposed by the
available main memory. They prepare programs using the entire address space
of the processor.
• The binary addresses that the processor issues for either instructions or data are
called virtual or logical addresses.
• These addresses are translated into physical addresses by a combination of
hardware and software actions.
• If a virtual address refers to a part of the program or data space that is currently
in the physical memory, then the contents of the appropriate location in the main
memory are accessed immediately.
• Otherwise, the contents of the referenced address must be brought into a
suitable location in the memory before they can be used.
• A special hardware unit, called the Memory Management Unit (MMU), keeps
track of which parts of the virtual address space are in the physical memory.
• When the desired data or instructions are in the main memory, the MMU
translates the virtual address into the corresponding physical address

Virtual memory organization


ADDRESS TRANSLATION
• A simple method for translating virtual addresses into physical addresses is to
assume that all programs and data are composed of fixed-length units called
pages, each of which consists of a block of words that occupy contiguous
locations in the main memory.
• They constitute the basic unit of information that is transferred between the
main memory and the disk whenever the MMU determines that a transfer is
required.
• Pages should not be too small, because the access time of a magnetic disk is
much longer than the access time of the main memory.
• If pages are too large, it is possible that a substantial portion of a page may not
be used, yet this unnecessary data will occupy valuable space in the main
memory.
• The virtual-memory mechanism bridges the size and speed gaps between the
main memory and secondary storage and is usually implemented in part by
software techniques.
• An instruction fetch or an operand load/store operation, is interpreted as a
virtual page number followed by an offset that specifies the location of a
particular byte (or word) within a page.
• Information about the main memory location of each page is kept in a page
table.
• This information includes the main memory address where the page is stored
and the current status of the page.
• An area in the main memory that can hold one page is called a page frame.
• The starting address of the page table is kept in a page table base register.
• By adding the virtual page number to the contents of this register, the address
of the corresponding entry in the page table is obtained. The contents of this
location give the starting address of the page if that page currently resides in the
main memory.
• Each entry in the page table also includes some control bits that describe the
status of the page while it is in the main memory
Virtual-memory address translation

Translation Lookaside Buffer


• The page table information is used by the MMU for every read and write access.
• the page table should be situated within the MMU. Unfortunately, the page table
may be rather large. Since the MMU is normally implemented as part of the
processor chip, it is impossible to include the complete table within the MMU.
• Instead, a copy of only a small portion of the table is accommodated within the
MMU, and the complete table is kept in the main memory.
• The portion maintained within the MMU consists of the entries corresponding
to the most recently accessed pages.
• They are stored in a small table, usually called the Translation Lookaside Buffer
(TLB).
• The TLB functions as a cache for the page table in the main memory

Page Faults
• When a program generates an access request to a page that is not in the main
memory, a page fault is said to have occurred.
• The entire page must be brought from the disk into the memory before access
can proceed.
• When it detects a page fault, the MMU asks the operating system to intervene
by raising an exception (interrupt).
• Processing of the program that generated the page fault is interrupted, and
control is transferred to the operating system.
• The operating system copies the requested page from the disk into the main
memory.
• Since this process involves a long delay, the operating system may begin
execution of another program whose pages are in the main memory.
• When page transfer is completed, the execution of the interrupted program is
resumed.
6.Accessing I/O Devices
• The components of a computer system communicate with each other
through an interconnection network
• The interconnection network consists of circuits needed to transfer
information between the processor, the memory unit, and a number of
I/O devices.
• Load and Store instructions use addressing modes to generate effective
addresses that identify the desired locations.
• idea of using addresses to access various locations in the memory can
be extended to deal with the I/O devices
• each I/O device must appear to the processor as consisting of some
addressable locations, just like the memory.
• Some addresses in the address space of the processor are assigned to
these I/O locations, rather than to the main memory.
• These locations are usually implemented as bit storage circuits
organized in the form of registers called I/O registers.
• Since the I/O devices and the memory share the same address space, this
arrangement is called memory-mapped I/O.
• With memory-mapped I/O, any machine instruction that can access
memory can be used to transfer data to or from an I/O device.

A computer system
• For example, if DATAIN is the address of a register in an input device,
the instruction
Load R2, DATAIN
• reads the data from the DATAIN register and loads them into processor
register R2. Simi- larly, the instruction
Store R2, DATAOUT
• sends the contents of register R2 to location DATAOUT, which is a
register in an output device.

I/O Device Interface:


• An I/O device is connected to the interconnection network by using a
circuit, called the device interface, which provides the means for data
transfer.
• The interface includes some registers that can be accessed by the
processor.
• One register may serve as a buffer for data transfers, another may hold
information about the current status of the device, and yet another may
store the information that controls the operational behavior of the
device.
• These data, status, and control registers are accessed by program
instructions as if they were memory locations.
• Typical transfers of information are between I/O registers and the
registers in the processor.

Program-Controlled I/O:
• Consider a task that reads characters typed on a keyboard, stores these
data in the memory, and displays the same characters on a display
screen.
• A simple way of implementing this task is to write a program that
performs all functions needed to realize the desired action. This method
is known as program-controlled I/O.
• In addition to transferring each character from the keyboard into the
memory, and then to the display, it is necessary to ensure that this
happens at the right time. An input character must be read in response
to a key being pressed. For output, a character must be sent to the display
only when the display device is able to accept it.
• The rate of data transfer from the keyboard to a computer is limited by
the typing speed of the user, which is unlikely to exceed a few characters
per second. The rate of output transfers from the computer to the display
is much higher.
• It is determined by the rate at which characters can be transmitted to and
displayed on the display device
• The difference in speed between the processor and I/O devices creates
the need for mechanisms to synchronize the transfer of data between
them.

• One solution to this problem involves a signaling protocol.


• KBD_DATA be the address label of an 8-bit register that holds the
generated character. Also, let a signal indicating that a key has been
pressed be provided by setting to 1 a flip-flop called KIN, which is a part
of an eight-bit status register, KBD_STATUS.
• The processor can read the status flag KIN to determine when a character
code has been placed in KBD_DATA. When the processor reads the
status flag to determine its state, we say that the processor polls the I/O
device.
• The display includes an 8-bit register, which we will call DISP_DATA,
used to receive characters from the processor. It also must be able to
indicate that it is ready to receive the next character, this can be done by
using a status flag called DOUT, which is one bit in a status register,
DISP_STATUS.

7.Interrupts:
• the program enters a wait loop in which it repeatedly tests the device
status.
• During this period, the processor is not performing any useful
computation.
• There are many situations where other tasks can be performed while
waiting for an I/O device to become ready.
• To allow this to happen, the I/O device to alert the processor when it
becomes ready.
• It can done by sending a hardware signal called an interrupt request to
the processor.
• Since the processor is no longer required to continuously poll the
status of I/O devices, it can use the waiting period to perform other
useful tasks.
• The routine executed in response to an interrupt request is called the
interrupt-service routine, which is the DISPLAY routine .
• The return address must be saved either in a designated general-
purpose register or on the processor stack.
• the processor must inform the device that its request has been
recognized so that it may remove its interrupt-request signal.
• This can be accomplished by means of a special control signal, called
interrupt acknowledge
• treatment of an interrupt-service routine is very similar to that of a
subroutine.
• A subroutine performs a function required by the program from which
it is called.
• The task of saving and restoring information can be done
automatically by the processor or by program instructions.
• Most modern processors save only the minimum amount of
information needed to maintain the integrity of program execution.
• This is because the process of saving and restoring registers involves
memory transfers that increase the total execution time, and hence
represent execution overhead. Saving registers also increases the delay
between the time an interrupt request is received and the start of
execution of the interrupt-service routine. This delay is called interrupt
latency.
• One saves all register contents, and the other does not. A particular I/O
device may use either type, depending upon its response- time
requirements.
• Another approach is to provide duplicate sets of processor registers.
• a different set of registers can be used by the interrupt-service routine,
thus eliminating the need to save and restore registers. The duplicate
registers are sometimes called the shadow registers.

Enabling and Disabling Interrupts:


• The facilities provided in a computer must give the programmer
complete control over the events that take place during program
execution.
• The arrival of an interrupt request from an external device causes the
processor to suspend the execution of one program and start the
execution of another.
• Because interrupts can arrive at any time, they may alter the sequence
of events from that envisaged by the programmer.
• Hence, the interruption of program execution must be carefully
controlled. A fundamental facility found in all computers is the ability
to enable and disable such interruptions as desired.
• A commonly used mechanism to achieve this is to use some control bits
in registers that can be accessed by program instructions.
• The processor has a status register (PS), which contains information
about its current state of operation.
1. The device raises an interrupt request.
2. The processor interrupts the program currently being executed
and saves the contents of the PC and PS registers.
3. Interrupts are disabled by clearing the IE bit in the PS to 0.
4. The action requested by the interrupt is performed by the
interrupt-service routine, during which time the device is
informed that its request has been recognized, and in response, it
deactivates the interrupt-request signal.
5. Upon completion of the interrupt-service routine, the saved
contents of the PC and PS registers are restored and execution of
the interrupted program is resumed.

Handling Multiple Devices:


• the situation where a number of devices capable of initiating interrupts
are connected to the processor.
• When an interrupt request is received it is necessary to identify the
particular device that raised the request.
• if two devices raise interrupt requests at the same time, it must be
possible to break the tie and select one of the two requests for service.
• When the interrupt-service routine for the selected device has been
completed, the second request can be serviced.
• The information needed to determine whether a device is requesting an
interrupt is available in its status register.
• When the device raises an interrupt request, it sets to 1 a bit in its status
register

Vectored Interrupts:
• To reduce the time involved in the polling process, a device requesting
an interrupt may identify itself directly to the processor.
• Then, the processor can immediately start executing the corresponding
interrupt-service routine.
• The term vectored interrupts refers to interrupt-handling schemes based
on this approach.
• A commonly used scheme is to allocate permanently an area in the
memory to hold the addresses of interrupt-service routines.
• These addresses are usually referred to as interrupt vectors, and they are
said to constitute the interrupt-vector table

Controlling I/O Device Behavior:


• It is important to ensure that interrupt requests are generated only by
those I/O devices that the processor is currently willing to recognize.
• Hence, a mechanism is needed in the interface circuits of individual
devices to control whether a device is allowed to interrupt the processor.
• The control needed is usually provided in the form of an interrupt-enable
bit in the device’s interface circuit.
Processor Control Registers:
• To deal with interrupts it is useful to have some control registers.
• there are four processor control registers. The status register, PS,
includes the interrupt-enable bit, IE
• the processor will accept interrupts only when this bit is set to 1. The
IPS register is used to automatically save the contents of PS when an
interrupt request is received and accepted.
• At the end of the interrupt-service routine, the previous state of the
processor is automatically restored by transferring the contents of IPS
into PS.
• Since there is only one register available for storing the previous status
information, it becomes necessary to save the contents of IPS on the
stack if nested interrupts are allowed.
• The IENABLE register allows the processor to selectively respond to
individual I/O devices.
• The IPENDING register indicates the active interrupt requests
Exceptions:
• An interrupt is an event that causes the execution of one program to be
suspended and the execution of another program to begin.
• The term exception is often used to refer to any event that causes an
interruption.

Recovery from Errors:


• Computers use a variety of techniques to ensure that all hardware
components are operating properly.
• many computers include an error-checking code in the main memory,
which allows detection of errors in the stored data.
• If an error occurs, the control hardware detects it and informs the
processor by raising an interrupt.
• The processor may also interrupt a program if it detects an error or an
unusual condition while executing the instructions of this program.

Debugging:

• System software usually includes a program called a debugger, which


helps the programmer find errors in a program.
• The debugger uses exceptions to provide two important facilities: trace
mode and breakpoints

Use of Exceptions in Operating Systems:


• The operating system (OS) software coordinates the activities within a
computer.
• It uses exceptions to communicate with and control the execution of user
programs.
• It uses hardware interrupts to perform I/O operations
8.DIRECT MEMORY ACCESS:
• Blocks of data are often transferred between the main memory and I/O devices
• Data are transferred from an I/O device to the memory by first reading them
from the I/O device using an instruction such as which loads the data into a
processor register.
Load R2, DATAIN
• The data read are stored into a memory location.
• The reverse process takes place for transferring data from the memory to an
I/O device.
• An instruction to transfer input or output data is executed only after the
processor determines that the I/O device is ready, either by polling its status
register or by waiting for an interrupt request.
• When transferring a block of data, instructions are needed to increment the
memory address and keep track of the word count.
• The use of interrupts involves operating system routines which incur additional
overhead to save and restore processor registers, the program counter
• An alternative approach is used to transfer blocks of data directly between the
main memory and I/O devices
• A special control unit is provided to manage the transfer, without continuous
intervention by the processor. This approach is called direct memory access, or
DMA.
• The unit that controls DMA transfers is referred to as a DMA controller. It may
be part of the I/O device interface, or it may be a separate unit shared by a
number of I/O devices.
• The DMA controller performs the functions that would normally be carried out
by the processor when accessing the main memory.
• a DMA controller transfers data without intervention by the processor, its
operation must be under the control of a program executed by the processor,
usually an operating system routine.
• The DMAcontroller then proceeds to perform the requested operation. When
the entire block has been transferred, it informs the processor by raising an
interrupt.
• Two registers are used for storing the starting address and the word count. The
third register contains status and control flags.
• The R/W bit determines the direction of the transfer. When this bit is set to 1
by a program instruction, the controller performs a Read operation, that is, it
transfers data from the memory to the I/O device. Otherwise, it performs a
Write operation.
• To start a DMA transfer of a block of data from the main memory to one
of the disks, an OS routine writes the address and word count information
into the registers of the disk controller.
• The DMA controller proceeds independently to implement the specified
operation.
• When the transfer is completed, this fact is recorded in the status and
control register of the DMA channel by setting the Done bit.
• At the same time, if the IE bit is set, the controller sends an interrupt
request to the processor and sets the IRQ bit.
• The status register may also be used to record other information, such as
whether the transfer took place correctly or errors occurred
9.BUS STRUCTURE:
• The bus consists of three sets of lines used to carry address, data,
and control signals. I/O device interfaces are connected to these
lines
• Each I/O device is assigned a unique set of addresses for the
registers in its interface.
• When the processor places a particular address on the address
lines, it is examined by the address decoders of all devices on the
bus
• The device that recognizes this address responds to the
commands issued on the control lines.
• The processor uses the control lines to request either a Read
or a Write operation, and the requested data are transferred
over the data lines.

10.BUS OPEARTION:
• A bus requires a set of rules, often called a bus protocol
• The bus protocol determines when a device may place information on the
bus, when it may load the data on the bus into one of its registers
• These rules are implemented by control signals
• One control line, usually labelled R/W, specifies whether a Read or a
Write operation is to be performed.
•• it
The
at specifies
data bus
which
from Read
control
the
the when
lines
processor
data set
also
lines.andtothe
1 and
carry I/O Write
timing when
mayset
information.
devices to 0.data
They
place specify
on orthe times
receive
• one device plays the role of a master. This is the device that initiates data
transfers by issuing Read or Write commands on the bus.
• The device addressed by the master is referred to as a slave.

SYNCHRONOUS BUS:
• On a synchronous bus, all devices derive timing information from a
control line called the bus clock
• The signal on this line has two phases: a high level followed by a low
level. The two phases constitute a clock cycle.
• The first half of the cycle between the low-to-high and high-to-low
transitions is often referred to as a clock pulse.

• During clock cycle 1, the master sends address and command information
on the bus, requesting a Read operation.
• The slave receives this information and decodes it. It begins to access the
requested data on the active edge of the clock at the beginning of clock
cycle 2.
• The data become ready and are placed on the bus during clock cycle 3.
• The slave asserts a control signal called Slave-ready at the same time.
• The master, which has been waiting for this signal, loads the data into its
register at the end of the clock cycle.
• The slave removes its data signals from the bus and returns its Slave-ready
signal to the low level at the end of cycle 3.
• The bus transfer operation is now complete, and the master may send new
address and command signals to start a new transfer in clock cycle 4.

ASYNCHRONOUS BUS:

• An alternative scheme for controlling data transfers on a bus is based on


the use of a handshake protocol between the master and the slave.
• A handshake is an exchange of command and response signals between
the master and the slave.
• A control line called Master-ready is asserted by the master to indicate
that it is ready to start a data transfer.
• The Slave responds by asserting Slave-ready.
• t0—The master places the address and command information on the bus,
and all devices on the bus decode this information.
• t1—The master sets the Master-ready line to 1 to inform the devices that
the address and command information is ready. The delay t1 − t0 is
intended to allow for any skew that may occur on the bus. Skew occurs
when two signals transmitted simultaneously from one source arrive at the
destination at different times.
• t2—The selected slave, having decoded the address and command
information, performs the required input operation by placing its data on
the data lines. At the same time, it sets the Slave-ready signal to 1.
• t3—The Slave-ready signal arrives at the master, indicating that the input
data are available on the bus.
• t4—The master removes the address and command information from the
bus.
• t5—When the device interface receives the 1-to-0 transition of the
Master-ready signal, it removes the data and the Slave-ready signal from
the bus. This completes the input transfer.
11.ARBITRATION:
• There are occasions when two or more entities contend for the use of a
single resource in a computer system.
• When two devices may need to access a given slave at the same time. In
such cases, it is necessary to decide which device will access the slave
first.
• The decision is usually made in an arbitration process performed by an
arbiter circuit.
• The arbitration process starts by each device sending a request to use the
shared resource.
• The arbiter associates priorities with individual requests. If it receives
two requests at the same time, it grants the use of the slave to the device
having the higher priority first.
• In the arbitration process, a single bus is the shared resource. The device
that initiates data transfer requests on the bus is the bus master.
• Since the bus is a single shared facility, it is essential to provide orderly
access to it by the bus masters.
• A device that wishes to use the bus sends a request to the arbiter.
• When multiple requests arrive at the same time, the arbiter selects one
request and grants the bus to the corresponding device. For some
devices, a delay in gaining access to the bus may lead to an error. Such
devices must be given high priority.
• If there is no particular urgency among requests, the arbiter may grant
the bus using a simple round-robin scheme.

12.INTERFACE CIRCUITS:
• The I/O interface of a device consists of the circuitry needed to connect
that device to the bus.
• On one side of the interface are the bus lines for address, data, and control.
• On the other side are the connections needed to transfer data between the
interface and the I/Odevice. This side is called a port, and it can be either
a parallel or a serial port.
• A parallel port transfers multiple bits of data simultaneously to or from
the device.
• A serial port sends and receives data one bit at a time.
• An I/O interface does the following:
1. Provides a register for temporary storage of data
2. Includes a status register containing status information that can be
accessed by the processor
3. Includes a control register that holds the information governing the
behavior of the interface
4. Contains address-decoding circuitry to determine when it is being
addressed by the processor
5. Generates the required timing signals
6. Performs any format conversion that may be necessary to transfer
data between the processor and the I/O device, such as parallel-to-
serial conversion in the case of a serial port

PARALLEL INTERFACE
• An interface circuit for an 8-bit input port that can be used for connecting
a simple input device, such as a keyboard.
• an interface circuit for an 8-bit output port, which can be used with an
output device such as a display.
• interface circuits are connected to a 32-bit processor that uses memory-
mapped I/O and the asynchronous bus protocol

Input Interface
• There are only two registers: a data register, KBD_DATA, and a status
register, KBD_STATUS. The latter contains the keyboard status flag,
KIN.
• A typical keyboard consists of mechanical switches that are normally
open. When a key is pressed, its switch closes and establishes a path for
an electrical signal
• A difficulty with such mechanical pushbutton switches is that the
contacts bounce when a key is pressed, resulting in the electrical
connection being made then broken several times before the switch
settles in the closed position
• The software detects that a key has been pressed when it observes that
the keyboard status flag, KIN, has been set to 1.
• The I/O routine can then introduce sufficient delay before reading the
contents of the input buffer, KBD_DATA, to ensure that bouncing has
subsided.
• When debouncing is implemented in hardware, the I/O routine can read
the input character as soon as it detects that KIN is equal to 1.
• consists of one byte of data representing the encoded character and one
control signal called Valid. When a key is pressed, the Valid signal
changes from 0 to 1
• The status flag is cleared to 0 when the processor reads the contents of
the KBD_DATA register.
• The interface circuit is connected to an asynchronous bus on which
transfers are controlled by the handshake signals Master-ready and
Slave-ready
• The bus has one other control line, R/W, which indicates a Read
operation when equal to 1.

Output Interface
• can be used to connect an output device such as a display
• the display uses two handshake signals, New-data and Ready, in a
manner similar to the handshake between the bus signals Master-ready
and Slave-ready.
• When the display is ready to accept a character, it asserts its Ready
signal, which causes the DOUT flag in the DISP_STATUS register to be
set to 1.
• When the I/O routine checks DOUT and finds it equal to 1, it sends a
character to DISP_DATA.
• This clears the DOUT flag to 0 and sets the New-data signal to 1. In
response, the display returns Ready to 0 and accepts and displays the
character in DISP_DATA

SERIAL INTERFACE
• A serial interface is used to connect the processor to I/O devices that
transmit data one bit at a time.
• Data are transferred in a bit-serial fashion on the device side and in a bit-
parallel fashion on the processor side.
• The transformation between the parallel and serial formats is achieved
with shift registers that have parallel access capability.
• The input shift register accepts bit-serial input from the I/O device.
• When all 8 bits of data have been received, the contents of this shift
register are loaded in parallel into the DATAIN register.
• output data in the DATAOUT register are transferred to the output shift
register, from which the bits are shifted out and sent to the I/O device.
• The part of the interface that deals with the bus is the same as in the
parallel interface described earlier.
• Two status flags, which we will refer to as SIN and SOUT, are
maintained by the Status and control block.
• The SIN flag is set to 1 when new data are loaded into DATAIN from
the shift register, and cleared to 0 when these data are read by the
processor.
• The SOUT flag indicates whether the DATAOUT register is available. It
is cleared to 0 when the processor writes new data into DATAOUT and
set to 1 when data are transferred from DATAOUT to the output shift
register.
• The double buffering used in the input and output paths in Figure 7.15 is
important. It is possible to implement DATAIN and DATAOUT
themselves as shift registers, thus obviating the need for separate shift
registers

13.UNIVERSAL SERIAL BUS:


• The Universal Serial Bus (USB) is the most widely used interconnection
standard.
• A large variety of devices are available with a USB connector, including
mice, memory keys, disk drives, printers, and cameras
• The commercial success of the USB is due to its simplicity and low cost.
• The original USB specification supports two speeds of operation, called
low-speed (1.5 Megabits/s) and full-speed (12 Megabits/s).
• It supports data transfer rates up to 5 Gigabits/s.
• The USB has been designed to meet several key objectives:
o Provide a simple, low-cost, and easy to use interconnection system
o Accommodate a wide range of I/O devices and bit rates, including
Internet connections, and audio and video applications
o Enhance user convenience through a “plug-and-play” mode of
operation

DEVICE CHARACTERISTICS:
• The kinds of devices that may be connected to a computer cover a wide
range of functionality.
• The speed, volume, and timing constraints associated with data transfers
to and from these devices vary significantly.
• one byte of data is generated every time a key is pressed, which may
happen at any time.
• These data should be transferred to the computer promptly.
• The event of pressing a key is not synchronized to any other event in a
computer system, the data generated by the keyboard are called
asynchronous
• The sampling process yields a continuous stream of digitized samples
that arrive at regular intervals, synchronized with the sampling clock.
• Such a data stream is called isochronous, meaning that successive events
are separated by equal periods of time.
• A signal must be sampled quickly enough to track its highest-frequency
components.

Plug-and-Play
• When an I/O device is connected to a computer, the operating system
needs some information about it. It needs to know what type of device it
is so that it can use the appropriate device driver.
• It also needs to know the addresses of the registers in the device’s
interface to be able to communicate with it.
• The USB standard defines both the USB hardware and the software that
communicates with it. Its plug-and-play feature means that when a new
device is connected, the system detects its existence automatically

USB Architecture
• The USB uses point-to-point connections and a serial transmission
format.
• When multiple devices are connected, they are arranged in a tree
structure
• Each node of the tree has a device called a hub, which acts as an
intermediate transfer point between the host computer and the I/O
devices.
• At the root of the tree, a root hub connects the entire tree to the host
computer.
• The leaves of the tree are the I/O devices: a mouse, a keyboard, a printer,
an Internet connection, a camera, or a speaker.
• The tree structure makes it possible to connect many devices using
simple point-to-point serial links

Electrical Characteristics
• USB connections consist of four wires, of which two carry power, +5 V
and Ground, and two carry data.
• I/O devices that do not have large power requirements can be powered
directly from the USB.
• Two methods are used to send data over a USB cable.
• When sending data at low speed, a high voltage relative to Ground is
transmitted on one of the two data wires to represent a 0 and on the other
to represent a 1.
• The Ground wire carries the return current in both cases. Such a scheme
in which a signal is injected on a wire relative to ground is referred to as
single-ended transmission.
• The speed at which data can be sent on any cable is limited by the amount
of electrical noise present.
• The term noise refers to any signal that interferes with the desired data
signal and hence could cause errors.
• Single-ended transmission is highly susceptible to noise.
• The voltage on the ground wire is common to all the devices connected to
the computer.
• Signals sent by one device can cause small variations in the voltage on the
ground wire, and hence can interfere with signals sent by another device.
• Interference can also be caused by one wire picking up noise from nearby
wires.
• The High-Speed USB uses an alternative arrangement known as
differential signaling.
• The data signal is injected between two data wires twisted together.

You might also like