Ca Ut5
Ca Ut5
1. MEMORY HIERARCHY:
• An ideal memory has to be fast, large and inexpensive
• A very fast memory can be implemented using static RAM chips. But, these
chips are not suitable for implementing large memories, because their basic cells
are larger and consume more power than dynamic RAM cells.
• Dynamic memory can be implemented at an affordable cost, but size has to be
small compared to the demands of larger programs.
• A solution is provided by using secondary storage, mainly magnetic disks to
provide the required memory space
• Disks are available at a reasonable cost and are used extensively in the computer
system.
• A large and faster yet affordable main memory can be built using dynamic RAM
technology
• Hence, RAM technology can be used in smaller units.
• Different types of memory units are deployed in the computer system.
• The entire computer memory can be viewed as a hierarchy
• The fastest access is to data held in processor registers.
• The processor registers are at the top in terms of speed of access.
• The next level of hierarchy is a relatively small amount of memory that can be
implemented directly on the processor chip called PROCESSOR CACHE
• Processor cache holds copies of the instructions and data stored in a much larger
memory that is provided.
• There ae often two or more levels of cache.
• A primary cache is always located on the processor chip and is referred as
LEVEL 1 (L1) cache. This cache is small and access time is faster compared to
processor registers.
• A larger and slower secondary cache cache is placed between primary cache
and the rest of the memory referred as LEVEL 2 (L2) cache .
• Some computers have LEVEL 3 (L3) cache of even larger size. L3 is
implemented using SRAM technology.
• The next level of hierarchy is the main memory. It is implemented using
dynamic memory components.
• Main memory is larger and slower than cache memories.
• Disk devices provide a very large amount of inexpensive memory and widely
used as secondary storage devices. They are very slow compared to the main
memory and is at the bottom of the hierarchy
• During program execution, the speed of memory access is of utmost
importance.
2. MEMORY TECHNOLOGIES:
• Semiconductor random – access memories (RAM) are available in a wide
range of speeds.
EPROM:
• It allows the stored data to be erased and new data to be written into it, called
erasable reprogrammable ROM (EPROM)
• It provides higher level of convenience and are capable of retaining
information for a longer time.
• An EPROM cell has a connection to ground at point P is made through a
special transistor.
• The transistor is usually turned off, creating an open switch.
EEPROM:
• PROM that can be programmed, erased and reprogrammable electrically is
called electrically erasable PROM (EEPROM)
• Possible to erase selective contents
• Disadvantage is it requires different voltages for erasing, reading and writing
data
FLASH MEMORY:
• Flash memory is a type of electrically erasable programmable read-only
memory (EEPROM).
• Where levelling is a process that is designed to extend the life of solid state
storage devices.
• Solid state storage is made up of microchips that store data in blocks.
• Each block can tolerate a finite number of program/erase cycles before
becoming unreliable.
3.CACHE MEMORY:
• The cache is small and very fast memory between the processor and the main
memory.
• Its purpose is to make the main memory appear faster to the processor.
• The effectiveness of this approach is based on locality of reference.
• It is of two types: Temporal and Spatial
• The first means that a recently executed instruction is likely to be executed
again very soon
• The Spatial aspect means that instructions close to a recently executed
instruction are also likely to be executed soon.
• Temporal locality suggests that whenever an information is fetched it is likely
to be brought into cache, because it can be needed again soon.
• Spatial suggests instead of bringing one item at a time from the main memory
to the cache, it is better to fetch all the adjacent addresses as well.
• Cache block refers to a contiguous address location of some size
• Cache block is also called cache line.
• When a processor issues a read request, the block of memory containing the
words in a specified location is transferred into the cache.
• During next reference the data is obtained directly from the cache.
• Cache memory can store reasonable amount of data but small when compared
to main memory.
• The correspondence between the main memory and cache is specified by
mapping function
• When cache is full and word not in cache is referenced , the cache control
hardware decides which block to remove using replacement algorithm.
MAPPING FUNCTIONS:
• There are several possible methods for determining where memory blocks are
placed in the cache
• A cache is considered using 128 blocks of 16 words each, for a total of 2048
(2K) words, and assume that the main memory is addressable by 16 bit
address.
• The main memory has 64K words, which we will view as 4K blocks of 16
words each.
DIRECT MAPPING:
• The simplest way to determine cache locations in which to store
memory blocks is the direct-mapping technique.
• In this technique, block j of the main memory maps onto block j
modulo 128 of the cache. Thus, whenever one of the main
memory blocks 0, 128, 256, . . . is loaded into the cache, it is
stored in cache block
• Blocks 1, 129, 257, . . . are stored in cache block 1, and so on. Since more
than one memory block is mapped onto a given cache block position,
contention may arise for that position even when the cache is not full. For
example, instructions of a program may start in block 1 and continue in block
129, possibly after a branch.
Direct-mapped cache
ASSOCIATIVE MAPPING:
• The most flexible mapping method, in which a main memory block can be
placed into any cache block position.
• Tag bits are required to identify a memory block when it is resident in the
cache.
• The tag bits of an address received from the processor are compared to the tag
bits of each block of the cache to see if the desired block is present. This is
called the associative-mapping technique.
Associative-mapped cache
SET-ASSOCIATIVE MAPPING:
• Another approach is to use a combination of the direct- and associative-
mapping tech-niques.
• The blocks of the cache are grouped into sets, and the mapping allows a block
of the main memory to reside in any block of a specific set.
• the contention problem of the direct method is eased by having a few choices
for block placement.
• At the same time, the hardware cost is reduced by decreasing the size of the
associative search.
• An example of this set-associative-mapping technique is shown in Figure for
a cache with two blocks per set. In this case, memory blocks 0, 64, 128, . . . ,
4032 map into cache set 0, and they can occupy either of the two block
positions within this set. Having 64 sets means that the 6-bit set field of the
address determines which set of the cache might contain the desired block.
• The tag field of the address must then be associatively compared to the tags of
the two blocks of the set to check if the desired block is present. This two-way
associative search is simple to implement.
where
h1 is the hit rate in the L1 caches.
h2 is the hit rate in the L2 cache.
C1 is the time to access information in the L1 caches.
C2 is the miss penalty to transfer information from the L2 cache to
an L1 cache
OTHER ENHANCEMENTS:
• Several other possibilities exist for enhancing performance are of three types
Write Buffer:
• When the write-through protocol is used, each Write operation results in writing
a new value into the main memory.
• If the processor must wait for the memory function to be completed then the
processor is slowed down by all Write requests.
• To improve performance, a Write buffer can be included for temporary storage
of Write requests.
• The processor places each Write request into this buffer and continues
execution of the next instruction.
• The Write requests stored in the Write buffer are sent to the main memory
whenever the memory is not responding to Read requests.
• It is important that the Read requests be serviced quickly, because the processor
usually cannot proceed before receiving the data being read from the memory.
Hence, these requests are given priority over Write requests.
• The Write buffer may hold a number of Write requests. Thus, it is possible that
a subsequent Read request may refer to data that are still in the Write buffer.
• To ensure correct operation, the addresses of data to be read from the memory
are always compared with the addresses of the data in the Write buffer. In the
case of a match, the data in the Write buffer are used.
Prefetching:
• Cache mechanism follows, new data are brought into the cache when they are
first needed. Following a Read miss, the processor has to pause until the new
data arrive, thus incurring a miss penalty.
• To avoid stalling the processor, it is possible to prefetch the data into the cache
before they are needed.
• The simplest way to do this is through software. A special prefetch instruction
may be provided in the instruction set of the processor.
• Executing this instruction causes the addressed data to be loaded into the cache,
as in the case of a Read miss.
• A prefetch instruction is inserted in a program to cause the data to be loaded in
the cache shortly before they are needed in the program.
• Then, the processor will not have to wait for the referenced data as in the case
of a Read miss.
• The hope is that prefetching will take place while the processor is busy
executing instructions that do not result in a Read miss, thus allowing accesses
to the main memory to be overlapped with computation in the processor.
• Prefetch instructions can be inserted into a program either by the programmer
or by the compiler.
Lockup-Free Cache:
• Software prefetching does not work well if it interferes significantly with the
normal execution of instructions. This is the case if the action of prefetching
stops other accesses to the cache until the prefetch is completed.
• While servicing a miss, the cache is said to be locked. This problem can be
solved by modifying the basic cache structure to allow the processor to access
the cache while a miss is being serviced.
• It is possible to have more than one outstanding miss, and the hardware must
accommodate such occurrences.
• A cache that can support multiple outstanding misses is called lockup-free. Such
a cache must include circuitry that keeps track of all outstanding misses.
• This may be done with special registers that hold the pertinent information
about these misses.
Page Faults
• When a program generates an access request to a page that is not in the main
memory, a page fault is said to have occurred.
• The entire page must be brought from the disk into the memory before access
can proceed.
• When it detects a page fault, the MMU asks the operating system to intervene
by raising an exception (interrupt).
• Processing of the program that generated the page fault is interrupted, and
control is transferred to the operating system.
• The operating system copies the requested page from the disk into the main
memory.
• Since this process involves a long delay, the operating system may begin
execution of another program whose pages are in the main memory.
• When page transfer is completed, the execution of the interrupted program is
resumed.
6.Accessing I/O Devices
• The components of a computer system communicate with each other
through an interconnection network
• The interconnection network consists of circuits needed to transfer
information between the processor, the memory unit, and a number of
I/O devices.
• Load and Store instructions use addressing modes to generate effective
addresses that identify the desired locations.
• idea of using addresses to access various locations in the memory can
be extended to deal with the I/O devices
• each I/O device must appear to the processor as consisting of some
addressable locations, just like the memory.
• Some addresses in the address space of the processor are assigned to
these I/O locations, rather than to the main memory.
• These locations are usually implemented as bit storage circuits
organized in the form of registers called I/O registers.
• Since the I/O devices and the memory share the same address space, this
arrangement is called memory-mapped I/O.
• With memory-mapped I/O, any machine instruction that can access
memory can be used to transfer data to or from an I/O device.
A computer system
• For example, if DATAIN is the address of a register in an input device,
the instruction
Load R2, DATAIN
• reads the data from the DATAIN register and loads them into processor
register R2. Simi- larly, the instruction
Store R2, DATAOUT
• sends the contents of register R2 to location DATAOUT, which is a
register in an output device.
Program-Controlled I/O:
• Consider a task that reads characters typed on a keyboard, stores these
data in the memory, and displays the same characters on a display
screen.
• A simple way of implementing this task is to write a program that
performs all functions needed to realize the desired action. This method
is known as program-controlled I/O.
• In addition to transferring each character from the keyboard into the
memory, and then to the display, it is necessary to ensure that this
happens at the right time. An input character must be read in response
to a key being pressed. For output, a character must be sent to the display
only when the display device is able to accept it.
• The rate of data transfer from the keyboard to a computer is limited by
the typing speed of the user, which is unlikely to exceed a few characters
per second. The rate of output transfers from the computer to the display
is much higher.
• It is determined by the rate at which characters can be transmitted to and
displayed on the display device
• The difference in speed between the processor and I/O devices creates
the need for mechanisms to synchronize the transfer of data between
them.
7.Interrupts:
• the program enters a wait loop in which it repeatedly tests the device
status.
• During this period, the processor is not performing any useful
computation.
• There are many situations where other tasks can be performed while
waiting for an I/O device to become ready.
• To allow this to happen, the I/O device to alert the processor when it
becomes ready.
• It can done by sending a hardware signal called an interrupt request to
the processor.
• Since the processor is no longer required to continuously poll the
status of I/O devices, it can use the waiting period to perform other
useful tasks.
• The routine executed in response to an interrupt request is called the
interrupt-service routine, which is the DISPLAY routine .
• The return address must be saved either in a designated general-
purpose register or on the processor stack.
• the processor must inform the device that its request has been
recognized so that it may remove its interrupt-request signal.
• This can be accomplished by means of a special control signal, called
interrupt acknowledge
• treatment of an interrupt-service routine is very similar to that of a
subroutine.
• A subroutine performs a function required by the program from which
it is called.
• The task of saving and restoring information can be done
automatically by the processor or by program instructions.
• Most modern processors save only the minimum amount of
information needed to maintain the integrity of program execution.
• This is because the process of saving and restoring registers involves
memory transfers that increase the total execution time, and hence
represent execution overhead. Saving registers also increases the delay
between the time an interrupt request is received and the start of
execution of the interrupt-service routine. This delay is called interrupt
latency.
• One saves all register contents, and the other does not. A particular I/O
device may use either type, depending upon its response- time
requirements.
• Another approach is to provide duplicate sets of processor registers.
• a different set of registers can be used by the interrupt-service routine,
thus eliminating the need to save and restore registers. The duplicate
registers are sometimes called the shadow registers.
Vectored Interrupts:
• To reduce the time involved in the polling process, a device requesting
an interrupt may identify itself directly to the processor.
• Then, the processor can immediately start executing the corresponding
interrupt-service routine.
• The term vectored interrupts refers to interrupt-handling schemes based
on this approach.
• A commonly used scheme is to allocate permanently an area in the
memory to hold the addresses of interrupt-service routines.
• These addresses are usually referred to as interrupt vectors, and they are
said to constitute the interrupt-vector table
Debugging:
10.BUS OPEARTION:
• A bus requires a set of rules, often called a bus protocol
• The bus protocol determines when a device may place information on the
bus, when it may load the data on the bus into one of its registers
• These rules are implemented by control signals
• One control line, usually labelled R/W, specifies whether a Read or a
Write operation is to be performed.
•• it
The
at specifies
data bus
which
from Read
control
the
the when
lines
processor
data set
also
lines.andtothe
1 and
carry I/O Write
timing when
mayset
information.
devices to 0.data
They
place specify
on orthe times
receive
• one device plays the role of a master. This is the device that initiates data
transfers by issuing Read or Write commands on the bus.
• The device addressed by the master is referred to as a slave.
SYNCHRONOUS BUS:
• On a synchronous bus, all devices derive timing information from a
control line called the bus clock
• The signal on this line has two phases: a high level followed by a low
level. The two phases constitute a clock cycle.
• The first half of the cycle between the low-to-high and high-to-low
transitions is often referred to as a clock pulse.
• During clock cycle 1, the master sends address and command information
on the bus, requesting a Read operation.
• The slave receives this information and decodes it. It begins to access the
requested data on the active edge of the clock at the beginning of clock
cycle 2.
• The data become ready and are placed on the bus during clock cycle 3.
• The slave asserts a control signal called Slave-ready at the same time.
• The master, which has been waiting for this signal, loads the data into its
register at the end of the clock cycle.
• The slave removes its data signals from the bus and returns its Slave-ready
signal to the low level at the end of cycle 3.
• The bus transfer operation is now complete, and the master may send new
address and command signals to start a new transfer in clock cycle 4.
ASYNCHRONOUS BUS:
12.INTERFACE CIRCUITS:
• The I/O interface of a device consists of the circuitry needed to connect
that device to the bus.
• On one side of the interface are the bus lines for address, data, and control.
• On the other side are the connections needed to transfer data between the
interface and the I/Odevice. This side is called a port, and it can be either
a parallel or a serial port.
• A parallel port transfers multiple bits of data simultaneously to or from
the device.
• A serial port sends and receives data one bit at a time.
• An I/O interface does the following:
1. Provides a register for temporary storage of data
2. Includes a status register containing status information that can be
accessed by the processor
3. Includes a control register that holds the information governing the
behavior of the interface
4. Contains address-decoding circuitry to determine when it is being
addressed by the processor
5. Generates the required timing signals
6. Performs any format conversion that may be necessary to transfer
data between the processor and the I/O device, such as parallel-to-
serial conversion in the case of a serial port
PARALLEL INTERFACE
• An interface circuit for an 8-bit input port that can be used for connecting
a simple input device, such as a keyboard.
• an interface circuit for an 8-bit output port, which can be used with an
output device such as a display.
• interface circuits are connected to a 32-bit processor that uses memory-
mapped I/O and the asynchronous bus protocol
Input Interface
• There are only two registers: a data register, KBD_DATA, and a status
register, KBD_STATUS. The latter contains the keyboard status flag,
KIN.
• A typical keyboard consists of mechanical switches that are normally
open. When a key is pressed, its switch closes and establishes a path for
an electrical signal
• A difficulty with such mechanical pushbutton switches is that the
contacts bounce when a key is pressed, resulting in the electrical
connection being made then broken several times before the switch
settles in the closed position
• The software detects that a key has been pressed when it observes that
the keyboard status flag, KIN, has been set to 1.
• The I/O routine can then introduce sufficient delay before reading the
contents of the input buffer, KBD_DATA, to ensure that bouncing has
subsided.
• When debouncing is implemented in hardware, the I/O routine can read
the input character as soon as it detects that KIN is equal to 1.
• consists of one byte of data representing the encoded character and one
control signal called Valid. When a key is pressed, the Valid signal
changes from 0 to 1
• The status flag is cleared to 0 when the processor reads the contents of
the KBD_DATA register.
• The interface circuit is connected to an asynchronous bus on which
transfers are controlled by the handshake signals Master-ready and
Slave-ready
• The bus has one other control line, R/W, which indicates a Read
operation when equal to 1.
Output Interface
• can be used to connect an output device such as a display
• the display uses two handshake signals, New-data and Ready, in a
manner similar to the handshake between the bus signals Master-ready
and Slave-ready.
• When the display is ready to accept a character, it asserts its Ready
signal, which causes the DOUT flag in the DISP_STATUS register to be
set to 1.
• When the I/O routine checks DOUT and finds it equal to 1, it sends a
character to DISP_DATA.
• This clears the DOUT flag to 0 and sets the New-data signal to 1. In
response, the display returns Ready to 0 and accepts and displays the
character in DISP_DATA
SERIAL INTERFACE
• A serial interface is used to connect the processor to I/O devices that
transmit data one bit at a time.
• Data are transferred in a bit-serial fashion on the device side and in a bit-
parallel fashion on the processor side.
• The transformation between the parallel and serial formats is achieved
with shift registers that have parallel access capability.
• The input shift register accepts bit-serial input from the I/O device.
• When all 8 bits of data have been received, the contents of this shift
register are loaded in parallel into the DATAIN register.
• output data in the DATAOUT register are transferred to the output shift
register, from which the bits are shifted out and sent to the I/O device.
• The part of the interface that deals with the bus is the same as in the
parallel interface described earlier.
• Two status flags, which we will refer to as SIN and SOUT, are
maintained by the Status and control block.
• The SIN flag is set to 1 when new data are loaded into DATAIN from
the shift register, and cleared to 0 when these data are read by the
processor.
• The SOUT flag indicates whether the DATAOUT register is available. It
is cleared to 0 when the processor writes new data into DATAOUT and
set to 1 when data are transferred from DATAOUT to the output shift
register.
• The double buffering used in the input and output paths in Figure 7.15 is
important. It is possible to implement DATAIN and DATAOUT
themselves as shift registers, thus obviating the need for separate shift
registers
DEVICE CHARACTERISTICS:
• The kinds of devices that may be connected to a computer cover a wide
range of functionality.
• The speed, volume, and timing constraints associated with data transfers
to and from these devices vary significantly.
• one byte of data is generated every time a key is pressed, which may
happen at any time.
• These data should be transferred to the computer promptly.
• The event of pressing a key is not synchronized to any other event in a
computer system, the data generated by the keyboard are called
asynchronous
• The sampling process yields a continuous stream of digitized samples
that arrive at regular intervals, synchronized with the sampling clock.
• Such a data stream is called isochronous, meaning that successive events
are separated by equal periods of time.
• A signal must be sampled quickly enough to track its highest-frequency
components.
Plug-and-Play
• When an I/O device is connected to a computer, the operating system
needs some information about it. It needs to know what type of device it
is so that it can use the appropriate device driver.
• It also needs to know the addresses of the registers in the device’s
interface to be able to communicate with it.
• The USB standard defines both the USB hardware and the software that
communicates with it. Its plug-and-play feature means that when a new
device is connected, the system detects its existence automatically
USB Architecture
• The USB uses point-to-point connections and a serial transmission
format.
• When multiple devices are connected, they are arranged in a tree
structure
• Each node of the tree has a device called a hub, which acts as an
intermediate transfer point between the host computer and the I/O
devices.
• At the root of the tree, a root hub connects the entire tree to the host
computer.
• The leaves of the tree are the I/O devices: a mouse, a keyboard, a printer,
an Internet connection, a camera, or a speaker.
• The tree structure makes it possible to connect many devices using
simple point-to-point serial links
Electrical Characteristics
• USB connections consist of four wires, of which two carry power, +5 V
and Ground, and two carry data.
• I/O devices that do not have large power requirements can be powered
directly from the USB.
• Two methods are used to send data over a USB cable.
• When sending data at low speed, a high voltage relative to Ground is
transmitted on one of the two data wires to represent a 0 and on the other
to represent a 1.
• The Ground wire carries the return current in both cases. Such a scheme
in which a signal is injected on a wire relative to ground is referred to as
single-ended transmission.
• The speed at which data can be sent on any cable is limited by the amount
of electrical noise present.
• The term noise refers to any signal that interferes with the desired data
signal and hence could cause errors.
• Single-ended transmission is highly susceptible to noise.
• The voltage on the ground wire is common to all the devices connected to
the computer.
• Signals sent by one device can cause small variations in the voltage on the
ground wire, and hence can interfere with signals sent by another device.
• Interference can also be caused by one wire picking up noise from nearby
wires.
• The High-Speed USB uses an alternative arrangement known as
differential signaling.
• The data signal is injected between two data wires twisted together.