Co Unit 2
Co Unit 2
• Memories are made up of registers. Each register in the memory is one storage location also called memory
location.
• Each memory location is identified by an address. The number of storage locations can vary from a few in some
memories to hundreds of thousand in others. Each location can accommodate one or more bits.
• Generally, the total number of bits that a memory can store is its capacity. Most of the types the capacity is
specified in terms of bytes (group of eight bits).
• Each register consists of storage elements (flip-flops or capacitors in semiconductor memories and magnetic
domain in magnetic storage), each of which stores one bit of data. A storage element is called a cell.
• The data stored in a memory by a process called writing and are retrieved from the memory by a process called
reading. Fig. 5.1.1 illustrates in a very simplified way the concept of write, read, address and storage capacity for
a generalized
memory.
• As shown in the Fig. 8.1.1 a memory unit stores binary information in groups of
bits called words. A word in memory is an entity of bits that moves in and out of storage as a unit. A word having
group of 8-bits is called a byte. Most computer memories use words that are. multiples of eight bits in length.
Thus, a 16-bit word contains two bytes, and a 32-bit word is made of 4-bytes.
• The communication between a memory and its environment is achieved through data lines, address selection
lines, and control lines that specify the direction of transfer.
• The Fig. 8.1.2 shows the block diagram of memory unit. Then data lines provide the information to be stored in
memory and the k address lines specify the particular word chosen among the many available. The two control
inputs specify the direction transfer.
• When there are k address lines we can access 2k memory words. For example, if k = 10 we can access 210 =
1024 memory words.
Illustrative Examples
Example 8.1.1 A bipolar RAM chip is arranged as 16 words. How many bits are stored in the chip?
Solution: 16 x 8 = 128 bits          Therefore one word = 8 bits.
Example 8.1.2 How many address bits are needed to operate a 2 K × 8 ROM ?
Solution: 2 K memory locations = 2048 locations
Since 211=2048, we need 11 address lines.
Example 8.1.3 How many locations are addressed using 18 address bits?
Solution: The number of locations addressed = 218 = 262144
Characteristics of Memory
• The Table 8.1.1 lists the key characteristics of memory systems.
Physical characteristics :
Volatile/Nonvolatile If memory can hold data even if power is turned off, it is called. nonvolatile memory;
otherwise it is called volatile memory.
Erasable/Nonerasable The memories in which data is once programmed cannot be erased are called nonerasable
memories. On the other hand, if data in the memory is erasable then memory is called erasable memory.
The Table 8.1.2 shows the characteristics of some common memory technologies.
Characteristics of some common memory technologies
Distinguish between volatile and non-volatile memories
• The processor of a computer can usually process instructions and data faster than they are fetched from the
memory unit. The memory cycle time, then is the bottleneck in the system. One way to avoid this bottleneck is
to use a cache memory. Cache memory is a small, fast memory that is inserted between the larger, slower main
memory and the processor. It usually holds the currently active segments of a program and their data.
• In most modern computers, the physical main memory is not as large as the address space spanned by an address
issued by the processor. Here, the virtual memory technique is used to extend the apparent size of the physical
memory. It uses secondary storage such as disks, to extend the apparent size of the physical memory.
Classification of Primary Memory
• The memory devices can be classified based on following parameters:
• Principle of operation
• Physical characteristics
• Mode of access and
• Terminology used for fabrication.
• The Fig. 8.1.3 shows the classification of memory.
• Broadly semiconductor memories are classified as volatile memories and non-volatile memories. Volatile
memories can retain their state as long as power is applied. On the other hand, non-volatile memories can hold
data even if power is turned off.
• Read/Write Memories (RWMs) are those memories, which allows both read and write operations. They are used
in applications where data has to change continuously. They are also used for temporary storage of data. ROM
memories allow only read operation. They are used to store monitor programs and constants used in the program.
• The volatile memories which can hold data as long as power is ON are called Static RAMS (SRAMs). Dynamic
RAMS (DRAMs) stores the data as a charge on the capacitor and they need refreshing of charge on the capacitor
after every few milliseconds to hold the data even if power is ON.
• EPROM and EEPROM are erasable memories in which the stored data can be erased and new data can be
stored.
• The semiconductor memories are also classified as Bipolar and MOS memories depending upon the type of
transistors used to construct the individual cell.
Classification of Secondary Memory
• A primary memory is costly and has a limited size. This memory is mainly used for storing the currently
processing data.
• Secondary storage is used to store data and instructions (programs) when they are not being processed.
• The devices those are used as secondary storage are non-volatile and have a larger storage capacity. Also, they
are less expensive as compared to primary storage devices.
• However, they are slower in comparison. The examples are hard disks, floppies, CD-ROMs, magnetic tapes etc.
This type of memory is also called secondary memory, auxiliary memory or peripheral storage.
• Fig. 8.1.4 shows the classification of secondary storage devices. They can be categorized broadly according to
their access types as sequential and random (direct).
                                            Memory Hierarchy
• Ideally, computer memory should be fast, large and inexpensive. Unfortunately, it is impossible to meet all the
three of these requirements using one type of memory.
• Increased speed and size are achieved at increased cost.
• Very fast memory system can be achieved if SRAM chips are used. These chips are expensive and for the cost
reason it is impracticable to build a large main memory using SRAM chips. The only alternative is to use DRAM
chips for large main memories.
• Processor fetches the code and data from the main memory to execute the program. The DRAMS which form
the main memory are slower devices. So it is necessary to insert wait states in memory read/write cycles. This
reduces the speed of execution.
• The solution for this problem is come out with the fact that most of the computer programs work with only
small sections of code and data at a particular time. In the memory system, small section of SRAM is added along
with main memory, referred to as cache memory.
• The program which is to be executed is loaded in the main memory, but the part of program (code) and data that
work at a particular time is usually accessed from the cache memory.
• This is accomplished by loading the active part of code and data from main memory to cache memory. The
cache controller looks after this swapping between main memory and cache memory with the help of DMA
controller.
• The cache memory just discussed is called secondary cache. Recent processors have the built-in cache memory
called primary cache.
• DRAMs along with cache allow main memories in the range of tens of megabytes to be implemented at a
reasonable cost, the size and better speed performance. But the size of memory is still small compared to the
demands of large programs with voluminous data. A solution is provided by using secondary storage, mainly
magnetic disks and magnetic tapes to implement large memory spaces. Very large disks are available at a
reasonable price, sacrificing the speed.
• From the above discussion, we can realize that to make efficient computer system it is not possible to rely on a
single memory component, but to employ a memory hierarchy. Using memory hierarchy all of different types of
memory units are employed to give efficient computer system. A typical memory hierarchy is in Fig. 8.2.1.
• In summary, we can say that a huge amount of cost-effective storage can be provided by magnetic disks. A large,
yet affordable, main memory can be built with DRAM technology along with the cache memory to achieve better
speed performance.
• The Fig. 8.2.2 shows how memory hierarchy can be employed in a computer system. As shown in the Fig. 8.2.2,
the bottom of the hierarchy magnetic tapes and magnetic disks are used as a secondary memory. This memory is
also known auxiliary memory.
• The main memory occupies a central position by being able to communicate directly with the CPU and with
auxiliary memory devices through an I/O processor.
Fig. 8.2.3 shows common memory hierarchies with two, three and four levels.
Memory Technologies
RAM (Random Access Memories)
• Memories that consists of circuits capable of retaining their state as long as power is applied are known as static
memories.
• These are Random Access Memory (RAM) and hence combinely called static RAM memories.
Static RAM Cell
• The Fig. 8.3.1 shows the implementation of static RAM cell. It consists of two cross-coupled inverters as a latch
and two transistors T1 and T2 which act as a switches.
• The latch is connected to two bit lines by transistors T1 and T2. The word line controls the opening and closing
of transistors T1 and T2. When word line is at logic 0 level (Ground level), the transistors are off and the latch
retains its state.
Read operation
• For read operation, word line is made logic 1 (high) so that both transistors are ON. Now if the cell is in state
1, the signal on bit line b is high and the signal on bit line b' is low. The opposite is true if the cell is in state 0.
The b and b' are complements of each other. The sense/write circuits connected to the bit lines monitor the states
of b and b' and set the output accordingly.
Write operation
• For write operation, the state to be set is placed on the line b and its complement is placed on line b' and then
the word line is activated. This action forces the cell into the corresponding state and write operation is completed.
CMOS Cell
• The Fig. 8.3.2 shows the CMOS cell. Here, transistor pairs (T3, T5) and (T4, T6) form the cross-coupled inverters
and transistors T1and T2act as a switches. The latch is connected to two bit lines by transistors T1 and T2.
• The word line controls the opening and closing of transistors T1 and T2. When word line is at logic 0 level
(ground level), the transistors T1 and T2 are off and the latch retains its state.
Read operation
• For read operation, word line is made high to switch-on transistor T1 and T2. The cell is in state 1, if voltage at
point X is maintained high by having transistors T3 and T6 on, while T3 and T5 are off. The cell is in state 0, if the
voltage at point X is maintained low by having transistors T3 and T6 off, while T4 and T5 are on.
Write operation
• For write operation, the state to be set is placed on the line b and its complement is placed on line b', and then
the word line is activated. This action forces the cell into the corresponding state.
DRAM (Dynamic RAMs)
• Dynamic RAM stores the data as a charge on the capacitor. Fig. 8.3.3 shows the dynamic RAM cell.
• It is not possible to erase selective information, when erased the entire information is lost.
• The chip can be reprogrammed.
• This memory is ideally suitable for product development, experimental projects and college laboratories, since
this chip can be reused many times.
EPROM Programming :
• When erased each cell in the EPROM contains 1. Data is introduced by selectively programming 0's into the
desired bit locations. Although only O's will be programmed, both 1's and O's can be presented in the data.
• During programming address and data are applied to address and data pins of the EPROM. When the address
and data are stable, program pulse is applied to the program input of the EPROM. The program pulse duration is
around 50 ms and its amplitude depends on EPROM IC. It is typically 5.5 V to 25 V.
• In EPROM, it is possible to program any location at any time - either individually, sequentially or at random.
EEPROM (Electrically Erasable Programmable Read Only Memory)
• Electrically erasable programmable ROMs also use MOS circuitry very similar to that of EPROM.
• Data is stored as charge or no charge on an insulated layer or an insulated floating gate in the device.
• The insulating layer is made very thin (<200 Å). Therefore, a voltage as low as 20 to 25 V can be used to move
charges across the thin barrier in either direction for programming or erasing.
• EEPROM allows selective erasing at the register level rather than erasing all the information since the
information can be changed by using electrical signals.
• The EEPROM memory also has a special chip erase mode by which entire chip can be erased in 10 ms. This
time is quite small as compared to time required to erase EPROM. It can be erased and reprogrammed with device
right in the circuit.
Disadvantage
• EEPROMs are most expensive and the least dense ROMs.
Flash Memory
Flash memories are read/write memories. In flash memories it is possible to read the contents of a single cell, but
it is only possible to write an entire block of cells. A flash cell is based on a single transistor controlled by trapped
charge.
Flash devices have greater density than the EEPROM memory. Due to this flash devices have higher capacity
and a lower cost per bit. They require a single power supply voltage and consume less power in their operation.
The low power consumption of flash memory makes it suitable for portable equipments such as hand-held
computers, cell phones, digital cameras, MP3 music players and so on. In hand-hold computers and cell phones,
flash memory is used to store the software needed to operate the equipment. In digital cameras, flash memory is
used to store picture image data. In MP3 players, flash memory is used to store the sound data.
The flash memories are available in modules. These modules are implemented in two types: flash cards and flash
drives.
Flash Cards: In this, flash chips are mounted on a small card. Flash cards have a standard interface that makes
them usable in variety of applications. These cards are available in variety of memory sizes. The sizes are 8, 32
and 64 Mbytes.
Flash Drives: Flash drives are manufactured to replace hard-disk drives. These drives are designed to fully
emulate the hard disk, to the point that they can be fitted into standard disk drive bays. However, the storage
capacity of flash drives is significantly lower than that of hard disk drives. Currently, the capacity of flash drives
is less than 1 Gbyte. The Table 8.3.1 shows the comparison between flash drives and hard disk drives.
Synchronous DRAM (SDRAM)
• DRAMS whose operation is directly synchronised with a clock signal are known as synchronous DRAMS or
simply SDRAMs.
• In a typical DRAM, the processor has to wait through delay of access time slowing the system performance. To
solve above problem, the SDRAM exchanges data with the processor synchronized to an external clock signal.
• Advantage: This allows processor to read/write data with memory without imposing wait states.
• In this, DRAM latches the address sent by the processor and then responds after a set number of clock cycles.
• Advantage: Meanwhile, the processor can do other task while the SDRAM is processing the request.
• The Fig. 8.3.7 shows the internal architecture of SDRAM. The cell organisation of SDRAM is same as
asynchronous DRAM. The address and data lines are buffered by means of buffer registers.
• It supports burst mode transfer.
• Advantage: In burst mode, after first access, no address setup or row/column line pre-charge time is needed.
This improves system performance.
• Its mode register allows burst length, burst type and latency to be customized to suit specific system needs.
• Advantage: The SDRAMs have built in refresh circuitry.
• The Fig. 8.3.8 shows a timing diagram for a typical burst read cycle of length 4.
• The first clock cycle row address is latched by activating RAS control signal. The memory typically takes 2 or
3 clock cycles to activate the selected row. (Fig. 8.3.8 shows 2 clock cycles).
• Then, the column address is latched by activating the CAS control signal.
• After a delay of one clock cycle, the memory places the first set of data bits on the data lines.
• The SDRAM then automatically increments the column address to access the next three sets of data bits in the
selected row in three successive clock cycles.
Double-Data-Rate Series
• The standard SDRAM performs its operations on the rising edge of the clock signal. On the other hand, the
faster SDRAMs transfers data on both the edges of the clock signal.
• The latency of the faster SDRAMs is same as that for standard SDRAM. However, since they transfer data on
both the edges of the clock signal, their bandwidth is effectively doubled for long burst transfer. Such faster
SDRAMs are known as Double-Data-Rate SDRAMs (DDR SDRAMs).
• In DDR SDRAMs, the cell array is organized in two banks as shown in Fig. 8.3.9.
• As shown in Fig. 8.3.9 it has dual bank architecture to support on-chip parallelism.
• It also supports burst mode transfer. This allows the sequential reading of number of bits from same row,
eliminating the address setup time for each bit.
• As shown in the Fig. 8.3.10 there are 16 words of 8-bits each. A memory with 16 words needs four address lines.
The four address inputs go through a 4 × 16 decoder to select one of the sixteen words. The decoder is enabled
with a memory enable input. When the memory enable is 0, all outputs of the decoder are 0 and none of the
memory words are selected. With the memory select at 1, one of the sixteen words is selected, dictated by the
value in the four address lines. Once a word has been selected, the read/write input determines the operation.
During write operation, the data available in the input lines are transferred into the eight memory cells of the
selected word. The memory cells that are not selected are disabled and their previous binary values remain
unchanged.
Example 8.3.1 Draw the organization of 4 K×1 memory cell and explain.
Solution: The Fig. 8.3.11 shows the organization of 4K×1 memory cell. Such organization needs12 address lines
and one data line, and two control signals (               ), resulting 15 external connections. Here, 12-bit
address is
divided into two groups of 6-bit each to form therow and column addresses for the cell array. A row address
selects a row of 64 cells, all of which are accessed in parallel. However, according to the column address, only
one of these cells is connected to the external data line by the output multiplexer and input demultiplexer.
Cache Memories
• In a computer system the program which is to be executed is loaded in the main memory (DRAM). Processor
then fetches the code and data from the main memory to execute the program. The DRAMS which form the main
memory are slower devices. So it is necessary to insert wait states in memory read/write cycles. This reduces the
speed of execution.
• To speed up the process, high speed memories such as SRAMS must be used. But considering the cost and
space required for SRAMS, it is not desirable to use SRAMS to form the main memory. The solution for this
problem is come out with the fact that most of the microcomputer programs work with only small sections of
code and data at a particular time.
• Definition: The part of program (code) and data that work at a particular time is usually accessed from the
SRAM memory. This is accomplished by loading the active part of code and data from main memory to SRAM
memory. This small section of SRAM memory added between processor and main memory to speed up execution
process is known as cache memory.
• Fig. 8.4.1 shows a simplest form of cache memory system.
• A cache memory system includes a small amount of fast memory () and a large amount of slow memory
(DRAM). This system is configured to simulate a large amount of fast memory.
• Cache controller implements the cache logic. If processor finds that the addressed code or data is not available
in cache - the condition referred to as cache miss, the desired memory block is copied from main memory to
cache using cache controller. The cache controller decides which memory block should be moved in or out of the
cache and in or out of main memory, based on the requirements. (The cache block is also known as cache slot or
cache line.)
• The percentage of accesses where the processor finds the code or data word it needs in the cache memory is
called the hit rate/hit ratio. The hit rate is normally greater than 90 percent.
Example 8.4.1 The application program in a computer system with cache uses 1400 instruction acquisition bus
cycle from cache memory and 100 from main memory. What is the hit rate? If the cache memory operates with
zero wait state and the main memory bus cycles use three wait states, what is the average number of wait states
experienced during the program execution ?
Solution : Hit rate=1400 /1400 +100 × 100 = 93.3333 %
Total wait states = 1400 x 0 + 100 × 3 = 300
Average wait states = Total wait states / Number of memory bus cycles
= 300 / 1500 = 0.2
Most Commonly used Cache Organizations
• Two most commonly used system organizations for cache memory are :
• Look-aside and
• Look-through
Look-aside system organization
• The Fig. 8.4.2 shows system of look-aside cache organization. Here,the cache and the main memory are directly
connected to the system bus.
• In this system, the CPU initiates a memory access by placing a physical address on the memory address bus at
the start of read or write cycle.
• The cache memory M1 immediately compares physical address to the tag addresses currently residing in its tag
memory. If a match is found, i.e., in case of cache hit, the access is completed by a read or write operation executed
in the cache. The main memory is not involved in the process of read or write.
If match is not found, i.e., in case of cache miss, the desired access is completed by a read or write operation
directed to M2. In response to cache miss, a block of data that includes the target address is transferred from M2 to
M1. The system bus is used for this transfer and hence it is unavailable for other uses like I/O operations.
Look-through system organization
• The Fig. 8.4.3 shows look-through system of cache organization. Here, the CPU communicates with cache via
a separate (local) bus which is isolated from the main system bus. Thus during cache accesses, the system bus is
available for use by other units, such as I/O controllers, to communicate with main memory.
• Unlike the look-aside system, look-through cache system does not automatically send all memory requests to
main memory; it does so only after a cache miss.
• A look-through cache systems use wider local bus to link M1 and M2, thus speeding up cache-main-memory
transfers (block transfers).
• Look-through cache system is faster.
Disadvantages:
• It is complex.
• It is costly.
• It takes longer time for M2 to respond to the CPU when a cache miss occurs.
Cache Read Operation
• The Fig. 8.4.4 shows the small cache system. Here each cache block is 4 bytes and each memory address is 10-
bit long. Due to this 8 high-order bits form the tag or block address and the 2 low-order bits define a displacement
address within the block.
• When a block is assigned to the cache data memory, its tag is also placed in the cache tag memory.
• During read operation, the 8 high-order bits of an address are compared with stored tags in the cache tag memory
to find match (cache hit). The stored tag pinpoints the corresponding block in cache data memory and the 2-bit
displacement is used to read the target word.
Cache Write Operation
• The Fig. 8.4.5 shows execution of cache write operation. It uses same addressing technique as in case of read
operation.
• When cache hit occurs, the new data, in this case E6, is stored at the location pointed by address in the cache
data memory, thereby overwriting the old data 5 A.
• Now data in the cache data memory and data in the main memory for given address is different. This causes
cache consistency problem.
Program Locality
• In cache memory system, prediction of memory location for the next access is essential. This is possible because
computer systems usually access memory from the consecutive locations. This prediction of next memory address
from the current memory address is known as program locality.
• Program locality enables cache controller to get a block of memory instead of getting just a single address.
• The principle of program locality may not work properly when program executes JUMP and CALL instructions.
In case of these instructions, program code is not in sequence.
Locality of Reference
• We know that program may contain a simple loop, nested loops or a few procedures that repeatedly call each
other. The point is that many instructions in localized area of the program are executed repeatedly during some
time period and the remainder of the program is accessed relatively infrequently. This is referred to as locality of
reference.
• It manifests itself in two ways: temporal and spatial.
• The temporal means that a recently executed instruction is likely to be executed again very soon.
• The spatial means that instructions stored near by to the recently executed instruction are also likely to be
executed soon.
The temporal aspect of the locality of reference suggests that whenever an instruction or data is first needed, it
should be brought into the cache and it should remain there until it is needed again.
• The spatial aspect suggests that instead of bringing just one instruction or data from the main memory to the
cache, it is wise to bring several instructions and data items that reside at adjacent address as well. We use the
term block to refer to a set of contiguous addresses of some size.
Block Fetch
• Block fetch technique is used to increase the hit rate of cache.
• A block fetch can retrieve the data located before the requested byte (look behind)or data located after the
requested byte (look ahead) or both.
• When CPU needs to access any byte from the block, entire block that contains the needed byte is copied from
main memory into cache.
• The size of the block is one of the most important parameters in the design of a cache memory system.
Block size
1. If the block size is too small, the look-ahead and look-behind are reduced and therefore the hit rate is reduced.
2. Larger blocks reduce the number of blocks that fit into a cache. As the number of blocks decrease, block
rewrites from main memory becomes more likely.
3. Due to large size of block, the ratio of required data and useless data is less.
4. Bus size between the cache and the main memory increases with block size to accommodate larger data
transfers between main memory and the cache, which increases the cost of cache memory system.
Elements of Cache Design
• The cache design elements include cache size, mapping function, replacement algorithm write policy, block size
and number of caches.
Cache size: The size of the cache should be small enough so that the overall average cost per bit is close to that
of main memory alone and large enough so that the overall average access time is close to that of the cache alone.
Mapping function: The cache memory can store a reasonable number of blocks at any given time, but this
number is small compared to the total number of blocks in the main memory. Thus we have to use mapping
functions to relate the main memory blocks and cache blocks. There are two mapping functions commonly used
direct mapping and associative mapping. Detail description of mapping functions is given in section 8.4.6.
Replacement algorithm: When a new block is brought into the cache, one of the existing blocks must be
replaced, by a new block.
• There are four most common replacement algorithms:
• Least-Recently-Used (LRU)
• First-In-First-Out (FIFO)
• Least-Frequently-Used (LFU)
• Random
• Cache design change according to the choice of replacement algorithm. Detail description of replacement
algorithm is given in section 8.4.8.
Write policy: It is also known as cache updating policy. In cache system, two copies of the same data can exist
at a time, one in cache and one in main memory. If one copy is altered and other is not, two different sets of data
become associated with the same address. To prevent this, the cache system has updating systems such as: write
through system, buffered write through system and write-back system. The choice of cache write policy also
change the design of cache.
Block size: It should be optimum for cache memory system.
Number of caches: When on-chip cache is insufficient, the secondary cache is used. The cache design changes
as number of caches used in the system changes.
Mapping
• Usually, the cache memory can store a reasonable number of memory blocks at any given time, but this number
is small compared to the total number of blocks in the main memory. The correspondence between the main
memory blocks and those in the cache is specified by a mapping function.
• The mapping techniques are classified as :
1. Direct-mapping technique
2. Associative-mapping technique
• Fully-associative
• Set-associative techniques.
• To discuss these techniques of cache mapping we consider a cache consists of 128 blocks of 16 words each, for
a total of 2048 (2 K) words and assume that the main memory has 64 K words. This 64 K words of main memory
is addressable by a 16-bit address and it can be viewed as 4 K blocks of 16 words each. The group of 128 blocks
of 16 words each in main memory form a page.
Direct-Mapping
• It is the simplest mapping technique.
• In this technique, each block from the main memory has only one possible location in the cache organization.
In this example, the block i of the main memory maps onto block j(j = i modulo 128) of the cache, as shown in
Fig. 8.4.6. Therefore, whenever one of the main memory blocks 0, 128, 256, ...... is loaded in are stored in cache,
it is stored in cache block 0. Blocks 1, 129, 257, …… block 1 and so on.
In general the mapping expression is
j = i modulo m
where
i =Main memory block number
j=Cache block (line) number
m= Number of blocks (lines) in the cache
• To implement such cache system, the address is divided into three fields, as shown in Fig. 8.4.6.
• The lower order 4-bits select one of the 16 words in a block. This field is known as word field.
• The second field known as block field is used to distinguish a block from other blocks. Its length is 7-bits since
27 = 128.
• When a new block enters the cache, the 7-bit cache block field determines the cache position in which this block
must be stored.
• The third field is a tag field. It is used to store the high-order 5-bits of memory address of the block. These 5-
bit (tag bits) are used to identify which of the 32 blocks (pages) that are mapped into the cache.
• When memory is accessed, the 7-bit cache block field of each address generated by CPU points to a particular
block locationin the cache. The high-order 5-bits of the address are compared with the tag bits associated with
that cache location. If they match, then the desired word is in that block of the cache. If there is no match, then
the block containing the required word must first be read from the main memory and loaded into the cache.
• This means that to determine whether requested word is in the cache, only tag field is necessary to be compared.
This needs only one comparison.
• The main drawback of direct mapped cache is that if processor needs to access same memory locations from
two different pages of the main memory frequently, the controller has to access main memory frequently. Since
only one of these locations can be in the cache at a time. For example, if processor want to access memory location
100 H from page 0 and then from page 2, the cache controller has to access page 2 of the main memory. Therefore,
we can say that direct-mapped cache is easy to implement, however, it is not very flexible.
Associative-Mapping (Fully-Associative Mapping)
• The Fig. 8.4.7 shows the associative-mapping technique. In this technique, a main memory block can be placed
into any cache block position. As there is no fix block, the memory address has only two fields: word and tag.
This techniques is also referred to as fully-associative cache.
• The 12-tag bits are required to identify a memory block when it is resident in the cache. The high-order 12-bits
of an address received from the CPU are compared to the tag bits of each block of the cache to see if the desired
block is present.
• Once the desired block is present, the 4-bit word is used to identify the necessary word from the cache.
• This technique gives complete freedom in choosing the cache location in which to place the memory block.
Thus, the memory space in the cache can be used more efficiently.
• A new block that has to be loaded into the cache has to replace (remove) an existing block only if the cache is
full.
• In such situations, it is necessary to use one of the possible replacement algorithm to select the block to be
replaced.
• Disadvantage:In associative-mapped cache, it is necessary to compare the higher-order bits of address of the
main memory with all 128 tag corresponding to each block to determine whether a given block is in the cache.
This is the main disadvantage of associative-mapped cache.
Set-Associative Mapping
• The set-associative mapping is a of both direct and associative mapping.
• It contains several groups of direct-mapped blocks that operate as several direct-mapped caches in parallel.
• A block of data from any page in the main memory can go into a particular block location of any direct-mapped
cache. Hence the contention problem of the direct-mapped technique is eased by having a few choices for block
placement.
• The required address comparisons depend on the number of direct-mapped caches in the cache system. These
comparisons are always less than the comparisons required in the fully-associative mapping.
• Fig. 8.4.8 shows two way set-associative cache. Each page in the main memory is organized in such a way that
the size of each page is same as the size of one directly mapped cache. It is called two-way set-associative cache
because each block from main memory has two choices for block placement.
• In this technique, block 0, 64, 128,.....4032 of main memory can map into any of the two (block 0) blocks of set
0, block 1, 65, 129,,...., 4033 of main memory can map into any of the two (block 1) blocks of set 1 and so on.
• As there are two choices, it is necessary to compare address of memory with the tag bits of corresponding two
block locations of particular set. Thus for two-way set-associative cache, we require two comparisons to
determine whether a given block is in the cache.
• Since there are two direct-mapped caches, any two bytes having same offset from different pages can be in the
cache at a time. This improves the hit rate of the cache system.
• To implement set-associative cache system, the address is divided into three fields, as shown in Fig. 8.4.8.
• The 4-bit word field selects one of the 16 words in a block.
• The set field needs 6-bits to determine the desired block from 64 sets. However, there are now 64 pages. To
identify a block belongs to a particular page from 64 pages, six tag bits are required.
Example 8.4.2 Consider a cache consisting of 256 blocks of 16 words each, for a total of 4096 (4 K) words and
assume that the main memory is addressable by a 16-bit address and it consists of 4 K blocks. How many bits are
there in each of the TAG, BLOCK/SET and word fields for different mapping techniques ?
Solution: We know that memory address is divided into three fields. We will now find the exact bits required for
each field in different mapping techniques.
a) Direct-mapping
Word bits: We know that each block consists of 16 words. Therefore, to identify each word we must have (24 =
16) four bit reserved for it.
Block bits: The cache memory consists of 256 blocks and using direct-mapped technique, block k of the main
memory maps onto block k modulo 256 of the cache. It has one to one correspondence and requires unique
address for each block. To address 128 block we require (28 = 256) eight bits.
Tag bits: The remaining 4 (16 – 4 - 8) address bits are tag bits which stores the higher address of the main
memory.
The main memory address for direct-mapping technique is divided as shown below:
b) Associative-mapping
Word bits: The word length will remain same i.e. 4 bits.
In the associative-mapping technique, each block in the main memory is identified by the tag bits and an address
received from the CPU is compared with the tag bits of each block of the cache to see if the desired block is
present. Therefore, this type of technique does not have block bits, but all remaining bits (except word bits) are
reserved as tag bits.
Block bits: 0
Tag bits: To address each block in the main memory (212 = 4096) 12 bits are required and therefore, there are 12
tag bits.
The main memory address for direct mapping technique is divided as shown below:
c) Set-associative mapping
Let us assume that there is a 2-way set-associative mapping. Here, cache memory 'is mapped with two blocks per
set. The set field of the address determines which set of the cache might contain the desired block.
Word bits: The word length will remain same i.e. 4 bits.
Set bits: There are 128 sets (256/2). To identify each set (27 = 128) seven bits are required.
Tag bits: The remaining 5 (16 – 4-7) address bits are the tag bits which stores higher address of the main memory.
The main memory address for 2-way set associative mapping technique is divided as shown below:
Example 8.4.3 A block set-associative cache consists of 64 blocks divided into 4 block sets. The main memory
contains 4096 blocks, each consists of 128 words of 16 bits length:
i) How many bits are there in main memory?
ii) How many bits are there in each of the TAG, SET and WORD fields?
Solution: i) Number of bits in main memory:
= Number of blocks x Number of words per block x Number of bits per word
= 4096 × 128 × 16
= 8388608 bits
ii) Number of bits in word field :
There are 128 words in each block. Therefore, to identify each word (27 = 128) 7 bits are required.
iii) Number of T bits in set field: There are 64 blocks and each set consists of 4 blocks.
Therefore, there are 16 (64/4) sets. To identify each set (24 = 16) four bits are required.
iv) Number of bits in tag field: The total words in the memory are :
4096 x 128 = 524288
To address these words we require (219 = 524288) 19 address lines. Therefore, tag bits are eight (19-7-4).
Example 8.4.4 A digital computer has a memory unit of 64 K × 16 and a cache memory of 1 K words. The cache
uses direct mapping with a block size of four words. How many bits there in the tag index, block and word field
of the address format?
Solution:Word bits: Number of word bits = log2 22 = 2-bits
Block bits: Number of block = Cache size / Words in each block
Number of block bits= log2 256 = log2 28 = 8 bits.
Example 8.4.6 A direct mapped cache has the following parameters: cache size = 1 K words, Block size 128
words and main memory size is 64 K words. Specify the number of bits in TAG, BLOCK and WORD in main
memory address.
Solution : Word bits = log2 128 = 7 bits
Number of blocks = cache size / words in each block = 1K/128 = 8
Number of block bits = log2 8 = 3-bits
Number of address bits to address main memory
= log2 64K = log2216 = 16 bits
Tag bits = 16 – 3 – 7 = 6 bits
Example 8.4.7 How many total bits are required for a direct-mapped cache with 16 kb of data and 4-word blocks,
assuming a 32-bit address?
Solution:
16 kb = 4K words =212 words
Block size of 4 words = 210 blocks
Each block has 4 x 32 = 128 bits of data + tag + valid bit
Tag + valid bit = (32 – 10 – 2 - 2) + 1 = 19
Total cache size =210 (128+ 19) = 210 ×147
Therefore, 147 kb are needed for the cache.
Example 8.4.8 You have been asked to design a cache with the following properties:
1) Data words are 32 bits each.
2) A cache block will contain 2048 bits of data.
3) The cache is direct mapped.
4) The address supplied from the CPU is 32 bits long.
5) There are 2048 blocks in the cache.
6) Addresses are to the word.
In the below Fig.8.4.11, there are 8 fields (labeled a,b,c,d,e,f,g and h), you will need to indicate the proper name
or number of bits for a particular portion of this cache configuration. AU May-19, Marks 8
Solution:
f. (name) -You are being asked to show what part of a physical address form the index, offset, and tag. < f > refers
to the most significant bits of the address - so this is the tag.
g. (name) - It follows that the next part of the address is the index.
h. (name) - The least significant bits form the offset.
c. (number) - There are 211 bits / block and there are 25 bits / word. Thus there are 26 words / block so we need 6
bits of offset.
b. (number) - There are 211 blocks and the cache is direct mapped (or "1-way set associative"). Therefore, we need
11 bits of index.
a. (number) - The remaining bits form the tag. Thus, 32 - 6 - 11 = 15 bits of tag.
d. (number) - Field < d > refers to the fact that a tag must be stored in each block. Thus, 15 bits are kept in each
block.
e. (number) Field < e > asks you to specify the total number of bits / block. This.is 2048.
We need to compare the valid bit associated with the block, the tag stored in the block, and the tag associated
with the physical address to determine if the cache entry is useable or not. The tags should be the same and the
valid bit should be 1.
Cache Size
There are 2048 blocks in the cache and there are 2048 bits / block. There are 8 bits/byte. Thus, there are 256 bytes
/ block
2048 blocks × 256 bytes / block =219 bytes (or 0.5 MB)
Comparison between Mapping Techniques
Cache Coherency
• In a single CPU system, two copies of same data, one in cache memory and another in main memory may
become different. This data inconsistency is called as cache coherence problem.
• Cache updating systems eliminates data inconsistency in the main memory caused by cache write operations.
• In multiprocessor systems, another bus master can take over control of the system bus. This bus master could
write data into a main memory blocks which are already held in the cache of another processor. When this
happens, the data in the cache no longer match those held in main memory creating inconsistency.
• The 80386 supports four different approaches to prevent data inconsistency, that is to protect cache coherency:
1. Bus watching (snooping) 2. Hardware transparency
3. Non-cacheable memory 4. Cache flushing
Bus watching: In bus watching, cache controller invalidates the cache entry, if another master writes to a location
in shared memory which also resides in the cache memory. Fig. 8.4.12 shows bus watching.
Hardware transparency : In hardware transparency, accesses of all devices to the main memory are routed
through the same cache or by copying all cache writes both to the main memory and to all other caches that share
the same memory. Fig. 8.4.13 shows hardware transparent system.
Non-cacheable memory: The 80386DX can partition its main memory into a cacheable and non-cacheable
memory. By designing shared memory as non-cacheable memory cache coherency can be maintained, since
shared memory is never copied into cache.
Cache flushing: To avoid data inconsistency, a cache flush writes any altered data to the main memory and
caches in the system are flushed before a device writes to shared memory.
Cache Updating Policies
• In a cache system, two copies of same data can exist at a time, one in cache and one in main memory. If one
copy is altered and other is not, two different sets of data become associated with the same address. To prevent
this, the cache system has updating systems such as: write through system, buffered write through system and
write-back system.
Write through System
• The cache controller copies data to the main memory immediately after it is written to the cache. Due to this,
main memory always contains a valid data and thus any block in the cache can be overwritten immediately
without data loss.
• The write through is a simple approach.
• This approach requires time to write data in main memory with increase in bus traffic.
• This in effect reduces the system performance.
Buffered Write through System
• In buffered write through system, the processor can start a new cycle before the write cycle to the main memory
is completed. This means that the write accesses to the main memory are buffered.
• In such systems, read access which is a "cache hit" can simultaneously when main memory is updated.
be performed
• However, two consecutive write operations to the main memory or read operation with cache "miss" require the
processor to wait.
Write-Back System
• In a write-back system, the alter (update) bit in the tag field is used to keep information of the new data. If it is
set, the controller copies the block to main memory before loading new data into the cache.
• Due to one time write operation, number of write cycles are reduced in write-back system. But this system has
following disadvantages.
• Write-back cache controller logic is more complex.
• It is necessary that, all altered blocks must be written to the main memory before another device can access
these blocks in main memory.
• In case of power failure, the data in the cache memory is lost, so there is no way to tell which locations of the
main memory contain old data. Therefore, the main memory as well as cache must be considered volatile and
provisions must be made to save the data in the cache.
Replacement Algorithms
• When a new block is brought into the cache, one of the existing blocks must be replaced, by a new block.
• In case of direct-mapping cache, we know that each block from the main memory has only one possible location
in the cache, hence there is no choice. The previous data is replaced by the data from the same memory location
from new page of the main memory.
• For associative and set-associative techniques, there is a choice of replacing existing block. The choice of
replacement of the existing block should be such that the probability of accessing same block must be very less.
The replacement algorithms do the task of selecting the existing block which must be replaced. • There are four
most common replacement algorithms:
• Least-Recently-Used (LRU)
• First-In-First-Out (FIFO)
• Least-Frequently-Used (LFU)
• Random
• Least-Recently-Used: In this technique, the block in the set which has been in the cache longest with no
reference to it, is selected for the replacement. Since we assume that more-recently used memory locations are
more likely to be referenced again. This technique can be easily implemented in the two-way set-associative
cache organization.
• First-In-First-Out: This technique uses same concept that stack implementation uses in the microprocessors.
In this technique, the block which is first loaded in the cache amongst the present blocks in the cache is selected
for the replacement.
• Least-Frequently-Used: In this technique, the block in the set which has the fewest references is selected for
the replacement.
• Random: Here, there is no specific criteria for replacement of any block. The existing blocks are replaced
randomly. Simulation studies have proved that random replacement algorithm provides only slightly inferior
performance to algorithms just discussed.
Example 8.4.9 Consider web browsing application. Assuming both client and server are involved in the process
of web browsing application, where can caches be placed to speed up the process? Design a memory hierarchy
for the system. Show the typical size and latency at various levels of the hierarchy. What is the relationship
between cache size and its access latency? What are the units-of data transfers between hierarchies? What is the
relationship between the data location, data size, and transfer latency? AU: Dec.-18, Marks 7
Solution: a) Assuming both client and server are involved in the process of web browsing application, caches
can be placed on both sides - web browser and server.
b) Memory hierarchy for the system is as follows:
1. Browser cache, size = fraction of client computer disk, latency = local disk = latency.
2. Proxy cache, size=proxy disk, latency = LAN+ proxy disk latencies
3. Server-side cache, size = fraction of server disk, latency = WAN + server disk
4. Server storage, size = server storage, latency = WAN + server storage. Latency is not directly related to cache
size.
c) The units of data transfers between hierarchies are pages.
d) Latency grows with page size as well as distance.
Virtual Memory
Virtual Memory Concept
• The virtual memory technique is used to extend the apparent size of the physical memory.
• It uses secondary storage such as disks, to extend the apparent size of the physical memory.
• Virtual memory is a concept that allows user to construct programs as large as auxiliary memory.
• When a program does not completely fit into the main memory, it is divided into segments. The segments which
are currently being executed are kept in the main memory and remaining segments are stored in the secondary
storage devices, such as a magnetic disk.
• If an executing program needs a segment which is not currently in the main memory, the required segment is
copied from the secondary storage device.
• When a new segment of a program is to be copied into a main memory, it must replace another segment already
in the memory.
• In virtual memory, the addresses that processor issues to access either instruction or data are called virtual or
logical address. The set of such addresses is called address space. These addresses are translated into physical
addresses by a combination of hardware and software components.
• The set of physical addresses or locations is called the memory space.
Concept of Paging
• Fig. 8.5.1 shows a typical memory organization that implements virtual memory. The memory management unit
controls this virtual memory system. It translates virtual address into physical addresses.
• A simple method for translating virtual addresses into physical addresses is to assume that all programs and data
are composed of fixed length unit called pages, as shown in the Fig. 8.5.2.
• Pages constitute the basic unit of information that is moved between the main memory and the disk whenever
the page translation mechanism determines that a swapping is required.
Virtual to Physical Address Translation
• Involves two phases: Segment translation and page translation.
Segment translation :
A logical address (also known as virtual address) consists of a selector and an offset. A selector is the contents of
a segment register.
• Every segment selector has a linear base address associated with it, and it is stored in the segment descriptor. A
selector is used to point a descriptor for the segment in a table of descriptors. The linear base address from the
descriptor is then added to the offset to generate the linear address. This process is known as
segmentation or segment translation. If paging unit is not enabled then the linear address corresponds to the
physical address. But if paging unit is enabled, paging mechanism translates the linear address space into the
physical address space by paging translation.
Page translation
• Page translation is the second phase of address translation. It transforms a linear address generated by segment
translation into a physical address.
• When paging is enabled, the linear address is broken into a virtual page number and a page offset.
Fig. 8.5.4 shows the translation of the virtual page number to a physical page number. The physical page number
constitutes the upper portion of the physical address, while the page offset, which is not changed, constitutes the
lower portion. The number of bits in the page offset field decides the page size.
• The page table is used to keep the information about the main memory location of each page. This information
includes the main memory address where the page is stored and the current status of the page.
• To obtain the address of the corresponding entry in the page table the virtual page number is added with the
contents of page table base register, in which the starting address of the page table is stored. The entry in the page
table gives the physical page number, in which offset is added to get the physical address of the main memory.
Example 8.5.1 The logical address space in a computer system consists of 128 segments. Each segment can have
up to 32 pages of 4 K words each. Physical memory consists of 4 K blocks of 4 K words in each. Formulate the
logical and physical address formats.
Solution :
Number of bits for segment address = log2 128 = log227 = 7 bits
Number of bits for page address = log232 = log222 log2
Number of bits for word address = log2 4096 = 212 = 12 bits
Number of bits for block address = log24096 = log2212 = 12 bits
Example 8.5.2 An address space is specified by 32 bits and corresponding memory space by 24 bits.
i) How many words are there in the address space?
ii) How many words are there in the memory space?
iii) If a page consists of 4 K words, how many pages and blocks are there in the systems.
Solution :
i)Words in the address space 232 = 4 G words
ii) Words in the memory space = 224 = 16 M words
iii) Number of pages = Words in address space/Words per page = 4 G words/4 K words = 1 M pages
iv) Number of blocks = Words in memory space/Words per page/block = 16 M words/4 K words = 4 K words
Example 8.5.3 An address space is specified by 24 bits and the corresponding memory space by 16 bits. How
many words are there in the virtual memory and in the main memory ? AU May-13, Marks 2
Solution: Words in the address space, i.e., in the virtual memory
=224 = 16 M words
Words in the memory space, i.e., in the main memory
=216 = 64 K words
Translation Lookaside Buffer (TLB)
• To support demand paging and virtual memory processor has to access page table which is kept in the main
memory. To reduce the access time and degradation of performance, a small portion of the page table is
accommodated in the memory management unit. This portion is called Translation Lookaside Buffer (TLB).
• It is used to hold the page table entries that corresponds to the most recently accessed pages. When processor
finds the page table entries in the TLB it does not have to access page table and saves substantial access time.
• The Fig. 8.5.6 shows a organization of a TLB where the associative mapping technique is used.
• As shown in the Fig. 8.5.6, a given virtual address is compared with the TLB entries for the reference page by
MMU. If the page table entry for this page is found in the TLB, the physical address is obtained immediately.
• If entry is not found, there is a miss in the TLB, then the required entry is obtained from the page table in the
main memory and the TLB is updated.
• It is important that the contents of the TLB be coherent with the contents of page tables in the memory. When
page table entries are changed by an operating system, it must invalidate the corresponding entries in the TLB.
One of the control bits in the TLB is provided for this purpose. When an entry is invalidated, the TLB acquires
the new information as a part of the MMU's normal response to access misses.
Page Size
• The page size has a great effect on both storage utilization and the effective memory data transfer rate. It is
denoted by Sp.
• If Spis too large, excessive internal fragmentation results and if it is too small, the page tables become very large.
This tends to reduce space utilization (u). There should be balance between these two schemes.
• Let us denote the average segment size in words as S. If Ss>>Sp, the last page assigned to a segment should
contain about Sp /2 words. The size of the page table associated with each segment is approximately Ss/Sp words,
assuming each entry in the table is a word. Hence the memory space overhead associated with each segment is,
• We can derive the expression for optimum page size SpOPT by defining the value of Sp, that maximizes u or,
equivalently, that minimizes S. Differentiating S with respect to we obtain
• Substituting the value of SpOPT equation (8.5.2), we get the optimum space utilization as,
• The Fig. 8.5.7 shows the effect of page size and segment size on space utilization. It shows that space utilization
factor increases with increase in segment size and it is optimum when Sp = √2Ss.
4. It schedules a disk operation to read the desired page into the newly allocated frame.
5. When the disk read is complete, it modifies the internal table kept with the program and the page table to
indicate that the page is now in memory.
6. It restarts the instruction that was interrupted by the illegal address trap. The program can now access the page
as though it had always been in memory.
• The hardware to implement demand paging is the same as the hardware for paging and swapping :
Page table: This table has the ability to mark an entry invalid through a valid-invalid bit or special value of
protection bits.
Secondary memory: This memory holds those pages that are not present in main memory. The secondary
memory is usually a hard-disk. It is known as the swap device, and the section of disk used for this purpose is
known as swap space or backing store.
Example 8.5.4: Calculate the effective address time if average page-fault service time of 20 milliseconds and a
memory access time of 80 nanoseconds. Let us assume the probability of a page fault 10 %. Dec.-09, Marks
2
Solution: Effective access time is given as
=(1 - 0.1) x (80) + 0.1 (20 milliseconds)
=(1 -0.1) × 80+ 0.1 x 20,000,000 = 72 + 2,000,000 (nanoseconds)
=2,000,072 (nanoseconds)
Page Replacement Algorithms
• If the required page is not found in the main memory, the page fault occurs and the required page is loaded into
the main memory from the secondary memory. •However, if there is no vacant space in the main memory to copy
the required page, it is necessary to replace the required page with one of the existing page in the main memory
which is currently not in use.
• Thus we can say that the page replacement is a basic requirement of demand paging.
• There are many different page-replacement algorithms used by various operating systems. These are discussed
in the following sections.
FIFO Algorithm
• It is the simplest page-replacement algorithm. A FIFO (First In First Out) replacement algorithm replaces the
new page with the oldest page in the main memory.
• For our example reference string, our three frames are initially empty. The first three references (6, 0, 1) cause
page faults, and the required pages are brought into these empty frames.
The next reference (2) replaces page 6, because page 6 was brought in first. Since 0 is the next reference and 0 is
already in memory, we have no fault for this reference. The first reference to 3 results in page 0 being replaced,
since it was the first of the three pages in memory (0, 1 and 2) to be brought in. This process continues as shown
in Fig. 8.5.9.
• The FIFO page-replacement algorithm is easy to understand and program. However, its performance is not
always good. It may replace the most needed page as it is oldest page.
Optimal algorithm
• Advantage: An optimal page-replacement algorithm has the lowest page-fault rate of all algorithms. It has been
called OPT or MIN.
• It simply states that "Replace the page that will not be used for the longest period of time".
• For example, on our sample reference string, the optimal page replacement algorithm would yield nine page
faults, as shown in Fig. 8.5.10. The first three references cause faults that fill the three empty frames.
Reference string
• The reference to page 2 replaces page 6, because 6 will not be used until reference 18, whereas page 0 will be
used at 5, and page 1 at 14. The reference to page 3 replaces page 1, as page 1 will be the last of the three pages
in memory to be referenced again. With only nine page faults, optimal replacement is much better than a FIFO
algorithm, which had 15 faults. (If we ignore the first three, which all algorithms must suffer, then optimal
replacement is twice as good as FIFO replacement.) In fact, no replacement algorithm can process this reference
string in three frames with less than nine faults.
• Disadvantage: Unfortunately, the optimal page replacement algorithm is difficult to implement, because it
requires future knowledge of the reference string.
LRU algorithm
• The algorithm which replaces the page that has not been used for the longest period of time is referred to as the
Least Recently Used (LRU) algorithm.
• The result of applying LRU replacement to our example reference string is shown in Fig. 8.5.11.
• When the reference to page 4 occurs, LRU replacement sees that, of the three frames in memory, page 2 was
used least recently. The most recently used page is page 0, and just before that page 3 was used. Thus, the LRU
algorithm replaces page 2, not knowing that page 2.
• When it then faults for page 2, the LRU algorithm replaces page 3 since, of the three pages in memory {0, 3,
4}, page 3 is the least recently used.
• LRU replacement with 12 faults is still much better than FIFO replacement with 15.
Counting algorithms
• In the counting algorithm the a counter of the number of references that have been made to each pages are kept,
and based on these counts the following two schemes work.
• LFU algorithm: The Least Frequently Used (LFU) page replacement algorithm requires that the page with the
smallest count be replaced.
• The reason for this selection is that an actively used page should have a large reference count.
• This algorithm suffers from the situation in which a page is used heavily during the initial phase of a process,
but then is never used again.
• MFU algorithm: The Most Frequently Used (MFU) page replacement algorithm is based on the argument that
the page with the smallest count was probably just brought in and has yet to be used.
Example 8.5.5: Explain page replacement algorithms. Find out page fault for following string using LRU method
60 12 0 30 4 2 30 321 20 15
Consider page frame size 3.
Solution:
The figure given above depicts the image of a process. This processed image occupies a main memory’s
contiguous region. The OS would like to know multiple things, including the process control information location,
the code entry, and even the execution stack. There are various memory references within a program in various
instructions, and they are known as logical addresses.
After the program gets loaded into the main memory, the OS and the processor should be able to translate the
logical addresses into the physical addresses. The branch instructions consist of the address of the next instruction
that has to be executed. The data reference instructions consist of the addresses of the word or byte of the
referenced data.
2. Protection
Whenever we are dealing with multiple programs at the same time, there always exists a danger – one program
might write to the address space occupied by another program. Thus, every process has to be protected against
all unwanted interferences when all the other processes try to write into a process, whether it is accidental or
incidental. Now, between the requirement of relocation and protection, a trade-off occurs – the satisfaction of the
requirement of relocation makes it more difficult to satisfy the protection requirement.
The prediction of a program’s location in the main memory isn’t possible. Thus, it is impossible for us to check
the absolute address at the time of compilation in order to assure protection. A majority of the programming
languages allow dynamic calculation of the addresses at the run time. The requirement of memory protection has
to be satisfied by the processor instead of the OS, since the OS can hardly control any given process whenever it
occupies the processor. Hence, it is possible for us to check the validity of a memory reference.
3. Sharing
Some protection mechanisms allow various processes to access a similar section of main memory. Thus, it allows
all the processes to access the very same copy of the program instead of having their own separate copy, which
has an advantage.
For instance, numerous processes may utilise the very same system file. It is natural to load a single copy of the
given file in the main memory and to let it be shared by these processes. It is the memory management’s task to
allow the shared areas of memory with controlled access without compromising the protection. Various
mechanisms are utilised here to support the relocation-supported sharing abilities.
4. Logical Organization
The main memory is organized as a linear form, or it could be an address space that’s one-dimensional. It
comprises a sequence of words or bytes. A majority of programs could be organized into modules, and some of
them are unmodifiable (execute only, read-only), and some of them consist of modifiable data. To deal with any
user program effectively, the OS and the computer hardware have to support a basic module. This way, it provides
the required sharing and protection. It comes with the following pros:
   •
             •   Modules are independently written and compiled. All the references from a single module to
                 another one get resolved by a system at the run time.
             •   Different degrees of protection are provided to different modules.
             •   Various mechanisms are there by which the modules could be shared among various processes.
                 Sharing can be available on a module level, letting the user specify the desired sharing.
5. Physical Organization
A computer memory’s structure has two levels, and they are referred to as the main memory and the secondary
memory. Here, the main memory is relatively costly and much faster than the secondary memory, and the main
memory is volatile. Hence, the secondary memory is mainly used for data storage on a long-term basis, but the
main memory holds the currently used programs. The primary system concern between the secondary memory
and the main memory is the information flow. It is impractical for a programmer to understand this because of
the following two reasons:
             •   A programmer may engage in overlaying whenever the main memory is available for a program,
                 and also, its data might be insufficient. Different modules are allowed to be assigned to a similar
                 memory region. A primary disadvantage of this is that it is time-consuming for any programmer.
             •   In a given multiprogramming environment, a programmer won’t know the amount of space that
                 might be available during coding time and also the location of that space inside the memory.