0% found this document useful (0 votes)
16 views94 pages

CT 211 - Lecture - 05

Uploaded by

nelsonedimond028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views94 pages

CT 211 - Lecture - 05

Uploaded by

nelsonedimond028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

[CT 211]

COMPUTER ARCHITECTURE AND


ORGANIZATION I

LECTURE#05

Computer Memory System

12/13/2024 CT 211: Computer Architecture and Organization I 1


Layout of Lecture #05

 Introduction
 Principle of Locality
 Characteristics of Memory Systems
 Types of memory
 Memory Hierarchy
 Cache Memory
 Virtual Memory

12/13/2024 CT 211: Computer Architecture and Organization I 2


Introduction

CT 211: Computer Architecture and


12/13/2024 3
Organization I
Introduction (1)

Memory lies at the heart of the stored-


program computer.
A stored-program computer is a type of
computer where both program instructions
and data are stored in the same memory
unit {i.e. Von Neumann Model}.

12/13/2024 CT 211: Computer Architecture and Organization I 4


Introduction (2)

Memory affects speed, capacity, and


access times, making it an essential part
of computer system performance.

It is essential to the storage of data and


the operation of programs.

12/13/2024 CT 211: Computer Architecture and Organization I 5


Principle of Locality

CT 211: Computer Architecture and


12/13/2024 6
Organization I
Principle of Locality (1)

 “Principle of locality” or “locality of


reference” is one of the most important concepts
related to computer systems.

 The principle reflects the observation that during the


course of execution of a program, memory
references by the processor, for both instructions
and data, tend to cluster.

12/13/2024 CT 211: Computer Architecture and Organization I 7


Principle of Locality (2)

Programs contain a number of iterative loops


and subroutines.

Once a loop or subroutine is entered, there are


repeated references to a small set of
instructions.

Similarly, operations on tables and arrays


involve access to a clustered set of data words.

12/13/2024 CT 211: Computer Architecture and Organization I 8


Principle of Locality (3)
 There are three forms of locality:
1. Temporal locality – refers to the tendency of a
program to reference in the near future those units
of memory referenced in the recent past.
E.g. When an iteration loop is executed, the processor executes
the same set of instructions repeatedly.
2. Spatial locality – refers to the tendency of a
program to reference units of memory whose
addresses are near one another.
E.g. If a unit of memory “x” is referenced at a time “t”, it is likely that units in
the range “x – k” through “x + k” will be reference in the near future.

12/13/2024 CT 211: Computer Architecture and Organization I 9


Principle of Locality (4)

There are three forms of locality:


3. Sequential locality – Instructions tend to
be accessed sequentially.

12/13/2024 CT 211: Computer Architecture and Organization I 10


Characteristics of Memory Systems

CT 211: Computer Architecture and


12/13/2024 11
Organization I
Characteristics of Memory Systems (1)

The complex subject of computer memory is


made more manageable if we classify memory
systems according to their key characteristics.
The most important of these characteristics are:
Location, Capacity, Unit of transfer,
Access method, Performance, Physical
type, Physical characteristics, and
Organization.
12/13/2024 CT 211: Computer Architecture and Organization I 12
Characteristics of Memory Systems (2)
(1) Location (4) Performance
•Internal (e.g. registers, cache, main memory) •Access time
•External (e.g. optical disks, magnetic disks, tapes) •Cycle time
•Transfer rate
(2) Capacity (5) Physical Type
•Number of words •Semiconductor
•Magnetic
•Number of bytes •Optical
•Magneto-optical
(2) Unit Transfer (6) Physical
•Word Characteristics
•Block •Volatile/non-volatile
•Erasable/non-erasable
(3) Access Method (7) Organization
•Sequential •Memory modules
•Direct
•Random
•Associative
12/13/2024 CT 211: Computer Architecture and Organization I 13
Types of Memory

CT 211: Computer Architecture and


12/13/2024 14
Organization I
Types of Memory (1)
 Although there is a large number of memory
technologies that exist, there are only two basic types of
memory, namely:
1. Random Access Memory (RAM)
2. Read-Only Memory (ROM)
 The term “random” in random access memory means
that any memory location can be accessed in the same
amount of time, regardless of its position in the memory.
 RAM also called “read-write memory”, “main memory”, is
used to store programs and data that the computer
needs when executing programs.
 On the other hand, RAM is “volatile” and loses this
information once the power is turned off.
12/13/2024 CT 211: Computer Architecture and Organization I 15
Types of Memory (2)

There are two types of RAM, namely;


1. Dynamic RAM (DRAM)

2. Static RAM (SRAM)


DRAM is constructed from capacitors that slowly
leak their charge (i.e. electricity) over time.
Thus, they must be refreshed every few
milliseconds to prevent data loss.
DRAM is a “cheap” memory owing to its simple
design.

12/13/2024 CT 211: Computer Architecture and Organization I 16


Types of Memory (3)

 RAM chips that are based upon flip-flop (e.g. D flip-


flop) are referred to as static RAM (SRAM), because
the contents of each location persist as long as
power is applied to the chips.
 SRAM is faster and much more expensive than
DRAM; however, designers use DRAM because it is
much denser (can store many bits per chip), uses
less power, and generates less heat than SRAM.

12/13/2024 CT 211: Computer Architecture and Organization I 17


Types of Memory (4)
For these reasons, both technologies are often
used in combination: DRAM for main memory
and SRAM for cache.
The basic operation of all DRAM memories is
the same, but there are many flavors, including:
1. Multibank DRAM (MDRAM)
2. Fast-Page Mode (FPM) DRAM
3. Extended Data Out (EDO) DRAM
4. Burst EDO DRAM (BEDO DRAM)
5. Synchronous Dynamic Random Access Memory (SDRAM)
6. Synchronous-Link (SL) DRAM
7. Double Data Rate (DDR) SDRAM
8. Rambus DRAM (RDRAM)
9. Direct Rambus (DR) DRAM

12/13/2024 CT 211: Computer Architecture and Organization I 18


Types of Memory (5)
In addition to RAM, most computers contain a
small amount of ROM that stores critical
information necessary to operate the system,
such as the program necessary to boot the
computer.
ROM is “not volatile” and always retain its
data.
This type of memory is also used in embedded
systems or any systems where the programming
does not need to change (e.g. calculators,
automobiles, laser printers).

12/13/2024 CT 211: Computer Architecture and Organization I 19


Types of Memory (6)
 There are five basic types of ROM, namely:
1. ROM
2. PROM (Programmable Read-Only Memory)
3. EPROM (Erasable PROM)
4. EEPROM (Electrically Erasable PROM)
5. Flash Memory.
 PROM can be programmed by the user with the
appropriate equipment.
 Whereas ROMs are “hardwired”, PROMs have
fuses that can be blown to program the chip.
 Once programmed, the data and instructions in
PROM cannot be changed.

12/13/2024 CT 211: Computer Architecture and Organization I 20


Types of Memory (7)

EPROM is programmable with the added


advantage of being reprogrammable.
To reprogram an EPROM, the entire chip must
first be erased using a special tool that emits
ultraviolet light.
EEPROM requires no special tool for erasure
(i.e. this is performed by applying an electric
field, and you can erase only one portions of
chip, one byte at a time).

12/13/2024 CT 211: Computer Architecture and Organization I 21


Types of Memory (8)

 Flash memory is essentially an EEPROM with the


added benefit that data can be written or erased in
blocks, removing the one-byte-at-a-time limitation
(i.e. making flash memory faster than EEPROM).
 Flash memory has become a very popular type of
storage and is often used in many different devices,
including: cell phones, digital cameras, music
players, and solid-state disk drives.

12/13/2024 CT 211: Computer Architecture and Organization I 22


Memory Hierarchy

12/13/2024 CT 211: Computer Architecture and Organization I 23


Memory Hierarchy (1)

In the past few decades, CPU processing speed


as measured by the number of instructions
executed per second has doubled every 18
months, for the same price.
Computer memory has experienced a similar
increase along a different dimension (i.e.
quadrupling in size every 36 months) for the
same price.
Memory speed, however, has only increased at
a rate of less than 10% per year.

12/13/2024 CT 211: Computer Architecture and Organization I 24


Memory Hierarchy (2)

Therefore, while processing speed


increases at the same rate that memory size
increases, the gap between the speed of the
processor and the speed of memory also
increases.
As the gap between processor and memory
speeds grows, architectural solutions help
bridge the gap.
A typical computer contains several types of
memory, ranging from fast, expensive internal
registers, to slow, inexpensive removable disks.

12/13/2024 CT 211: Computer Architecture and Organization I 25


Memory Hierarchy (3)

The interplay between these different types of


memory is exploited so that a computer
behaves as if it has a single, large, fast
memory, when in fact it contains a
range of memory types that operate in
a highly coordinated fashion.
Generally speaking, faster memory is more
expensive than slower memory.
To provide the best performance at the lowest
cost, memory is organized in a hierarchical
fashion, referred to as a Memory Hierarchy.

12/13/2024 CT 211: Computer Architecture and Organization I 26


Memory Hierarchy (4)
 The memory hierarchy can be thought of as a
pyramid:

12/13/2024 CT 211: Computer Architecture and Organization I 27


Memory Hierarchy (5)

The memory is classified based on its “distance”


from the processor (i.e. distance is measured by
the number of machine cycles required for
access).
The closer memory is to the processor, the
faster it should be. As the memory gets farther
from the processor, we can afford longer access
times. Thus, slower technologies are used for
these memories, and faster technologies are
used for memories closer to the CPU.

12/13/2024 CT 211: Computer Architecture and Organization I 28


Memory Hierarchy (6)

We are most interested in the memory hierarchy


that involves registers, cache, main
memory, and virtual memory.
Registers are storage locations available on
the processor itself.
Virtual memory is typically implemented
using a hard drive; it extends the address space
from RAM to the hard drive.
Virtual memory provides more space:
Cache memory provides speed.

12/13/2024 CT 211: Computer Architecture and Organization I 29


Memory Hierarchy (7)

To access a particular piece of data, the CPU


first sends a request to its nearest memory,
usually cache.
If the data is not in cache, then main memory is
queried. If the data is not in main memory, then
the request goes to disk.
Once the data is located, then the data, and a
number of its nearby data elements are fetched
into cache memory.

12/13/2024 CT 211: Computer Architecture and Organization I 30


Memory Hierarchy (8)

 This leads us to some definitions.


– A hit is when data is found at a given memory level.
– A miss is when it is not found.
– The hit rate is the percentage of time data is found at a
given memory level.
– The miss rate is the percentage of time it is not.
Miss rate = 1 - hit rate.
– The hit time is the time required to access data at a
given memory level.
– The miss penalty is the time required to process a
miss, including the time that it takes to replace a block of
memory plus the time it takes to deliver the data to the
processor.

12/13/2024 CT 211: Computer Architecture and Organization I 31


Memory Hierarchy (9)

An entire blocks of data is copied after a hit


because the principle of locality tells us
that once a byte is accessed, it is likely that a
nearby data element will be needed soon.

12/13/2024 CT 211: Computer Architecture and Organization I 32


Cache Memory

CT 211: Computer Architecture and


12/13/2024 33
Organization I
Cache Memory (1)

A cache memory is a small, but fast memory that


the processor uses for information it is likely to
need again in the very near future.
Therefore, the purpose of cache memory is to
speed up accesses by storing recently used data
closer to the CPU, instead of storing it in main
memory.
Although cache is much smaller than main
memory, its access time is a fraction of that of
main memory.

12/13/2024 CT 211: Computer Architecture and Organization I 34


Cache Memory (2)

A cache memory is faster than main memory


for a number of reasons:
1) Faster electronics can be used, which also
results in a greater expense in terms of money,
size, and power requirements (Since the cache
is small, this increase in cost is relatively small).
2) A cache memory has fewer locations than a
main memory, and as a result it has a shallow
decoding tree, which reduces access time.
3) The cache is placed both physically closer and
logically closer to the CPU than the main
memory and this placement avoids
communication delays over a shared bus.

12/13/2024 CT 211: Computer Architecture and Organization I 35


Cache Memory (3)
 The following are non-computer examples of
caching:
1. Grocery shopping
– You seldom, if ever, go to the grocery store to buy one
single item. You buy any items you require immediately
in addition to items you will most likely use in the future.
The grocery store is similar to main memory, and your
home is the cache
2. Student doing research
– Suppose you are writing a paper on quantum computing.
Would you go to the library, check out one book, return
home, get the necessary information from that book, go
back to the library, check out another book, return home,
and so on? No; you would go to the library and check out
all the books you might need and bring them all home.
The library is analogous to main memory, and your home
is, again, similar to cache.
12/13/2024 CT 211: Computer Architecture and Organization I 36
Cache Memory (4)

 Cache memory works on the same principles as the


preceding examples (1 & 2), by copying frequently
used data into the cache than requiring an access to
main memory to retrieve data.
 However, cache differs from our real-life examples
in one important way: The computer really has no
way to know , a priori, what data is most likely to be
accessed, so it uses the locality principle and
transfers an entire block from main memory into
cache whenever it has to make a main memory
access.

12/13/2024 CT 211: Computer Architecture and Organization I 37


Cache Memory (5)

The cache location for this new block depends


on two things:
1. The Cache Mapping Policy

2. The Cache Size

 Unlike main memory, which is accessed


by address, cache is typically accessed
by content; hence, it is often called
content addressable memory
(CAM).
12/13/2024 CT 211: Computer Architecture and Organization I 38
Cache Memory (6)
Definition:
– A content-addressable memory (CAM) is type
of computer memory that enables high-speed data
retrieval by allowing searches based on the content,
rather than a specific memory addresses.
– CAM is also known as associative memory.
 Under most cache mapping schemes, the
cache entries must be checked or searched to see if
the value being requested is stored in cache.
 To simplify this process of locating the desired data,
various cache mapping algorithms are used.

12/13/2024 CT 211: Computer Architecture and Organization I 39


Cache Mapping Schemes (1)

When the CPU wants to access data or


instructions, it first generates a main memory
address.
However, if the data has been copied to cache,
the address of data in cache is NOT THE SAME
as the main memory address.
 For example: data located at main memory address
0x2E3 could be located in the first location in cache.
 This raises an important question: How, then, does
the CPU locate data when it has been copied into
cache?

12/13/2024 CT 211: Computer Architecture and Organization I 40


Cache Mapping Schemes (2)

 The answer to this is that: The CPU uses a specific


mapping scheme that “converts” the main
memory address into a cache location.
 This address conversation is done by giving special
significance to the bits in the main memory address.
 First, the bits are divided into distinct groups called
fields.
 Depending on the mapping scheme, we may have
two or three fields.
 How we use these fields depends on the particular
mapping scheme being used.
12/13/2024 CT 211: Computer Architecture and Organization I 41
Cache Mapping Schemes (3)

 The mapping scheme determines where the data is


placed when it is originally copied into cache and
also provides a method for the CPU to find
previously copied data when searching cache.
 Before we discuss the cache mapping schemes
note the following:
1. Some computers are byte addressable, while others, are
word addressable.
2. If a machine is byte addressable, we are concerned with
number of bytes; if it is word addressable, we are concerned
with number words regardless of the number of words
3. Key in determining how the mapping schemes work is knowing
how many addresses are contained in main memory, in
cache, and in each block.

12/13/2024 CT 211: Computer Architecture and Organization I 42


Cache Mapping Schemes (4)

The cache mapping schemes include the


following:

1. Direct mapping

2. Fully associative mapping

3. Set associative mapping

12/13/2024 CT 211: Computer Architecture and Organization I 43


Direct Mapped Cache (1)

 Direct mapped cache assigns cache mapping using


a “modular” approach.
 Main memory and cache are both divided into the
same size blocks (the size of these blocks varies).
 Since number of main memory blocks is
greater than the number of cache blocks, then,
it is clear that main memory blocks compete for
cache locations.
 Direct mapping maps block X of main memory to
block Y of cache = X mod N, where N is the total
number of blocks in cache.
12/13/2024 CT 211: Computer Architecture and Organization I 44
Direct Mapped Cache (2)

For example: If cache contains 4 blocks (i.e.


N=4), then main memory block 0 maps to cache
blocks 0, main memory block 1 maps to cache
block 1, main memory block 2 maps to cache
block 2, main memory block 3 maps to cache
block 3, main memory block 4 maps to cache
block 0, and so on,…as illustrated next.

12/13/2024 CT 211: Computer Architecture and Organization I 45


Direct Mapped Cache (3)

12/13/2024 CT 211: Computer Architecture and Organization I 46


Direct Mapped Cache (4)

 To perform direct mapping, the binary main memory


address is partitioned into the fields {i.e. Tag, Block,
Offset}.
 The format of a main memory address using direct
mapping is shown below:

 The size of each field depends on the physical


characteristics of main memory and cache.

12/13/2024 CT 211: Computer Architecture and Organization I 47


Direct Mapped Cache (5)
 The offset field uniquely identifies an address within a
specific block (i.e. points to the desired data in the
block).
– The number of bytes (for byte addressable machine) or words
(for word addressable machine) in each block dictates the
number of bits in the offset field.
 The block field selects a unique b
lock of cache.
– The number of blocks in cache dictates the number of bits in
the block field.
 Many blocks of main memory map to a single block of
cache. A tag field in the cache block distinguishes one
cached memory block from another.
 Note: The total number of bits in all three fields must be equal
to the number of bits in a main memory address.

12/13/2024 CT 211: Computer Architecture and Organization I 48


Direct Mapped Cache (6)
 In summary: To determine how the direct mapping
works, you must know the following:
1. How many bits are in the main memory address (which is
determined by how many addresses exist in main memory).
2. How many blocks are in cache (which determines the size of
the block field).
3. How many addresses are in a block (which determines the size
of the offset field).
 Once you know these values, you can use the direct
mapping address format to locate the block in cache to
which a main memory block maps. Once you find the
cache block, you can determine if the block currently in
cache is the one you want by checking the tag. If the
tags match (the tag from the memory address field and
the tag in cache), use the offset field to find the desired
data in the cache block.
12/13/2024 CT 211: Computer Architecture and Organization I 49
Direct Mapped Cache (7)
 EXAMPLE-1: Consider a byte-addressable main
memory consisting of 4 blocks, and a cache with 2
blocks, where each block is 4 bytes.
– This means Block 0 and 2 of main memory map
to Block 0 of cache, and Blocks 1 and 3 of main
memory map to Block 1 of cache. (This is easy
for us to determine simply by using modular
arithmetic, because 0 mod 2 = 0, 1 mod 2 = 1, 2
mod 2 = 0, and 3 mod 2 = 1..) However, the
computer must use the fields in the
memory address itself.
– Using the tag, block, and offset fields, we
can see how main memory maps to cache.

12/13/2024 CT 211: Computer Architecture and Organization I 50


Direct Mapped Cache (8)
 EXAMPLE 1 Cont'd
– First, we need to determine the address format for mapping. Each
block is 4 bytes, so the offset field must contain 2 bits; there are 2
blocks in cache, so the block field must contain 1 bit; this leaves 1
bit for the tag (as a main memory address has 4 bits because there
are a total of 24=16 bytes).

12/13/2024 CT 211: Computer Architecture and Organization I 51


Direct Mapped Cache (9)
 EXAMPLE 1 Cont'd
– Suppose we need to access a
main memory address 316
(0011 in binary). If we partition
0011 using the address format b
from Figure a, we get Figure
b.
– Thus, the main memory
address 0011 maps to cache
block 0 (because block field
evaluates to 0).
– Figure c shows this mapping,
along with the tag that is also
stored with the data. c

The next slide illustrates another mapping.

CT 211: Computer Architecture and


12/13/2024 52
Organization I
Direct Mapped Cache (10)

12/13/2024 CT 211: Computer Architecture and Organization I 53


Direct Mapped Cache (11)
• EXAMPLE 2: Assume a byte-addressable memory
consists of 214 bytes, cache has 16 blocks, and each
block has 8 bytes.
– The number of memory blocks are:
– Each main memory address requires14 bits. Of this 14-bit
address field, the rightmost 3 bits reflect the offset field (we
need 3 bits to uniquely identify one of 8 bytes in a block)
– We need 4 bits to select a specific block in cache, so the block
field consists of the middle 4 bits.
– The remaining 7 bits make up the tag field.

12/13/2024 CT 211: Computer Architecture and Organization I 54


Fully Associative Cache (1)

Suppose instead of placing memory blocks in


specific cache locations based on memory
address, we could allow a block to go anywhere
in cache.
In this way, cache would have to fill up before
any blocks are evicted.
This is how fully associative cache works.
A memory address is partitioned into only two
fields: the tag and the offset.

12/13/2024 CT 211: Computer Architecture and Organization I 55


Fully Associative Cache (2)
Suppose, as before (Example 2), we have 14-bit
memory addresses and a cache with 16 blocks,
and each block with 8 bytes. The field format of
a memory reference is:

When the cache is searched, all tags are


searched in parallel to retrieve the data quickly.
This requires special, costly hardware.
12/13/2024 CT 211: Computer Architecture and Organization I 56
Fully Associative Cache (3)

 You will recall that direct mapped cache evicts a block


whenever another memory reference needs that block.
 With fully associative cache, we have no such mapping,
thus we must devise an algorithm to determine which
block to evict from the cache.
 The block that is evicted is the victim block.
 There are a number of ways to pick a victim, we will
discuss them under shortly under Replacement Policy.

12/13/2024 CT 211: Computer Architecture and Organization I 57


Fully Associative Cache (4)

In summary:
– Fully associative cache allows a main memory block
to be mapped to any block in cache. Once a main
memory block resides in cache, to locate a specific
byte, the computer compares the tag field of the main
memory address to all tags stored in cache (in one
comparison). Once the specific cache block is
located, the offset field is used to locate the required
data within that block. If the tag of the memory
address doesn’t match any tags in cache, the
memory block containing the desired data must be
transferred into cache. This transfer may require
picking a victim block to transfer out of cache.

12/13/2024 CT 211: Computer Architecture and Organization I 58


Set Associative Cache (1)

Because of its speed and complexity,


associative cache is very expensive.
On the other hand, although direct mapping is
inexpensive, it is very restrictive.
Set associative cache combines the ideas
of direct mapped cache and fully associative
cache.
An N-way set associative cache mapping is like
direct mapped cache in that a memory reference
maps to a particular location in cache.

12/13/2024 CT 211: Computer Architecture and Organization I 59


Set Associative Cache (2)

Unlike direct mapped cache, a memory


reference maps to a set of several cache blocks,
similar to the way in which fully associative
cache works.

Instead of mapping anywhere in the entire


cache, a memory reference can map only to the
subset of cache slots.

12/13/2024 CT 211: Computer Architecture and Organization I 60


Set Associative Cache (3)
 The number of cache blocks per set in set associative
cache varies according to overall system design.
– For example, a 2-way set associative cache
can be conceptualized as shown in the
schematic below.
– Each set contains two different memory
blocks.

Logical view Linear view

12/13/2024 CT 211: Computer Architecture and Organization I 61


Set Associative Cache (4)

In set associative cache mapping, a memory


reference is divided into three fields: tag, set,
and offset.

As with direct-mapped cache, the offset field


chooses the word within the cache block, and
the tag field uniquely identifies the memory
address.
The set field determines the “set” to which the
memory block maps.
12/13/2024 CT 211: Computer Architecture and Organization I 62
Cache Replacement Policy (1)

With fully associative and set associative cache,


a replacement policy is invoked when it
becomes necessary to evict a block from cache.
An optimal replacement policy would be able
to look into the future to see which blocks won’t
be needed for the longest period of time.
Although it is impossible to implement an optimal
replacement algorithm, it is instructive to use it
as a benchmark for assessing the efficiency of
any other scheme we come up with.

12/13/2024 CT 211: Computer Architecture and Organization I 63


Cache Replacement Policy (2)

 The replacement policy that we choose depends


upon the locality that we are trying to optimize--
usually, we are interested in temporal locality.
 There are several algorithms (i.e. replacement
policies), including:
1. Least Recently Used (LRU)
2. Fist-In First-Out (FIFO)
3. Random

12/13/2024 CT 211: Computer Architecture and Organization I 64


Cache Replacement Policy (3)

A least recently used (LRU) algorithm


keeps track of the last time that a block was
assessed and evicts the block that has been
unused for the longest period of time.
The disadvantage of this approach is its
complexity: LRU has to maintain an access
history for each block, which ultimately slows
down the cache.

12/13/2024 CT 211: Computer Architecture and Organization I 65


Cache Replacement Policy (4)

 First-in, first-out (FIFO) is a popular cache


replacement policy.
In FIFO, the block that has been in the cache the
longest, regardless of when it was last used.
A random replacement policy does what its
name implies: It picks a block at random and
replaces it with a new block.
Random replacement can certainly evict a block
that will be needed often or needed soon, but it
never thrashes (i.e. it does not constantly throw out a block,
then bring it back, then throw it out, then bring it back, repeatedly).

12/13/2024 CT 211: Computer Architecture and Organization I 66


Effective Access Time (EAT) and Hit Ratio (1)

 The performance of hierarchical memory is


measured by its Effective Access Time (EAT).
 EAT is a weighted average that takes into account
the hit ratio and relative access times of successive
levels of memory.
 The EAT for a two-level memory is given by:
EAT = H  AccessC + (1-H)  AccessMM
where H is the cache hit rate and AccessC and AccessMM
are the access times for cache and main memory,
respectively.

12/13/2024 CT 211: Computer Architecture and Organization I 67


Effective Access Time (EAT) and Hit Ratio (2)

For example, consider a system with a main


memory access time of 200ns supported by a
cache having a 10ns access time and a hit rate
of 99%.
Suppose access to cache and main memory
occurs concurrently/parallel (i.e. The accesses
overlap).
The EAT is:
0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.

12/13/2024 CT 211: Computer Architecture and Organization I 68


Effective Access Time (EAT) and Hit Ratio (3)

 For example, consider a system with a main memory


access time of 200ns supported by a cache having a
10ns access time and a hit rate of 99%.
If the accesses do not overlap (sequential) the
EAT is:

0.99(10ns) + 0.01(10ns + 200ns)


= 9.9ns + 2.01ns = 12ns.

 The equation for determining the effective access


time can be extended to any number of memory
levels.

12/13/2024 CT 211: Computer Architecture and Organization I 69


When Does Caching Break Down? (1)

 Caching is depends upon programs exhibiting good


locality.
– Some object-oriented programs have poor locality
owing to their complex, dynamic structures.
– Arrays stored in column-major rather than row-
major order can be problematic for certain cache
organizations.
 With poor locality, caching can actually cause
performance degradation rather than performance
improvement.

12/13/2024 CT 211: Computer Architecture and Organization I 70


Cache Writing Policies (1)

Cache replacement policies must take into


account dirty blocks, those blocks that have
been updated while they were in the cache.
Dirty blocks must be written back to memory. A
write policy determines how this will be done.
There are two types of write policies: write
through and write back.
Write through updates cache and main memory
simultaneously on every write.

12/13/2024 CT 211: Computer Architecture and Organization I 71


Cache Writing Policies (2)

 Write back (also called copyback) updates memory


only when the block is selected for replacement.
 The disadvantage of write through is that memory
must be updated with each cache write, which slows
down the access time on updates. This slowdown is
usually negligible, because the majority of accesses
tend to be reads, not writes.
 The advantage of write back is that memory traffic is
minimized, but its disadvantage is that memory does
not always agree with the value in cache, causing
problems in systems with many concurrent users.

12/13/2024 CT 211: Computer Architecture and Organization I 72


Instruction and Data Cache (1)

 The cache we have been discussing is called a


unified or integrated cache where both
instructions and data are cached.
 Many modern systems employ separate caches for
data and instructions.
– This is called a Harvard cache.
The separation of data from instructions
provides better locality, at the cost of greater
complexity.
– Simply making the cache larger provides about the same
performance improvement without the complexity.

12/13/2024 CT 211: Computer Architecture and Organization I 73


Instruction and Data Cache (2)

• Cache performance can also be improved by


adding a small associative cache to hold
blocks that have been evicted recently.
– This is called a victim cache.
• A trace cache is a variant of an instruction
cache that holds decoded instructions for
program branches, giving the illusion that
noncontiguous instructions are really
contiguous.

12/13/2024 CT 211: Computer Architecture and Organization I 74


Levels of Cache (1)

 Most of today’s small systems employ multilevel


cache hierarchies.
 The levels of cache form their own small memory
hierarchy.
 Level 1 (L1) cache (8KB to 64KB) is situated on
the processor itself.
– Access time is typically about 4ns.
 Level 2 (L2) cache (64KB to 2MB) may be on
the motherboard, or on an expansion card.
– Access time is usually around 15 - 20ns.

12/13/2024 CT 211: Computer Architecture and Organization I 75


Levels of Cache (2)

In systems that employ three levels of cache,


the Level 2 cache is placed on the same die
as the CPU (reducing access time to about
10ns).
Accordingly, the Level 3 (L3) cache (2MB to
256MB) refers to cache that is situated between
the processor and main memory.
Once the number of cache levels is determined,
the next thing to consider is whether data (or
instructions) can exist in more than one cache
level.

12/13/2024 CT 211: Computer Architecture and Organization I 76


Levels of Cache (3)

 If the cache system used an inclusive cache,


the same data may be present at multiple levels of
cache.
 Strictly inclusive caches guarantee that all
data in a smaller cache also exists at the next
higher level.
 Exclusive caches permit only one copy of the
data.
 The tradeoffs in choosing one over the other
involve weighing the variables of access time,
memory size, and circuit complexity.
12/13/2024 CT 211: Computer Architecture and Organization I 77
Virtual Memory

CT 211: Computer Architecture and


12/13/2024 78
Organization I
Virtual Memory (1)

 Cache memory enhances performance by providing


faster memory access speed.
 Virtual memory enhances performance by providing
greater memory capacity, without the expense of
adding main memory.
 Instead, a portion of a disk drive serves as an
extension of main memory.
 If a system uses paging, virtual memory partitions
main memory into individually managed page
frames, that are written (or paged) to disk when
they are not immediately needed.

12/13/2024 CT 211: Computer Architecture and Organization I 79


Virtual Memory (2)

A physical address is the actual memory


address of physical memory.
Programs create virtual addresses that are
mapped to physical addresses by the memory
manager.
 Page faults occur when a logical address
requires that a page be brought in from disk.
 Memory fragmentation occurs when the
paging process results in the creation of small,
unusable clusters of memory addresses.
12/13/2024 CT 211: Computer Architecture and Organization I 80
Virtual Memory (3)

 Main memory and virtual memory are divided into


equal sized pages.
 The entire address space required by a process
need not be in memory at once. Some parts can be
on disk, while others are in main memory.
 Further, the pages allocated to a process do not
need to be stored contiguously-- either on disk or in
memory.
 In this way, only the needed pages are in memory at
any time, the unnecessary pages are in slower disk
storage.

12/13/2024 CT 211: Computer Architecture and Organization I 81


Virtual Memory (4)

 Information concerning the location of each page,


whether on disk or in memory, is maintained in a data
structure called a page table (shown below).
 There is one page table for each active process.

12/13/2024 CT 211: Computer Architecture and Organization I 82


Virtual Memory (5)

 When a process generates a virtual address, the


operating system translates it into a physical
memory address.
 To accomplish this, the virtual address is divided
into two fields: A page field, and an offset field.
 The page field determines the page location of the
address, and the offset indicates the location of the
address within the page.
 The logical page number is translated into a physical
page frame through a lookup in the page table.

12/13/2024 CT 211: Computer Architecture and Organization I 83


Virtual Memory (6)

 If the valid bit is zero in the page table entry for the
logical address, this means that the page is not in
memory and must be fetched from disk.
– This is a page fault.
– If necessary, a page is evicted from memory and is replaced
by the page retrieved from disk, and the valid bit is set to 1.
 If the valid bit is 1, the virtual page number is
replaced by the physical frame number.
 The data is then accessed by adding the offset to
the physical frame number.

12/13/2024 CT 211: Computer Architecture and Organization I 84


Virtual Memory (7)

 As an example, suppose a system has a virtual address


space of 8K and a physical address space of 4K, and the
system uses byte addressing.
– We have 213/210 = 23 virtual pages.
 A virtual address has 13 bits (8K = 213) with 3 bits for the page
field and 10 for the offset, because the page size is 1024.
 A physical memory address requires 12 bits, the first two bits
for the page frame and the trailing 10 bits the offset.

12/13/2024 CT 211: Computer Architecture and Organization I 85


Virtual Memory (8)

 Suppose we have the page table shown below.


 What happens when CPU generates address
545910 = 10101010100112 = 1553 16?

12/13/2024 CT 211: Computer Architecture and Organization I 86


Virtual Memory (9)

 We said earlier that effective access time (EAT)


takes all levels of memory into consideration.
 Thus, virtual memory is also a factor in the
calculation, and we also have to consider page table
access time.
 Suppose a main memory access takes 200ns, the
page fault rate is 1%, and it takes 10ms to load a
page from disk. We have:
EAT = 0.99(200ns + 200ns) + 0.01(10ms) = 100.396ns.

12/13/2024 CT 211: Computer Architecture and Organization I 87


Virtual Memory (10)

Even if we had no page faults, the EAT would be


400ns because memory is always read twice:
First to access the page table, and second to
load the page from memory.
Because page tables are read constantly, it
makes sense to keep them in a special cache
called a translation look-aside buffer (TLB).
TLBs are a special associative cache that stores
the mapping of virtual pages to physical pages.

12/13/2024 CT 211: Computer Architecture and Organization I 88


Virtual Memory (11)

 Another approach to virtual memory is the use of


segmentation.
 Instead of dividing memory into equal-sized pages,
virtual address space is divided into variable-length
segments, often under the control of the
programmer.
 A segment is located through its entry in a segment
table, which contains the segment’s memory
location and a bounds limit that indicates its size.
 After a page fault, the operating system searches for
a location in memory large enough to hold the
segment that is retrieved from disk.

12/13/2024 CT 211: Computer Architecture and Organization I 89


Virtual Memory (12)

 Both paging and segmentation can cause


fragmentation.
 Paging is subject to internal fragmentation
because a process may not need the entire range of
addresses contained within the page. Thus, there
may be many pages containing unused fragments of
memory.
 Segmentation is subject to external
fragmentation, which occurs when contiguous
chunks of memory become broken up as segments
are allocated and de-allocated over time.
12/13/2024 CT 211: Computer Architecture and Organization I 90
Virtual Memory (13)

 Large page tables are cumbersome and slow, but


with its uniform memory mapping, page operations
are fast. Segmentation allows fast access to the
segment table, but segment loading is labor-
intensive.
 Paging and segmentation can be combined to take
advantage of the best features of both by assigning
fixed-size pages within variable-sized segments.
 Each segment has a page table. This means that a
memory address will have three fields, one for the
segment, another for the page, and a third for the
offset.

12/13/2024 CT 211: Computer Architecture and Organization I 91


A Real-World Example (1)

 The Pentium architecture supports both paging and


segmentation, and they can be used in various
combinations including unsegmented, unpaged memory;
unsegmented, paged memory; segmented, unpaged
memory; and segmented, paged memory.
 The processor supports two levels of cache (L1 and L2),
both having a block size of 32 bytes.
 The L1 cache is next to the processor, and the L2 cache
sits between the processor and memory.
 The L1 cache is in two parts: and instruction cache (I-
cache) and a data cache (D-cache).
The next slide shows this organization schematically.
12/13/2024 CT 211: Computer Architecture and Organization I 92
A Real-World Example (2)

12/13/2024 CT 211: Computer Architecture and Organization I 93


Reading Assignment

Read Chapter 6: Memory, from a book


“The Essentials of Computer Organization
and Architecture” by Linda Null and Julia
Lobur.

12/13/2024 CT 211: Computer Architecture and Organization I 94

You might also like