0% found this document useful (0 votes)
3 views36 pages

Caching

The document discusses cache memory organization, including byte and word addressable systems, various types of caches (hardware and software), and their design issues such as cache hits and misses. It explains different mapping techniques for cache memory, including direct-mapped, associative, and set-associative caches, along with their advantages and disadvantages. Additionally, it covers the structure of cache memory, addressing methods, and the importance of cache line tag size.

Uploaded by

pratyush3703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views36 pages

Caching

The document discusses cache memory organization, including byte and word addressable systems, various types of caches (hardware and software), and their design issues such as cache hits and misses. It explains different mapping techniques for cache memory, including direct-mapped, associative, and set-associative caches, along with their advantages and disadvantages. Additionally, it covers the structure of cache memory, addressing methods, and the importance of cache line tag size.

Uploaded by

pratyush3703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Byte addressable system :

f block offset = 100 (binary = 4 decimal), that means “the 5th byte” inside the block.
But the CPU typically requests a whole word (4 bytes), not a single byte.
So the cache controller delivers the entire word that contains that byte (bytes 4–7).

Word Addressable System:


If block offset = 100 (binary = 4 decimal), it means “the 5th word” inside the block (i.e., bytes 16–19 if words are 4 bytes).
Cache returns that word directly.

Caches

Dr. Sonali Chouhan


Embedded Systems (EE312)

© Sonali Chouhan
Many types of caches
• Examples
– H/W: L1, L2 CPU caches, translation
lookaside buffers (TLBs), ...
– S/W: virtual memory, FS (Filesystem)
buffers, web browser caches, …

© Sonali Chouhan
Caches and CPUs

address data
cache

controller
cache main
CPU memory
address
data data

© Sonali Chouhan
Caches
• Many common design issues
– each cached item has a “tag” (an ID) plus contents
– need a mechanism to efficiently determine whether
given item is cached
• combinations of indices and constraints on valid
locations
– on a miss, usually need to pick something to replace
with new item
• called a “replacement policy”
– on writes, need to either propagate change or mark
item as “dirty”
• Write-through (L1+main) vs. write-back (L1)

• Different solutions for different caches


© Sonali Chouhan
Terms
• Cache hit: required location is in
cache.
• Cache miss: required location is not
in cache.
• Working set: set of
locations/addresses or memory used
by program in a time interval.

© Sonali Chouhan
Inserting an L1 Cache Between
the CPU and Main Memory
The tiny, very fast CPU register file has
The transfer unit between the room for four 4-byte word
CPU register file and the cache
is a 4-byte word
line 0 The small fast L1 cache has room for
line 1 two 4-word blocks
The transfer unit
between the cache and
main memory is a 4-word block 10 abcd
block (16 bytes) ...
block 21 pqrs The big slow main memory has
... room for many 4-word blocks
block 30 wxyz
...
© Sonali Chouhan
General Organization of a Cache
Cache is an array 1 valid bit t tag bits B = 2b bytes
of sets per line per line per cache block
Each set contains
valid tag 0 1 • • • B–1
one or more lines E lines
set 0: •••
per set
Each line holds a valid tag 0 1 • • • B–1
block of data
valid tag 0 1 • • • B–1
set 1: •••
S = 2s sets
valid tag 0 1 • • • B–1

•••
valid tag 0 1 • • • B–1
set S-1: •••
valid tag 0 1 • • • B–1

Cache size: C = B x E x S data bytes


© Sonali Chouhan
Addressing Caches
Address A:
t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
set 0: •••
v tag 0 1 • • • B–1 <tag> <set index> <block offset>
v tag 0 1 • • • B–1
set 1: •••
v tag 0 1 • • • B–1

••• The word at address A is in the cache


v tag 0 1 • • • B–1
If the tag bits in one of the <valid>
set S-1: •••
v tag 0 1 • • • B–1 lines in set <set index> match <tag>

The word contents begin at offset


<block offset> bytes from the
beginning of the block
© Sonali Chouhan
The addresses are generated by CP. Thus they are not the separate identities. They are mere generated addresses by the CPU.

Addressing Caches
Address A:
t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
set 0: •••
v tag 0 1 • • • B–1 <tag> <set index> <block offset>
v tag 0 1 • • • B–1
set 1: •••
v tag 0 1 • • • B–1

••• 1. Locate the set based on


v tag 0 1 • • • B–1 <set index>
set S-1: ••• 2. Locate the line in the set based on
v tag 0 1 • • • B–1
<tag>
3. Check that the line is valid
4. Locate the data in the line based on
<block offset>
© Sonali Chouhan
Example: Direct-Mapped
Cache
Simplest kind of cache, easy to build
(only 1 tag compare required per access)
Characterized by exactly one line per set.

set 0: valid tag cache block E=1 lines per set

set 1: valid tag cache block


•••

set S-1: valid tag cache block

Cache size: C = B x S data bytes


© Sonali Chouhan
Accessing Direct-Mapped
Caches
Set selection
– Use the set index bits to determine the set of
interest.
set 0: valid tag cache block
selected set
set 1: valid tag cache block
•••

set S-1: valid tag cache block

t bits s bits b bits


00001
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Direct-Mapped
Caches
Line matching and word selection
– Line matching: Find a valid line in the selected
set with a matching tag
– Word selection: Then extract the word
=1? (1) The valid bit must be set
0 1 2 3 4 5 6 7

selected set (i): 1 0110 b0 b1 b2 b3

(2) The tag bits in the cache


=? If (1) and (2), then cache hit
line must match the tag bits in
the address
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Direct-Mapped
Caches
Line matching and word selection
– Line matching: Find a valid line in the selected
set with a matching tag
– Word selection: Then extract the word

0 1 2 3 4 5 6 7

selected set (i): 1 0110 b0 b1 b2 b3

(3) If cache hit,


block offset selects starting byte.

t bits s bits b bits


0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Cache Memory : Placement
Policy
• There are three commonly used
methods to translate main memory
addresses to cache memory
addresses.
• Associative Mapped Cache
• Direct-Mapped Cache
• Set-Associative Mapped Cache
• The choice of cache mapping scheme
affects cost and performance, and
there is no single best method that is
appropriate for all situations
© Sonali Chouhan
Associative Mapped Cache
• A block in the Main Memory can be mapped
to any block in the Cache Memory available
(not already occupied)
• Advantage: Flexibility. A Main Memory block
can be mapped anywhere in Cache Memory.
• Disadvantage: Slow or expensive. A search
through all the Cache Memory blocks is
needed to check whether the address can be
matched to any of the tags.

© Sonali Chouhan
Main Memory - Cache
Structure
Main Memory
4-line Cache 0000
0001
00 Block
0010 (k words)
01 0011
10 0100
11 0101
Tag Block Length 0110
(k words) 0111
1000
1001
1010
1011
1100
1101
1110
1111 © Sonali Chouhan
Associative Mapping -
Example
Main Memory
Cache 0000.. Block
0001..
0010..
0011..

. 0100..
0101..

.
Tag Block Length 0110..
(k words) 0111..
1000..

. 1001..
1010..
1011..
1100..
1101..
1110..
1111.. © Sonali Chouhan
Direct Mapped Cache
• To avoid the search through all CM
blocks needed by associative
mapping, this method only allows
# blocks in main memory__
# blocks in cache memory
blocks to be mapped to each Cache
Memory block.

© Sonali Chouhan
Direct Cache Mapping-
Example
Main Memory

4-line Cache 0000.. Block


Set 0001..
00 0010..
01 0011..
10 0100..
11 0101..
Tag Block 0110..
0111..
1000..
1001..
16/4=4 blocks of 1010..
1011..
MM to be mapped to 1100..
1 CM block 1101..
1110..
1111.. © Sonali Chouhan
Direct-Mapped Cache
Advantages
• The tag memory is much smaller
than in associative mapped cache.
• No need for an associative search,
since the set field is used to direct
the comparison to a single field.

© Sonali Chouhan
Direct-Mapped Cache
Disadvantage
• It lacks mapping flexibility. For
example, if two MM blocks mapped
to same CM block are needed
repeatedly (e.g., in a loop), they will
keep replacing each other, even
though all other CM blocks may be
available.

© Sonali Chouhan
Direct Cache Mapping-
Disadvantage (Example)
Main Memory
0000.. a(i)
Set 4-line Cache
0001..
00 0010..
01 0011..
10 0100.. b(i)
11 0101..
Tag Block 0110..
0111..
1000..
1001..
In a(i) + b(i) both keep 1010..
1011..
replacing each other 1100..
1101..
1110..
1111.. © Sonali Chouhan
Set-Associative Mapping
• This is a trade-off between associative and
direct mappings
• The cache is broken into sets where each
set contains "N" cache lines, let's say 4.
Then, each memory address is assigned a
set, and can be cached in any one of those
4 locations within the set that it is
assigned to. In other words, within each
set the cache is associative, and thus the
name.

© Sonali Chouhan
Set Associative Mapping-
Example
Main Memory

8 line Cache 0000.. Block


Set 0001..
00 0010..
01 0011..
10 0100..
11 0101..
Tag Block 0110..
0111..
1000..
1001..
1010..
1011..
1100..
1101..
1110..
1111.. © Sonali Chouhan
Example: Set Associative
Cache
Characterized by more than one line
per set
valid tag cache block E=2
set 0:
valid tag cache block lines per set

valid tag cache block


set 1:
valid tag cache block
•••
valid tag cache block
set S-1:
valid tag cache block

E-way associative cache


© Sonali Chouhan
Accessing Set Associative Caches
Set selection
– identical to direct-mapped cache
valid tag cache block
set 0:
valid tag cache block

valid tag cache block


selected set set 1:
valid tag cache block
•••
valid tag cache block
set S-1:
valid tag cache block

t bits s bits b bits


0001
m-1
tag set index block offset 0
© Sonali Chouhan
(/index)
Accessing Set Associative Caches
Line matching and word selection
– must compare the tag in each valid line in
the selected set.
=1? (1) The valid bit must be set
0 1 2 3 4 5 6 7

1 1001
selected set (i): b0 b1 b2 b3
1 0110

(2) The tag bits in one of the


cache lines must match the tag =? If (1) and (2), then cache hit
bits in the address

t bits s bits b bits


0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Set Associative
Caches
Line matching and word selection
– Word selection is the same as in a direct
mapped cache
0 1 2 3 4 5 6 7

1 1001
selected set (i): b0 b1 b2 b3
1 0110

(3) If cache hit,


block offset selects starting byte.

t bits s bits b bits


0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Cache Line’s Tag Size
• Depends on 3 factors:
• Size of cache memory;
• Associativity of cache memory;
• Cacheable range of operating memory.

Here,
Stag — size of cache tag, in bits;
Smemory — cacheable range of operating memory, in bytes;
Scache — size of cache memory, in bytes;
A — associativity of cache memory, in ways.
© Sonali Chouhan
Memory system
performance

Dr. Sonali Chouhan


Embedded Systems (EE312)

© Sonali Chouhan
Cache operation
• Many main memory locations are
mapped onto one cache entry.
• May have caches for:
– instructions;
– data;
– data + instructions (unified).
• Memory access time is no longer
deterministic.

© Sonali Chouhan
Cache Performance Metrics

• Miss Rate
– Fraction of memory references not found
in cache (misses / accesses)
• 1 – hit rate
– Typical numbers (in percentages):
• 3-10% for L1
• can be quite small (e.g., < 1%) for L2,
depending on size, etc.

© Sonali Chouhan
Cache Performance Metrics
• Hit Time
– Time to deliver a line in the cache to the
processor
• includes time to determine whether the
line is in the cache
– Typical numbers:
• 1-2 clock cycle for L1
• 5-20 clock cycles for L2
• Miss Penalty
– Additional time required because of a miss
• typically 50-200 cycles for main memory
(Trend: increasing!) © Sonali Chouhan
Lets think about those
numbers
•Huge difference between a hit and a miss
– 100X, if just L1 and main memory
•Would you believe 99% hits is twice as
good as 97%?
– Consider these numbers:
– cache hit time of 1 cycle, miss penalty of 100
cycles
So, average access time is:
97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles
99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles
This is why “miss rate” is used instead of “hit rate”
© Sonali Chouhan
Memory system
performance
• h = cache hit rate.
• tcache = cache access time, tmain = main
memory access time.
• Average memory access time (AMAT):
– AMAT =hit rateL1* access timeL1 + miss
rateL1*miss penaltyL1
– tav = htcache + (1-h)tmain

© Sonali Chouhan
Reference

Computer organization and design: the


hardware/software interface
By David A. Patterson, John L. Hennessy

You might also like