Byte addressable system :
f block offset = 100 (binary = 4 decimal), that means “the 5th byte” inside the block.
But the CPU typically requests a whole word (4 bytes), not a single byte.
So the cache controller delivers the entire word that contains that byte (bytes 4–7).
Word Addressable System:
If block offset = 100 (binary = 4 decimal), it means “the 5th word” inside the block (i.e., bytes 16–19 if words are 4 bytes).
Cache returns that word directly.
Caches
Dr. Sonali Chouhan
Embedded Systems (EE312)
© Sonali Chouhan
Many types of caches
• Examples
– H/W: L1, L2 CPU caches, translation
lookaside buffers (TLBs), ...
– S/W: virtual memory, FS (Filesystem)
buffers, web browser caches, …
© Sonali Chouhan
Caches and CPUs
address data
cache
controller
cache main
CPU memory
address
data data
© Sonali Chouhan
Caches
• Many common design issues
– each cached item has a “tag” (an ID) plus contents
– need a mechanism to efficiently determine whether
given item is cached
• combinations of indices and constraints on valid
locations
– on a miss, usually need to pick something to replace
with new item
• called a “replacement policy”
– on writes, need to either propagate change or mark
item as “dirty”
• Write-through (L1+main) vs. write-back (L1)
• Different solutions for different caches
© Sonali Chouhan
Terms
• Cache hit: required location is in
cache.
• Cache miss: required location is not
in cache.
• Working set: set of
locations/addresses or memory used
by program in a time interval.
© Sonali Chouhan
Inserting an L1 Cache Between
the CPU and Main Memory
The tiny, very fast CPU register file has
The transfer unit between the room for four 4-byte word
CPU register file and the cache
is a 4-byte word
line 0 The small fast L1 cache has room for
line 1 two 4-word blocks
The transfer unit
between the cache and
main memory is a 4-word block 10 abcd
block (16 bytes) ...
block 21 pqrs The big slow main memory has
... room for many 4-word blocks
block 30 wxyz
...
© Sonali Chouhan
General Organization of a Cache
Cache is an array 1 valid bit t tag bits B = 2b bytes
of sets per line per line per cache block
Each set contains
valid tag 0 1 • • • B–1
one or more lines E lines
set 0: •••
per set
Each line holds a valid tag 0 1 • • • B–1
block of data
valid tag 0 1 • • • B–1
set 1: •••
S = 2s sets
valid tag 0 1 • • • B–1
•••
valid tag 0 1 • • • B–1
set S-1: •••
valid tag 0 1 • • • B–1
Cache size: C = B x E x S data bytes
© Sonali Chouhan
Addressing Caches
Address A:
t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
set 0: •••
v tag 0 1 • • • B–1 <tag> <set index> <block offset>
v tag 0 1 • • • B–1
set 1: •••
v tag 0 1 • • • B–1
••• The word at address A is in the cache
v tag 0 1 • • • B–1
If the tag bits in one of the <valid>
set S-1: •••
v tag 0 1 • • • B–1 lines in set <set index> match <tag>
The word contents begin at offset
<block offset> bytes from the
beginning of the block
© Sonali Chouhan
The addresses are generated by CP. Thus they are not the separate identities. They are mere generated addresses by the CPU.
Addressing Caches
Address A:
t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
set 0: •••
v tag 0 1 • • • B–1 <tag> <set index> <block offset>
v tag 0 1 • • • B–1
set 1: •••
v tag 0 1 • • • B–1
••• 1. Locate the set based on
v tag 0 1 • • • B–1 <set index>
set S-1: ••• 2. Locate the line in the set based on
v tag 0 1 • • • B–1
<tag>
3. Check that the line is valid
4. Locate the data in the line based on
<block offset>
© Sonali Chouhan
Example: Direct-Mapped
Cache
Simplest kind of cache, easy to build
(only 1 tag compare required per access)
Characterized by exactly one line per set.
set 0: valid tag cache block E=1 lines per set
set 1: valid tag cache block
•••
set S-1: valid tag cache block
Cache size: C = B x S data bytes
© Sonali Chouhan
Accessing Direct-Mapped
Caches
Set selection
– Use the set index bits to determine the set of
interest.
set 0: valid tag cache block
selected set
set 1: valid tag cache block
•••
set S-1: valid tag cache block
t bits s bits b bits
00001
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Direct-Mapped
Caches
Line matching and word selection
– Line matching: Find a valid line in the selected
set with a matching tag
– Word selection: Then extract the word
=1? (1) The valid bit must be set
0 1 2 3 4 5 6 7
selected set (i): 1 0110 b0 b1 b2 b3
(2) The tag bits in the cache
=? If (1) and (2), then cache hit
line must match the tag bits in
the address
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Direct-Mapped
Caches
Line matching and word selection
– Line matching: Find a valid line in the selected
set with a matching tag
– Word selection: Then extract the word
0 1 2 3 4 5 6 7
selected set (i): 1 0110 b0 b1 b2 b3
(3) If cache hit,
block offset selects starting byte.
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Cache Memory : Placement
Policy
• There are three commonly used
methods to translate main memory
addresses to cache memory
addresses.
• Associative Mapped Cache
• Direct-Mapped Cache
• Set-Associative Mapped Cache
• The choice of cache mapping scheme
affects cost and performance, and
there is no single best method that is
appropriate for all situations
© Sonali Chouhan
Associative Mapped Cache
• A block in the Main Memory can be mapped
to any block in the Cache Memory available
(not already occupied)
• Advantage: Flexibility. A Main Memory block
can be mapped anywhere in Cache Memory.
• Disadvantage: Slow or expensive. A search
through all the Cache Memory blocks is
needed to check whether the address can be
matched to any of the tags.
© Sonali Chouhan
Main Memory - Cache
Structure
Main Memory
4-line Cache 0000
0001
00 Block
0010 (k words)
01 0011
10 0100
11 0101
Tag Block Length 0110
(k words) 0111
1000
1001
1010
1011
1100
1101
1110
1111 © Sonali Chouhan
Associative Mapping -
Example
Main Memory
Cache 0000.. Block
0001..
0010..
0011..
. 0100..
0101..
.
Tag Block Length 0110..
(k words) 0111..
1000..
. 1001..
1010..
1011..
1100..
1101..
1110..
1111.. © Sonali Chouhan
Direct Mapped Cache
• To avoid the search through all CM
blocks needed by associative
mapping, this method only allows
# blocks in main memory__
# blocks in cache memory
blocks to be mapped to each Cache
Memory block.
© Sonali Chouhan
Direct Cache Mapping-
Example
Main Memory
4-line Cache 0000.. Block
Set 0001..
00 0010..
01 0011..
10 0100..
11 0101..
Tag Block 0110..
0111..
1000..
1001..
16/4=4 blocks of 1010..
1011..
MM to be mapped to 1100..
1 CM block 1101..
1110..
1111.. © Sonali Chouhan
Direct-Mapped Cache
Advantages
• The tag memory is much smaller
than in associative mapped cache.
• No need for an associative search,
since the set field is used to direct
the comparison to a single field.
© Sonali Chouhan
Direct-Mapped Cache
Disadvantage
• It lacks mapping flexibility. For
example, if two MM blocks mapped
to same CM block are needed
repeatedly (e.g., in a loop), they will
keep replacing each other, even
though all other CM blocks may be
available.
© Sonali Chouhan
Direct Cache Mapping-
Disadvantage (Example)
Main Memory
0000.. a(i)
Set 4-line Cache
0001..
00 0010..
01 0011..
10 0100.. b(i)
11 0101..
Tag Block 0110..
0111..
1000..
1001..
In a(i) + b(i) both keep 1010..
1011..
replacing each other 1100..
1101..
1110..
1111.. © Sonali Chouhan
Set-Associative Mapping
• This is a trade-off between associative and
direct mappings
• The cache is broken into sets where each
set contains "N" cache lines, let's say 4.
Then, each memory address is assigned a
set, and can be cached in any one of those
4 locations within the set that it is
assigned to. In other words, within each
set the cache is associative, and thus the
name.
© Sonali Chouhan
Set Associative Mapping-
Example
Main Memory
8 line Cache 0000.. Block
Set 0001..
00 0010..
01 0011..
10 0100..
11 0101..
Tag Block 0110..
0111..
1000..
1001..
1010..
1011..
1100..
1101..
1110..
1111.. © Sonali Chouhan
Example: Set Associative
Cache
Characterized by more than one line
per set
valid tag cache block E=2
set 0:
valid tag cache block lines per set
valid tag cache block
set 1:
valid tag cache block
•••
valid tag cache block
set S-1:
valid tag cache block
E-way associative cache
© Sonali Chouhan
Accessing Set Associative Caches
Set selection
– identical to direct-mapped cache
valid tag cache block
set 0:
valid tag cache block
valid tag cache block
selected set set 1:
valid tag cache block
•••
valid tag cache block
set S-1:
valid tag cache block
t bits s bits b bits
0001
m-1
tag set index block offset 0
© Sonali Chouhan
(/index)
Accessing Set Associative Caches
Line matching and word selection
– must compare the tag in each valid line in
the selected set.
=1? (1) The valid bit must be set
0 1 2 3 4 5 6 7
1 1001
selected set (i): b0 b1 b2 b3
1 0110
(2) The tag bits in one of the
cache lines must match the tag =? If (1) and (2), then cache hit
bits in the address
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Accessing Set Associative
Caches
Line matching and word selection
– Word selection is the same as in a direct
mapped cache
0 1 2 3 4 5 6 7
1 1001
selected set (i): b0 b1 b2 b3
1 0110
(3) If cache hit,
block offset selects starting byte.
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
© Sonali Chouhan
Cache Line’s Tag Size
• Depends on 3 factors:
• Size of cache memory;
• Associativity of cache memory;
• Cacheable range of operating memory.
Here,
Stag — size of cache tag, in bits;
Smemory — cacheable range of operating memory, in bytes;
Scache — size of cache memory, in bytes;
A — associativity of cache memory, in ways.
© Sonali Chouhan
Memory system
performance
Dr. Sonali Chouhan
Embedded Systems (EE312)
© Sonali Chouhan
Cache operation
• Many main memory locations are
mapped onto one cache entry.
• May have caches for:
– instructions;
– data;
– data + instructions (unified).
• Memory access time is no longer
deterministic.
© Sonali Chouhan
Cache Performance Metrics
• Miss Rate
– Fraction of memory references not found
in cache (misses / accesses)
• 1 – hit rate
– Typical numbers (in percentages):
• 3-10% for L1
• can be quite small (e.g., < 1%) for L2,
depending on size, etc.
© Sonali Chouhan
Cache Performance Metrics
• Hit Time
– Time to deliver a line in the cache to the
processor
• includes time to determine whether the
line is in the cache
– Typical numbers:
• 1-2 clock cycle for L1
• 5-20 clock cycles for L2
• Miss Penalty
– Additional time required because of a miss
• typically 50-200 cycles for main memory
(Trend: increasing!) © Sonali Chouhan
Lets think about those
numbers
•Huge difference between a hit and a miss
– 100X, if just L1 and main memory
•Would you believe 99% hits is twice as
good as 97%?
– Consider these numbers:
– cache hit time of 1 cycle, miss penalty of 100
cycles
So, average access time is:
97% hits: 1 cycle + 0.03 * 100 cycles = 4 cycles
99% hits: 1 cycle + 0.01 * 100 cycles = 2 cycles
This is why “miss rate” is used instead of “hit rate”
© Sonali Chouhan
Memory system
performance
• h = cache hit rate.
• tcache = cache access time, tmain = main
memory access time.
• Average memory access time (AMAT):
– AMAT =hit rateL1* access timeL1 + miss
rateL1*miss penaltyL1
– tav = htcache + (1-h)tmain
© Sonali Chouhan
Reference
Computer organization and design: the
hardware/software interface
By David A. Patterson, John L. Hennessy