0% found this document useful (0 votes)

61 views44 pages

Unit V

The document discusses memory hierarchy and caching techniques. It begins by reviewing DRAM and SRAM memory types and how they are used. It then introduces the concept of a memory hierarchy to improve performance by keeping frequently used data closer to the CPU. The rest of the document discusses caching in more detail, including block size, hit rates, direct mapped caches, addressing schemes, and handling reads and writes. Locality principles are also introduced to explain why caching is effective.

Uploaded by

Pavithra Janarthanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views44 pages

Unit V

Uploaded by

Pavithra Janarthanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Large and Fast: Exploiting

Memory Hierarchy
Memories: Review
 DRAM (Dynamic Random Access Memory):
 value is stored as a charge on capacitor that must be periodically
refreshed, which is why it is called dynamic
 very small – 1 transistor per bit – but factor of 5 to 10 slower than
SRAM
 used for main memory
 SRAM (Static Random Access Memory):
 value is stored on a pair of inverting gates that will exist indefinitely
as long as there is power, which is why it is called static
 very fast but takes up more space than DRAM – 4 to 6 transistors
per bit
Speed CPU Size Cost ($/bit)
 used for cache
Fastest Memory Smallest Highest

Memory

Slowest Memory Biggest Lowest

Memory Hierarchy
 Users want large and fast memories…
 expensive and they don’t like to pay…
 Make it seem like they have what they want…
 memory hierarchy
 hierarchy is inclusive, every level is subset of lower level
 performance depends on hit rates
CPU

Processor
Block of data
(unit of data copy) Increasing distance
Level 1
from the CPU in
access time

Levels in the Level 2

Data are transferred memory hierarchy

Level n

Size of the memory at each level

Locality
 Locality is a principle that makes having a memory hierarchy a
good idea
 If an item is referenced then because of
 temporal locality: it will tend to be again referenced soon
 spatial locality: nearby items will tend to be referenced soon
 why does code have locality – consider instruction and data?
Hit and Miss
 Focus on any two adjacent levels – called, upper (closer to CPU)
and lower (farther from CPU) – in the memory hierarchy,
because each block copy is always between two adjacent levels
 Terminology:
 block: minimum unit of data to move between levels
 hit: data requested is in upper level
 miss: data requested is not in upper level
 hit rate: fraction of memory accesses that are hits (i.e., found at
upper level)
 miss rate: fraction of memory accesses that are not hits
 miss rate = 1 – hit rate
 hit time: time to determine if the access is indeed a hit + time to
access and deliver the data from the upper level to the CPU
 miss penalty: time to determine if the access is a miss + time to
replace block at upper level with corresponding block at lower level
+ time to deliver the block to the CPU
Caches
 By simple example
 assume block size = one word of data

X4 X4

X1 X1
Reference to Xn
causes miss so
Xn – 2 Xn – 2
it is fetched from
memory
Xn – 1 Xn – 1

X2 X2
Xn

X3 X3

a. Before the reference to Xn b. After the reference to Xn

 Issues:
 how do we know if a data item is in the cache?

 if it is, how do we find it?

 if not, what do we do?

 Solution depends on cache addressing scheme…

Direct Mapped Cache
 Addressing scheme in direct mapped cache:
 cache block address = memory block address mod cache size (unique)
 if cache size = 2m, cache address = lower m bits of n-bit memory address
 remaining upper n-m bits kept kept as tag bits at each cache block
Cache
 also need a valid bit to

000
001
010
011

111
100
101
110
recognize valid entry

00001 00101 01001 01101 10001 10101 11001 11101

Memory
Accessing Cache
 Example:
(0) Initial state: (1) Address referred 10110 (miss):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 N 010 N
011 N 011 N
100 N 100 N
101 N 101 N
110 N 110 Y 10 Mem(10110)
111 N 111 N
(2) Address referred 11010 (miss): (3) Address referred 10110 (hit):
Index V Tag Data Index V Tag Data
000 N 000 N
001 N 001 N
010 Y 11 Mem(11010) 010 Y 11 Mem(11010)
011 N 011 N
100 N 100 N to CPU

101 N 101 N
110 Y 10 Mem(10110) 110 Y 10 Mem(10110)
111 N 111 N
(4) Address referred 10010 (miss):
Index V Tag Data
000 N
001 N
010 Y 10 Mem(10010)
011 N
100 N
101 N
110 Y 10 Mem(10110)
111 N
Direct Mapped Cache
Address showing
Address (showing bit positions
bit positions)
 MIPS style: 31 30 13 12 11 210
Byte
offset

20 10
Hit Data
Tag
Index

Index Valid Tag Data

0
1
2

1021
1022
1023
20 32

What kind of locality are we takingCache

advantage
withof?
1024 1-word blocks: byte offset
(least 2 significant bits) is ignored and
next 10 bits used to index into cache
Cache Read Hit/Miss
 Cache read hit: no action needed
 Instruction cache read miss:
1. Send original PC value (current PC – 4, as PC has already been
incremented in first step of instruction cycle) to memory
2. Instruct main memory to perform read and wait for memory to
complete access – stall on read
3. After read completes write cache entry
4. Restart instruction execution at first step to refetch instruction
 Data cache read miss:
 Similar to instruction cache miss
 To reduce data miss penalty allow processor to execute
instructions while waiting for the read to complete until the word
is required – stall on use (why won’t this work for instruction
misses?)
DECStation 3100 Cache
(MIPS R2000 processor) Address showing
Address bitpositions)
(showing bit positions
31 30 17 16 15 543210

16 14 Byte
offset
Hit Data

16 bits 32 bits
Valid Tag Data

16K
entries

16 32

Cache with 16K 1-word blocks: byte offset

(least 2 significant bits) is ignored and
next 14 bits used to index into cache
Cache Write Hit/Miss
 Write-through scheme
 on write hit: replace data in cache and memory with every write hit
to avoid inconsistency
 on write miss: write the word into cache and memory – obviously
no need to read missed word from memory!
 Write-through is slow because of always required memory write
 performance is improved with a write buffer where words are stored
while waiting to be written to memory – processor can continue
execution until write buffer is full
 when a word in the write buffer completes writing into main that buffer
slot is freed and becomes available for future writes
 DEC 3100 write buffer has 4 words
 Write-back scheme
 write the data block only into the cache and write-back the block to
main only when it is replaced in cache
 more efficient than write-through, more complex to implement
Direct Mapped Cache: Taking
Advantage of Spatial Locality
 Taking advantage of spatial locality with larger blocks:
Address
Addressshowing bit positions
(showing bit positions)
31 16 15 4 32 1 0

16 12 2 Byte
Hit Tag Data
offset
Index Block offset
16 bits 128 bits
V Tag Data

4K
entries

16 32 32 32 32

Mux
32

Cache with 4K 4-word blocks: byte offset (least 2 significant bits) is ignored,
next 2 bits are block offset, and the next 12 bits are used to index into cache
Direct Mapped Cache: Taking
Advantage of Spatial Locality
 Cache replacement in large (multiword) blocks:
 word read miss: read entire block from main memory
 word write miss: cannot simply write word and tag! Why?!
 writing in a write-through cache:
 if write hit, i.e., tag of requested address and and cache entry are
equal, continue as for 1-word blocks by replacing word and writing
block to both cache and memory
 if write miss, i.e., tags are unequal, fetch block from memory, replace
word that caused miss, and write block to both cache and memory
 therefore, unlike case of 1-word blocks, a write miss with a multiword
block causes a memory read
Direct Mapped Cache: Taking
Advantage of Spatial Locality
 Miss rate falls at first with increasing block size as expected,
but, as block size becomes a large fraction of total cache size,
miss rate may go up because
 there are few blocks
 competition for blocks increases
 blocks get ejected before most of their words are accessed
(thrashing in cache) 40%
35%

30%

25%
Miss rate

20%

15%

10%

0%
4 16 64 256
Block size (bytes) 1 KB

Miss rate vs. block size for 8 KB

16 KB
various cache sizes 64 KB
256 KB
Example Problem
 How many total bits are required for a direct-mapped cache
with 128 KB of data and 1-word block size, assuming a 32-bit
address?

 Cache data = 128 KB = 217 bytes = 215 words = 215 blocks

 Cache entry size = block data bits + tag bits + valid bit
= 32 + (32 – 15 – 2) + 1 = 48 bits
 Therefore, cache size = 215  48 bits = 215
 (1.5  32) bits = 1.5  220 bits = 1.5 Mbits
 data bits in cache = 128 KB  8 = 1 Mbits
 total cache size/actual cache data = 1.5
Example Problem
 How many total bits are required for a direct-mapped cache with
128 KB of data and 4-word block size, assuming a 32-bit
address?

 Cache size = 128 KB = 217 bytes = 215 words = 213 blocks

 Cache entry size = block data bits + tag bits + valid bit
= 128 + (32 – 13 – 2 – 2) + 1 = 144 bits
 Therefore, cache size = 213  144 bits =
213  (1.25  128) bits = 1.25  220 bits = 1.25 Mbits
 data bits in cache = 128 KB  8 = 1 Mbits
 total cache size/actual cache data = 1.25
Example Problem
 Consider a cache with 64 blocks and a block size of 16 bytes. What
block number does byte address 1200 map to?

 As block size = 16 bytes:

byte address 1200  block address 1200/16  = 75
 As cache size = 64 blocks:
block address 75  cache block (75 mod 64) = 11
Improving Cache Performance
 Use split caches for instruction and data because there is more spatial locality in instruction references:

Block size in Instruction Data miss Effective combined

Program words miss rate rate miss rate
 gcc
Make reading multiple words1(higher bandwidth) possible
6.1% by increasing 2.1% 5.4%
physical or logical width of the system…
4 2.0% 1.7% 1.9%
spice 1 1.2% 1.3% 1.2%
4 0.3% 0.6% 0.4%
Miss rates for gcc and spice in a MIPS R2000
with one and four word block sizes
Improving Cache Performance
by Increasing Bandwidth
 Assume:
 cache block of 4 words
 1 clock cycle to send address to memory address buffer (1 bus trip)
 15 clock cycles for each memory data access
 1 clock cycle to send data to memory data buffer (1 bus trip)
CPU CPU CPU

Multiplexor Bus
Cache Cache
Cache
Proce- Memory Memory Memory Memory
Bus Bus Bus ssor bank 0 bank 1 bank 2 bank 3

Memory Memory Memory Memory

Memory
bank 0 bank 1 bank 2 bank 3 Interleaved memory units
compete for bus
Memory b. Wide memory organization c. Interleaved memory organization

4 word wide memory and bus 4 word wide memory only

1 + 115 +11 = 17 cycles 1 +115 + 41 = 20 cycles

a. One-word-wide
memory organization
Miss penalties
1 + 4*15 + 4*1 = 65 cycles
Performance
 Simplified model assuming equal read and write miss penalties:
 CPU time = (execution cycles + memory stall cycles)  cycle time
 memory stall cycles = memory accesses  miss rate  miss penalty

 Therefore, two ways to improve performance in cache:

 decrease miss rate
 decrease miss penalty
 what happens if we increase block size?
Example Problems
 Assume for a given machine and program:
 instruction cache miss rate 2%
 data cache miss rate 4%
 miss penalty always 40 cycles
 CPI of 2 without memory stalls
 frequency of load/stores 36% of instructions

1. How much faster is a machine with a perfect cache that never

misses?
2. What happens if we speed up the machine by reducing its CPI to 1
without changing the clock rate?
3. What happens if we speed up the machine by doubling its clock rate,
but if the absolute time for a miss penalty remains same?
Solution
1.
 Assume instruction count = I
 Instruction miss cycles = I  2%  40 = 0.8  I
 Data miss cycles = I  36%  4%  40 = 0.576  I
 So, total memory-stall cycles = 0.8  I + 0.576  I = 1.376  I
 in other words, 1.376 stall cycles per instruction
 Therefore, CPI with memory stalls = 2 + 1.376 = 3.376
 Assuming instruction count and clock rate remain same for a
perfect cache and a cache that misses:
CPU time with stalls / CPU time with perfect cache
= 3.376 / 2 = 1.688
 Performance with a perfect cache is better by a factor of 1.688
Solution (cont.)
2.
 CPI without stall = 1
 CPI with stall = 1 + 1.376 = 2.376 (clock has not changed so
stall cycles per instruction
remains same)
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: with higher CPI cache misses “hurt more” than with
lower CPI
Solution (cont.)
3.
 With doubled clock rate, miss penalty = 2  40 = 80 clock cycles
 Stall cycles per instruction = (I  2%  80) + (I  36%  4%  80)
= 2.752  I
 So, faster machine with cache miss has CPI = 2 + 2.752 = 4.752
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 4.752 / 2 = 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: with higher clock rate cache misses “hurt more” than
with lower clock rate
Decreasing Miss Rates with
Associative Block Placment
 Direct mapped: one unique cache location for each memory block
 cache block address = memory block address mod cache size
 Fully associative: each memory block can locate anywhere in cache
 all cache entries are searched (in parallel) to locate block
 Set associative: each memory block can place in a unique set of
cache locations – if the set is of size n it is n-way set-associative
 cache set address = memory block address mod number of sets in
cache
 all cache entries in the corresponding set are searched (in parallel) to
locate block
 Increasing degree of associativity
 reduces miss rate
 increases hit time because of the parallel search and then fetch
Decreasing Miss Rates with
Associative Block Placment
Direct Mapped
Direct mapped 2-way Set Associative
Set associative Fully Associative
Fully associative
Block # 0 1 2 3 4 5 6 7 Set # 0 1 2 3

Data Data Data

12 mod 8 = 4 12 mod 4 = 0

1 1 1
Tag Tag Tag
2 2 2

Search Search Search

Location of a memory block with address 12 in a cache with 8 blocks

with different degrees of associativity
Decreasing Miss Rates with
Associative Block Placment
One-way set
One-way associative
set associative
(direct mapped)
Block Tag Data
0
Two-way set associative
1
Set Tag Data Tag Data
2
3 0

4 1

5 2

6 3

Four-way set associative

Set Tag Data Tag Data Tag Data Tag Data
0
1

Eight-way set associative (fully associative)

Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data

Configurations of an 8-block cache with different degrees of associativity

Example Problems
 Find the number of misses for a cache with four 1-word blocks given
the following sequence of memory block accesses:
0, 8, 0, 6, 8,
for each of the following cache configurations
1. direct mapped
2. 2-way set associative (use LRU replacement policy)
3. fully associative

 Note about LRU replacement

 in a 2-way set associative cache LRU replacement can be
implemented with one bit at each set whose value indicates the
mostly recently referenced block
Solution
 1 (direct-mapped)
Block address Cache block
0 0 (= 0 mod 4)
6 2 (= 6 mod 4)
8 0 (= 8 mod 4)
Block address translation in direct-mapped cache

Address of memory Hit or Contents of cache blocks after reference

block accessed miss 0 1 2 3
0 miss Memory[0]
8 miss Memory[8]
0 miss Memory[0]
6 miss Memory[0] Memory[6]
8 miss Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added
 5 misses
Solution (cont.)
 2 (two-way set-associative)
Block address Cache set
0 0 (= 0 mod 2)
6 0 (= 6 mod 2)
8 0 (= 8 mod 2)
Block address translation in a two-way set-associative cache

Address of memory Hit or Contents of cache blocks after reference

block accessed miss Set 0 Set 0 Set 1 Set 1
0 miss Memory[0]
8 miss Memory[0] Memory[8]
0 hit Memory[0] Memory[8]
6 miss Memory[0] Memory[6]
8 miss Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added
 Four misses
Solution (cont.)
 3 (fully associative)

Address of memory Hit or Contents of cache blocks after reference

block accessed miss Block 0 Block 1 Block 2 Block 3
0 miss Memory[0]
8 miss Memory[0] Memory[8]
0 hit Memory[0] Memory[8]
6 miss Memory[0] Memory[8] Memory[6]
8 hit Memory[0] Memory[8] Memory[6]
Cache contents after each reference – red indicates new entry added

 3 misses
Implementation of a Set-
Associative Cache Address
Address
31 30 12 11 10 9 8 321 0

22 8

Index V Tag Data V Tag Data V Tag Data V Tag Data

0
1
2

253
Set
254
255
22 32

4-to-1 multiplexor

Hit Data
4-way set-associative cache with 4 comparators and one 4-to-1 multiplexor:
size of cache is 1K blocks = 256 sets * 4-block set size
Performance with Set-
Associative Caches 15%

12%

9%
Miss rate

0%
One-way Two-way Four-way Eight-way
Associativity 1 KB 16 KB
2 KB 32 KB
Miss rates for each of eight cache sizes 4 KB 64 KB
with increasing associativity: 8 KB 128 KB

data generated from SPEC92 benchmarks

with 32 byte block size for all caches
Decreasing Miss Penalty with
Multilevel Caches
 Add a second-level cache
 primary cache is on the same chip as the processor
 use SRAMs to add a second-level cache, sometimes off-chip, between
main memory and the first-level cache
 if miss occurs in primary cache second-level cache is accessed
 if data is found in second-level cache miss penalty is access time of
second-level cache which is much less than main memory access time
 if miss occurs again at second-level then main memory access is required
and large miss penalty is incurred
 Design considerations using two levels of caches:
 try and optimize the hit time on the 1st level cache to reduce clock cycle
 try and optimize the miss rate on the 2nd level cache to reduce memory
access penalties
 In other words, 2nd level allows 1st level to go for speed without
“worrying” about failure…
Example Problem
 Assume a 500 MHz machine with
 base CPI 1.0
 main memory access time 200 ns.
 miss rate 5%
 How much faster will the machine be if we add a second-level
cache with 20ns access time that decreases the miss rate to 2%?
Solution
 Miss penalty to main = 200 ns / (2 ns / clock cycle) = 100 clock cycles
 Effective CPI with one level of cache
= Base CPI + Memory-stall cycles per instruction
= 1.0 + 5%  100 = 6.0
 With two levels of cache, miss penalty to second-level cache
= 20 ns / (2 ns / clock cycle) = 10 clock cycles
 Effective CPI with two levels of cache
= Base CPI + Primary stalls per instruction
+ Secondary stall per instruction
= 1 + 5%  10 + 2%  100 = 3.5
 Therefore, machine with secondary cache is faster by a factor of
6.0 / 3.5 = 1.71
Virtual Memory
 Motivation: main memory acts as cache for secondary storage,
e.g., magnetic disk
 Virtual address space, i.e., space addressable by a program is
determined by ISA
 e.g., 64-bit MIPS address space size is 264 – recall jr instruction
 typically: main memory size  disk size  virtual address space size
 Program can “pretend” it has main memory of the size of the disk
– which is smaller than thevirtual memory (= whole virtual
address space), but bigger than the actual physical memory
(=DRAM main memory)
 Page table (as we shall see) transparently converts a virtual memory
address to a physical memory address, if the data is already in main;
if not, it issues call to OS to fetch the data from disk into main
 Virtual memory is organized in fixed-size (power of 2, typically at
least 4 KB) blocks, called pages. Physical memory is also
considered a collection of pages of the same size.
 the unit of data transfer between disk and physical memory is a page
Virtual Memory
Page

Virtual Address
Virtual addresses Physical Address
Physical addresses
Address translation

Main Memory
Virtual
Memory

Disk addresses

Secondary Storage

Mapping of pages from a virtual address to a

physical address or disk address
Page Table Implements Virtual
to Physical Address Translation Page table register

Virtual address
31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

Points to start Virtual page number Page offset

of page table
20 12

Valid Physical page number

Page table

18
If 0 then page is not
present in memory

29 28 27 15 14 13 12 11 10 9 8 3 2 1 0

Physical page number Page offset

Physical address

Page table: page size 4 KB, virtual address space 4 GB,

physical memory 1 GB
Page Faults
 Page fault: page is not in memory, must retrieve it from disk
 enormous miss penalty = millions of cycles
 therefore, page size should be large (e.g., 32 or 64 KB)
 to make one trip to disk worth a lot
 reducing page faults is critical
 LRU replacement policy – implemented approximately by setting a use
bit each time a page is accessed, and then periodically clearing all
these bits so that pages accessed in a fixed time period are known
 fully associative page placement – consequence of page table
 handle faults in software instead of hardware
 as software overhead is still small compared to disk access time
 using write-through is too expensive, so always use write-back
Resolving Page Faults using
the Page Table to Access Disk
 There is a data structure, either part of or auxiliary to the
page table, which records where each virtual page is stored
on disk (cylinder, sector, block, etc.)
Virtual page
number
Page table
Physical memory
Physical page or
Valid disk address

1
1
1
1
0
1
1
0
1 Disk storage
1
0
1

Page table maps virtual page to

either physical page or disk page
Making Address Translation
Fast with the
Translation-lookaside Buffer
 A cache for address translations – translation-lookaside buffer (TLB):
TLB
Virtual page Physical page
number Valid Tag address

1
1
1 Physical memory
1
0
1

Page table
Physical p
Valid age
or disk address

1
1
1
Disk storage
1
0
1
1
0
1
1
0
1

On a page reference, first look up the virtual page number in the TLB; if
there is a TLB miss look up the page table; if miss again then true page fault

Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Cache Memory for Computer Science Students
No ratings yet
Cache Memory for Computer Science Students
16 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Chap 6
No ratings yet
Chap 6
48 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
06 - Memory System - I
No ratings yet
06 - Memory System - I
63 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
ch5 1
No ratings yet
ch5 1
44 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
Lec 4
No ratings yet
Lec 4
31 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Memory
No ratings yet
Memory
42 pages
Unit 4
No ratings yet
Unit 4
72 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
115 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Memory Hierarchy Design Guide
No ratings yet
Memory Hierarchy Design Guide
54 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Chapter 5
No ratings yet
Chapter 5
131 pages
6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
DECO - Module 4.3 - Cache
No ratings yet
DECO - Module 4.3 - Cache
20 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
Cache Memory Management Guide
No ratings yet
Cache Memory Management Guide
45 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec04-Cache - (Cuuduongthancong - Com)
16 pages
Introduction To Cache Memory: Lecture 4A
No ratings yet
Introduction To Cache Memory: Lecture 4A
31 pages
Chapter 5.1-5.6 Memory
No ratings yet
Chapter 5.1-5.6 Memory
26 pages
Lectures wk11
No ratings yet
Lectures wk11
21 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Cache
No ratings yet
Cache
34 pages
The Motivation For Caches: Memory System
No ratings yet
The Motivation For Caches: Memory System
9 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Unit 4
No ratings yet
Unit 4
55 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
COA Lecture 18-Fully Associative, Set Associative PDF
No ratings yet
COA Lecture 18-Fully Associative, Set Associative PDF
20 pages
Lecture 7
No ratings yet
Lecture 7
34 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
10 Cache
No ratings yet
10 Cache
28 pages
Cache Memory Essentials
No ratings yet
Cache Memory Essentials
52 pages
Cache Design
No ratings yet
Cache Design
59 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
CS8552-Computer Architecture and Organization
No ratings yet
CS8552-Computer Architecture and Organization
2 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
CS6008 - Hci MCQ
No ratings yet
CS6008 - Hci MCQ
23 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
DS LAB Manual
No ratings yet
DS LAB Manual
78 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Unit-Iii Models & Theories
No ratings yet
Unit-Iii Models & Theories
19 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
64 pages
Unit Iv
No ratings yet
Unit Iv
37 pages
Unit-Ii Design & Software Process: Prototyping in Practice Design Rationale Design Rules Universal Design
No ratings yet
Unit-Ii Design & Software Process: Prototyping in Practice Design Rationale Design Rules Universal Design
25 pages
Unit-I: Memory Interaction Human, Computer
No ratings yet
Unit-I: Memory Interaction Human, Computer
25 pages
Microprocessor - Terms and Definitions
No ratings yet
Microprocessor - Terms and Definitions
5 pages
Hodges Et Al 2024 Introducing swmm5
No ratings yet
Hodges Et Al 2024 Introducing swmm5
4 pages
Microprocessor and Microcontroller
No ratings yet
Microprocessor and Microcontroller
2 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
114 pages
Chapter 1 Logic 1
No ratings yet
Chapter 1 Logic 1
59 pages
CS 35101 Computer Architecture: Section 600
No ratings yet
CS 35101 Computer Architecture: Section 600
45 pages
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
No ratings yet
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
4 pages
It Responds To A Specific Set of Instructions in A Well-Defined Manner. - It Can Execute A Prerecorded List of Instructions
No ratings yet
It Responds To A Specific Set of Instructions in A Well-Defined Manner. - It Can Execute A Prerecorded List of Instructions
22 pages
Special Function Registers (SFR'S)
No ratings yet
Special Function Registers (SFR'S)
14 pages
Flynn Class
No ratings yet
Flynn Class
12 pages
Frank Vahid - Tony Givargis - Embedded System Design
67% (3)
Frank Vahid - Tony Givargis - Embedded System Design
174 pages
ELT1130 Anatomy of A Robot Notes
No ratings yet
ELT1130 Anatomy of A Robot Notes
15 pages
"LUCAS" For The Amiga 1000: A 68020168881 Platform Board
No ratings yet
"LUCAS" For The Amiga 1000: A 68020168881 Platform Board
8 pages
Leveraging HET TMS570
No ratings yet
Leveraging HET TMS570
6 pages
Visit: For Thousands of Educational Resources: WWW - Kcse-Online - Info
No ratings yet
Visit: For Thousands of Educational Resources: WWW - Kcse-Online - Info
21 pages
OCR A Level CS Specification (H446)
No ratings yet
OCR A Level CS Specification (H446)
56 pages
Face Mask Detection Using Machine Learning Final Report
No ratings yet
Face Mask Detection Using Machine Learning Final Report
34 pages
Ifet College of Engineering Department of Cse & It Cs6303 - Computer Architecture Unit V - Memory and I/O Systems (100% THEORY) Question Bank
No ratings yet
Ifet College of Engineering Department of Cse & It Cs6303 - Computer Architecture Unit V - Memory and I/O Systems (100% THEORY) Question Bank
5 pages
HP UX General PDF
No ratings yet
HP UX General PDF
64 pages
8085 Interrupts
100% (6)
8085 Interrupts
44 pages
BL860 Logs
No ratings yet
BL860 Logs
102 pages
Crash 2024 12 31 - 02.18.24 Server
No ratings yet
Crash 2024 12 31 - 02.18.24 Server
2 pages
Gravimetric Feeder
75% (4)
Gravimetric Feeder
47 pages
Brksec 3699 PDF
No ratings yet
Brksec 3699 PDF
209 pages
UC 8th Edition Ejercicios y Soluciones
No ratings yet
UC 8th Edition Ejercicios y Soluciones
13 pages
IT233-Assignment 2
No ratings yet
IT233-Assignment 2
7 pages
Cambridge International AS & A Level: Computer Science 9608/12
No ratings yet
Cambridge International AS & A Level: Computer Science 9608/12
16 pages
Rolf Burger - Computer Viruses - A High-Tech Disease (1988, Abacus)
No ratings yet
Rolf Burger - Computer Viruses - A High-Tech Disease (1988, Abacus)
291 pages
MikroC Pro Ebook
No ratings yet
MikroC Pro Ebook
159 pages
System&Applicationsoft
No ratings yet
System&Applicationsoft
48 pages

Unit V

Uploaded by

Unit V

Uploaded by

Large and Fast: Exploiting

Slowest Memory Biggest Lowest

Levels in the Level 2

Size of the memory at each level

a. Before the reference to Xn b. After the reference to Xn

 if it is, how do we find it?

 if not, what do we do?

 Solution depends on cache addressing scheme…

00001 00101 01001 01101 10001 10101 11001 11101

Index Valid Tag Data

What kind of locality are we takingCache

Cache with 16K 1-word blocks: byte offset

Miss rate vs. block size for 8 KB

 Cache data = 128 KB = 217 bytes = 215 words = 215 blocks

 Cache size = 128 KB = 217 bytes = 215 words = 213 blocks

 As block size = 16 bytes:

Block size in Instruction Data miss Effective combined

Memory Memory Memory Memory

4 word wide memory and bus 4 word wide memory only

1 + 1*15 +1*1 = 17 cycles 1 +1*15 + 4*1 = 20 cycles

 Therefore, two ways to improve performance in cache:

1. How much faster is a machine with a perfect cache that never

Data Data Data

Search Search Search

Location of a memory block with address 12 in a cache with 8 blocks

Four-way set associative

Eight-way set associative (fully associative)

Configurations of an 8-block cache with different degrees of associativity

 Note about LRU replacement

Address of memory Hit or Contents of cache blocks after reference

Address of memory Hit or Contents of cache blocks after reference

Address of memory Hit or Contents of cache blocks after reference

Index V Tag Data V Tag Data V Tag Data V Tag Data

data generated from SPEC92 benchmarks

Mapping of pages from a virtual address to a

Points to start Virtual page number Page offset

Valid Physical page number

Physical page number Page offset

Page table: page size 4 KB, virtual address space 4 GB,

Page table maps virtual page to

You might also like

1 + 115 +11 = 17 cycles 1 +115 + 41 = 20 cycles