0% found this document useful (0 votes)

30 views38 pages

Cache PPT

Uploaded by

Tharun Chitipolu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views38 pages

Cache PPT

Uploaded by

Tharun Chitipolu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Computer Architecture

ELE 475 / COS 475

Slide Deck 3: Cache Review
David Wentzlaff
Department of Electrical Engineering
Princeton University

1
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

2
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

3
Naive Register File

Write
Data

Read
Data

clk Read
Decoder Address

Write
Address
4
Memory Arrays: Register File

5
Memory Arrays: SRAM

6
Memory Arrays: DRAM

7
Relative Memory Sizes of
SRAM vs. DRAM

On-Chip DRAM on
SRAM on memory chip
logic chip

[ From Foss, R.C. “Implementing Application-

Specific Memory”, ISSCC 1996 ] 8
Memory Technology Trade-offs
Low Capacity
Latches/Registers Low Latency
High Bandwidth
(more and wider ports)

SRAM
High Capacity
DRAM High Latency
Low Bandwidth

9
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

10
CPU-Memory Bottleneck
Main
Processor
Memory

• Performance of high-speed computers is usually limited by

memory bandwidth and latency
• Latency is time for a single access
– Main memory latency is usually >> than processor cycle time
• Bandwidth is the number of accesses per unit time
– If m instructions are loads/stores, 1 + m memory accesses per
instruction, CPI = 1 requires at least 1 + m memory accesses
per cycle
• Bandwidth-Delay Product is amount of data that can be in
flight at the same time (Little’s Law)
11
Processor-DRAM Latency Gap

[Hennessy &
Patterson 2011]

• Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute

800 instructions during the time for one memory access!
• Long latencies mean large bandwidth-delay products which can be
difficult to saturate, meaning bandwidth is wasted
12
From Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
Physical Size Affects Latency
Processor

Processor

Small
Memory
Big Memory

• Signals have further to travel

• Fan out to more locations

13
Memory Hierarchy
Small Fast Big Slow
Processor Memory Memory
(RF, SRAM) (DRAM)

• Capacity: Register << SRAM << DRAM

• Latency: Register << SRAM << DRAM
• Bandwidth: on-chip >> off-chip
• On a data access:
– if data is in fast memory -> low-latency access to SRAM
– if data is not in fast memory -> long-latency access to DRAM
• Memory hierarchies only work if the small, fast memory
actually stores data that is reused by the processor 14
Common And Predictable Memory
Reference Patterns
Address n loop iterations
Temporal Locality:
Instruction If a location is
fetches reference it is likely
to be reference again
subroutine subroutine in the near future
call return
Stack
Spatial Locality:
accesses
argument access If a location is
referenced it is likely
that locations near it
will be referenced in
Data the near future
accesses scalar accesses
Time
15
Real Memory Reference Patterns
Spatial
Locality

Temporal
Locality
Memory Address

Temporal
& Spatial
Locality

Time (one dot per access to that address at that time)

[From Donald J. Hatfield, Jeanette Gerald: Program Restructuring 16
for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)]
Caches Exploit Both Types of Locality
Small Fast Big Slow
Processor Memory Memory
(RF, SRAM) (DRAM)

• Exploit temporal locality by remembering the

contents of recently accessed locations
• Exploit spatial locality by fetching blocks of
data around recently accessed locations

17
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

18
Inside a Cache
Address Address
Main
Processor CACHE Memory
Data Data

copy of main copy of main

memory memory
location 100 location 101
Data Data
100 Byte Byte Line
Data
304 Byte

Address 6848
Tag 416

Data Block
19
Basic Cache Algorithm for a Load

20
Classifying Caches
Address Address
Main
Processor CACHE Memory
Data Data

• Block Placement: Where can a block be

placed in the cache?
• Block Identification: How a block is found if it
is in the cache?
• Block Replacement: Which block should be
replaced on a miss?
• Write Strategy: What happens on a write? 21
Block Placement:
Where Place Block in Cache?
1111111111 2222222222 33
Block Number 0123456789 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct

Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
22
Block Placement:
Where Place Block in Cache?
1111111111 2222222222 33
Block Number 0123456789 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct

Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
23
Block Identification: How to find block
in cache?

• Cache uses index and offset to find

potential match, then checks tag
• Tag check only includes higher order bits
• In this example (Direct-mapped, 8B block,
4 line cache )
24
Block Identification: How to find block
in cache?

• Cache checks all potential blocks with

parallel tag check
• In this example (2-way associative, 8B block,
4 line cache) 25
Block Replacement: Which block to
replace?
• No choice in a direct mapped cache
• In an associative cache, which block from set should be
evicted when the set becomes full?
• Random
• Least Recently Used (LRU)
– LRU cache state must be updated on every access
– True implementation only feasible for small sets (2-way)
– Pseudo-LRU binary tree often used for 4-8 way
• First In, First Out (FIFO) aka Round-Robin
– Used in highly associative caches
• Not Most Recently Used (NMRU)
– FIFO with exception for most recently used block(s)
26
Write Strategy: How are writes
handled?
• Cache Hit
– Write Through – write both cache and memory,
generally higher traffic but simpler to design
– Write Back – write cache only, memory is written
when evicted, dirty bit per block avoids unnecessary
write backs, more complicated
• Cache Miss
– No Write Allocate – only write to main memory
– Write Allocate – fetch block into cache, then write
• Common Combinations
• Write Through & No Write Allocate
• Write Back & Write Allocate
27
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

28
Average Memory Access Time
Hit
Main
Processor CACHE Memory
Miss

• Average Memory Access Time = Hit Time + ( Miss Rate * Miss Penalty )

29
Categorizing Misses: The Three C’s

• Compulsory – first-reference to a block, occur even

with infinite cache
• Capacity – cache is too small to hold all data needed by
program, occur even under perfect replacement policy
(loop over 5 cache lines)
• Conflict – misses that occur because of collisions due
to less than full associativity (loop over 3 cache lines) 30
Reduce Hit Time: Small & Simple
Caches

Plot from Hennessy and Patterson Ed. 4

• Less tag overhead • Can waste bandwidth if data is

• Exploit fast burst transfers not used
from DRAM • Fewer blocks -> more conflicts
• Exploit fast burst transfers
over wide on-chip busses

Empirical Rule of Thumb:

If cache size is doubled, miss rate usually drops by about √2

Empirical Rule of Thumb:

Direct-mapped cache of size N has about the same miss rate
as a two-way set- associative cache of size N/2
Plot from Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
Reduce Miss Rate: High Associativity

Empirical Rule of Thumb:

36
Acknowledgements
• These slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
– Joel Emer (Intel/MIT)
– James Hoe (CMU)
– John Kubiatowicz (UCB)
– David Patterson (UCB)
– Christopher Batten (Cornell)

• MIT material derived from course 6.823

• UCB material derived from course CS252 & CS152
• Cornell material derived from course ECE 4750

6.module 2 - Part 2
No ratings yet
6.module 2 - Part 2
39 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
Memory Hierarchy Essentials
No ratings yet
Memory Hierarchy Essentials
60 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
51 pages
CH04 COA9e Cache Memory Repaired
No ratings yet
CH04 COA9e Cache Memory Repaired
42 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Chap 6
No ratings yet
Chap 6
48 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
57 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Chapter 7
No ratings yet
Chapter 7
23 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
CMSC 611: Advanced Computer Architecture
No ratings yet
CMSC 611: Advanced Computer Architecture
21 pages
Cache Mapping
No ratings yet
Cache Mapping
23 pages
10 Caches
No ratings yet
10 Caches
34 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Cache
No ratings yet
Cache
34 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
AC14L08 Memory Hierarchy
No ratings yet
AC14L08 Memory Hierarchy
20 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
6.cache Memory - BVK
No ratings yet
6.cache Memory - BVK
47 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
CH04 COA10e
No ratings yet
CH04 COA10e
46 pages
Cache Memory Characteristics
No ratings yet
Cache Memory Characteristics
67 pages
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
No ratings yet
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
47 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
T1 and E1 Setup
100% (1)
T1 and E1 Setup
46 pages
Fundamentals of N Tier
No ratings yet
Fundamentals of N Tier
169 pages
Iptables
No ratings yet
Iptables
10 pages
Sony HDR-SR5/SR5C/SR5E/SR7//SR8 Service Manual
No ratings yet
Sony HDR-SR5/SR5C/SR5E/SR7//SR8 Service Manual
118 pages
Cisco WAP321 Wireless-N Selectable-Band Access Point With Power Over Ethernet
No ratings yet
Cisco WAP321 Wireless-N Selectable-Band Access Point With Power Over Ethernet
7 pages
Computer Networks: Transport, Session, and Presentation Layers
67% (3)
Computer Networks: Transport, Session, and Presentation Layers
12 pages
Complaint Demand For Jury Trial
No ratings yet
Complaint Demand For Jury Trial
28 pages
Ru 1273
No ratings yet
Ru 1273
20 pages
Government of India Ministry of Electronics & Information Technology National Informatics Centre
No ratings yet
Government of India Ministry of Electronics & Information Technology National Informatics Centre
2 pages
Operating System: Property of STI
No ratings yet
Operating System: Property of STI
3 pages
Dish TV
No ratings yet
Dish TV
25 pages
8255 PPI I/O Modes and Functions
No ratings yet
8255 PPI I/O Modes and Functions
4 pages
SAP's Java IDE for Enterprise Teams
No ratings yet
SAP's Java IDE for Enterprise Teams
4 pages
Imaps Dev Pack 20 Eda Perspective
No ratings yet
Imaps Dev Pack 20 Eda Perspective
22 pages
ATS-820 USB-RS-485 Interface Convector PDF
No ratings yet
ATS-820 USB-RS-485 Interface Convector PDF
2 pages
PCI Express System Architecture 27-Mar
100% (1)
PCI Express System Architecture 27-Mar
26 pages
Experiment-1: Simulation Result For Half Adder and Full Adder Circuit
No ratings yet
Experiment-1: Simulation Result For Half Adder and Full Adder Circuit
7 pages
AVR Microcontroller PWM Guide
No ratings yet
AVR Microcontroller PWM Guide
72 pages
IDOC Testing: You Can Test The Idoc Using Idoc Test Tool. Just Follow The Steps Above
No ratings yet
IDOC Testing: You Can Test The Idoc Using Idoc Test Tool. Just Follow The Steps Above
3 pages
Smartphone Integration Package (Code 14U) : Individual News
No ratings yet
Smartphone Integration Package (Code 14U) : Individual News
2 pages
IxD (Interaction Design) Checklist - Myplanet Digital
No ratings yet
IxD (Interaction Design) Checklist - Myplanet Digital
5 pages
RBH View Client Training Manual
No ratings yet
RBH View Client Training Manual
12 pages
Ytha Yu, Charles Marut-Assembly Language Programming Organization of The IBM PC (1992)
90% (72)
Ytha Yu, Charles Marut-Assembly Language Programming Organization of The IBM PC (1992)
551 pages
Interview Questions For System Administrator
No ratings yet
Interview Questions For System Administrator
7 pages
Computer Fundamentals Exam Guide
No ratings yet
Computer Fundamentals Exam Guide
3 pages
The 11109 Public VPN Relay Servers by Volunteers Around The World
No ratings yet
The 11109 Public VPN Relay Servers by Volunteers Around The World
5 pages
Power PC-Data Sheet
No ratings yet
Power PC-Data Sheet
10 pages
AZ125: Industrial Safety Starts With IEC/UL 60730 Standards
No ratings yet
AZ125: Industrial Safety Starts With IEC/UL 60730 Standards
48 pages
Carrier VoIP Communication Server 2000 International Software Portfolio (I) CVM12 - NN10514-111 - 12.07
50% (2)
Carrier VoIP Communication Server 2000 International Software Portfolio (I) CVM12 - NN10514-111 - 12.07
475 pages
BIRT Performance Scorecard Root Cause Analysis and Data Visualization
No ratings yet
BIRT Performance Scorecard Root Cause Analysis and Data Visualization
8 pages

Cache PPT

Uploaded by

Cache PPT

Uploaded by

Computer Architecture

ELE 475 / COS 475

[ From Foss, R.C. “Implementing Application-

• Performance of high-speed computers is usually limited by

• Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute

• Signals have further to travel

• Capacity: Register << SRAM << DRAM

Time (one dot per access to that address at that time)

• Exploit temporal locality by remembering the

copy of main copy of main

• Block Placement: Where can a block be

Set Number 0 1 2 3 01234567

Fully (2-way) Set Direct

Set Number 0 1 2 3 01234567

Fully (2-way) Set Direct

• Cache uses index and offset to find

• Cache checks all potential blocks with

• Compulsory – first-reference to a block, occur even

Plot from Hennessy and Patterson Ed. 4

• Less tag overhead • Can waste bandwidth if data is

Empirical Rule of Thumb:

Empirical Rule of Thumb:

Empirical Rule of Thumb:

• MIT material derived from course 6.823

You might also like