0% found this document useful (0 votes)

453 views33 pages

Cache Coherency

The document discusses cache coherence in shared-memory multiprocessor architectures. It describes bus-based shared memory systems where CPUs are connected via a shared bus. It outlines the problem of memory coherence when caches hold shared copies of data. It proposes the use of bus snooping protocols like write invalidate and write update to maintain coherence. It also describes the MESI protocol, a practical invalidate protocol that uses cache line states to minimize bus usage.

Uploaded by

KUSHAL NEHETE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

453 views33 pages

Cache Coherency

Uploaded by

KUSHAL NEHETE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Cache coherence in

shared-memory architectures

Adapted from a lecture by Ian Watson, University of Machester

1
Overview

We have talked about optimizing performance on

single cores
Locality
Vectorization
Now let us look at optimizing programs for a
shared-memory multiprocessor.
Two architectures:
Bus-based shared-memory machines (small-scale)
Directory-based shared-memory machines (large-scale)
2
Bus-based Shared Memory
Organization

Basic picture is simple :-

Shared
CPU CPU CPU Memory
Cache Cache Cache

Shared Bus

3
Organization

Bus is usually simple physical connection

(wires)
Bus bandwidth limits no. of CPUs
Could be multiple memory elements
For now, assume that each CPU has only a
single level of cache

4
Problem of Memory Coherence

Assume just single level caches and main

memory
Processor writes to location in its cache
Other caches may hold shared copies - these
will be out of date
Updating main memory alone is not enough

5
Example
1 X: 24
32 2 X: 24 3 Shared X: 24
CPU CPU CPU Memory
Cache Cache Cache

Shared Bus
Processor 1 reads X: obtains 24 from memory and caches it
Processor 2 reads X: obtains 24 from memory and caches it
Processor 1 writes 32 to X: its locally cached copy is updated
Processor 3 reads X: what value should it get?
Memory and processor 2 think it is 24
Processor 1 thinks it is 32
6
Notice that having write-through caches is not good enough
Bus Snooping

Each CPU (cache system) snoops (i.e. watches

continually) for write activity concerned with data
addresses which it has cached.
This assumes a bus structure which is global, i.e
all communication can be seen by all.
More scalable solution: directory based
coherence schemes

7
Snooping Protocols

Write Invalidate
CPU wanting to write to an address, grabs a bus
cycle and sends a write invalidate message
All snooping caches invalidate their copy of
appropriate cache line
CPU writes to its cached copy (assume for now
that it also writes through to memory)
Any shared read in other CPUs will now miss
in cache and re-fetch new data.
8
Snooping Protocols

Write Update
CPU wanting to write grabs bus cycle and
broadcasts new data as it updates its own copy
All snooping caches update their copy
Note that in both schemes, problem of
simultaneous writes is taken care of by bus
arbitration - only one CPU can use the bus
at any one time.
9
Update or Invalidate?

Update looks the simplest, most obvious

and fastest, but:-
Multiple writes to same word (no intervening
read) need only one invalidate message but
would require an update for each
Writes to same block in (usual) multi-word
cache block require only one invalidate but
would require multiple updates.

10
Update or Invalidate?

Due to both spatial and temporal locality,

previous cases occur often.
Bus bandwidth is a precious commodity in
shared memory multi-processors
Experience has shown that invalidate
protocols use significantly less bandwidth.
Will consider implementation details only
of invalidate.
11
Implementation Issues

In both schemes, knowing if a cached value is not

shared (copy in another cache) can avoid sending
any messages.
Invalidate description assumed that a cache value
update was written through to memory. If we used
a copy back scheme other processors could re-
fetch old value on a cache miss.
We need a protocol to handle all this.

12
MESI Protocol (1)

A practical multiprocessor invalidate protocol

which attempts to minimize bus usage.
Allows usage of a write back scheme - i.e. main
memory not updated until dirty cache line is
displaced
Extension of usual cache tags, i.e. invalid tag and
dirty tag in normal write back cache.

13
MESI Protocol (2)

Any cache line can be in one of 4 states (2 bits)

Modified - cache line has been modified, is
different from main memory - is the only cached
copy. (multiprocessor dirty)
Exclusive - cache line is the same as main
memory and is the only cached copy
Shared - Same as main memory but copies may
exist in other caches.
Invalid - Line data is not valid (as in simple
cache)
14
MESI Protocol (3)

Cache line changes state as a function of

memory access events.
Event may be either
Due to local processor activity (i.e. cache
access)
Due to bus activity - as a result of snooping
Cache line has its own state affected only if
address matches

15
MESI Protocol (4)

Operation can be described informally by

looking at action in local processor
Read Hit
Read Miss
Write Hit
Write Miss
More formally by state transition diagram

16
MESI Local Read Hit

Line must be in one of MES

This must be correct local value (if M it
must have been modified locally)
Simply return value
No state change

17
MESI Local Read Miss (1)

No other copy in caches

Processor makes bus request to memory
Value read to local cache, marked E
One cache has E copy
Processor makes bus request to memory
Snooping cache puts copy value on the bus
Memory access is abandoned
Local processor caches value
Both lines set to S
18
MESI Local Read Miss (2)

Several caches have S copy

Processor makes bus request to memory
One cache puts copy value on the bus
(arbitrated)
Memory access is abandoned
Local processor caches value
Local copy set to S
Other copies remain S
19
MESI Local Read Miss (3)

One cache has M copy

Processor makes bus request to memory
Snooping cache puts copy value on the bus
Memory access is abandoned
Local processor caches value
Local copy tagged S
Source (M) value copied back to memory
Source value M -> S

20
MESI Local Write Hit (1)

Line must be one of MES

M
line is exclusive and already dirty
Update local cache value
no state change
E
Update local cache value
State E -> M
21
MESI Local Write Hit (2)

S
Processor broadcasts an invalidate on bus
Snooping processors with S copy change S->I
Local cache value is updated
Local state change S->M

22
MESI Local Write Miss (1)

Detailed action depends on copies in other

processors

No other copies
Value read from memory to local cache (?)
Value updated
Local copy state set to M

23
MESI Local Write Miss (2)

Other copies, either one in state E or more

in state S
Value read from memory to local cache - bus
transaction marked RWITM (read with intent to
modify)
Snooping processors see this and set their copy
state to I
Local copy updated & state set to M

24
MESI Local Write Miss (3)

Another copy in state M

Processor issues bus transaction marked
RWITM
Snooping processor sees this
Blocks RWITM request
Takes control of bus
Writes back its copy to memory
Sets its copy state to I
25
MESI Local Write Miss (4)

Another copy in state M (continued)

Original local processor re-issues RWITM
request
Is now simple no-copy case
Value read from memory to local cache
Local copy value updated
Local copy state set to M

26
Putting it all together

All of this information can be described

compactly using a state transition diagram
Diagram shows what happens to a cache
line in a processor as a result of
memory accesses made by that processor (read
hit/miss, write hit/miss)
memory accesses made by other processors that
result in bus transactions observed by this
snoopy cache (Mem read, RWITM,Invalidate)
27
MESI locally initiated accesses

Read
Miss(sh) Read
Invalid Mem Read Shared Hit

Read Invalidate
RWITM Mem Read
Miss(ex) Write
Write Hit
Miss

Read Read
Modified Exclusive Hit
Hit Write
Hit
Write = bus transaction
Hit 28
MESI remotely initiated accesses
Mem Read

Invalidate

Invalid Shared

Mem Read
RWITM Mem Read RWITM

Modified Exclusive

= copy back
29
MESI notes
There are minor variations (particularly to
do with write miss)
Normal write back when cache line is
evicted is done if line state is M
Multi-level caches
If caches are inclusive, only the lowest level
cache needs to snoop on the bus

30
Directory Schemes

Snoopy schemes do not scale because they rely on

broadcast

Directory-based schemes allow scaling.

avoid broadcasts by keeping track of all PEs caching a
memory block, and then using point-to-point messages to
maintain coherence
they allow the flexibility to use any scalable point-to-point
network

31
Basic Scheme (Censier & Feautrier)
P P

Cache Cache Assume "k" processors.

With each cache-block in memory:
Interconnection Network k presence-bits, and 1 dirty-bit
With each cache-block in cache:
Memory Directory
1valid bit, and 1 dirty (owner) bit

presence bits dirty bit

Read from main memory by PE-i:
If dirty-bit is OFF then { read from main memory; turn p[i] ON; }
if dirty-bit is ON then { recall line from dirty PE (cache state to
shared); update memory; turn dirty-bit OFF; turn p[i] ON; supply
recalled data to PE-i; }
Write to main memory:
If dirty-bit OFF then { send invalidations to all PEs caching that block;
turn dirty-bit ON; turn P[i] ON; ... }
... 32
Key Issues
Scaling of memory and directory bandwidth
Can not have main memory or directory memory centralized
Need a distributed memory and directory structure
Directory memory requirements do not scale well
Number of presence bits grows with number of PEs
Many ways to get around this problem
limited pointer schemes of many flavors
Industry standard
SCI: Scalable Coherent Interface

PCIe
0% (1)
PCIe
41 pages
VCS Commands Ease Coverage Efforts - Speed Simulation PDF
No ratings yet
VCS Commands Ease Coverage Efforts - Speed Simulation PDF
6 pages
PCIE Data Link Layer Verification
No ratings yet
PCIE Data Link Layer Verification
5 pages
Behavioral Model of A DDR Memory Controller in A DFi - Frequency Ratio System
100% (1)
Behavioral Model of A DDR Memory Controller in A DFi - Frequency Ratio System
10 pages
Ethernet Testbench SV
No ratings yet
Ethernet Testbench SV
31 pages
DDR Controller
No ratings yet
DDR Controller
30 pages
AMBA AHB Protocol Overview
No ratings yet
AMBA AHB Protocol Overview
31 pages
FIFO Design for Engineers
No ratings yet
FIFO Design for Engineers
15 pages
USB Interview Questions PDF
No ratings yet
USB Interview Questions PDF
8 pages
UVM RAL: Simplifying Verification
No ratings yet
UVM RAL: Simplifying Verification
93 pages
FSM in SV-class
75% (4)
FSM in SV-class
13 pages
AXI Verification IP
No ratings yet
AXI Verification IP
54 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
02 - 05 PCIe 6.0 PHY Logical
No ratings yet
02 - 05 PCIe 6.0 PHY Logical
25 pages
SystemVerilog Randomization FAQ
100% (1)
SystemVerilog Randomization FAQ
268 pages
Verilog FSM Coding & Interview Q&A
No ratings yet
Verilog FSM Coding & Interview Q&A
8 pages
Pcie Interview Question Day3
100% (2)
Pcie Interview Question Day3
5 pages
CacheCoherencyWhitepaper 6june2011 PDF
No ratings yet
CacheCoherencyWhitepaper 6june2011 PDF
15 pages
Resume Deshdeepak
No ratings yet
Resume Deshdeepak
4 pages
Difference: new() vs new[] in SystemVerilog
No ratings yet
Difference: new() vs new[] in SystemVerilog
9 pages
DDR4 Basics
No ratings yet
DDR4 Basics
12 pages
INterview Ques - Pcie
No ratings yet
INterview Ques - Pcie
3 pages
14.25 Tao Liu Richard Ho UVM Based RISC V Processor Verification Platform
No ratings yet
14.25 Tao Liu Richard Ho UVM Based RISC V Processor Verification Platform
22 pages
Design and Simulation of A PCI Express Based Embed
No ratings yet
Design and Simulation of A PCI Express Based Embed
7 pages
Basics of DDR Protocol: Jose Thomas Vellara
No ratings yet
Basics of DDR Protocol: Jose Thomas Vellara
57 pages
CES UVM 1.0 (3-Day) 2011 05 15
100% (1)
CES UVM 1.0 (3-Day) 2011 05 15
352 pages
SystemVerilog Assertions Guide
No ratings yet
SystemVerilog Assertions Guide
39 pages
System Verilog
No ratings yet
System Verilog
132 pages
Verifying A Low Power Design: Asif Jafri
No ratings yet
Verifying A Low Power Design: Asif Jafri
10 pages
Pcie Gen 4 &gen 5
No ratings yet
Pcie Gen 4 &gen 5
54 pages
Metastability and CDC-1
No ratings yet
Metastability and CDC-1
32 pages
Pciec Tutorial
No ratings yet
Pciec Tutorial
305 pages
Case Studies of DDR Subsystem Integration
No ratings yet
Case Studies of DDR Subsystem Integration
38 pages
Coverage/Block Level Functional Coverage Example
No ratings yet
Coverage/Block Level Functional Coverage Example
14 pages
PCIe Transaction and Data Link Layers Verification IP Development Using UVM
No ratings yet
PCIe Transaction and Data Link Layers Verification IP Development Using UVM
4 pages
MindShare Intro To PIPE Spec
No ratings yet
MindShare Intro To PIPE Spec
15 pages
15 SVAssertionsLecture1
No ratings yet
15 SVAssertionsLecture1
20 pages
SystemVerilog DPI Tutorial
No ratings yet
SystemVerilog DPI Tutorial
6 pages
ASIC Interview Question & Answer: Memory Interface Questions
No ratings yet
ASIC Interview Question & Answer: Memory Interface Questions
5 pages
Pcie End Point (Ep) Uvm Vip: Block Diagram
No ratings yet
Pcie End Point (Ep) Uvm Vip: Block Diagram
2 pages
SystemVerilog for Chip Designers
No ratings yet
SystemVerilog for Chip Designers
7 pages
Chapter 4 - Cache Memory: Luis Tarrataca
No ratings yet
Chapter 4 - Cache Memory: Luis Tarrataca
159 pages
Ahb Faqs FAQsnew
No ratings yet
Ahb Faqs FAQsnew
10 pages
SystemVerilog Assertions Tutorial
100% (1)
SystemVerilog Assertions Tutorial
9 pages
Bromley Coverage Paper
No ratings yet
Bromley Coverage Paper
20 pages
Verilog Gotchas Part1
No ratings yet
Verilog Gotchas Part1
63 pages
UVM HowTo
No ratings yet
UVM HowTo
93 pages
DDR4 Design Verification
100% (1)
DDR4 Design Verification
11 pages
System Verilog Quick View New PDF
100% (3)
System Verilog Quick View New PDF
33 pages
System Verilog Classes
No ratings yet
System Verilog Classes
105 pages
AXI
100% (1)
AXI
87 pages
PCIe Configuration for Engineers
No ratings yet
PCIe Configuration for Engineers
16 pages
Keystone PCIE
No ratings yet
Keystone PCIE
215 pages
Ahb Questions
No ratings yet
Ahb Questions
20 pages
Optimizing SystemVerilog DPI-C Integration
100% (1)
Optimizing SystemVerilog DPI-C Integration
22 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Cache Coherence Protocols Guide
No ratings yet
Cache Coherence Protocols Guide
24 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Engineer Job Application
No ratings yet
Engineer Job Application
3 pages
Study Guide Quiz 2
No ratings yet
Study Guide Quiz 2
3 pages
3407 Biomedical Engineering Technology
No ratings yet
3407 Biomedical Engineering Technology
1 page
Join Full Batch Download Careerwill App-: Telegram Channel by Gagan Pratap Sir
No ratings yet
Join Full Batch Download Careerwill App-: Telegram Channel by Gagan Pratap Sir
2 pages
BioGeometry Insights & 2013 Reflections
No ratings yet
BioGeometry Insights & 2013 Reflections
4 pages
LI04-Esempio2 (Inglese-Russo)
No ratings yet
LI04-Esempio2 (Inglese-Russo)
5 pages
Computer Architecture Question Bank
No ratings yet
Computer Architecture Question Bank
7 pages
English9 - q3 - Mod4 - Recognize Faulty Logic Unsupported Facts and Emotional Appeal - v4
No ratings yet
English9 - q3 - Mod4 - Recognize Faulty Logic Unsupported Facts and Emotional Appeal - v4
17 pages
ITIL
No ratings yet
ITIL
14 pages
2 - GPS and Surveying
100% (1)
2 - GPS and Surveying
21 pages
400 KV Tender Docs PDF
No ratings yet
400 KV Tender Docs PDF
356 pages
Love Language Profile For Singles - The 5 Love Languages®
No ratings yet
Love Language Profile For Singles - The 5 Love Languages®
2 pages
June 2019 QP
No ratings yet
June 2019 QP
32 pages
University of Toronto Scarborough STAB22 Midterm Examination
No ratings yet
University of Toronto Scarborough STAB22 Midterm Examination
13 pages
Competency Based Interviewing PDF
No ratings yet
Competency Based Interviewing PDF
3 pages
Scholarship Aspirations in Urban Planning
No ratings yet
Scholarship Aspirations in Urban Planning
15 pages
Kaizen Overview
No ratings yet
Kaizen Overview
14 pages
Listening Practice Test 8
No ratings yet
Listening Practice Test 8
9 pages
Be A Better Leader
No ratings yet
Be A Better Leader
11 pages
American WWII Eastern Front Views
No ratings yet
American WWII Eastern Front Views
18 pages
Ghost Arnt Real
No ratings yet
Ghost Arnt Real
2 pages
Class 5 National Genius Search Examination: Advanced: Check The Correctness of The Roll No. With The Answer Sheet
No ratings yet
Class 5 National Genius Search Examination: Advanced: Check The Correctness of The Roll No. With The Answer Sheet
3 pages
Command 700
No ratings yet
Command 700
21 pages
Empowerment Technology: Joseph Ryan Lopez Subject Teacher
No ratings yet
Empowerment Technology: Joseph Ryan Lopez Subject Teacher
27 pages
Hypothesis
100% (2)
Hypothesis
25 pages
Resume Olsen
No ratings yet
Resume Olsen
2 pages
Scanner Consumable Replacement Guide
No ratings yet
Scanner Consumable Replacement Guide
8 pages
Factor Analysis Example Coca Cola
No ratings yet
Factor Analysis Example Coca Cola
7 pages
THE LEVEL OF AWARENESS AMONG THE TEENAGE MOTHERS OF TUBIGON, BOHOL IN RENDERING NEWBORN CARE (Part 1)
No ratings yet
THE LEVEL OF AWARENESS AMONG THE TEENAGE MOTHERS OF TUBIGON, BOHOL IN RENDERING NEWBORN CARE (Part 1)
13 pages
Peran Gereja Dalam Pembinaan Kerohanian Remaja Di Gereja Pantekosta Di Indonesia Kota Palangkaraya
No ratings yet
Peran Gereja Dalam Pembinaan Kerohanian Remaja Di Gereja Pantekosta Di Indonesia Kota Palangkaraya
12 pages

Cache Coherency

Uploaded by

Cache Coherency

Uploaded by

Cache coherence in

Adapted from a lecture by Ian Watson, University of Machester

We have talked about optimizing performance on

Basic picture is simple :-

Bus is usually simple physical connection

Assume just single level caches and main

Each CPU (cache system) snoops (i.e. watches

Update looks the simplest, most obvious

Due to both spatial and temporal locality,

In both schemes, knowing if a cached value is not

A practical multiprocessor invalidate protocol

Any cache line can be in one of 4 states (2 bits)

Cache line changes state as a function of

Operation can be described informally by

Line must be in one of MES

No other copy in caches

Several caches have S copy

One cache has M copy

Line must be one of MES

Detailed action depends on copies in other

Other copies, either one in state E or more

Another copy in state M

Another copy in state M (continued)

All of this information can be described

Snoopy schemes do not scale because they rely on

Directory-based schemes allow scaling.

Cache Cache Assume "k" processors.

presence bits dirty bit

You might also like