0% found this document useful (0 votes)

44 views53 pages

Cache Coherence

The document discusses cache coherence in multiprocessors. It describes snooping and directory-based cache coherence protocols. The snooping protocol uses bus broadcasting to maintain coherence while the directory-based protocol uses a centralized directory to track shared data blocks.

Uploaded by

20bec004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views53 pages

Cache Coherence

Uploaded by

20bec004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

1

Cache Coherence in Multiprocessors

Dr. Dhaval Shah

Fig. 1. symmetric multiprocessor Fig. 2. Distributed memory Multiprocessor

(UMA Model) (NUMA Model)
Different Organization of SMPS
3

 Processors and Cache on separate extension boards (1980)

 Plugged on to the back plane

 Integrated on the main board (1990)

4 or 6 processors placed per board.

 Integrated on the same chip (multi-core) (2000)

 DualCore (IBM, Intel, AMD)
 Quad Core
Why not more cores on chip?
4

 Clock Skew
 Temperature/Power dissipation
Multicore for Low Power
5

 Same performance power dissipation is reduced.

 Thread level parallelism can be exploited to increase
performance of multicore.
6

 Private cache Vs. Shared cache

Shared cache Private cache

L2 Organizations
7

 Advantages of a Shared L2 Cache:

 Efficient
dynamic use of space by each core
 Data shared by multicore is not replicated

 Every block has a fixed “home” – hence easy to find the latest copy

 Advantage of private L2 cache

 Quick access to private L2 cache
 Private bus to private L2 cache, less contention
8
Shared Memory: Coherence
9

 When shared data are cached.

 Allows migration and replication
 These are replicated in multiple caches.
 Reduces latency to access a shared data

 Reduce bandwidth demand on the shared memory.

 Data in the caches of different processors may become inconsistent.(write

back policy)
 How to enforce cache coherency?
 How does a processor know changes in caches of other processor?
Possible Solutions
10

 Software solutions:
 Avoids additional hardware
 Relies on compiler and OS to deal with the problem

 Compile-time overhead

 Compiler performance analysis on the code to detect which data items may
become unsafe for caching
 Prevents non-cacheable item (shared data) to be cached.

 Approach being conservative, does not lead effective use of cache

Possible Solutions
11

 Hardware solutions:
 Allowsdynamic recognition of potential inconsistency at run time
 More effective use of caches and better performance than software based
approaches.
 Reduces software development burden.

 Two basic approaches

 Snoopy protocol
 Directory protocol
12

 Snooping Protocol:
 Eachcache controller “snoops” the bus to find out which data is being used
by whom.

 Directory based Protocol:

 Keeps track of the sharing state of each data block using in a directory.
 A directory is a centralized register for all memory blocks.

 Allows coherency protocol to avoid broadcast

 Snooping coherence on a bus was

first described by Goodman(1983).
 Each core snoops the bus to find out
which data is being used(updated)
by which processor.
 Reduces memory traffic.
 All transmission on a bus are
broadcast.
Fig. 5. Snooping
SNOOPING (CONT.)
14

 Snooping protocol is basically of two types

 Write-invalidate: Invalidate all remote copies of cache when a local cache
block is updated.

 Write-update: When a local cache block is updated, the new data block is
broadcast to all caches containing a copy of the block for updating them.
WRITE INVALIDATE PROTOCOL
15

 Handling a write to shared data:

 An invalidate command is sent on bus; all caches snoop and invalidate any
copies they have

 Handling a read Miss:

 Write through: Memory always up-to-date
 Write-back: Snooping find most recent copy.
16
17
Write Invalidate vs Write Update
18

 Invalidate exploits spatial locality

 Onlyone bus transaction for any number of writes to the same block.
 More efficient.

 Broadcast has lower latency for writes and reads:

 As Compared to invalidate

 Write invalidate is the winner

 It has been adopted in Pentium IV and Power PC
Example
19

 Assume:
 Invalidate Protocol, write-back cache
 Each block of memory is one of the following states:
 Modified/ Exclusive : The line in the cache has been modified.
 Shared: Clean in all caches and up-to-date in memory, block can be read.

 Invalid: Data present in the block is obsolete, can not be used.

3 STATE MSI PROTOCOL(CONT.)
20

 Modified: thisis the only valid copy in any cache and its value is different
from that in memory
3 STATE MSI PROTOCOL(CONT.)
21

 Shared: this is a valid copy, but other caches may also contain it, and
its value is the same as in memory
3 STATE MSI PROTOCOL(CONT.)
22

 Invalid: this copy is out of date and cannot be used.

3 STATE MSI PROTOCOL(CONT.)
23
P1 P2 BUS

Step memory
State Add. Val. State Add. Val. Action Pro. Add. Val.

P1 write 10
Excl A 10 Wr Mi P1 A
to A

P1 reads A Excl A 10

Share A - Rd mi P2 A

P2 reads A Share A 10 Wr bk P1 A 10 10

Share A 10 Da Rd P2 A 10 10

P2 write 20
Invalid A - Excl A 20 Wr mi P2 A 10
to A
25
STATE DIAGRAM(CONT.)
26
STATE DIAGRAM(CONT.)
27
STATE DIAGRAM(CONT.)
28
STATE DIAGRAM(CONT.)
29
STATE DIAGRAM(CONT.)
30
STATE DIAGRAM(CONT.)
31
STATE DIAGRAM(CONT.)
32
STATE DIAGRAM(CONT.)
33
STATE DIAGRAM(CONT.)
34
STATE DIAGRAM
35
Limitations of SMPs
36

 Centralized resources in the system becomes bottleneck. – BUS

 Bus Must support normal and coherence traffic both.
 As the speed of processor increases , the number of processor that can
be supported reduces.
 How designer can increase memory bandwidth?
 Use multiple buses or interconnection networks.
 Use multiple physical banks.
37

 A directory keeps the state of every block that may be cached

 Whichcaches have copies of block
 Whether it is dirty.

 In a directory-based system, the data being shared is placed in a

common directory that maintains the coherence between caches.
 The directory acts as a filter through which the processor must ask
permission to load an entry from the primary memory to its cache.
 When an entry is changed the directory either updates or invalidates
the other caches with that entry.
DICTIONARY BASED PROTOCOL
39
DICTIONARY BASED PROTOCOL
40

 NUMA computers:
 Message have long latency
 Also broadcast is inefficient – all message have explicit responses.

 Main memory controller to keep track of:

 Which processors are having cached copies of which memory locations.
 On a write – only need to inform user not everyone
 On a dirty read
 Forward to owner.
DICTIONARY BASED PROTOCOL
41

 Shared - One or more processors have the block cached, and the
value in memory is up to date (as well as in all the caches).
 Uncached - No processor has a copy of the cache block.
 Modified - Exactly one processor has a copy of the cache block, and
it has written the block, so the memory copy is out of date. The
processor is called the owner of the block.
42

 Must track which processors have data when in the shared state
 Usually implemented using bit vector, 1 if processor has copy.
 Writes to non-exclusive data --→ Write misses
 Processor block until access completes
 Assume message received and acted upon in order sent.
P1 P2 BUS Directory
me
Step Pr Ad stat mo
State Add. Val. State Add Val. Action Val. Add. Pro. ry
o. d. e

P1 write 10
Excl A 10 Wr Mi P1 A A Excl P1
to A
P1 reads A Excl A 10

Share A - Rd mi P2 A

P2 reads A Share A 10 Fatch P1 A 10 10

P1,
Share A 10 Da Rd P2 A 10 A Shar 10
P2

Excl A 20 Wr mi P2 A 10
P2 write 20
to A Invalid Invalid P1 A A Excl P2 10
STATE DIAGRAM
44
45
46
47
48
49
50
51
52
53
54

Module 4
No ratings yet
Module 4
40 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
EGC121lect20 Multicore MSI Protocol
No ratings yet
EGC121lect20 Multicore MSI Protocol
39 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
18bce2429 Da 2 Cao
No ratings yet
18bce2429 Da 2 Cao
13 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
32 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Multiprocessors & Thread-Level Parallelism
79% (19)
Multiprocessors & Thread-Level Parallelism
29 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Cosc530 Ch5all6up
No ratings yet
Cosc530 Ch5all6up
5 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Cache Coherence
No ratings yet
Cache Coherence
18 pages
Coherence
No ratings yet
Coherence
16 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Cache Coherence Protocols Guide
No ratings yet
Cache Coherence Protocols Guide
24 pages
Shared Memory Architecture Guide
No ratings yet
Shared Memory Architecture Guide
34 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
14 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Lecture 5
No ratings yet
Lecture 5
15 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
Multiprocessor Cache Coherence
No ratings yet
Multiprocessor Cache Coherence
13 pages
Cache Coherence in SMP Systems
No ratings yet
Cache Coherence in SMP Systems
29 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Distributed Shared Memory Systems
No ratings yet
Distributed Shared Memory Systems
23 pages
MC&CC
No ratings yet
MC&CC
21 pages
ch5 4
No ratings yet
ch5 4
9 pages
Cache Coherence and Synchronization - Tutorialspoint
No ratings yet
Cache Coherence and Synchronization - Tutorialspoint
7 pages
Mehmet Senvar - Cache Coherence Protocols
No ratings yet
Mehmet Senvar - Cache Coherence Protocols
30 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
0014 SharedMemoryArchitecture
No ratings yet
0014 SharedMemoryArchitecture
31 pages
Cache Coherence Part 1
No ratings yet
Cache Coherence Part 1
37 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
No ratings yet
Thread-Level Parallelism: A Quantitative Approach, Sixth Edition
40 pages
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
No ratings yet
Cache Coherence: CEG 4131 Computer Architecture III Slides Developed by Dr. Hesham El-Rewini
63 pages
MN Cache Coherence
No ratings yet
MN Cache Coherence
11 pages
Multiprocessor Cache Coherence Design
No ratings yet
Multiprocessor Cache Coherence Design
32 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
2.symmetric Shared Memory Architectures
No ratings yet
2.symmetric Shared Memory Architectures
12 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages

Cache Coherence

Uploaded by

Cache Coherence

Uploaded by

1

Cache Coherence in Multiprocessors

Dr. Dhaval Shah

Fig. 1. symmetric multiprocessor Fig. 2. Distributed memory Multiprocessor

 Processors and Cache on separate extension boards (1980)

 Integrated on the main board (1990)

 Integrated on the same chip (multi-core) (2000)

 Same performance power dissipation is reduced.

 Private cache Vs. Shared cache

Shared cache Private cache

 Advantages of a Shared L2 Cache:

 Advantage of private L2 cache

 When shared data are cached.

 Reduce bandwidth demand on the shared memory.

 Data in the caches of different processors may become inconsistent.(write

 Approach being conservative, does not lead effective use of cache

 Two basic approaches

 Directory based Protocol:

 Allows coherency protocol to avoid broadcast

 Snooping coherence on a bus was

 Snooping protocol is basically of two types

 Handling a write to shared data:

 Handling a read Miss:

 Invalidate exploits spatial locality

 Broadcast has lower latency for writes and reads:

 Write invalidate is the winner

 Invalid: Data present in the block is obsolete, can not be used.

 Invalid: this copy is out of date and cannot be used.

 Centralized resources in the system becomes bottleneck. – BUS

 A directory keeps the state of every block that may be cached

 In a directory-based system, the data being shared is placed in a

 Main memory controller to keep track of:

P2 reads A Share A 10 Fatch P1 A 10 10

You might also like