Chapter 7

This document summarizes key aspects of multiprocessor system interconnects. It discusses different types of bus systems like local buses, backplane buses, and I/O buses. It also covers different interconnect network topologies like crossbar switches, multistage networks, and multiport memory. Issues related to cache coherence are addressed along with snoopy and directory-based cache coherence protocols. Message passing schemes for multiprocessors like store-and-forward routing and wormhole routing are briefly described.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views97 pages

Chapter 7

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

Multiprocessors and multi

computers
-prajwala T R
Dept. of CSE
PESIT
Multiprocessor system interconnects
Network characteristics
• Timing
– Synchronous
– asynchronous
• Switching
– Circuit switching
– Packet switching
• Control
– Centralized
– distributed
Hierarchical bus systems
• Local bus-
buses implemented within processor chip or
PCB
provides communication path among
components mounted on board
Memory bus
Data bus
• Backplane bus
• Is printed circuit on which many connectors
are used to plug in functional boards
• VME bus
• multibusII
• Futurebus+
Backplane bus
• I/O bus
– SCSCI-small computer system interface bus
– Made of coaxial cables with taps connecting to
disks,printer.
– Interface logic
– Ex: encore bus consists of 32 bit address,64 bit
data path and 14 bit vector bus
– Clock speed 12.5MHz
SCSI bus cable
Encore ultramax multiprocessor
architecture
Cross bar switch
• Single stage
• Multistage network
– Blocking ex:omega and baseline network
– non blocking(all possible connections between
i/o)
• Cross bar networks
Single stage
Cross bar networks
• Single stage, permutation and non blocking
network
• Unary switch set to open or close and
establishes point to point connections
• N X M or N=M
• All processors send request asynchronously
and independently
design
• Multiplexer
• Arbitration logic
• Acknowledgement signal
• Memory read or writ
• 16 processors then 4 bit control lines
• Advantages
– High bandwidth
– Interface is cheaper
– Single processor send many requests to multiple
modules
• Disadvantages
– Cost effective only for small number of processors
– Not expandable once built
Multiport memory
• Solution intermediate to bus and switch
• Only one of n processor requests is honored at
a time.
• Drawback
– Not scalable
– Large number of interconnection cables
Multiport memory
Multistage and combining networks
• Omega network
• Base line networks
• Hotspot problem
– ex: memory module
– Semaphore
– Degrade performance
Fetch and add primitive
• Increments content of memory loation.
• Atomic operation
• X,e-value, increments
• When using multiprocessor, when one process
is allowed to make change no other process
can access intermediate result
• Switch performs addition of increments.
• Disadvantage-
• Requires additional switch cycles to make
entire operation atomic.
• Rela time systems ex:IBMRP3
– 512 processors
– Omega network of 128 ports
– Bandwidth 13Gbps
– 50Mhz clock
• 2 methods to solve cache coherence problem
– Snoopy protocol- to monitor the values
– Directory based protocols-no broadcasting of
values. A central directory is maintained for
modifications made in the cache
Snoopy protocols
• Snoopy protocols are used to ensure
coherence of cache.
• The mechanism are
– write invalidate
– Write update
– Write through caches
– Write back caches
– Write once protocol
Snoopy protocols contd…
• Write invalidate protocol
– Will invalidate all remote copies when local cache
block is updates
• Write update policy
– Broadcast new data to all caches containing the
copy of block
Snoopy protocols contd…
• Write through caches
– I and j processors
– VALID o INVALID
• Possible operations:
– Read by same processR(i)
– Read by different processorR( j )
– Write by same processor W(i) Write by different
processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back caches
• Data item states: o
– RO : Read Only (Valid state)
– RW : Read Write (Valid state)
– INV : Invalid state
• Possible operations:
– Read by same processor R(i)
– Read by different processor R( j )
– Write by same processor W(i)
– Write by different processor W( j )
– Replace by same processor Z(i) Replace by different
processor Z( j )
Write back cache
Snoopy protocols contd..
• Write-once Protocol
• First write using write-through policy
• Subsequent writes using write-back policy
• In both cases, data item copy in remote caches is invalidated
• Data item states:
– Valid :cache block consistent with main memory copy
– Reserved : data has been written exactly once and is consistent
with main memory copy
– Dirty : data is written more than once but is not consistent with
main memory copy
– Invalid :block not found in cache or is inconsistent with main
memory copy
Read hit, read miss, write hit, write miss
• Read hit: The information is supplied by the c
• Read miss: The data is read from main
memory. Check for dirty or reserved states
• Write hit-if in dirty or reserved state update to
dirty state
• Write miss-invalid state
Multilevel cache coherence
• An write invalidate is sent vertically up inorder
to invalidate the shared caches at higher level.
• Higher level caches keep track of dirty blocks.
Protocol Performance issues
Directory based protocols

• Snoopy protocols broadcast the information.

• In large network this is expensive
• Write invalidate protocol leads heavy bus
traffic
• Write update protocol –the updated data may
not be used by remote processors a lot
• Hence use directory based protocol.
• Cache directories-
– List of cached locations
– Number of pointers to specify the copies of block
– Dirty bit
• Cache directories store information on where
copies of cache block resides, list of cached
locations
• Central directory scheme
– Duplicates all cache directories.
– Consistency must be maintained.
– Drawbacks-
• Contention
• Long term searches
• Distributed directory scheme
– Each memory module holds its own directories.
– State information is local to the memory module.
– If read miss in cache 2-request sent to memory
module and memory module controller
retransmits data in cache 1.
– If write hit of c1-controller sends invalidation to all
caches.
Types of directories
• Full map directories
– Each directory has n entries where n is number of
processors.
– 2 bit-entry for processor(valid),dirty bit(whether
block overwritten)
• Steps
– Cache c3 finds block containing x is valid.
– C3 issues write request to memory module
containing x.
– Memory module invalidates requests of c1 and c2
– C1 and c2 set the b it indicating x is no longer
valid.
– Memory module sends write request to c3
– Cache c3 updates value of x
Limited directories
• Directory size problem is solved- entries only if
cache block has the value X else no entry.
• Dirix
– i –number of pointers
– X-no broadcast scheme
• Full map scheme without broadcast
• i<n pointers
• Dir2NB-pointer replacement-eviction
• Directory-set associative mapping
• Scalable protocols
• Dir I B-
– Allow more than I copies of each block of data to
exist
Chained directories
• Singly linked chain
– Initially no shared copies of x
– P1 reads x from shared memory along with chain
termination pointer.
– If p2 requires cache data it is read from p1 along
with CT.
– Memory then keeps a pointer to c2.
– Gossip protocol –info passed from individual to
individual .
• Doubly linked chain
– 2 pointer-backward and forward chain pointers.
– More memory because of more storage of 2
pointers
• Cache design alternatives
• Shared caches-no private cache,will reduce
main memory access time,second level cache
• Shared data can be non cacheable.
• Cache flushing-at synchronization ,I/O and
process migration.
Atomic operation
• Synchronization primitives
– Test and set lock
– Lock-1 set
– Lock 0-reset
– Spin lock
• Wired barrier synchronization
– Wired NOR logic
– Control vector –X
– Common monitor-Y
– Xi connected to input
– Yi output
• Xi -1-process is initiated
• Barrier set to 1-synchronization
• Only one barrier line is needed to initiate and
complete single synchronization operation
3 generations of multicomputers
Message passing schemes
• Message formats
– Fixed length of packets.
– Destination addr, sequence number
– Further dived to flits-flow control digits
– Store and forward routing-packets
– Wormhole-flits
– Size of packets-64-512 bits
Store and forward routing
• Basic units of transfer are packets
• Transmitted through series of intermediate
nodes.
• Buffers are used to store packets, then
transferred to output channels
• Latency is α number of hops
Wormhole routing
• Flits are used.
• Transmission from source to destination is
done through routers
• All flits are transmitted in order ,as
inseparable companions.
• Header flit,dataflits
• Latency is independent of distance or number
of hops
Asynchronous pipelining
• Handshaking protocol
• 1 –bit ready/request line is used between
adjacent routers.
• No global clock
virtual channels
• Virtual channel is a logical link between 2
nodes.
• Flit buffer in source node and a physical
channel between them and flit buffer at
receiver node.
• One source buffer is paired with one receiver..
buffer to form virtual channel
• Physical channel is time shared by all virtual
channels
deadlocks
• Why?
– 4 flits from 4 messages occupy 4 channels
– Circular waits
• How to detect deadlock?
– Channel dependence graph
• Deadlock avoidance
– Use virtual channels
– Can be bidirectional or unidirectional
Flow control strategies
• Packet collision
• Elements –scr buffer and dest buffer holding
slit,channel
• Packet collision resolution-
– Which packet will be allocated to channel?
– What will be done to packet which is denied
Solution 1
• Virtual cut through routing scheme
– Packet 2 temporarily stored in buffer.
– Adv- not wasting resource
– Disadv-requires use of large buffer, storage delay
– Packet buffer should not have cycles
Solution 2
• Blocking flow control
– Second packet is blocked but not abandoned
Solution 3
• Discard and retransmit
– Drops packet
– Disadv-wastage of resources,unstable delivery
rate
– Requires packet retransmission and
acknowledgement.
Solution 4
• Detour after blocked
– Results in idling the resources
– Offers flexibility
– Rerouted packet enters live stock which wastes
resources
Dimension order routing
• Deterministic
– Communication path is completely predetermined
by source and destination.
– Dimension order routing- X Y routing, E cube
routing
• Adaptive routing depends on network
conditions
E cube routing
• N=2n
• Source (s),dest(d),intermediate node (v)
(0,1…n-1),
1.Direction bit ri=si-1 XOR di-1
2.V XOR 2i-1 if ri=1.else ri=0 skip
3.Move to i+1 dimension until dest reached.
example
Adaptive routing
Multicast routing algorithms
• Communication patters

– Unicast
– Broadcast
– multicast
Routing efficiency
• Channel bandwidth
• Communication delay
• Implemented by replicating packet at
intermediate node and multiple copies of
packet reach destination.
Virtual networks
Network portioning

CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Module 4
No ratings yet
Module 4
40 pages
Module 4
No ratings yet
Module 4
66 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Unit 4
No ratings yet
Unit 4
9 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Cache Coherence
No ratings yet
Cache Coherence
18 pages
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
No ratings yet
Cheat Sheet Prepared For Advanced Computer Architecture Midterm Exam - UofM
11 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
EGC121lect20 Multicore MSI Protocol
No ratings yet
EGC121lect20 Multicore MSI Protocol
39 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Lecture 9 Multi-Processor
No ratings yet
Lecture 9 Multi-Processor
83 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
Multiprocessors & Thread-Level Parallelism
79% (19)
Multiprocessors & Thread-Level Parallelism
29 pages
Module 3
No ratings yet
Module 3
25 pages
Coherence
No ratings yet
Coherence
16 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
ch5 4
No ratings yet
ch5 4
9 pages
Cosc530 Ch5all6up
No ratings yet
Cosc530 Ch5all6up
5 pages
Aca UNIT-4
No ratings yet
Aca UNIT-4
18 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
55 pages
Introduction
No ratings yet
Introduction
46 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
Aca UNIT-4
No ratings yet
Aca UNIT-4
19 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Cache Coherence Protocols Explained
No ratings yet
Cache Coherence Protocols Explained
14 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
No ratings yet
Snooping vs. Directory Based Coherency: Professor David A. Patterson Computer Science 252 Fall 1996
59 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
Distributed Shared Memory Systems
No ratings yet
Distributed Shared Memory Systems
23 pages
Multiprocessor Architecture and Programming
No ratings yet
Multiprocessor Architecture and Programming
20 pages
Introduction To Distributed Operating Systems Communication in Distributed Systems
No ratings yet
Introduction To Distributed Operating Systems Communication in Distributed Systems
150 pages
Shared Memory Architecture Guide
No ratings yet
Shared Memory Architecture Guide
34 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Snooping Cache and Directory Based Multiprocessors
No ratings yet
Snooping Cache and Directory Based Multiprocessors
59 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
32 pages
Veeam Availability On Cisco UCS Solution: Use Case: Creating Always-On Data Availability For All Applications
No ratings yet
Veeam Availability On Cisco UCS Solution: Use Case: Creating Always-On Data Availability For All Applications
2 pages
4 .2.1 Client - Server: P Art1
No ratings yet
4 .2.1 Client - Server: P Art1
18 pages
Genese Solutions Case Study Report
No ratings yet
Genese Solutions Case Study Report
14 pages
Saniya K 235 (1) Unlocked
No ratings yet
Saniya K 235 (1) Unlocked
3 pages
Kaspersky Security Training
No ratings yet
Kaspersky Security Training
5 pages
Handout Opening Keynote - The Generative AI Mindset
No ratings yet
Handout Opening Keynote - The Generative AI Mindset
33 pages
Intern Project Report
No ratings yet
Intern Project Report
13 pages
CSC 211 - Course Outline
No ratings yet
CSC 211 - Course Outline
2 pages
Weihan Wang Brian CV PDF
No ratings yet
Weihan Wang Brian CV PDF
1 page
Scammed: Defend Against Social Engineering
No ratings yet
Scammed: Defend Against Social Engineering
31 pages
Rohitash Mourya Resume
No ratings yet
Rohitash Mourya Resume
3 pages
Planning For Information Network
No ratings yet
Planning For Information Network
32 pages
CSC1002 Week3 AI Prompt
No ratings yet
CSC1002 Week3 AI Prompt
46 pages
Tutorial1 GIS
No ratings yet
Tutorial1 GIS
10 pages
Oracle Hyperion FDM Presentation 11 1 1 HFM PDF
No ratings yet
Oracle Hyperion FDM Presentation 11 1 1 HFM PDF
23 pages
SAP HANA Performance FAQ
No ratings yet
SAP HANA Performance FAQ
18 pages
Manual Test Cases:: Test Case ID Test Scenario/De Scription Test Steps Actual Results Expecte D Results PAS S/Fa IL
No ratings yet
Manual Test Cases:: Test Case ID Test Scenario/De Scription Test Steps Actual Results Expecte D Results PAS S/Fa IL
3 pages
SAP Organization Structure
No ratings yet
SAP Organization Structure
10 pages
MSCSec
No ratings yet
MSCSec
1 page
Ipa MCQ Kyt
No ratings yet
Ipa MCQ Kyt
10 pages
II4IIT Assignment-10 Solution
No ratings yet
II4IIT Assignment-10 Solution
5 pages
08 Proxy
No ratings yet
08 Proxy
14 pages
D in IT Lecture - Database and Digital Economy
No ratings yet
D in IT Lecture - Database and Digital Economy
18 pages
Dot NET MAUI Community Toolkit Succinctly
No ratings yet
Dot NET MAUI Community Toolkit Succinctly
120 pages
Canape
No ratings yet
Canape
2 pages
Os - Unit 5 Answers
No ratings yet
Os - Unit 5 Answers
31 pages
Core Syllabus It 6th Sem
No ratings yet
Core Syllabus It 6th Sem
5 pages
Sreeja Big Data Resume
No ratings yet
Sreeja Big Data Resume
6 pages
Informatica Interview Questions
No ratings yet
Informatica Interview Questions
28 pages
Employee Management System Presentation
No ratings yet
Employee Management System Presentation
14 pages

Chapter 7

Uploaded by

Chapter 7

Uploaded by

Multiprocessors and multi

• Snoopy protocols broadcast the information.

You might also like