0% found this document useful (0 votes)

105 views33 pages

Parallel Architecture

This document discusses parallel computing architectures and classification. It covers Flynn's classification of architectures based on instruction and data streams, including SISD, SIMD, MISD, and MIMD models. It also discusses shared memory versus distributed memory architectures. Key topics covered include cache coherence problems in shared memory systems and solutions using cache coherence protocols, as well as different interconnection network topologies used in parallel systems like meshes, toruses, hypercubes, and fat trees.

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views33 pages

Parallel Architecture

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Parallel Architecture

Sathish Vadhiyar
Motivations of Parallel Computing
 Faster execution times
 From days or months to hours or seconds
 E.g., climate modelling, bioinformatics
 Large amount of data dictate parallelism
 Parallelism more natural for certain kinds of
problems, e.g., climate modelling
 Due to computer architecture trends
 CPU speeds have saturated
 Slow memory bandwidths

2
Classification of Architectures – Flynn’s
classification
In terms of parallelism in
instruction and data stream
 Single Instruction Single
Data (SISD): Serial
Computers
 Single Instruction Multiple
Data (SIMD)
- Vector processors and
processor arrays
- Examples: CM-2, Cray-90,
Cray YMP, Hitachi 3600

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

3
Classification of Architectures – Flynn’s
classification
 Multiple Instruction Single
Data (MISD): Not popular
 Multiple Instruction
Multiple Data (MIMD)
- Most popular
- IBM SP and most other
supercomputers,
clusters, computational
Grids etc.

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/

4
Classification of Architectures – Based on
Memory
 Shared memory
 2 types – UMA and
NUMA NUMA
Examples: HP-
Exemplar, SGI Origin,
UMA Sequent NUMA-Q

Courtesy: http://www.llnl.gov/computing/tutorials/parallel_comp/ 5
Classification 2:
Shared Memory vs Message Passing
 Shared memory machine: The n processors
share physical address space
 Communication can be done through this shared
memory P
M P
M P
M P
M P
M P
M P
M

P P P Interconnect
P P P P

Interconnect
Main Memory

 The alternative is sometimes referred to

as a message passing machine or a
distributed memory machine
6
Shared Memory Machines
The shared memory could itself be
distributed among the processor nodes
 Each processor might have some portion of the
shared physical address space that is physically
close to it and therefore accessible in less time
 Terms: NUMA vs UMA architecture
 Non-Uniform Memory Access
 Uniform Memory Access

7
SHARED MEMORY AND
CACHES

8
Shared Memory Architecture: Caches
P1 P2
ReadX=1
Write X Read X
Cache hit:
Wrong data!!
X:
X:10 X: 0

X: 1
0

9
Cache Coherence Problem
 If each processor in a shared memory
multiple processor machine has a data cache
 Potential data consistency problem: the cache
coherence problem
 Shared variable modification, private cache
 Objective: processes shouldn’t read `stale’
data
 Solutions
 Hardware: cache coherence mechanisms

10
Cache Coherence Protocols
 Write update – propagate cache line to other
processors on every write to a processor
 Write invalidate – each processor gets the
updated cache line whenever it reads stale
data
 Which is better?

11
Invalidation Based Cache Coherence
P1 P2
ReadX=1
Write X Read X

X: 1
X:
X:10 X: 0

Invalidate

X: 0 X: 1

12
Cache Coherence using invalidate protocols

 3 states associated with data items

 Shared – a variable shared by 2
caches
 Invalid – another processor (say P0)
has updated the data item
 Dirty – state of the data item in P0

13
Implementations of cache coherence protocols

 Snoopy
 for bus based architectures
 shared bus interconnect where all cache
controllers monitor all bus activity
 There is only one operation through bus at a
time; cache controllers can be built to take
corrective action and enforce coherence in
caches
 Memory operations are propagated over the bus
and snooped

14
Implementations of cache coherence protocols

 Directory-based
 Instead of broadcasting memory operations to
all processors, propagate coherence operations
to relevant processors
 A central directory maintains states of cache
blocks, associated processors

15
Implementation of Directory Based
Protocols
 Using presence bits for the owner processors
 Two schemes:
 Full bit vector scheme – O(MxP) storage for
P processors and M cache lines
 But not necessary
 Modern day processors use sparse or tagged
directory scheme
 Limited cache lines and limited presence bits
16
False Sharing
 Cache coherence occurs at the granularity of
cache lines – an entire cache line is
invalidated
 Modern day cache lines are 64 bytes in size
 Consider a Fortran program dealing with a
matrix
 Assume each thread or process accessing a
row of a matrix
 Leads to false sharing

17
False sharing: Solutions
 Reorganize the code so that each processor
access a set of rows
 Can still lead to overlapping of cache lines if
matrix size not divisible by processors
 In such cases, employ padding
 Padding: dummy elements added to make
the matrix size divisible

18
INTERCONNECTION
NETWORKS

19
Interconnects
 Used in both shared memory and
distributed memory architectures
 In shared memory: Used to connect
processors to memory
 In distributed memory: Used to connect
different processors
 Components
 Interface (PCI or PCI-e): for connecting
processor to network link
 Network link connected to a communication
network (network of connections)

20
Communication network
 Consists of switching elements to which
processors are connected through ports
 Switch: network of switching elements
 Switching elements connected with each
other using a pattern of connections
 Pattern defines the network topology

 In shared memory systems, memory units

are also connected to communication
network
21
Parallel Architecture: Interconnections
 Routing techniques: how the route taken by the message
from source to destination is decided
 Network topologies
 Static – point-to-point communication links among processing
nodes
 Dynamic – Communication links are formed dynamically by
switches

22
Network Topologies

 Static
 Bus
 Completely connected
 Star
 Linear array, Ring (1-D torus)
 Mesh
 k-d mesh: d dimensions with k nodes in each dimension
 Hypercubes – 2-logp mesh
 Trees – our campus network
 Dynamic – Communication links are formed dynamically by
switches
 Crossbar
 Multistage
 For more details, and evaluation of topologies, refer to book by
Grama et al.
23
Network Topologies
 Bus, ring – used in small-
scale shared memory
systems

 Crossbar switch – used in

some small-scale shared
memory machines, small
or medium-scale
distributed memory
machines 24
Crossbar Switch
 Consists of 2D grid of switching elements
 Each switching element consists of 2 input
ports and 2 output ports
 An input port dynamically connected to an
output port through a switching logic

25
Multistage network – Omega network
 To reduce switching complexity
 Omega network – consisting of logP stages,
each consisting of P/2 switching elements

 Contention
 In crossbar – nonblocking
 In Omega – can occur during multiple
communications to disjoint pairs
26
Mesh, Torus, Hypercubes, Fat-tree
 Commonly used network topologies in
distributed memory architectures
 Hypercubes are networks with dimensions

27
Mesh, Torus, Hypercubes

2D
Mesh
Hypercube (binary n-cube)

n=2 n=3

Torus

28
Fat Tree Networks
 Binary tree
 Processors arranged in leaves
 Other nodes correspond to switches
 Fundamental property:
No. of links from a node to
a children = no. of links
from the node to its parent
 Edges become fatter as we traverse up the
tree

29
Fat Tree Networks
 Any pairs of processors can communicate
without contention: non-blocking network
 Constant Bisection Bandwidth (CBB)
networks
 Two level fat tree has a diameter of four

30
Evaluating Interconnection topologies
 Diameter – maximum distance between any two processing
nodes
 Full-connected – 1

 Star – 2
 Ring – p/2

 Hypercube - logP

 Connectivity – multiplicity of paths between 2 nodes. Miniimum

number of arcs to be removed from network to break it into two
disconnected networks
 Linear-array – 1

 Ring – 2
 2-d mesh – 2

 2-d mesh with wraparound – 4

 D-dimension hypercubes – d

31
Evaluating Interconnection topologies
 bisection width – minimum number of links to
be removed from network to partition it into 2
equal halves
 Ring – 2
 P-node 2-D mesh - Root(P)
 Tree – 1
 Star – 1
 Completely connected – P2/4
 Hypercubes - P/2

32
Evaluating Interconnection topologies

 channel width – number of bits that can be

simultaneously communicated over a link, i.e.
number of physical wires between 2 nodes
 channel rate – performance of a single physical
wire
 channel bandwidth – channel rate times channel
width
 bisection bandwidth – maximum volume of
communication between two halves of network,
i.e. bisection width times channel bandwidth

Parallel Architecture: Sathish Vadhiyar
No ratings yet
Parallel Architecture: Sathish Vadhiyar
26 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Introduction
No ratings yet
Introduction
46 pages
Unit 4
No ratings yet
Unit 4
9 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Pdcco 1
No ratings yet
Pdcco 1
8 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
Unit 1
No ratings yet
Unit 1
25 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
No ratings yet
Additional Topics of Unit-I and Unit-II: Syed Rameem Zahra
21 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
16 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Module 4
No ratings yet
Module 4
66 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
CS621 Final Term
No ratings yet
CS621 Final Term
111 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Network 34
No ratings yet
Network 34
76 pages
cs668 Lec1 ParallelArch
No ratings yet
cs668 Lec1 ParallelArch
18 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
Parallel Computing Architecture Guide
No ratings yet
Parallel Computing Architecture Guide
72 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
RG1 Intro ParallelArch HPCAI Jan2020
No ratings yet
RG1 Intro ParallelArch HPCAI Jan2020
47 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
Distributed System
100% (1)
Distributed System
26 pages
Parallel Processors: Session 5 Interconnection Networks
No ratings yet
Parallel Processors: Session 5 Interconnection Networks
48 pages
Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Week 5
No ratings yet
Week 5
52 pages
3RD Unit Half 2
No ratings yet
3RD Unit Half 2
8 pages
Module 4
No ratings yet
Module 4
40 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
40 pages
Assignment: Objective
No ratings yet
Assignment: Objective
30 pages
Aphs
No ratings yet
Aphs
111 pages
Scanned by Camscanner
No ratings yet
Scanned by Camscanner
6 pages
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
No ratings yet
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
1 page
JSD Assignments Questions
No ratings yet
JSD Assignments Questions
5 pages
RSTP vs STP: Key Differences
No ratings yet
RSTP vs STP: Key Differences
17 pages
Networking Interview Questions
No ratings yet
Networking Interview Questions
11 pages
CN PPT-Unit I
No ratings yet
CN PPT-Unit I
63 pages
MS Word Guide for Beginners
No ratings yet
MS Word Guide for Beginners
43 pages
17CSL57 CN Lab Manual
No ratings yet
17CSL57 CN Lab Manual
49 pages
Cisco: 352-001 Exam
No ratings yet
Cisco: 352-001 Exam
128 pages
Unit-5 - Computer Networks-Part 1
No ratings yet
Unit-5 - Computer Networks-Part 1
12 pages
CS6551 CN Unit 1 PDF
No ratings yet
CS6551 CN Unit 1 PDF
55 pages
MANET
No ratings yet
MANET
15 pages
Introduction NoC Paper PDF
No ratings yet
Introduction NoC Paper PDF
12 pages
ns-3 Training for Network Researchers
No ratings yet
ns-3 Training for Network Researchers
58 pages
1 - Structured Cabling System Part1
No ratings yet
1 - Structured Cabling System Part1
19 pages
BulSU3 Est Part
No ratings yet
BulSU3 Est Part
61 pages
HSC CS Paper II Most IMP Questions
No ratings yet
HSC CS Paper II Most IMP Questions
132 pages
A Bridge To Fieldbus Upgrades
No ratings yet
A Bridge To Fieldbus Upgrades
1 page
IoT RPL DoS Mitigation Strategy
No ratings yet
IoT RPL DoS Mitigation Strategy
13 pages
PTZOptics WirelessCable Manual
No ratings yet
PTZOptics WirelessCable Manual
16 pages
Cisco Data Center Design Guide
67% (3)
Cisco Data Center Design Guide
180 pages
Telecom Network Protection Analysis
No ratings yet
Telecom Network Protection Analysis
9 pages
Networking Documentation
100% (4)
Networking Documentation
26 pages
Merging Ravenna Network Guide
No ratings yet
Merging Ravenna Network Guide
34 pages
8th CH 1 Back Ex Sol
No ratings yet
8th CH 1 Back Ex Sol
3 pages
98-366 Dumps MTA Networking Fundamentals: 100% Valid and Newest Version 98-366 Questions & Answers Shared by Certleader
No ratings yet
98-366 Dumps MTA Networking Fundamentals: 100% Valid and Newest Version 98-366 Questions & Answers Shared by Certleader
8 pages
Unit - 3 (Wireless Communication & Network) - 1
No ratings yet
Unit - 3 (Wireless Communication & Network) - 1
20 pages
MCQ Related To Fundamentals of Computers For Bba Course
No ratings yet
MCQ Related To Fundamentals of Computers For Bba Course
22 pages
Vacon NX OPTC7 DeviceNet Board User Manual DPD0089
0% (1)
Vacon NX OPTC7 DeviceNet Board User Manual DPD0089
52 pages
Jir Student Guide PDF
No ratings yet
Jir Student Guide PDF
192 pages
Nursing Informatics
No ratings yet
Nursing Informatics
7 pages
802.11 WiFi Architecture Guide
No ratings yet
802.11 WiFi Architecture Guide
25 pages

Parallel Architecture

Uploaded by

Parallel Architecture

Uploaded by

Parallel Architecture

 The alternative is sometimes referred to

 3 states associated with data items

 In shared memory systems, memory units

 Crossbar switch – used in

 Connectivity – multiplicity of paths between 2 nodes. Miniimum

 2-d mesh with wraparound – 4

 channel width – number of bits that can be

You might also like