0% found this document useful (0 votes)

9 views9 pages

Unit 4

The document discusses parallel and scalable architectures, focusing on multiprocessors and multicomputers, their interconnects, cache coherence, and synchronization mechanisms. It outlines the need for such architectures due to increasing demands for high-performance computing, their characteristics, types, scalability, advantages, and challenges. Additionally, it covers the three generations of multicomputers and the message-passing mechanism essential for communication between processors.

Uploaded by

rishithaparankusham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views9 pages

Unit 4

Uploaded by

rishithaparankusham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT - IV

Parallel and Scalable Architectures, Multiprocessors and Multicomputers, Multiprocessor

system interconnects, cache coherence and synchronization mechanism, Three Generations of
Multicomputers, Message-passing Mechanisms, Multivetor and SIMD computers.

Parallel and Scalable Architectures (10 Marks)

1. Introduction

 Parallel architecture refers to computer systems that use multiple processing

elements (PEs) to perform tasks simultaneously.
 Scalable architecture means the system can increase performance proportionally
as processors are added, without major redesign.

Such architectures are the backbone of supercomputers, data centers, AI/ML workloads,
and scientific simulations.

2. Need for Parallel and Scalable Architectures

 Increasing demand for high-performance computing (HPC).

 Limitations of single-core sequential processing (Von Neumann bottleneck).
 To support large-scale data processing, graphics, AI, weather forecasting,
simulations.

3. Characteristics

1. Multiple Processing Elements: Can be CPUs, GPUs, or vector processors.

2. Levels of Parallelism:
o Instruction-level (ILP) → pipelining.
o Data-level (DLP) → SIMD, vector.
o Task-level (TLP) → independent processes in parallel.
o Thread-level (Multithreading).
3. Interconnection Network: Provides communication between processors (Bus,
Crossbar, Mesh, Hypercube).
4. Scalability: System must maintain efficiency as processors grow.

4. Types of Parallel Architectures

1. Shared Memory Multiprocessors (SMP):

o All processors share a common memory.
o Easy programming but limited scalability.
o Example: Intel Xeon multiprocessor.
2. Distributed Memory Multicomputers:
o Each processor has its own local memory.
o Communication via message passing (MPI, PVM).
o Scalable but harder to program.
o Example: IBM SP2, Beowulf clusters.
3. Hybrid Architectures:
o Combine shared + distributed memory.
o Used in modern supercomputers.

5. Scalability

 A system is scalable if:

o Performance grows as processors are added.
o Communication overhead is minimal.
o Memory system supports large-scale data access.
 Scalability Models:
o Amdahl’s Law: pessimistic (fixed workload).
o Gustafson’s Law: optimistic (scaled workload).
o Sun–Ni Law: considers workload + memory constraints.

6. Advantages

 Increased speedup and throughput.

 Handles large problem sizes efficiently.
 Supports parallel programming models (OpenMP, CUDA, MPI).
 Flexible for scientific, AI, and real-time applications.

7. Challenges

 Synchronization among processors.

 Cache coherence problem in shared memory.
 Communication overhead in distributed memory.
 Load balancing across processors.

1. Introduction

 In multiprocessor systems, multiple CPUs need to communicate with each other

and with memory & I/O devices.
 The interconnection network provides this communication path.
 A good interconnect should be fast, reliable, scalable, and cost-effective.

2. Requirements of Interconnects

1. High Bandwidth – To support many processors.

2. Low Latency – Fast data transfer between processors.
3. Scalability – Should work efficiently as the system grows.
4. Fault Tolerance – Ability to reroute if a link fails.

3. Types of Interconnection Networks

A. Bus-based Interconnect

 All processors share a common communication bus.

 Advantages: Simple, low cost.
 Disadvantages: Bus contention → performance drops with more CPUs.
 Example: Early multiprocessors.
CPU1 --\
CPU2 ----> [ Shared Bus ] ---> Memory / I/O
CPU3 --/

B. Crossbar Switch

 Every processor has a direct path to every memory module via switches.
 Advantages: High speed, no contention (if enough switches).
 Disadvantages: Very expensive for large systems.

CPU1 ---|X|--- M1
CPU2 ---|X|--- M2
CPU3 ---|X|--- M3

C. Multistage Networks (Indirect Interconnects)

 Use multiple switching stages (e.g., Omega, Butterfly, Clos).

 Advantages: Cheaper than crossbar, scalable.
 Disadvantages: Possible blocking (two requests may collide).

D. Topology-based Interconnects (for Multicomputers)

1. Ring: Each processor connected to two neighbors. (Simple but slow).

2. Mesh / Torus: Processors in grid form, scalable, used in clusters.
3. Hypercube: Each node connected to log2(N) neighbors; very scalable.
4. Tree / Fat-tree: Hierarchical, good for large systems.

cache coherence and synchronization mechanism,

Introduction

 In multiprocessor systems, each CPU may have its own cache to reduce memory
access time.
 Problem: Inconsistency occurs when multiple caches store different copies of the
same memory location.
 Solution: Cache coherence protocols + synchronization mechanisms ensure data
consistency and orderly access.

2. Cache Coherence Problem

 Example:
o CPU1 and CPU2 cache variable X.
o CPU1 updates X = 5, but CPU2’s cache still has X = 2.
o ❌ → Inconsistent view of memory.
Conditions for Cache Coherence

1. Write Propagation: Updates to a variable must be visible to all processors.

2. Transaction Serialization: All processors must see memory operations in the same
order.

3. Cache Coherence Protocols

A. Write Policies

 Write-through: Update both cache and memory (slower, but simple).

 Write-back: Update only cache, write to memory later (efficient, but complex).

B. Protocol Types

1. Directory-based Protocols
o A directory keeps track of which caches store each memory block.
o Centralized control → scalable for large systems.
2. Snoopy Protocols
o All caches monitor (snoop) a common bus.
o If one CPU updates, others invalidate or update their cache copies.
o Suitable for bus-based multiprocessors.

4. Popular Snoopy Protocols

 MSI (Modified, Shared, Invalid)

 MESI (Modified, Exclusive, Shared, Invalid) → common in Intel processors.
 MOESI and MESIF → advanced variants.

5. Synchronization Mechanism

Ensures orderly and mutually exclusive access to shared data/resources.

A. Hardware Mechanisms

1. Locks/Atomic Instructions:
o Test-and-Set, Compare-and-Swap, Fetch-and-Add → atomic updates.
2. Barriers: All processors wait until every processor reaches a synchronization point.

B. Software Mechanisms

1. Semaphores & Mutexes: Control access to critical sections.

2. Monitors & Condition Variables: High-level synchronization constructs.
6. Example (MESI Protocol Flow)

 CPU1 writes to a block → changes state to Modified.

 Other caches mark block as Invalid.
 Next time another CPU reads, it fetches updated value → coherence maintained.

7. Diagram (Exam Sketch – Snoopy Protocol)

+---------+ +---------+
| CPU1 | | CPU2 |
| Cache | <----> | Cache |
+---------+ || +---------+
\ || /
\---- Shared Bus ----/
|
Main Memory

Three Generations of Multicomputers,

1. Introduction

 A multicomputer is a parallel computer system consisting of multiple processors,

each with its own local memory, connected by an interconnection network.
 Unlike multiprocessors (shared memory), multicomputers use message passing to
communicate.
 The development of multicomputers can be classified into three generations, based
on technology, interconnects, programming models, and performance goals.

2. First Generation (1980s – Early Multicomputers)

 Architecture:
o Experimental designs with static interconnection topologies such as mesh,
hypercube, ring, or torus.
o Each node = processor + local memory.
 Communication:
o Store-and-forward packet switching.
o High latency, limited bandwidth.
 Programming Model:
o Low-level message passing (send/receive calls).
o No standard libraries. Programmer handled data distribution manually.
 Applications: Scientific and research computing.
 Examples: Intel iPSC (1985), nCUBE-10, Cosmic Cube.

👉 Limitations: Hard to program, lack of standards, limited scalability (hundreds of

processors at most).
3. Second Generation (1990s – Cluster-based Multicomputers)

 Architecture:
o Clusters of workstations or PCs connected with high-speed networks.
o Used commodity hardware (cheap processors + network cards).
 Communication:
o Faster interconnects (Myrinet, Fast Ethernet).
o Introduction of wormhole routing → lower latency than store-and-forward.
 Programming Model:
o Standardized libraries: MPI (Message Passing Interface), PVM (Parallel
Virtual Machine).
o Easier programming with support for collective communication.
 Applications:
o Weather forecasting, fluid dynamics, financial modeling, military simulations.
 Examples: IBM SP2, Intel Paragon, Beowulf clusters.

👉 Improvements: More scalable (thousands of processors), portable software, better cost-

performance ratio.

4. Third Generation (2000s – Present: HPC Clusters, Grids, and

Clouds)

 Architecture:
o Large-scale superclusters and supercomputers with thousands to millions
of cores.
o Integration of GPUs, accelerators, and multicore CPUs.
o Support for grid computing and cloud computing.
 Communication:
o Ultra-fast interconnects like InfiniBand, 10/40/100 Gigabit Ethernet, Omni-
Path.
o Low latency and high bandwidth with advanced routing + virtual channels.
 Programming Model:
o Hybrid models: MPI + OpenMP (message passing + shared memory).
o Support for distributed shared memory (DSM) and parallel programming
frameworks.
o Integration with cloud-based models for scalability.
 Applications:
o AI/ML, Big Data Analytics, Molecular modeling, Astrophysics, Climate
simulations, Quantum computing.
 Examples: IBM Blue Gene series, Cray XT, Tianhe-2 (China), Summit (USA),
Fugaku (Japan).

Message-Passing Mechanism

1. Introduction

 In multicomputers, each processor has its own local memory → no shared memory.
 Processors must communicate by sending and receiving messages over an
interconnection network.
 This is known as the Message-Passing Mechanism.

👉 Used in parallel computing, clusters, and distributed systems.

2. Features

1. Explicit Communication – Processes exchange data via send/receive.

2. Synchronization – Communication ensures coordination among processes.
3. Portability – Standard APIs like MPI (Message Passing Interface) make programs
portable.
4. Scalability – Well-suited for large-scale systems (clusters, supercomputers).

3. Basic Operations

1. Send (destination, message) – Transmits message to another process.

2. Receive (source, message) – Accepts incoming message.

4. Types of Message Passing

1. Synchronous vs Asynchronous
o Synchronous: Sender waits until receiver acknowledges.
o Asynchronous: Sender continues without waiting.
2. Buffered vs Unbuffered
o Buffered: Messages stored temporarily in system buffer.
o Unbuffered: Direct handoff between sender and receiver.
3. Direct vs Indirect
o Direct: Sender specifies the exact receiver.
o Indirect: Messages go via mailboxes/queues.

5. Message-Passing Models

 Point-to-Point – One sender ↔ one receiver.

 Broadcast / Multicast – One sender → multiple receivers.
 Collective Communication – Group operations (scatter, gather, reduce).

6. Advantages

 Works in systems without shared memory.

 Scalable to thousands of processors.
 Standardized libraries (MPI, PVM) make programming easier.

7. Limitations

 Overhead due to copying messages and communication delays.

 More complex programming than shared memory.

8. Examples

 MPI (Message Passing Interface) – De facto standard for HPC.

 PVM (Parallel Virtual Machine) – Early library for cluster computing.
 Sockets – Used in distributed applications.

9. Diagram (Exam Sketch)

Processor A + Memory ----> Interconnection Network ----> Processor B +
Memory
(Send) (Receive)

Module 4
No ratings yet
Module 4
66 pages
Module 4
No ratings yet
Module 4
40 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Multiprocessor Architecture and Programming
No ratings yet
Multiprocessor Architecture and Programming
20 pages
Chapter Ten Architeture
No ratings yet
Chapter Ten Architeture
14 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
16 pages
Aos Questions Ia
No ratings yet
Aos Questions Ia
19 pages
10 Multithreading
No ratings yet
10 Multithreading
60 pages
Coa Unit 5
No ratings yet
Coa Unit 5
18 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Computer Architecture: Multiprocessors
No ratings yet
Computer Architecture: Multiprocessors
17 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Multi Core
No ratings yet
Multi Core
7 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Lecture 19
No ratings yet
Lecture 19
20 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
Coa Unit 5 Notes
No ratings yet
Coa Unit 5 Notes
18 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
55 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
EGC121lect20 Multicore MSI Protocol
No ratings yet
EGC121lect20 Multicore MSI Protocol
39 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Multiprocessors & Thread-Level Parallelism
79% (19)
Multiprocessors & Thread-Level Parallelism
29 pages
Computer Architecture
No ratings yet
Computer Architecture
11 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
Distributed System
100% (1)
Distributed System
26 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
3 OS Multiprocessing NUMA
No ratings yet
3 OS Multiprocessing NUMA
36 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
26 pages
CompArch 23a MP-1
No ratings yet
CompArch 23a MP-1
17 pages
Module 1
No ratings yet
Module 1
11 pages
Lecture 2.2 From Multiprocessor To Multicomputer To Distributed Systems
No ratings yet
Lecture 2.2 From Multiprocessor To Multicomputer To Distributed Systems
56 pages
CP 4252
No ratings yet
CP 4252
7 pages
Unit 3
No ratings yet
Unit 3
1 page
Unit 1
No ratings yet
Unit 1
21 pages
Unit 3
No ratings yet
Unit 3
19 pages
Unit2 ACA
No ratings yet
Unit2 ACA
14 pages
Unit5 Aca
No ratings yet
Unit5 Aca
11 pages
JAVA Full Stack - Course Content PDF
No ratings yet
JAVA Full Stack - Course Content PDF
3 pages
Computer Networks - Unit - I
No ratings yet
Computer Networks - Unit - I
34 pages
C Progrms MCA
No ratings yet
C Progrms MCA
6 pages
Mad
No ratings yet
Mad
9 pages
Unit 2 Aa
No ratings yet
Unit 2 Aa
11 pages
Java - DB
No ratings yet
Java - DB
3 pages
Unit 1 Aa
No ratings yet
Unit 1 Aa
14 pages
Optika Softwares Technical Datasheet en
No ratings yet
Optika Softwares Technical Datasheet en
2 pages
A Collection of New Curricullum Ict Scenarios From B.peter's Academic Platform
100% (3)
A Collection of New Curricullum Ict Scenarios From B.peter's Academic Platform
15 pages
Manual Honeywell 3200
No ratings yet
Manual Honeywell 3200
101 pages
Introduction To RS & GIS Using QGIS
No ratings yet
Introduction To RS & GIS Using QGIS
2 pages
AN12323 - S32K1xx Firmware Updates
No ratings yet
AN12323 - S32K1xx Firmware Updates
44 pages
18 PPT Home Tab
No ratings yet
18 PPT Home Tab
30 pages
UVM Framework Users Guide PDF
No ratings yet
UVM Framework Users Guide PDF
105 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
01 Getting Started With Eclipse (Sep 2023 Update)
No ratings yet
01 Getting Started With Eclipse (Sep 2023 Update)
18 pages
Konica Minolta Bizhub C368 C308 C258
No ratings yet
Konica Minolta Bizhub C368 C308 C258
8 pages
Stp030000 Og en 20221130 Synchroteq Plus Operation Guide
No ratings yet
Stp030000 Og en 20221130 Synchroteq Plus Operation Guide
149 pages
QuattSecure IP Starterset
No ratings yet
QuattSecure IP Starterset
28 pages
Online Platforms for ICT Content Development
No ratings yet
Online Platforms for ICT Content Development
44 pages
Seminar Report
No ratings yet
Seminar Report
24 pages
Shipping Details for Marco Reyes
No ratings yet
Shipping Details for Marco Reyes
4 pages
Miditranslator Manual
No ratings yet
Miditranslator Manual
114 pages
402 IT Class IX Answers
63% (8)
402 IT Class IX Answers
8 pages
03 QPythons Main Features
No ratings yet
03 QPythons Main Features
4 pages
Full Service and Luxury Integrated Solution - Hotel Guest Room Management - Integration Guide
No ratings yet
Full Service and Luxury Integrated Solution - Hotel Guest Room Management - Integration Guide
62 pages
ML Unit V
No ratings yet
ML Unit V
46 pages
Srs (Spotify) New
No ratings yet
Srs (Spotify) New
19 pages
A Review On Augmented Reality Application in Engineering Drawing Classrooms
No ratings yet
A Review On Augmented Reality Application in Engineering Drawing Classrooms
10 pages
UMC UMCONFUserManual
No ratings yet
UMC UMCONFUserManual
45 pages
OS Part 02 PDF
No ratings yet
OS Part 02 PDF
93 pages
Specs US - Arietta 850 With 4 Probe
No ratings yet
Specs US - Arietta 850 With 4 Probe
10 pages
Speed Up Your Windows 11 PC
No ratings yet
Speed Up Your Windows 11 PC
10 pages
RPA Solution Design Guide
50% (2)
RPA Solution Design Guide
10 pages
Troubleshooting HighSierra 003 E
No ratings yet
Troubleshooting HighSierra 003 E
4 pages
PCS-Explorer: IED Configuration & Debugging Tool
No ratings yet
PCS-Explorer: IED Configuration & Debugging Tool
4 pages
Data Presentation Geography Coursework
100% (2)
Data Presentation Geography Coursework
5 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT - IV

Parallel and Scalable Architectures, Multiprocessors and Multicomputers, Multiprocessor

Parallel and Scalable Architectures (10 Marks)

 Parallel architecture refers to computer systems that use multiple processing

2. Need for Parallel and Scalable Architectures

 Increasing demand for high-performance computing (HPC).

1. Multiple Processing Elements: Can be CPUs, GPUs, or vector processors.

4. Types of Parallel Architectures

1. Shared Memory Multiprocessors (SMP):

 A system is scalable if:

 Increased speedup and throughput.

 Synchronization among processors.

 In multiprocessor systems, multiple CPUs need to communicate with each other

1. High Bandwidth – To support many processors.

3. Types of Interconnection Networks

 All processors share a common communication bus.

C. Multistage Networks (Indirect Interconnects)

 Use multiple switching stages (e.g., Omega, Butterfly, Clos).

D. Topology-based Interconnects (for Multicomputers)

1. Ring: Each processor connected to two neighbors. (Simple but slow).

cache coherence and synchronization mechanism,

2. Cache Coherence Problem

1. Write Propagation: Updates to a variable must be visible to all processors.

3. Cache Coherence Protocols

 Write-through: Update both cache and memory (slower, but simple).

4. Popular Snoopy Protocols

 MSI (Modified, Shared, Invalid)

Ensures orderly and mutually exclusive access to shared data/resources.

1. Semaphores & Mutexes: Control access to critical sections.

 CPU1 writes to a block → changes state to Modified.

7. Diagram (Exam Sketch – Snoopy Protocol)

Three Generations of Multicomputers,

 A multicomputer is a parallel computer system consisting of multiple processors,

2. First Generation (1980s – Early Multicomputers)

👉 Limitations: Hard to program, lack of standards, limited scalability (hundreds of

👉 Improvements: More scalable (thousands of processors), portable software, better cost-

4. Third Generation (2000s – Present: HPC Clusters, Grids, and

👉 Used in parallel computing, clusters, and distributed systems.

1. Explicit Communication – Processes exchange data via send/receive.

1. Send (destination, message) – Transmits message to another process.

4. Types of Message Passing

 Point-to-Point – One sender ↔ one receiver.

 Works in systems without shared memory.

 Overhead due to copying messages and communication delays.

 MPI (Message Passing Interface) – De facto standard for HPC.

9. Diagram (Exam Sketch)

You might also like