0% found this document useful (0 votes)
9 views9 pages

Unit 4

The document discusses parallel and scalable architectures, focusing on multiprocessors and multicomputers, their interconnects, cache coherence, and synchronization mechanisms. It outlines the need for such architectures due to increasing demands for high-performance computing, their characteristics, types, scalability, advantages, and challenges. Additionally, it covers the three generations of multicomputers and the message-passing mechanism essential for communication between processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Unit 4

The document discusses parallel and scalable architectures, focusing on multiprocessors and multicomputers, their interconnects, cache coherence, and synchronization mechanisms. It outlines the need for such architectures due to increasing demands for high-performance computing, their characteristics, types, scalability, advantages, and challenges. Additionally, it covers the three generations of multicomputers and the message-passing mechanism essential for communication between processors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT - IV

Parallel and Scalable Architectures, Multiprocessors and Multicomputers, Multiprocessor


system interconnects, cache coherence and synchronization mechanism, Three Generations of
Multicomputers, Message-passing Mechanisms, Multivetor and SIMD computers.

Parallel and Scalable Architectures (10 Marks)

1. Introduction

 Parallel architecture refers to computer systems that use multiple processing


elements (PEs) to perform tasks simultaneously.
 Scalable architecture means the system can increase performance proportionally
as processors are added, without major redesign.

Such architectures are the backbone of supercomputers, data centers, AI/ML workloads,
and scientific simulations.

2. Need for Parallel and Scalable Architectures

 Increasing demand for high-performance computing (HPC).


 Limitations of single-core sequential processing (Von Neumann bottleneck).
 To support large-scale data processing, graphics, AI, weather forecasting,
simulations.

3. Characteristics

1. Multiple Processing Elements: Can be CPUs, GPUs, or vector processors.


2. Levels of Parallelism:
o Instruction-level (ILP) → pipelining.
o Data-level (DLP) → SIMD, vector.
o Task-level (TLP) → independent processes in parallel.
o Thread-level (Multithreading).
3. Interconnection Network: Provides communication between processors (Bus,
Crossbar, Mesh, Hypercube).
4. Scalability: System must maintain efficiency as processors grow.

4. Types of Parallel Architectures

1. Shared Memory Multiprocessors (SMP):


o All processors share a common memory.
o Easy programming but limited scalability.
o Example: Intel Xeon multiprocessor.
2. Distributed Memory Multicomputers:
o Each processor has its own local memory.
o Communication via message passing (MPI, PVM).
o Scalable but harder to program.
o Example: IBM SP2, Beowulf clusters.
3. Hybrid Architectures:
o Combine shared + distributed memory.
o Used in modern supercomputers.

5. Scalability

 A system is scalable if:


o Performance grows as processors are added.
o Communication overhead is minimal.
o Memory system supports large-scale data access.
 Scalability Models:
o Amdahl’s Law: pessimistic (fixed workload).
o Gustafson’s Law: optimistic (scaled workload).
o Sun–Ni Law: considers workload + memory constraints.

6. Advantages

 Increased speedup and throughput.


 Handles large problem sizes efficiently.
 Supports parallel programming models (OpenMP, CUDA, MPI).
 Flexible for scientific, AI, and real-time applications.

7. Challenges

 Synchronization among processors.


 Cache coherence problem in shared memory.
 Communication overhead in distributed memory.
 Load balancing across processors.

8. Diagram
+-----------+ +-----------+ +-----------+
| Processor | | Processor | ... | Processor |
+-----------+ +-----------+ +-----------+
\ | /
----------- Interconnection -----------
Shared / Distributed Memory
Multiprocessor system interconnects

1. Introduction

 In multiprocessor systems, multiple CPUs need to communicate with each other


and with memory & I/O devices.
 The interconnection network provides this communication path.
 A good interconnect should be fast, reliable, scalable, and cost-effective.

2. Requirements of Interconnects

1. High Bandwidth – To support many processors.


2. Low Latency – Fast data transfer between processors.
3. Scalability – Should work efficiently as the system grows.
4. Fault Tolerance – Ability to reroute if a link fails.

3. Types of Interconnection Networks


A. Bus-based Interconnect

 All processors share a common communication bus.


 Advantages: Simple, low cost.
 Disadvantages: Bus contention → performance drops with more CPUs.
 Example: Early multiprocessors.
CPU1 --\
CPU2 ----> [ Shared Bus ] ---> Memory / I/O
CPU3 --/

B. Crossbar Switch

 Every processor has a direct path to every memory module via switches.
 Advantages: High speed, no contention (if enough switches).
 Disadvantages: Very expensive for large systems.

CPU1 ---|X|--- M1
CPU2 ---|X|--- M2
CPU3 ---|X|--- M3

C. Multistage Networks (Indirect Interconnects)

 Use multiple switching stages (e.g., Omega, Butterfly, Clos).


 Advantages: Cheaper than crossbar, scalable.
 Disadvantages: Possible blocking (two requests may collide).

D. Topology-based Interconnects (for Multicomputers)

1. Ring: Each processor connected to two neighbors. (Simple but slow).


2. Mesh / Torus: Processors in grid form, scalable, used in clusters.
3. Hypercube: Each node connected to log2(N) neighbors; very scalable.
4. Tree / Fat-tree: Hierarchical, good for large systems.

cache coherence and synchronization mechanism,


Introduction

 In multiprocessor systems, each CPU may have its own cache to reduce memory
access time.
 Problem: Inconsistency occurs when multiple caches store different copies of the
same memory location.
 Solution: Cache coherence protocols + synchronization mechanisms ensure data
consistency and orderly access.

2. Cache Coherence Problem

 Example:
o CPU1 and CPU2 cache variable X.
o CPU1 updates X = 5, but CPU2’s cache still has X = 2.
o ❌ → Inconsistent view of memory.
Conditions for Cache Coherence

1. Write Propagation: Updates to a variable must be visible to all processors.


2. Transaction Serialization: All processors must see memory operations in the same
order.

3. Cache Coherence Protocols


A. Write Policies

 Write-through: Update both cache and memory (slower, but simple).


 Write-back: Update only cache, write to memory later (efficient, but complex).

B. Protocol Types

1. Directory-based Protocols
o A directory keeps track of which caches store each memory block.
o Centralized control → scalable for large systems.
2. Snoopy Protocols
o All caches monitor (snoop) a common bus.
o If one CPU updates, others invalidate or update their cache copies.
o Suitable for bus-based multiprocessors.

4. Popular Snoopy Protocols

 MSI (Modified, Shared, Invalid)


 MESI (Modified, Exclusive, Shared, Invalid) → common in Intel processors.
 MOESI and MESIF → advanced variants.

5. Synchronization Mechanism

Ensures orderly and mutually exclusive access to shared data/resources.

A. Hardware Mechanisms

1. Locks/Atomic Instructions:
o Test-and-Set, Compare-and-Swap, Fetch-and-Add → atomic updates.
2. Barriers: All processors wait until every processor reaches a synchronization point.

B. Software Mechanisms

1. Semaphores & Mutexes: Control access to critical sections.


2. Monitors & Condition Variables: High-level synchronization constructs.
6. Example (MESI Protocol Flow)

 CPU1 writes to a block → changes state to Modified.


 Other caches mark block as Invalid.
 Next time another CPU reads, it fetches updated value → coherence maintained.

7. Diagram (Exam Sketch – Snoopy Protocol)


+---------+ +---------+
| CPU1 | | CPU2 |
| Cache | <----> | Cache |
+---------+ || +---------+
\ || /
\---- Shared Bus ----/
|
Main Memory

Three Generations of Multicomputers,

1. Introduction

 A multicomputer is a parallel computer system consisting of multiple processors,


each with its own local memory, connected by an interconnection network.
 Unlike multiprocessors (shared memory), multicomputers use message passing to
communicate.
 The development of multicomputers can be classified into three generations, based
on technology, interconnects, programming models, and performance goals.

2. First Generation (1980s – Early Multicomputers)

 Architecture:
o Experimental designs with static interconnection topologies such as mesh,
hypercube, ring, or torus.
o Each node = processor + local memory.
 Communication:
o Store-and-forward packet switching.
o High latency, limited bandwidth.
 Programming Model:
o Low-level message passing (send/receive calls).
o No standard libraries. Programmer handled data distribution manually.
 Applications: Scientific and research computing.
 Examples: Intel iPSC (1985), nCUBE-10, Cosmic Cube.

👉 Limitations: Hard to program, lack of standards, limited scalability (hundreds of


processors at most).
3. Second Generation (1990s – Cluster-based Multicomputers)

 Architecture:
o Clusters of workstations or PCs connected with high-speed networks.
o Used commodity hardware (cheap processors + network cards).
 Communication:
o Faster interconnects (Myrinet, Fast Ethernet).
o Introduction of wormhole routing → lower latency than store-and-forward.
 Programming Model:
o Standardized libraries: MPI (Message Passing Interface), PVM (Parallel
Virtual Machine).
o Easier programming with support for collective communication.
 Applications:
o Weather forecasting, fluid dynamics, financial modeling, military simulations.
 Examples: IBM SP2, Intel Paragon, Beowulf clusters.

👉 Improvements: More scalable (thousands of processors), portable software, better cost-


performance ratio.

4. Third Generation (2000s – Present: HPC Clusters, Grids, and


Clouds)

 Architecture:
o Large-scale superclusters and supercomputers with thousands to millions
of cores.
o Integration of GPUs, accelerators, and multicore CPUs.
o Support for grid computing and cloud computing.
 Communication:
o Ultra-fast interconnects like InfiniBand, 10/40/100 Gigabit Ethernet, Omni-
Path.
o Low latency and high bandwidth with advanced routing + virtual channels.
 Programming Model:
o Hybrid models: MPI + OpenMP (message passing + shared memory).
o Support for distributed shared memory (DSM) and parallel programming
frameworks.
o Integration with cloud-based models for scalability.
 Applications:
o AI/ML, Big Data Analytics, Molecular modeling, Astrophysics, Climate
simulations, Quantum computing.
 Examples: IBM Blue Gene series, Cray XT, Tianhe-2 (China), Summit (USA),
Fugaku (Japan).

Message-Passing Mechanism

1. Introduction

 In multicomputers, each processor has its own local memory → no shared memory.
 Processors must communicate by sending and receiving messages over an
interconnection network.
 This is known as the Message-Passing Mechanism.

👉 Used in parallel computing, clusters, and distributed systems.

2. Features

1. Explicit Communication – Processes exchange data via send/receive.


2. Synchronization – Communication ensures coordination among processes.
3. Portability – Standard APIs like MPI (Message Passing Interface) make programs
portable.
4. Scalability – Well-suited for large-scale systems (clusters, supercomputers).

3. Basic Operations

1. Send (destination, message) – Transmits message to another process.


2. Receive (source, message) – Accepts incoming message.

4. Types of Message Passing

1. Synchronous vs Asynchronous
o Synchronous: Sender waits until receiver acknowledges.
o Asynchronous: Sender continues without waiting.
2. Buffered vs Unbuffered
o Buffered: Messages stored temporarily in system buffer.
o Unbuffered: Direct handoff between sender and receiver.
3. Direct vs Indirect
o Direct: Sender specifies the exact receiver.
o Indirect: Messages go via mailboxes/queues.

5. Message-Passing Models

 Point-to-Point – One sender ↔ one receiver.


 Broadcast / Multicast – One sender → multiple receivers.
 Collective Communication – Group operations (scatter, gather, reduce).

6. Advantages

 Works in systems without shared memory.


 Scalable to thousands of processors.
 Standardized libraries (MPI, PVM) make programming easier.

7. Limitations

 Overhead due to copying messages and communication delays.


 More complex programming than shared memory.

8. Examples

 MPI (Message Passing Interface) – De facto standard for HPC.


 PVM (Parallel Virtual Machine) – Early library for cluster computing.
 Sockets – Used in distributed applications.

9. Diagram (Exam Sketch)


Processor A + Memory ----> Interconnection Network ----> Processor B +
Memory
(Send) (Receive)

You might also like