0% found this document useful (0 votes)
12 views36 pages

PDC

This document has mcqs and short questions related to parallel and distrubuted computing.

Uploaded by

MUHAMMAD AZAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views36 pages

PDC

This document has mcqs and short questions related to parallel and distrubuted computing.

Uploaded by

MUHAMMAD AZAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

1.

Introduction to Parallel Computing

Q1. What is the main goal of parallel computing?

A. To reduce memory usage

B. To reduce input/output operations

C. To increase processing speed by dividing tasks

D. To reduce the number of processors used

Answer: C

Q2. Which of the following is NOT a type of parallel computing architecture?

A. SIMD

B. MIMD

C. MISD

D. BIOS

Answer: D

Q3. Which of the following is a benefit of parallel computing?

A. Increased energy consumption

B. Longer execution time

C. Efficient use of multiple processors

D. Less hardware cost

Answer: C

Q4. Which component is responsible for distributing tasks in parallel computing?

A. Compiler

B. Scheduler
C. Loader

D. Decoder

Answer: B

Q5. In parallel computing, what does "granularity" refer to?

A. The size of the processor

B. The number of bits processed

C. The amount of computation between communication

D. The voltage of the system

Answer: C

2. History and Evolution of Parallel Computing

Q6. Who first proposed the idea of parallel computing?

A. Alan Turing

B. Seymour Cray

C. Michael Flynn

D. John von Neumann

Answer: C

Q7. Flynn's Taxonomy is used to classify:

A. Processor clock speeds

B. Memory types

C. Computer architectures based on instruction and data streams

D. Data storage systems

Answer: C
Q8. Which of the following represents the earliest form of parallel computing?

A. Supercomputers

B. Multicore processors

C. Vector processors

D. Punch card systems

Answer: C

Q9. Which development significantly advanced parallel computing in the 2000s?

A. Quantum computing

B. Multicore CPUs

C. Vacuum tubes

D. Mechanical computers

Answer: B

Q10. What role did supercomputers like the Cray-1 play in the history of parallel computing?

A. First computer to use vacuum tubes

B. Introduced single-core technology

C. One of the earliest successful vector processors

D. First laptop with parallel ports

Answer: C

In parallel computing, granularity refers to the size of the computational tasks being performed
concurrently. It essentially measures the ratio of computation to communication, with coarse-grained
parallelism involving larger tasks and less frequent communication, and fine-grained parallelism
involving smaller tasks and more frequent communication.

More Details:

Coarse-grained parallelism:
This involves breaking a problem into a relatively small number of large, independent tasks that can be
executed concurrently. Communication and synchronization between these large tasks are less
frequent.

Fine-grained parallelism:

This involves breaking a problem into a large number of very small, independent tasks that can be
executed concurrently. Communication and synchronization between these smaller tasks are more
frequent.

Impact on performance:

The choice of granularity can significantly impact performance. Fine-grained parallelism can lead to
higher communication overhead and potential performance bottlenecks, while coarse-grained
parallelism may result in underutilization of resources or load imbalance.

Finding the right balance:

The ideal granularity for a given problem depends on factors such as the algorithm's characteristics, the
communication overhead, and the available resources. Finding the optimal balance between
computation and communication is crucial for maximizing parallel performance.

A vector processor, also known as an array processor, is a central processing unit (CPU) designed to
efficiently perform operations on large one-dimensional arrays of data called vectors. It's a type of
parallel processor that executes instructions on multiple data elements simultaneously, unlike scalar
processors which operate on individual data points.

Key Characteristics of Vector Processors:

Parallelism:

Vector processors leverage parallel processing capabilities, allowing multiple processors to operate
concurrently or splitting tasks into subtasks handled by different processors.

Vector Instructions:

They have instructions designed to operate on entire vectors, treating them as a single unit.

Single Instruction, Multiple Data (SIMD):

A core principle of vector processing is SIMD, where a single instruction operates on multiple data
elements simultaneously.

Pipeline:

Vector processors often employ pipelining to achieve fine-grained parallelism, latency hiding, and
amortized control overheads.
Applications:

They are well-suited for data-intensive applications like image processing, scientific simulations, and
artificial intelligence.

How they differ from scalar processors:

Scalar processors

operate on individual data elements, while vector processors operate on entire vectors.

Scalar instructions

are designed for single data element operations, while vector instructions are optimized for parallel
operations on vectors.

Scalar processors

typically have a simpler architecture, whereas vector processors often have specialized hardware for
vector processing.

Examples of Vector Processor Applications:

Image processing:

Tasks like filtering, transformation, and analysis of large image datasets.

Scientific simulations:

Modeling physical phenomena, weather forecasting, and other complex simulations.

Artificial intelligence:

Machine learning tasks involving large datasets and matrix operations.

Supercomputers:

Early vector processors were common in supercomputers designed for high-performance computing.

Advantages of Vector Processing:

Increased Performance:

Vector processing can significantly improve performance for data-intensive tasks by leveraging
parallelism.

Efficiency:

They can execute operations on large datasets more efficiently than scalar processors.
Simplicity:

Vector instructions can often simplify code and reduce the number of instructions needed.

Disadvantages of Vector Processing:

Complexity: Programming for vector processors can be complex, requiring specialized knowledge.

Cost: Specialized hardware for vector processing can be expensive.

Limited Applicability: Not all applications are suitable for vector processing, as some tasks are inherently
sequential.

A multicore processor is a CPU chip with two or more independent processing


units, called cores, on a single chip. Each core can execute instructions
independently, allowing the processor to handle multiple tasks or threads
concurrently, improving performance and multitasking capabilities compared
to single-core processors.

Elaboration:
 Multiple Cores:
A multicore processor contains multiple physical processing units (cores) on a single
chip.
 Independent Execution:
Each core can execute instructions independently, enabling the processor to perform
multiple tasks or threads simultaneously.
 Improved Performance:
This simultaneous execution allows for faster processing of multiple tasks, leading to
improved overall performance and responsiveness.
 Multithreading and Parallel Processing:
Multicore processors are well-suited for tasks that can be broken down into multiple
smaller tasks that can be processed in parallel by different cores, such as video
editing, rendering, or scientific simulations.
 Efficiency:
While each core may not run as fast as a single-core processor, the ability to process
multiple tasks concurrently makes multicore processors more efficient for various
workloads.
In parallel computing, a scheduler manages and distributes tasks across
multiple processors or cores to improve processing speed and efficiency. It
determines which tasks are executed, when, and on which processor, aiming
to minimize overall execution time and optimize resource utilization. Different
scheduling algorithms exist, each with its strengths and weaknesses,
impacting performance and fairness.

Key aspects of scheduling in parallel computing:


 Task allocation:
The scheduler decides which tasks (or jobs) are assigned to which processors.
 Execution order:
The scheduler determines the sequence in which tasks are executed on each
processor.
 Resource management:
The scheduler manages resources like processors, memory, and communication
channels.
 Optimization:
The scheduler aims to minimize overall execution time, resource contention, and other
performance metrics.
Types of scheduling algorithms:
 First-Come, First-Served (FCFS): Tasks are processed in the order they arrive.
 Shortest Job First (SJF): Tasks with the shortest processing time are executed first.
 Round Robin: Each task is given a fixed amount of time on a processor before being
moved to the queue.
 Priority-based scheduling: Tasks are assigned priorities, and higher-priority tasks are
executed first.
 List scheduling: Tasks are scheduled based on a list of priorities, with the highest
priority task being executed first.
 Adaptive scheduling: Algorithms adjust to changes in the workload and system
conditions during runtime.
Benefits of parallel computing scheduling:
 Reduced execution time:
By distributing tasks across multiple processors, the overall execution time can be
significantly reduced.
 Improved resource utilization:
Schedulers can ensure that processors are kept busy and resources are used
efficiently.
 Increased throughput:
More tasks can be processed within a given time period.
 Enhanced scalability:
Systems can be scaled up to handle larger workloads by adding more processors.

Introduction to Parall

el Computing (Advanced MCQs)

Q1. In a shared memory parallel system, which of the following problems is most likely to occur
when multiple threads access the same memory location?
A. Deadlock
B. Starvation
C. Race condition
D. Paging fault
Answer: C

Q2. Which model of parallelism is best suited for problems with large data sets but relatively
simple computations on each data element?
A. Task Parallelism
B. Instruction-Level Parallelism
C. Data Parallelism
D. Bit-Level Parallelism
Answer: C

Q3. Which of the following scenarios would benefit the most from fine-grained parallelism?
A. Performing I/O operations in a database
B. Rendering frames in a video game engine
C. Weather prediction models using matrix computations
D. Batch processing of large logs
Answer: C

Q4. Amdahl’s Law assumes that:


A. The speedup increases linearly with the number of processors.
B. The problem size remains constant regardless of processor count.
C. There is no serial portion in a parallel algorithm.
D. Communication overhead is negligible.
Answer: B

Q5. What is a limitation of Amdahl’s Law in real-world parallel systems?


A. It doesn’t account for task dependencies.
B. It underestimates serial portions.
C. It ignores scalability with increasing problem sizes.
D. It assumes all processors are identical.
Answer: C
(This limitation is addressed by Gustafson's Law.)

2. History and Evolution of Parallel Computing (Advanced MCQs)

Q6. Which of the following best describes the evolution from vector processors to massively
parallel processors (MPPs)?
A. From control-driven to data-driven execution
B. From MIMD to SIMD models
C. From centralized memory to cache-based memory
D. From data-parallel to instruction-level parallelism
Answer: A

Q7. Flynn’s Taxonomy categorizes architectures using which two parameters?


A. Number of CPUs and instruction sets
B. Instruction stream and data stream
C. Clock speed and bus width
D. Thread count and memory layout
Answer: B

Q8. Which architectural advancement in the early 2000s led to widespread use of parallel
computing in consumer devices?
A. Introduction of FPGAs
B. Launch of GPU co-processors for AI
C. Advent of multicore CPUs
D. Rise of cloud computing
Answer: C
Q9. The introduction of GPU computing as a parallel computing paradigm is associated with
which company and architecture?
A. Intel and Itanium
B. IBM and PowerPC
C. AMD and Bulldozer
D. NVIDIA and CUDA
Answer: D

Q10. Which of the following statements about MIMD architecture is FALSE?


A. It supports asynchronous execution.
B. Each processor may execute a different instruction on different data.
C. It is suitable for task parallelism.
D. All processors share a single instruction stream.
Answer: D
(MIMD has multiple instruction streams.)

. Introduction to Parallel Computing (Advanced Level Continued)

Q11. In parallel computing, which of the following primarily affects the efficiency of
synchronization between threads?
A. Cache coherency
B. Network topology
C. Thread stack size
D. Instruction pipelining
Answer: A
Cache coherence ensures consistent view of shared memory, crucial during synchronization.

Q12. Which technique is used to overlap computation and communication in parallel


programs?
A. Vectorization
B. Asynchronous communication
C. Instruction pipelining
D. Thread contention
Answer: B

Q13. Which metric best measures the scalability of a parallel program?


A. Instruction Per Cycle (IPC)
B. Speedup ratio
C. Efficiency
D. Parallel overhead
Answer: C
Efficiency = Speedup / Number of processors — reflects scalability.

Q14. Which type of parallelism is commonly used in GPU programming models like CUDA?
A. Task parallelism
B. Thread-level parallelism
C. Data-level parallelism
D. Memory-level parallelism
Answer: C
CUDA is built on SIMD-like data-parallel execution.

Q15. In multithreaded parallel programming, which method is commonly used to avoid race
conditions?
A. Fork-Join Model
B. Locks and Mutexes
C. Memory paging
D. Static Scheduling
Answer: B

✅ 2. History and Evolution of Parallel Computing (Advanced Level Continued)

Q16. Which of the following early computers first implemented pipelining, a concept critical to
later parallel architectures?
A. ENIAC
B. Cray-1
C. IBM System/360
D. CDC 6600
Answer: D
CDC 6600 introduced pipelining for instruction execution.

Q17. Which statement best describes Flynn’s MISD architecture?


A. Rarely used in practice; theoretical concept
B. Suitable for parallel matrix multiplication
C. Commonly found in web servers
D. Dominant model for modern GPUs
Answer: A
Q18. The Von Neumann bottleneck affects which aspect of traditional computer architecture?
A. Power consumption in arithmetic units
B. Memory bandwidth and instruction throughput
C. GPU compute cores
D. Control unit scheduling
Answer: B
The Von Neumann bottleneck limits performance due to sequential access between memory and
CPU.

Q19. Which of the following is an example of a loosely coupled parallel system?


A. Multi-core CPU
B. Thread pool in shared memory
C. Distributed system using message passing
D. GPU cluster using shared bus
Answer: C

Q20. What major shift allowed parallel computing to enter the mainstream consumer market?
A. Transition from mechanical to electronic computers
B. Inclusion of parallelism in OS scheduling
C. Development of VLIW architectures
D. Integration of multicore CPUs in personal computers
Answer: D

3. Types of Parallelism

🔹 3.1 Bit-Level Parallelism

Q1. Bit-level parallelism improves performance by:


A. Adding more processors to the system
B. Executing multiple instructions simultaneously
C. Using wider word sizes to process more bits per operation
D. Reducing memory access latency
Answer: ✅ C

Q2. Which hardware improvement most directly contributes to bit-level parallelism?


A. Increased cache memory
B. Larger register width (e.g., 32-bit to 64-bit)
C. GPU integration
D. Instruction pipelining
Answer: ✅ B
🔹 3.2 Instruction-Level Parallelism (ILP)

Q3. ILP is commonly achieved through techniques like:


A. Thread-level scheduling
B. Vectorization
C. Pipelining and out-of-order execution
D. Message passing
Answer: ✅ C

Q4. In ILP, what limits the extent to which instructions can be executed in parallel?
A. Bit width
B. Data dependencies
C. Clock frequency
D. RAM size
Answer: ✅ B

🔹 3.3 Data Parallelism

Q5. Data parallelism involves:


A. Running the same operation on different parts of a data set
B. Executing different tasks in parallel
C. Using vector instructions for serial operations
D. Sharing data between sequential threads
Answer: ✅ A

Q6. Which of the following applications is most suited for data parallelism?
A. Web crawling
B. Sorting distributed files
C. Image processing (e.g., applying a filter to all pixels)
D. Compiling source code
Answer: ✅ C

🔹 3.4 Task Parallelism


Q7. Task parallelism is characterized by:
A. Same operation on multiple data sets
B. Different tasks running concurrently on different processors
C. Bit-level operations within ALU
D. Executing one instruction per cycle
Answer: ✅ B

Q8. Which of the following is an example of task parallelism?


A. Processing different bank transactions in parallel
B. Applying the same transformation to multiple images
C. Multiplying elements of a vector
D. Increasing the size of memory pages
Answer: ✅ A

🔹 3.5 Pipeline Parallelism

Q9. Pipeline parallelism is best described as:


A. Executing multiple unrelated tasks in parallel
B. Executing the same instruction across different processors
C. Breaking a task into subtasks and executing them in sequence but overlapping in time
D. Scheduling instruction execution in software
Answer: ✅ C

Q10. Which real-world analogy best describes pipeline parallelism?


A. One chef cooking a dish alone
B. Several workers performing different tasks on an assembly line
C. A classroom where everyone studies the same chapter
D. Sending a single file to one printer
Answer: ✅ B

✅ 4. Flynn’s Taxonomy in Parallel Computing

🔹 4.1 SISD

Q11. SISD architecture corresponds to:


A. One processor executing multiple instructions on one data stream
B. One processor executing one instruction at a time on one data stream
C. Multiple processors working on the same data
D. Vector processing
Answer: ✅ B

🔹 4.2 SIMD

Q12. In SIMD architecture:


A. Each processor runs a separate program
B. All processors execute the same instruction on different data elements
C. Different tasks run on different machines
D. Memory is shared across instruction streams
Answer: ✅ B

Q13. Which of the following best matches the SIMD model?


A. Multi-core CPU
B. GPU-based matrix multiplication
C. Web server cluster
D. Instruction pipelining
Answer: ✅ B

🔹 4.3 MISD

Q14. MISD is rarely used in practice because:


A. It requires vectorization
B. Multiple instructions acting on the same data is inefficient
C. It increases instruction throughput
D. It lacks shared memory support
Answer: ✅ B

Q15. A possible real-world example of MISD is:


A. A CPU core running multiple threads
B. Redundant systems checking the same sensor data for faults
C. Image segmentation using neural networks
D. File compression using parallel algorithms
Answer: ✅ B
🔹 4.4 MIMD

Q16. MIMD architecture allows:


A. Single instruction on multiple data streams
B. Different processors executing different instructions on different data
C. Execution of only serial code
D. Multithreading without any parallel hardware
Answer: ✅ B

Q17. Which of the following systems is an example of MIMD architecture?


A. Classic Von Neumann Machine
B. SIMD GPU execution
C. A multicore processor running independent threads
D. A calculator
Answer: ✅ C

3. Types of Parallelism – Advanced MCQs

Q1. Which type of parallelism would be least effective in a scenario involving heterogeneous,
independent tasks with minimal data overlap?
A. Data Parallelism
B. Task Parallelism
C. Bit-Level Parallelism
D. Instruction-Level Parallelism
Answer: ✅ A
Data parallelism assumes the same operation on chunks of data — not suitable for
heterogeneous tasks.

Q2. Which factor is most critical in achieving effective instruction-level parallelism in modern
processors?
A. Clock speed
B. Instruction pipelining with hazard detection and branch prediction
C. Size of shared memory
D. Number of threads
Answer: ✅ B

Q3. In pipeline parallelism, which of the following scenarios would most likely reduce
throughput significantly?
A. All stages of the pipeline are equally balanced
B. One stage takes significantly longer than others (bottleneck)
C. Static scheduling of tasks
D. Increasing instruction cache
Answer: ✅ B

Q4. Bit-level parallelism can result in significant performance improvement only when:
A. Tasks are independent
B. The CPU supports parallel ALUs
C. Operations are inherently word-based (e.g., arithmetic on large integers)
D. Multiple processors are available
Answer: ✅ C

Q5. Which combination of parallelism types is typically used in modern GPUs for maximum
performance?
A. Bit-level and instruction-level
B. Task-level and pipeline
C. Data-level and instruction-level
D. Data-level and task-level
Answer: ✅ C
GPUs use SIMD-like data parallelism and deep instruction pipelines.

✅ 4. Flynn’s Taxonomy – Advanced MCQs

Q6. Which Flynn category would best describe a distributed simulation system where each
node executes a different part of the simulation using its own code and data?
A. SISD
B. SIMD
C. MISD
D. MIMD
Answer: ✅ D

Q7. A radar signal processing system that applies multiple filters (algorithms) on the same
stream of incoming data is closest to:
A. SIMD
B. MISD
C. MIMD
D. SISD
Answer: ✅ B
This is one of the very rare real-world examples that approximate MISD.

Q8. In Flynn’s Taxonomy, SIMD systems can be limited by:


A. High memory overhead
B. Instruction-level hazards
C. The need for divergent instruction execution across threads
D. Hardware-level task scheduling
Answer: ✅ C
SIMD struggles with divergence—when threads want to execute different instructions.

Q9. Which of the following is a fundamental assumption in SISD architecture that does not
apply to MIMD?
A. Uniform instruction stream
B. Multiple ALUs
C. Multi-core processor
D. Synchronization primitives
Answer: ✅ A

Q10. Flynn’s Taxonomy is limited in scope because it:


A. Ignores the memory hierarchy
B. Only applies to shared-memory systems
C. Does not account for hybrid systems like SIMD + MIMD
D. Assumes dynamic scheduling
Answer: ✅ C
Modern systems may mix models (e.g., GPUs = SIMD inside MIMD).

5. Parallel Architectures: SISD, SIMD, MISD, MIMD

Q1. Which architecture in Flynn's taxonomy is most suitable for applications with highly
regular data structures and operations (e.g., vector operations)?
A. SISD
B. SIMD
C. MISD
D. MIMD
Answer: ✅ B
Q2. Which of the following is a valid difference between SIMD and MIMD?
A. SIMD supports instruction-level concurrency, while MIMD does not
B. SIMD executes different instructions across all cores, MIMD executes the same
C. SIMD requires data to be partitioned identically, MIMD does not
D. MIMD cannot scale beyond 4 processors
Answer: ✅ C

Q3. Which Flynn architecture is considered mostly theoretical with very few practical
implementations?
A. SISD
B. SIMD
C. MISD
D. MIMD
Answer: ✅ C

Q4. A multicore CPU where each core executes a different thread independently follows which
Flynn category?
A. SISD
B. SIMD
C. MISD
D. MIMD
Answer: ✅ D

Q5. Which architecture is not suitable for divergent task execution?


A. SISD
B. SIMD
C. MISD
D. MIMD
Answer: ✅ B
(Because all processing elements must follow the same instruction flow.)

✅ 6. Shared vs Distributed Memory Systems

🔹 6.1 Shared Memory Systems


Q6. In a shared memory system, all processors:
A. Have independent memories
B. Share a single global memory address space
C. Do not require synchronization
D. Use message passing to communicate
Answer: ✅ B

Q7. What is a key advantage of shared memory systems?


A. Scalability to hundreds of nodes
B. Low communication overhead due to direct memory access
C. No need for synchronization
D. No cache coherence issues
Answer: ✅ B

Q8. A challenge in shared memory systems is:


A. Increased message latency
B. Load balancing
C. Cache coherence and synchronization
D. Hardware fault tolerance
Answer: ✅ C

🔹 6.2 Distributed Memory Systems

Q9. In distributed memory systems, processors communicate primarily through:


A. Shared variables
B. Cache lines
C. Message passing
D. DMA channels
Answer: ✅ C

Q10. Which of the following best describes the memory in a distributed memory system?
A. All processors have access to a common physical memory
B. Each processor accesses all memory locations uniformly
C. Each processor has its own private local memory
D. All memory operations are atomic
Answer: ✅ C
Q11. A disadvantage of distributed memory systems is:
A. Cache coherence
B. Synchronization primitives
C. Programming complexity due to explicit communication
D. Limited memory bandwidth
Answer: ✅ C

Q12. Which parallel computing system would be most scalable for extremely large data sets and
thousands of processors?
A. SISD
B. Shared memory system
C. Distributed memory system
D. SIMD-based GPU
Answer: ✅ C

A distributed memory system is a multiprocessor architecture where each


processor has its own private memory, with no single shared memory
space. Communication between processors happens explicitly through
message passing, where processors exchange data via a network, like
Ethernet or a dedicated interconnect. This contrasts with shared memory
systems, where all processors access a single memory space.

🔹 Advanced MCQs – Parallel Architectures

Q1. Which architecture type would suffer most from branch divergence in control flow?
A. SISD
B. SIMD
C. MIMD
D. MISD
Answer: ✅ B
SIMD requires all processing elements to follow the same instruction path; divergence reduces
efficiency.

Q2. In MIMD systems, synchronization mechanisms like barriers are essential because:
A. All processors execute the same instruction
B. Processors share a single clock
C. Tasks may progress at different speeds
D. Memory is distributed equally
Answer: ✅ C
Independent instruction streams lead to timing mismatches that must be synchronized.
Q3. Which parallel architecture model provides the highest flexibility in heterogeneous task
execution with asynchronous processing?
A. SIMD
B. MIMD
C. MISD
D. SISD
Answer: ✅ B

Q4. MISD systems, though rare, are best theoretically suited for:
A. Image processing
B. Signal redundancy and fault tolerance
C. Data-parallel matrix multiplication
D. Task-based load balancing
Answer: ✅ B

Q5. Which of the following systems can dynamically switch between SIMD and MIMD
modes, depending on workload?
A. Superscalar processors
B. Clustered GPUs
C. Heterogeneous hybrid architectures (e.g., modern CPUs + GPU cores)
D. VLIW systems
Answer: ✅ C

🔹 Advanced MCQs – Shared vs Distributed Memory Systems

Q6. In a shared memory multiprocessor system, false sharing occurs when:


A. Multiple processors access different variables on the same cache line
B. Two threads write to the same variable
C. Threads run on different physical CPUs
D. Memory is dynamically allocated
Answer: ✅ A

Q7. Which statement best highlights the primary scalability bottleneck in shared memory
systems?
A. High latency of message passing
B. Complex memory address translation
C. Contention for shared resources and synchronization overhead
D. Limited support for branch prediction
Answer: ✅ C

Q8. In distributed memory systems, why is data locality a crucial concern?


A. All memory is cached
B. Local memory is slower than remote memory
C. Memory operations across nodes require expensive communication
D. Cache coherence is automatic
Answer: ✅ C

Q9. A high-performance computing system uses multiple nodes, each with multiple cores and
private memory, communicating via MPI. This architecture is best described as:
A. Shared memory system
B. SIMD system
C. Distributed memory system
D. MISD system
Answer: ✅ C

Q10. Which of the following best differentiates shared and distributed memory systems in terms
of programming model complexity?
A. Shared memory systems require explicit synchronization, distributed do not
B. Distributed memory systems are generally easier to program
C. Shared memory uses automatic communication between threads
D. Distributed memory requires explicit communication and data partitioning
Answer: ✅ D

Q11. Hybrid systems combining shared and distributed memory (e.g., clusters of multicore
machines) are designed to:
A. Limit the use of synchronization
B. Reduce cost by avoiding interconnects
C. Combine fast intra-node communication with scalable inter-node communication
D. Prevent data races entirely
Answer: ✅ C
Q12. Which of the following is NOT an issue in distributed memory systems?
A. Deadlock due to message dependencies
B. Synchronization of shared variables
C. Load balancing across nodes
D. Communication overhead
Answer: ✅ B
(Shared variables don’t exist in distributed memory; all communication is explicit.)

Here’s a short and clear explanation of each Parallel Programming Model:

✅ 1. Shared Memory Programming (e.g., OpenMP)

 Concept: All threads share a common memory space.


 Communication: Via shared variables — no need for explicit message passing.
 Example: OpenMP uses compiler directives (like #pragma omp) to parallelize loops and
sections.
 Used in: Multi-core processors and systems with shared RAM.
 Pros: Simple to use for loop-level parallelism.
 Cons: Not scalable across distributed systems; risk of race conditions.

✅ 2. Message Passing Programming (e.g., MPI)

 Concept: Each process has its own local memory; processes communicate via explicit
messages.
 Communication: Through send/receive operations (e.g., MPI_Send, MPI_Recv).
 Used in: Distributed memory systems (e.g., clusters, supercomputers).
 Pros: Scales well across many nodes.
 Cons: Requires careful handling of message synchronization and data distribution.

✅ 3. Data Parallel Programming (e.g., CUDA)

 Concept: The same operation is applied to multiple data elements in parallel.


 Communication: Threads within a block can share memory; global memory for large
data.
 Example: CUDA runs thousands of threads on GPUs.
 Used in: Graphics, scientific simulations, deep learning.
 Pros: High performance on massive datasets.
 Cons: Requires knowledge of GPU architecture and memory hierarchy.
✅ 4. Hybrid Programming Models

 Concept: Combines two or more models (e.g., OpenMP + MPI or CUDA + MPI).
 Example: MPI is used across nodes, OpenMP or CUDA used within each node.
 Used in: HPC systems with multi-core CPUs + GPUs across a cluster.
 Pros: Maximizes hardware utilization, scalable and efficient.
 Cons: Complex to program and debug due to multiple layers of parallelism.

🔹 Advanced MCQs – Mixed Models

Q1. Which of the following best describes the primary challenge of combining MPI and CUDA
in a hybrid system?
A. Incompatibility of programming languages
B. Lack of GPU support in MPI
C. Managing data transfers between GPU memory and other nodes
D. Threads cannot be created inside CUDA
Answer: ✅ C

Q2. In OpenMP, which clause helps prevent race conditions by allowing each thread to maintain
its own copy of a variable?
A. shared
B. private
C. reduction
D. firstprivate
Answer: ✅ B

Q3. Which MPI function is typically used to gather data from all processes to a single
process?
A. MPI_Scatter
B. MPI_Broadcast
C. MPI_Reduce
D. MPI_Gather
Answer: ✅ D
Q4. What is the role of CUDA’s warp in data-parallel execution?
A. A synchronization barrier
B. A group of 32 threads executed in lock-step
C. A GPU memory space
D. A function for atomic operations
Answer: ✅ B

Q5. What distinguishes a hybrid parallel application from a traditional parallel program?
A. It only uses OpenMP for GPU parallelism
B. It runs on a single CPU
C. It combines two or more models, like OpenMP for shared memory and MPI for distributed
memory
D. It avoids synchronization completely
Answer: ✅ C

Q6. Which of the following best describes OpenMP's reduction clause?


A. Assigns a private copy of a variable to each thread
B. Combines values from all threads into a single result
C. Limits the number of threads
D. Synchronizes thread execution
Answer: ✅ B

Q7. In MPI, which function is used for synchronizing all processes?


A. MPI_Finalize
B. MPI_Barrier
C. MPI_Scatter
D. MPI_Init
Answer: ✅ B

Q8. Which of the following is a valid CUDA kernel launch syntax?


A. launch <<<...>>> (args);
B. kernel <<<blocks, threads>>> (args);
C. start <<<threads>>> kernel(args);
D. parallel <<<grid>>> function(args);
Answer: ✅ B
Q9. Why are hybrid models increasingly used in High Performance Computing (HPC)?
A. Single models are obsolete
B. All problems require only data parallelism
C. They allow efficient use of both node-level and cluster-level resources
D. Hybrid models are required by OpenMP
Answer: ✅ C

Q10. What is the main limitation of using OpenMP for large-scale distributed memory systems?
A. Too much code complexity
B. Limited compiler support
C. It only works on shared memory, not across nodes
D. It requires GPU support
Answer: ✅ C

🔹 MCQs – Parallel Algorithms & Synchronization

Q1. In a parallel divide-and-conquer algorithm, what is the primary challenge in achieving


scalability?
A. Too many shared variables
B. Overhead of merging subproblem results
C. Inability to divide problems
D. Using GPUs
Answer: ✅ B

Q2. Pipeline parallelism is most effective when:


A. Tasks are fully independent
B. All stages require equal computation time
C. There is a sequential loop dependency
D. Each stage can be parallelized separately and run concurrently
Answer: ✅ D

Q3. Which of the following is a correct advantage of the pipeline model in parallel computing?
A. It removes the need for task synchronization
B. It increases sequential bottlenecks
C. It improves throughput by overlapping computations
D. It works only in distributed memory systems
Answer: ✅ C
Q4. What is the main role of a barrier in a parallel program?
A. Allocate shared memory
B. Lock a shared variable
C. Prevent race conditions by restricting communication
D. Block threads until all have reached a certain point
Answer: ✅ D

Q5. Which synchronization mechanism allows multiple readers or one writer at a time?
A. Spinlock
B. Mutex
C. Semaphore
D. Read-Write Lock
Answer: ✅ D

Q6. Which of the following is true about semaphores?


A. They can only be used for thread creation
B. They ensure automatic deadlock recovery
C. They use a counter to control access to a resource
D. They replace barriers in CUDA
Answer: ✅ C

Q7. In shared memory communication, which of the following is the most common issue?
A. Lack of message delivery
B. Deadlock due to message order
C. Race conditions when accessing shared data
D. Data redundancy
Answer: ✅ C

Q8. In message passing communication, the communication overhead increases with:


A. Larger shared memory
B. Higher cache coherence
C. Increased number of processes and message size
D. Use of OpenMP
Answer: ✅ C
Q9. Which design strategy is most naturally mapped to a recursive parallel implementation?
A. Pipeline
B. Fork-Join
C. Divide and Conquer
D. Producer-Consumer
Answer: ✅ C

Q10. Locks are typically used to:


A. Speed up GPU computations
B. Divide memory into equal partitions
C. Prevent multiple threads from simultaneously modifying shared resources
D. Replace barriers and semaphores
Answer: ✅ C

Q1. In a parallel divide-and-conquer algorithm, the overhead of thread creation and


synchronization can be minimized using:
A. Recursive decomposition without limit
B. Task stealing and dynamic scheduling
C. Static assignment of all sub-tasks
D. Ignoring serial portions
Answer: ✅ B
Explanation: Dynamic task scheduling (like work stealing) helps balance load and reduce idle
cores.

Q2. Which of the following would most likely lead to a pipeline stall in pipeline parallelism?
A. All stages having equal workload
B. No inter-stage dependencies
C. One stage becoming a bottleneck
D. Perfect task balancing
Answer: ✅ C
Explanation: A bottleneck stage delays all following stages, reducing throughput.

Q3. In which scenario would a spinlock be preferred over a traditional mutex?


A. High contention on critical sections
B. Low-latency locks in multi-core shared memory systems
C. Distributed systems with message passing
D. Large-scale GPU kernels
Answer: ✅ B
Explanation: Spinlocks are faster when lock hold time is very short and context switching
overhead is high.
Q4. Which of the following problems can arise even when semaphores are used correctly?
A. Starvation
B. Race condition
C. Deadlock
D. Context switching
Answer: ✅ A
Explanation: If priorities or scheduling are unfair, some threads may never acquire the
semaphore (starvation).

Q5. In shared-memory systems, which of the following does NOT improve synchronization
efficiency?
A. Reducing critical section size
B. Replacing fine-grained locks with one global lock
C. Using lock-free data structures
D. Applying barriers only where required
Answer: ✅ B
Explanation: A global lock increases contention and reduces concurrency.

Q6. Message passing systems are less prone to race conditions than shared memory systems
because:
A. They don’t use locks
B. Each process has private memory, and explicit communication ensures isolation
C. They support atomic operations
D. They don’t require synchronization
Answer: ✅ B

Q7. Which of the following best describes the difference between a barrier and a semaphore?
A. A barrier blocks one thread; a semaphore blocks all
B. A barrier enforces collective synchronization, a semaphore regulates access to resources
C. A barrier is OS-level, semaphore is user-level
D. Semaphores are only used in shared memory, barriers in message-passing
Answer: ✅ B

Q8. In parallel divide-and-conquer algorithms, the critical path is:


A. The number of threads required
B. The longest sequential dependency chain
C. The portion with the most parallel speedup
D. The memory shared by all subproblems
Answer: ✅ B

Q9. When using message passing for communication, what factor most significantly affects
scalability?
A. Number of barriers
B. Instruction-level parallelism
C. Network bandwidth and latency
D. GPU core count
Answer: ✅ C

Q10. Which synchronization primitive allows bounded access to a resource (e.g., up to N


threads)?
A. Mutex
B. Spinlock
C. Semaphore
D. Barrier
Answer: ✅ C
Explanation: Semaphores can control concurrent access based on a counter.

🔹 Advanced MCQs – Performance & Amdahl’s Law

Q1. If a program has a serial portion of 20%, what is the theoretical maximum speedup
according to Amdahl’s Law, regardless of processor count?
A. 4
B. 5
C. 10
D. ∞
Answer: ✅ A
Explanation: Max Speedup = 1 / Serial Fraction = 1 / 0.2 = 5

Q2. Which of the following is true for weak scaling?


A. The problem size remains fixed as the number of processors increases
B. The problem size increases with the number of processors
C. Efficiency decreases significantly with more processors
D. Only works in shared memory systems
Answer: ✅ B
Q3. If a program takes 100 seconds on 1 core and 20 seconds on 10 cores, what is the speedup
and efficiency?
A. Speedup = 5, Efficiency = 0.5
B. Speedup = 10, Efficiency = 1
C. Speedup = 5, Efficiency = 0.05
D. Speedup = 5, Efficiency = 50%
Answer: ✅ A
Explanation:
Speedup = 100 / 20 = 5
Efficiency = 5 / 10 = 0.5 (or 50%)

Q4. Which of the following best explains why Amdahl’s Law limits scalability?
A. Because of hardware restrictions
B. Because some parts of the code cannot be parallelized
C. Because of cache misses
D. Because parallelism always reduces performance
Answer: ✅ B

Q5. Gustafson's Law addresses a limitation of Amdahl’s Law by:


A. Assuming perfect hardware
B. Keeping problem size constant
C. Scaling the problem size with the number of processors
D. Assuming all parts of code are serial
Answer: ✅ C

Q6. In parallel computing, efficiency is defined as:


A. Time taken per thread
B. Speedup divided by number of processors
C. Processor count divided by speedup
D. Speedup multiplied by processor count
Answer: ✅ B

Q7. What happens to efficiency in strong scaling when the number of processors increases but
the problem size remains fixed?
A. It remains constant
B. It always increases
C. It typically decreases due to overhead
D. It increases linearly
Answer: ✅ C

Q8. A program has 95% parallel code. What is the maximum theoretical speedup using
Amdahl’s Law?
A. 20
B. 10
C. 19
D. Cannot be determined without processor count
Answer: ✅ A
Explanation: Max speedup = 1 / 0.05 = 20

Q9. In a perfectly parallelizable task, adding more processors increases speedup linearly. Which
type of scalability does this represent?
A. Weak scalability
B. Strong scalability
C. Amdahl’s scalability
D. Temporal scalability
Answer: ✅ B

Q10. Which of the following statements is false?


A. Amdahl’s Law focuses on fixed-size problems
B. Gustafson’s Law assumes problem size grows with processor count
C. Speedup can exceed the number of processors
D. Efficiency helps measure resource utilization
Answer: ✅ C
Explanation: Speedup cannot exceed the number of processors in real-world scenarios (unless
superlinear speedup occurs due to caching or I/O effects, which is rare).

Q1. A program has a serial portion of 10%. If we double the number of processors from 4 to 8,
the speedup will:
A. Double
B. Increase slightly
C. Remain the same
D. Be exactly 8
Answer: ✅ B
Explanation: Due to the serial portion, doubling processors yields diminishing returns
(Amdahl's Law).
Q2. Which scenario demonstrates superlinear speedup, which seems to violate Amdahl’s Law?
A. When total execution time increases after parallelization
B. When processors utilize shared memory
C. When parallel execution fits entirely into processor cache
D. When tasks are perfectly load balanced
Answer: ✅ C
Explanation: Cache effects can lead to speedup > P, which Amdahl's Law does not account for.

Q3. In weak scaling, efficiency can remain constant only if:


A. Problem size is fixed
B. Workload per processor remains constant
C. Serial portion increases
D. Synchronization overhead increases
Answer: ✅ B

Q4. Suppose a program runs in 100 seconds on 1 processor, and 60 seconds on 4 processors.
What is the efficiency?
A. 0.25
B. 0.4
C. 0.6
D. 0.75
Answer: ✅ B
Explanation:
Speedup = 100 / 60 = 1.67
Efficiency = 1.67 / 4 = ~0.417

Q5. Which of the following statements is true regarding Amdahl’s Law in practice?
A. It assumes ideal scaling as processors increase
B. It favors increasing the problem size with processors
C. It highlights the impact of even a small serial portion
D. It allows infinite speedup with enough threads
Answer: ✅ C

Q6. Which of the following is not a reason why efficiency might decline as processors increase?
A. Increased communication overhead
B. Smaller workload per processor
C. Larger memory requirements
D. Decreased synchronization cost
Answer: ✅ D
Explanation: Decreased synchronization cost should improve efficiency.

Q7. Given a system where the parallel portion is 80%, how many processors are required to
achieve at least 4x speedup?
A. 5
B. 6
C. 8
D. 10
Answer: ✅ D
Explanation (using Amdahl's Law):
Speedup = 1 / (S + (1 - S)/P)
S = 0.2
Try P = 10
→ Speedup = 1 / (0.2 + 0.8/10) = 1 / (0.2 + 0.08) = 1 / 0.28 ≈ 3.57
To get speedup ≥ 4, you need more than 10 processors

(Answer is D for 10 being near the required threshold. For full 4x, you’d need around 13.)

Q8. A system achieves a speedup of 6 using 8 processors. What is the efficiency and what does
it suggest?
A. 0.75, excellent scaling
B. 0.8, possible overuse
C. 0.6, some parallel overhead
D. 1, perfect speedup
Answer: ✅ C
Explanation: Efficiency = Speedup / P = 6 / 8 = 0.75 → Typo: Option C should say 0.75, and
answer is A.
Assuming C is 0.75, it's good but not perfect — shows some overhead.

Q9. If a system exhibits high efficiency but low speedup, what could be inferred?
A. System has high overhead
B. Problem is mostly parallel
C. Problem is mostly serial
D. Number of processors is high
Answer: ✅ C
Explanation: Low speedup suggests serial bottleneck; high efficiency shows minimal loss from
parallelization overhead.
Q10. Why does Gustafson’s Law suggest more optimistic scaling than Amdahl’s Law?
A. It assumes fixed hardware
B. It scales problem size with processor count
C. It ignores the serial portion
D. It assumes constant communication cost
Answer: ✅ B

You might also like