LP 5 Oral Answers of HPC
Q. What is BFS? --> Answer: Breadth-First Search explores a graph level by level. It visits all neighbors
of a node before moving to the next level of neighbors. BFS uses a queue to keep track of the nodes
to visit.
Q. Example of BFS --> Answer: Finding the shortest path in an unweighted graph is a good example
of BFS. It systematically expands outwards from the starting node, guaranteeing the first path found
to a destination is the shortest in terms of edges.
Q. Concept of OpenMP --> Answer: OpenMP is an API for shared-memory parallel programming. It
allows you to parallelize code using compiler directives, making it run faster on multi-core processors
by dividing work among threads.
Q. How Parallel BFS Work --> Answer: Parallel BFS explores different parts of the graph concurrently.
Multiple threads can process nodes at the current level, exploring their neighbors in parallel and
adding them to the next level's frontier.
Q. Simple Algorithm: BFS
Create an empty queue.
Add the starting node to the queue.
Create a set or array to keep track of visited nodes.
Mark the starting node as visited.
While the queue is not empty:
Dequeue a node from the front of the queue.
Process the dequeued node (e.g., print it).
For each neighbor of the dequeued node:
If the neighbor has not been visited:
Mark the neighbor as visited.
Enqueue the neighbor.
Q. What is DFS? --> Answer: Depth-First Search explores a graph by going as deep as possible along
each branch. It backtracks when it reaches a dead end or a visited node to explore other branches.
DFS often uses a stack or recursion.
Q. Example of DFS --> Answer: Finding a path between two nodes or detecting cycles in a graph are
common applications of DFS. It explores one path fully before trying another.
Q. Concept of OpenMP --> Answer: OpenMP is an API for shared-memory parallel programming. It
enables parallel execution of code on multi-core systems through directives and library routines,
improving performance for computationally intensive tasks.
Q. Algorithm: DFS -->
Create an empty stack.
Push the starting node onto the stack.
Create a set or array to keep track of visited nodes.
Mark the starting node as visited.
While the stack is not empty:
Pop a node from the top of the stack.
Process the popped node (e.g., print it).
For each neighbor of the popped node:
If the neighbor has not been visited:
Mark the neighbor as visited.
Push the neighbor onto the stack.
Q. How Parallel DFS Work --> Answer: Parallel DFS can explore different branches of the search tree
simultaneously using multiple threads. Task parallelism in OpenMP can be used to create tasks for
exploring different recursive calls or subtrees concurrently.
Q. What is the advantage of using parallel programming in DFS? --> Answer: Parallel DFS can
significantly reduce the execution time for large graphs by exploring different parts of the search
space at the same time. This can lead to faster discovery of solutions or more efficient graph
traversal.
Q. What is Bubble Sort? Use of Bubble Sort --> Answer: Bubble Sort is a simple sorting algorithm
that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in
the wrong order. Its primary use is for educational purposes to illustrate basic sorting concepts, as it
is not efficient for large datasets.
Q. Example of Bubble sort? --> Answer: Consider the list [5, 1, 4, 2, 8]. In the first pass, 5 and 1 are
swapped to get [1, 5, 4, 2, 8], then 5 and 4 to get [1, 4, 5, 2, 8], then 5 and 2 to get [1, 4, 2, 5, 8]. The
largest element 'bubbles' to its correct position at the end. This process repeats until the list is
sorted.
Q. Concept of OpenMP --> Answer: OpenMP (Open Multi-Processing) is an API that supports shared-
memory parallel programming. It uses compiler directives, library routines, and environment
variables to enable parallel execution of code on multi-core processors, improving performance.
Q. How Parallel Bubble Sort Work --> Answer: While the basic comparison and swap of adjacent
elements are inherently sequential, we can try to parallelize the passes. However, due to the
dependencies between adjacent comparisons, efficient parallelization of standard Bubble Sort is
challenging and often not worthwhile. Some variations attempt to parallelize independent
comparisons within a pass.
Q. bubble sort algorithm :
→ Start from the beginning of the list.
Compare the first element with the second. If the first is larger, swap them.
Move to the next pair (second and third) and repeat the comparison and swap if needed.
Continue this process until the end of the list. The largest element will now be at the end.
Repeat steps 1-4, but stop one element earlier each time (since the last elements are already in
their correct sorted positions).
Q. How to measure the performance of sequential and parallel algorithms? --> Answer:
Performance is typically measured by execution time. For parallel algorithms, we also consider
speedup (ratio of sequential execution time to parallel execution time) and efficiency (speedup
divided by the number of processors). Profiling tools can help identify bottlenecks.
Merge Sort
Q. What is Merge Sort? Use of Merge Sort --> Answer: Merge Sort is an efficient, comparison-based
sorting algorithm that follows a divide-and-conquer approach. It recursively divides the list into
sublists until each sublist contains only one element, and then it repeatedly merges the sublists to
produce new sorted sublists until there is only one sorted list. It's widely used for its stability and
guaranteed time complexity.
Q. Example of Merge sort? --> Answer: For the list [5, 1, 4, 2, 8], it's divided into [5], [1], [4], [2], [8].
Then, these are merged: [1, 5], [2, 4], [8]. Finally, these are merged again: [1, 2, 4, 5, 8].
Q. Concept of OpenMP --> Answer: OpenMP (Open Multi-Processing) is an API for shared-memory
parallel programming. It uses compiler directives, library routines, and environment variables to
enable parallel execution of code on multi-core processors, improving performance.
Q. How Parallel Merge Sort Work --> Answer: Parallel Merge Sort can parallelize both the divide and
merge steps. The initial division into subproblems can happen sequentially, but the recursive sorting
of these subproblems can be done in parallel. Similarly, the merging of sorted sublists can also be
parallelized by dividing the merge operation into independent tasks that can be performed by
different threads.
Q. How to measure the performance of sequential and parallel algorithms? --> Answer:
Performance is typically measured by execution time. For parallel algorithms, we also consider
speedup (ratio of sequential execution time to parallel execution time) and efficiency (speedup
divided by the number of processors). Profiling tools can help identify bottlenecks.
Parallel Merge Sort
Q. What is parallel Merge Sort? --> Answer: Parallel Merge Sort is a variation of the Merge Sort
algorithm that utilizes parallel processing techniques to sort a list of elements faster than a
traditional sequential Merge Sort. It leverages multiple processors or cores to perform the sorting
operations concurrently.
Q. How does Parallel Merge Sort work? --> Answer: Parallel Merge Sort typically follows these steps:
1. Divide: The initial list is recursively divided into smaller sublists, similar to sequential Merge Sort.
2. Parallel Sort: The sorting of these smaller sublists is performed in parallel by multiple threads or
processors. 3. Parallel Merge: The sorted sublists are then merged back together in parallel to
produce the final sorted list. The merging step itself can be parallelized by dividing the merging tasks
among multiple threads.
Q. How do you implement Parallel MergeSort using OpenMP? --> Answer: In OpenMP, you can
parallelize Merge Sort by using directives like #pragma omp parallel and #pragma omp task. The
recursive calls to sort sub-arrays can be made into independent tasks that can be executed in
parallel. Similarly, the merging of sub-arrays can be parallelized by assigning different parts of the
merge operation to different threads. Synchronization might be needed to ensure correct merging.
Q. What are the advantages of Parallel MergeSort? --> Answer: The primary advantage is reduced
execution time, especially for large datasets, due to the concurrent processing. It can achieve better
scalability on multi-core systems compared to sequential Merge Sort.
Q. Difference between serial Mergesort and parallel Mergesort --> Answer: Serial Merge Sort
performs all sorting and merging steps sequentially using a single thread of execution. Parallel Merge
Sort, on the other hand, utilizes multiple threads or processors to perform the sorting and merging of
sublists concurrently, leading to faster execution times for large inputs.
Parallel Reduction with OpenMP
Q. What are the benefits of using parallel reduction for basic operations on large arrays? -->
Answer: Parallel reduction efficiently combines elements of a large array using an associative and
commutative operation (like sum, product, min, max) by dividing the array into smaller chunks
processed in parallel. This significantly reduces the computation time compared to a sequential
approach, especially for large arrays, by leveraging the power of multi-core processors.
Q. How does OpenMP's "reduction" clause work in parallel reduction? --> Answer: The
reduction(operator: variable) clause in OpenMP performs a safe and efficient parallel accumulation.
Each thread creates a private copy of the variable. After the parallel region, these private copies are
combined using the specified operator (e.g., +, *, min, max) to produce a single result, which is then
assigned to the original variable. OpenMP handles the necessary synchronization to avoid race
conditions.
Q. How do you set up a C++ program for parallel computation with OpenMP? --> Answer: To use
OpenMP in C++, you need to:
1. Include the <omp.h> header file.
2. Compile your code with a compiler that supports OpenMP (e.g., g++ with the -fopenmp flag).
3. Use OpenMP directives (pragmas) in your code to specify parallel regions, work-sharing constructs
(like for, sections), and synchronization mechanisms.
Q. What are the performance characteristics of parallel reduction, and how do they vary based on
input size? --> Answer: Parallel reduction typically shows significant speedup over sequential
reduction, especially as the input size increases. The performance is influenced by factors like the
number of available cores, the overhead of thread creation and management, and the specific
reduction operation. For very small input sizes, the overhead might outweigh the benefits of
parallelism. As the input size grows, the parallel execution time tends to decrease significantly
compared to the linear increase in sequential time, up to a point limited by Amdahl's Law and
communication overhead.
Q. How can you modify the provided code example for more complex operations using parallel
reduction? --> Answer: To modify for more complex operations, you would change the operator in
the reduction clause and potentially the logic within the parallel loop. For example, to find the
product, you'd use reduction(*: result). For custom operations, you might need to use a more
involved approach, potentially involving combining partial results calculated by each thread in a
specific way after the parallel region. OpenMP's reduction clause is designed for standard associative
and commutative operations.
CUDA
Q. What is CUDA --> Answer: CUDA (Compute Unified Device Architecture) is a parallel computing
platform and programming model developed by NVIDIA. It enables the use of NVIDIA GPUs (Graphics
Processing Units) for general-purpose parallel computation, significantly accelerating
computationally intensive tasks.
Q. Addition of two large Vector --> Answer: In CUDA, adding two large vectors involves launching a
kernel function that is executed by many lightweight threads on the GPU. Each thread is responsible
for adding corresponding elements from the two input vectors and storing the result in an output
vector. The data is typically transferred from the CPU's main memory to the GPU's global memory
before the kernel execution and back to the CPU after the computation.
Q. how to run cuda codes --> Answer: To run CUDA code:
1. Ensure you have an NVIDIA GPU with CUDA drivers installed.
2. Write your CUDA code in a .cu file.
3. Compile the code using nvcc <your_code>.cu -o <executable_name>.
4. Execute the compiled executable from the command line: ./<executable_name>.
Matrix Multiplication
Q. Matrix Multiplication --> Answer: Matrix multiplication involves computing the product of two
matrices. If matrix A has dimensions (m \times k) and matrix B has dimensions (k \times n), their
product C will have dimensions (m \times n), where each element (C_{ij}) is the dot product of the
(i)-th row of A and the (j)-th column of B: (C_{ij} = \sum_{l=1}^{k} A_{il} \cdot B_{lj}).
Q. Execution of CUDA Environment --> Answer: (Same as the previous "Execution of CUDA
Environment" question) Executing CUDA code involves:
1. Writing the code in C/C++ with CUDA extensions (.cu files).
2. Compiling the code using the NVIDIA CUDA Compiler (nvcc), which separates the host (CPU) code
and the device (GPU) code.
3. The host code manages the device, including allocating memory on the GPU, transferring data
between host and device, launching kernels on the GPU, and synchronizing operations.
4. The compiled device code (the kernel) is executed in parallel across the GPU's processing cores.
For matrix multiplication, different blocks of threads on the GPU can be assigned to compute
different sub-blocks of the output matrix in parallel.