0% found this document useful (0 votes)

22 views6 pages

HPC Viva

The document provides a comprehensive overview of various algorithms and concepts in computer science, including Breadth-First Search (BFS), Depth-First Search (DFS), Bubble Sort, Merge Sort, and their parallel implementations using OpenMP and CUDA. It explains the algorithms' workings, examples, and advantages of parallel processing, highlighting performance measurement techniques. Additionally, it covers the setup for parallel computation in C++ and the execution of CUDA code for tasks like vector addition and matrix multiplication.

Uploaded by

Mangesh Rajesh-Shimpi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

HPC Viva

Uploaded by

Mangesh Rajesh-Shimpi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

LP 5 Oral Answers of HPC

Q. What is BFS? --> Answer: Breadth-First Search explores a graph level by level. It visits all neighbors
of a node before moving to the next level of neighbors. BFS uses a queue to keep track of the nodes
to visit.

Q. Example of BFS --> Answer: Finding the shortest path in an unweighted graph is a good example
of BFS. It systematically expands outwards from the starting node, guaranteeing the first path found
to a destination is the shortest in terms of edges.

Q. Concept of OpenMP --> Answer: OpenMP is an API for shared-memory parallel programming. It
allows you to parallelize code using compiler directives, making it run faster on multi-core processors
by dividing work among threads.

Q. How Parallel BFS Work --> Answer: Parallel BFS explores different parts of the graph concurrently.
Multiple threads can process nodes at the current level, exploring their neighbors in parallel and
adding them to the next level's frontier.

Q. Simple Algorithm: BFS

Create an empty queue.

Add the starting node to the queue.

Create a set or array to keep track of visited nodes.

Mark the starting node as visited.

While the queue is not empty:

Dequeue a node from the front of the queue.

Process the dequeued node (e.g., print it).

For each neighbor of the dequeued node:

If the neighbor has not been visited:

Mark the neighbor as visited.

Enqueue the neighbor.

Q. What is DFS? --> Answer: Depth-First Search explores a graph by going as deep as possible along
each branch. It backtracks when it reaches a dead end or a visited node to explore other branches.
DFS often uses a stack or recursion.

Q. Example of DFS --> Answer: Finding a path between two nodes or detecting cycles in a graph are
common applications of DFS. It explores one path fully before trying another.

Q. Concept of OpenMP --> Answer: OpenMP is an API for shared-memory parallel programming. It
enables parallel execution of code on multi-core systems through directives and library routines,
improving performance for computationally intensive tasks.
Q. Algorithm: DFS -->

Create an empty stack.

Push the starting node onto the stack.

Create a set or array to keep track of visited nodes.

Mark the starting node as visited.

While the stack is not empty:

Pop a node from the top of the stack.

Process the popped node (e.g., print it).

For each neighbor of the popped node:

If the neighbor has not been visited:

Mark the neighbor as visited.

Push the neighbor onto the stack.

Q. How Parallel DFS Work --> Answer: Parallel DFS can explore different branches of the search tree
simultaneously using multiple threads. Task parallelism in OpenMP can be used to create tasks for
exploring different recursive calls or subtrees concurrently.

Q. What is the advantage of using parallel programming in DFS? --> Answer: Parallel DFS can
significantly reduce the execution time for large graphs by exploring different parts of the search
space at the same time. This can lead to faster discovery of solutions or more efficient graph
traversal.

Q. What is Bubble Sort? Use of Bubble Sort --> Answer: Bubble Sort is a simple sorting algorithm
that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in
the wrong order. Its primary use is for educational purposes to illustrate basic sorting concepts, as it
is not efficient for large datasets.

Q. Example of Bubble sort? --> Answer: Consider the list [5, 1, 4, 2, 8]. In the first pass, 5 and 1 are
swapped to get [1, 5, 4, 2, 8], then 5 and 4 to get [1, 4, 5, 2, 8], then 5 and 2 to get [1, 4, 2, 5, 8]. The
largest element 'bubbles' to its correct position at the end. This process repeats until the list is
sorted.

Q. Concept of OpenMP --> Answer: OpenMP (Open Multi-Processing) is an API that supports shared-
memory parallel programming. It uses compiler directives, library routines, and environment
variables to enable parallel execution of code on multi-core processors, improving performance.

Q. How Parallel Bubble Sort Work --> Answer: While the basic comparison and swap of adjacent
elements are inherently sequential, we can try to parallelize the passes. However, due to the
dependencies between adjacent comparisons, efficient parallelization of standard Bubble Sort is
challenging and often not worthwhile. Some variations attempt to parallelize independent
comparisons within a pass.
Q. bubble sort algorithm :

→ Start from the beginning of the list.

Compare the first element with the second. If the first is larger, swap them.

Move to the next pair (second and third) and repeat the comparison and swap if needed.

Continue this process until the end of the list. The largest element will now be at the end.

Repeat steps 1-4, but stop one element earlier each time (since the last elements are already in
their correct sorted positions).

Q. How to measure the performance of sequential and parallel algorithms? --> Answer:
Performance is typically measured by execution time. For parallel algorithms, we also consider
speedup (ratio of sequential execution time to parallel execution time) and efficiency (speedup
divided by the number of processors). Profiling tools can help identify bottlenecks.

Merge Sort

Q. What is Merge Sort? Use of Merge Sort --> Answer: Merge Sort is an efficient, comparison-based
sorting algorithm that follows a divide-and-conquer approach. It recursively divides the list into
sublists until each sublist contains only one element, and then it repeatedly merges the sublists to
produce new sorted sublists until there is only one sorted list. It's widely used for its stability and
guaranteed time complexity.

Q. Example of Merge sort? --> Answer: For the list [5, 1, 4, 2, 8], it's divided into [5], [1], [4], [2], [8].
Then, these are merged: [1, 5], [2, 4], [8]. Finally, these are merged again: [1, 2, 4, 5, 8].

Q. Concept of OpenMP --> Answer: OpenMP (Open Multi-Processing) is an API for shared-memory
parallel programming. It uses compiler directives, library routines, and environment variables to
enable parallel execution of code on multi-core processors, improving performance.

Q. How Parallel Merge Sort Work --> Answer: Parallel Merge Sort can parallelize both the divide and
merge steps. The initial division into subproblems can happen sequentially, but the recursive sorting
of these subproblems can be done in parallel. Similarly, the merging of sorted sublists can also be
parallelized by dividing the merge operation into independent tasks that can be performed by
different threads.

Parallel Merge Sort

Q. What is parallel Merge Sort? --> Answer: Parallel Merge Sort is a variation of the Merge Sort
algorithm that utilizes parallel processing techniques to sort a list of elements faster than a
traditional sequential Merge Sort. It leverages multiple processors or cores to perform the sorting
operations concurrently.
Q. How does Parallel Merge Sort work? --> Answer: Parallel Merge Sort typically follows these steps:
1. Divide: The initial list is recursively divided into smaller sublists, similar to sequential Merge Sort.
2. Parallel Sort: The sorting of these smaller sublists is performed in parallel by multiple threads or
processors. 3. Parallel Merge: The sorted sublists are then merged back together in parallel to
produce the final sorted list. The merging step itself can be parallelized by dividing the merging tasks
among multiple threads.

Q. How do you implement Parallel MergeSort using OpenMP? --> Answer: In OpenMP, you can
parallelize Merge Sort by using directives like #pragma omp parallel and #pragma omp task. The
recursive calls to sort sub-arrays can be made into independent tasks that can be executed in
parallel. Similarly, the merging of sub-arrays can be parallelized by assigning different parts of the
merge operation to different threads. Synchronization might be needed to ensure correct merging.

Q. What are the advantages of Parallel MergeSort? --> Answer: The primary advantage is reduced
execution time, especially for large datasets, due to the concurrent processing. It can achieve better
scalability on multi-core systems compared to sequential Merge Sort.

Q. Difference between serial Mergesort and parallel Mergesort --> Answer: Serial Merge Sort
performs all sorting and merging steps sequentially using a single thread of execution. Parallel Merge
Sort, on the other hand, utilizes multiple threads or processors to perform the sorting and merging of
sublists concurrently, leading to faster execution times for large inputs.

Parallel Reduction with OpenMP

Q. What are the benefits of using parallel reduction for basic operations on large arrays? -->
Answer: Parallel reduction efficiently combines elements of a large array using an associative and
commutative operation (like sum, product, min, max) by dividing the array into smaller chunks
processed in parallel. This significantly reduces the computation time compared to a sequential
approach, especially for large arrays, by leveraging the power of multi-core processors.

Q. How does OpenMP's "reduction" clause work in parallel reduction? --> Answer: The
reduction(operator: variable) clause in OpenMP performs a safe and efficient parallel accumulation.
Each thread creates a private copy of the variable. After the parallel region, these private copies are
combined using the specified operator (e.g., +, *, min, max) to produce a single result, which is then
assigned to the original variable. OpenMP handles the necessary synchronization to avoid race
conditions.

Q. How do you set up a C++ program for parallel computation with OpenMP? --> Answer: To use
OpenMP in C++, you need to:

1. Include the <omp.h> header file.

2. Compile your code with a compiler that supports OpenMP (e.g., g++ with the -fopenmp flag).

3. Use OpenMP directives (pragmas) in your code to specify parallel regions, work-sharing constructs
(like for, sections), and synchronization mechanisms.

Q. What are the performance characteristics of parallel reduction, and how do they vary based on
input size? --> Answer: Parallel reduction typically shows significant speedup over sequential
reduction, especially as the input size increases. The performance is influenced by factors like the
number of available cores, the overhead of thread creation and management, and the specific
reduction operation. For very small input sizes, the overhead might outweigh the benefits of
parallelism. As the input size grows, the parallel execution time tends to decrease significantly
compared to the linear increase in sequential time, up to a point limited by Amdahl's Law and
communication overhead.

Q. How can you modify the provided code example for more complex operations using parallel
reduction? --> Answer: To modify for more complex operations, you would change the operator in
the reduction clause and potentially the logic within the parallel loop. For example, to find the
product, you'd use reduction(*: result). For custom operations, you might need to use a more
involved approach, potentially involving combining partial results calculated by each thread in a
specific way after the parallel region. OpenMP's reduction clause is designed for standard associative
and commutative operations.

CUDA

Q. What is CUDA --> Answer: CUDA (Compute Unified Device Architecture) is a parallel computing
platform and programming model developed by NVIDIA. It enables the use of NVIDIA GPUs (Graphics
Processing Units) for general-purpose parallel computation, significantly accelerating
computationally intensive tasks.

Q. Addition of two large Vector --> Answer: In CUDA, adding two large vectors involves launching a
kernel function that is executed by many lightweight threads on the GPU. Each thread is responsible
for adding corresponding elements from the two input vectors and storing the result in an output
vector. The data is typically transferred from the CPU's main memory to the GPU's global memory
before the kernel execution and back to the CPU after the computation.

Q. how to run cuda codes --> Answer: To run CUDA code:

1. Ensure you have an NVIDIA GPU with CUDA drivers installed.

2. Write your CUDA code in a .cu file.

3. Compile the code using nvcc <your_code>.cu -o <executable_name>.

4. Execute the compiled executable from the command line: ./<executable_name>.

Matrix Multiplication

Q. Matrix Multiplication --> Answer: Matrix multiplication involves computing the product of two
matrices. If matrix A has dimensions (m \times k) and matrix B has dimensions (k \times n), their
product C will have dimensions (m \times n), where each element (C_{ij}) is the dot product of the
(i)-th row of A and the (j)-th column of B: (C_{ij} = \sum_{l=1}^{k} A_{il} \cdot B_{lj}).

Q. Execution of CUDA Environment --> Answer: (Same as the previous "Execution of CUDA
Environment" question) Executing CUDA code involves:

1. Writing the code in C/C++ with CUDA extensions (.cu files).

2. Compiling the code using the NVIDIA CUDA Compiler (nvcc), which separates the host (CPU) code
and the device (GPU) code.

3. The host code manages the device, including allocating memory on the GPU, transferring data
between host and device, launching kernels on the GPU, and synchronizing operations.

4. The compiled device code (the kernel) is executed in parallel across the GPU's processing cores.
For matrix multiplication, different blocks of threads on the GPU can be assigned to compute
different sub-blocks of the output matrix in parallel.

Lp5 DL HPC Lab Manual
No ratings yet
Lp5 DL HPC Lab Manual
60 pages
LP V Lab Manual 2022-23 Semester II
No ratings yet
LP V Lab Manual 2022-23 Semester II
45 pages
Final LP-V Practical Sheets Sppu
No ratings yet
Final LP-V Practical Sheets Sppu
85 pages
Parallel Bubble Sort Guide
No ratings yet
Parallel Bubble Sort Guide
15 pages
LP 5 Manual
No ratings yet
LP 5 Manual
40 pages
Be LP-V Lab Manual (HPC & DL)
No ratings yet
Be LP-V Lab Manual (HPC & DL)
62 pages
Endsem Imp HPC Unit 6
No ratings yet
Endsem Imp HPC Unit 6
21 pages
Experiment 2 (A)
No ratings yet
Experiment 2 (A)
9 pages
Assignment 1 HPC
No ratings yet
Assignment 1 HPC
9 pages
BE LP5 Manual 23-24
No ratings yet
BE LP5 Manual 23-24
67 pages
Manual LP V
No ratings yet
Manual LP V
84 pages
HPC Lab Manual-1
100% (1)
HPC Lab Manual-1
51 pages
HPC2
No ratings yet
HPC2
22 pages
Case Study
33% (3)
Case Study
4 pages
Laboratory Practice V
No ratings yet
Laboratory Practice V
90 pages
HPC Manual 2022-23
No ratings yet
HPC Manual 2022-23
25 pages
HPC Prac1t
No ratings yet
HPC Prac1t
9 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Parallel and Distributed Lec 11
No ratings yet
Parallel and Distributed Lec 11
15 pages
Scientific Writing Parallel Computing V2
No ratings yet
Scientific Writing Parallel Computing V2
15 pages
Experiment 1a
No ratings yet
Experiment 1a
4 pages
DSA & DAA Fast Study
No ratings yet
DSA & DAA Fast Study
6 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
CPP R16 - Unit-3
No ratings yet
CPP R16 - Unit-3
21 pages
Worksheet On Data Structure
No ratings yet
Worksheet On Data Structure
8 pages
3.parallel Processing - Algorithms
No ratings yet
3.parallel Processing - Algorithms
37 pages
HPC Prac2t
No ratings yet
HPC Prac2t
11 pages
Datastructure
No ratings yet
Datastructure
49 pages
Part-A 1) What Is Binary Search?: Sorting Algorithm Comparison Sort Quicksort Heapsort Merge Sort
No ratings yet
Part-A 1) What Is Binary Search?: Sorting Algorithm Comparison Sort Quicksort Heapsort Merge Sort
9 pages
LP V NBN Lab Manual
No ratings yet
LP V NBN Lab Manual
100 pages
ADS Notes
No ratings yet
ADS Notes
15 pages
Parallel Sorting Techniques
No ratings yet
Parallel Sorting Techniques
32 pages
Ds Importance
No ratings yet
Ds Importance
28 pages
DSA QSN N Answers
No ratings yet
DSA QSN N Answers
4 pages
10 Sorting
No ratings yet
10 Sorting
20 pages
Implementing Bubble Sort Algorithm
No ratings yet
Implementing Bubble Sort Algorithm
6 pages
Data Structures Concepts and Programming Questions: What Is A Data Structure?
No ratings yet
Data Structures Concepts and Programming Questions: What Is A Data Structure?
35 pages
HPC Codes
No ratings yet
HPC Codes
21 pages
Ds Micro
No ratings yet
Ds Micro
13 pages
Data Structure
No ratings yet
Data Structure
17 pages
Ds Importance
No ratings yet
Ds Importance
28 pages
Data Structures Concepts and Programming Questions
No ratings yet
Data Structures Concepts and Programming Questions
35 pages
Data Structure PDF
No ratings yet
Data Structure PDF
4 pages
Data Structures & Algorithms Guide
No ratings yet
Data Structures & Algorithms Guide
12 pages
Algorithms Cheat Sheet
No ratings yet
Algorithms Cheat Sheet
2 pages
Algorithm Characteristics & Data Structures
No ratings yet
Algorithm Characteristics & Data Structures
3 pages
100-M269 Mta Revision
No ratings yet
100-M269 Mta Revision
14 pages
DS Imp Solutions
No ratings yet
DS Imp Solutions
14 pages
Data Structure
No ratings yet
Data Structure
7 pages
DSUNIT3
No ratings yet
DSUNIT3
26 pages
DSA Complete Viva Guide
No ratings yet
DSA Complete Viva Guide
6 pages
Viva
No ratings yet
Viva
5 pages
Dsa Notes
No ratings yet
Dsa Notes
4 pages
Converted Text
No ratings yet
Converted Text
25 pages
Dev Board Mini Datasheet: Features
No ratings yet
Dev Board Mini Datasheet: Features
14 pages
Sonyplaystation4newsrelease130221a e
No ratings yet
Sonyplaystation4newsrelease130221a e
8 pages
4 1 MWagner GPU Volta
No ratings yet
4 1 MWagner GPU Volta
36 pages
A Survey of Performance Modeling and Simulation Techniques For Accelerator-Based Computing
No ratings yet
A Survey of Performance Modeling and Simulation Techniques For Accelerator-Based Computing
10 pages
Ekya - Continuous Learning of Video Analytics Models On Edge Compute Servers
No ratings yet
Ekya - Continuous Learning of Video Analytics Models On Edge Compute Servers
15 pages
25-04 Gpu Programming Without Cuda
No ratings yet
25-04 Gpu Programming Without Cuda
38 pages
Shader Fundamentals
No ratings yet
Shader Fundamentals
154 pages
DISCOVERSE: Efficient Robot Simulation in Complex High-Fidelity Environments
No ratings yet
DISCOVERSE: Efficient Robot Simulation in Complex High-Fidelity Environments
8 pages
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
No ratings yet
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
5 pages
Environment Draft 1
No ratings yet
Environment Draft 1
2 pages
Generative AI Transforms Cloud Platforms
No ratings yet
Generative AI Transforms Cloud Platforms
6 pages
Divinity Original Sin Enhanced Edition - Manual PDF
No ratings yet
Divinity Original Sin Enhanced Edition - Manual PDF
61 pages
AMD Radeon RX 9070 Can Be BIOS Modded With XT Firmware, Surpasses Reference RX 9070 XT When Overcloc
No ratings yet
AMD Radeon RX 9070 Can Be BIOS Modded With XT Firmware, Surpasses Reference RX 9070 XT When Overcloc
5 pages
The World'S 1St TSMC 4Nm-Class Smartphone Chip Mediatek Hyperengine 5.0
No ratings yet
The World'S 1St TSMC 4Nm-Class Smartphone Chip Mediatek Hyperengine 5.0
1 page
Sergios Karagiannakos - Deep Learning in Production (2022) - Libgen - Li
No ratings yet
Sergios Karagiannakos - Deep Learning in Production (2022) - Libgen - Li
223 pages
Assignment 10
No ratings yet
Assignment 10
2 pages
Thinking Parallel, Part 2
No ratings yet
Thinking Parallel, Part 2
10 pages
Force2 A-Ds-5018 5
No ratings yet
Force2 A-Ds-5018 5
2 pages
StartStick Sheet
No ratings yet
StartStick Sheet
2 pages
Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework
No ratings yet
Litespark Technical Report: High-Throughput, Energy-Efficient LLM Training Framework
14 pages
Chap 1-3 Final Format
No ratings yet
Chap 1-3 Final Format
45 pages
Red Hat Virtualization-4.3-Installing Red Hat Virtualization As A Standalone Manager With Local databases-en-US
No ratings yet
Red Hat Virtualization-4.3-Installing Red Hat Virtualization As A Standalone Manager With Local databases-en-US
72 pages
Kinect Fusion
No ratings yet
Kinect Fusion
9 pages
Open Rails Log
No ratings yet
Open Rails Log
9 pages
Geexlab Log
No ratings yet
Geexlab Log
4 pages
COTS: Essential for Space Success
No ratings yet
COTS: Essential for Space Success
10 pages
x86 Architecture Study & Applications
No ratings yet
x86 Architecture Study & Applications
7 pages
Toshiba L645
No ratings yet
Toshiba L645
14 pages
Pocket AI User Manual
No ratings yet
Pocket AI User Manual
33 pages
Design and Implementation of The PULSAR Programming System For Large Scale Computing
No ratings yet
Design and Implementation of The PULSAR Programming System For Large Scale Computing
23 pages

HPC Viva

Uploaded by

HPC Viva

Uploaded by

LP 5 Oral Answers of HPC

Q. Simple Algorithm: BFS

Create an empty queue.

Add the starting node to the queue.

Create a set or array to keep track of visited nodes.

Mark the starting node as visited.

While the queue is not empty:

Dequeue a node from the front of the queue.

Process the dequeued node (e.g., print it).

For each neighbor of the dequeued node:

If the neighbor has not been visited:

Mark the neighbor as visited.

Enqueue the neighbor.

Create an empty stack.

Push the starting node onto the stack.

Create a set or array to keep track of visited nodes.

Mark the starting node as visited.

While the stack is not empty:

Pop a node from the top of the stack.

Process the popped node (e.g., print it).

For each neighbor of the popped node:

If the neighbor has not been visited:

Mark the neighbor as visited.

Push the neighbor onto the stack.

→ Start from the beginning of the list.

Parallel Merge Sort

Parallel Reduction with OpenMP

1. Include the <omp.h> header file.

Q. how to run cuda codes --> Answer: To run CUDA code:

1. Ensure you have an NVIDIA GPU with CUDA drivers installed.

2. Write your CUDA code in a .cu file.

3. Compile the code using nvcc <your_code>.cu -o <executable_name>.

4. Execute the compiled executable from the command line: ./<executable_name>.

1. Writing the code in C/C++ with CUDA extensions (.cu files).

You might also like