0% found this document useful (0 votes)

7 views10 pages

Unit 3 HPC

Uploaded by

subalakshmir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Unit 3 HPC

Uploaded by

subalakshmir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

open MP in HPC

OpenMP (Open Multi-Processing) is a widely adopted API for parallel programming in shared-memory systems, playing a crucial
role in High-Performance Computing (HPC). It enables developers to parallelize applications efficiently, leveraging multi-core processors to
enhance performance.

🧠 OpenMP in High-Performance Computing (HPC)

In HPC, OpenMP is primarily utilized for parallelizing tasks within a single node, harnessing the power of multiple CPU cores. This is
particularly beneficial for applications that require intensive computation and can be divided into smaller, independent tasks.

Key Features:

 Shared Memory Model: Threads within a process share the same memory space, facilitating efficient data sharing and
communication.
 Compiler Directives: OpenMP uses compiler directives (e.g., #pragma omp parallel) to specify parallel regions in the code,
allowing for straightforward parallelization.
 Thread Management: It provides constructs for thread creation, synchronization, and management, simplifying the development of
parallel applications.HPC NMSU
 Scalability: OpenMP allows applications to scale across multiple cores within a node, improving performance for parallel workloads.

⚙️Hybrid Parallelism: Combining OpenMP with MPI

While OpenMP excels in shared-memory environments, many HPC applications require distributed-memory systems. To address this, a hybrid
parallelism model combining OpenMP with MPI (Message Passing Interface) is commonly employed.

Hybrid Model Overview:

 MPI: Manages parallelism across multiple nodes in a cluster, handling inter-process communication.
 OpenMP: Handles parallelism within each node, utilizing multiple cores to perform computations concurrently.

This hybrid approach allows applications to efficiently utilize both distributed and shared memory architectures, enhancing scalability and
performance. For instance, MPI manages communication between nodes, while OpenMP manages parallel computation within each node.

Best Practices for OpenMP in HPC

To maximize the effectiveness of OpenMP in HPC applications, consider the following best practices:

 Efficient Thread Management: Use appropriate thread counts to match the hardware capabilities, avoiding oversubscription of
cores.
 Data Locality: Organize data to enhance cache performance and minimize memory latency.
 Synchronization Minimization: Reduce the use of synchronization constructs to avoid bottlenecks.
 Load Balancing: Ensure that work is evenly distributed among threads to prevent idle times.
 Profiling and Optimization: Regularly profile the application to identify performance bottlenecks and optimize accordingly.
By adhering to these practices, developers can enhance the performance and scalability of their HPC applications using OpenMP. LinkedIn

basics of OpenMP

OpenMP (Open Multi-Processing) is a widely used API for parallel programming in shared-memory systems, enabling developers to
write parallel code in C, C++, and Fortran. It simplifies the process of parallelizing applications by providing compiler directives, runtime
routines, and environment variables.

🧠 Core Concepts of OpenMP

1. Shared Memory Model

OpenMP is designed for shared-memory architectures, where multiple processors or cores can access a common memory space. This model
allows threads to communicate by reading and writing to shared variables, facilitating efficient data sharing. NERSC Documentation

2. Thread-Based Parallelism

OpenMP utilizes threads as the basic unit of execution. The master thread forks additional threads to execute parallel regions of code, and these
threads synchronize upon completion.

3. Fork-Join Execution Model

OpenMP programs begin with a single master thread. When a parallel region is encountered, the master thread forks a team of threads to execute
the enclosed code in parallel. After completing the parallel region, the threads synchronize and terminate, returning control to the master thread.
LLNL HPC Tutorials+1NERSC Documentation+1

4. Compiler Directives

OpenMP uses compiler directives (e.g., #pragma omp parallel) to specify parallel regions in the code. These directives are interpreted by
the compiler to generate parallel code.

Basic OpenMP Example

Here's a simple example demonstrating OpenMP in C:

c
CopyEdit
#include <stdio.h>
#include <omp.h>

int main() {
#pragma omp parallel
{
printf("Hello, World! from thread %d\n", omp_get_thread_num());
}
return 0;
}

To compile and run this program:

bash
CopyEdit
gcc -fopenmp -o hello hello.c
./hello

This program will print "Hello, World!" from each thread, with each thread identifying itself by its thread number. GeeksforGeeks

🔧 Key Features of OpenMP

 Parallel Regions: Sections of code that can be executed in parallel are enclosed within #pragma omp parallel directives.
 Work Sharing: Distributes loop iterations or blocks of code among threads using constructs like #pragma omp for.
 Synchronization: Ensures correct execution order using constructs like #pragma omp barrier and #pragma omp
critical.
 Data Environment: Manages data sharing attributes (e.g., shared, private, firstprivate) to control variable visibility
among threads.
 Runtime Control: Environment variables like OMP_NUM_THREADS control the number of threads used during execution.

parallel regions and work sharing constructs

OpenMP (Open Multi-Processing) is a widely used API for parallel programming in shared-memory systems. It enables developers to parallelize
applications efficiently by providing compiler directives, runtime routines, and environment variables.

🧠 Parallel Regions in OpenMP

A parallel region in OpenMP is a block of code that is executed by multiple threads in parallel. It is defined using the #pragma omp
parallel directive.

Syntax:

c
CopyEdit
#pragma omp parallel
{
// Code to be executed in parallel
}

When the program encounters a parallel region, it creates a team of threads. Each thread executes the code within the parallel region concurrently.
Variables declared inside the parallel region are private to each thread, while variables declared outside are shared among all threads by
default.Fiveable+1cvw.cac.cornell.edu+1
🔄 Work-Sharing Constructs in OpenMP

Work-sharing constructs divide the execution of a block of code among the threads in a team. These constructs do not launch new threads; they
distribute the work among existing threads. There is no implied barrier upon entry to a work-sharing construct; however, there is an implied
barrier at the end unless a nowait clause is specified.Fiveable+4www3.risc.jku.at+4wstein.org+4www3.risc.jku.at+2wstein.org+2LLNL HPC
Tutorials+2cvw.cac.cornell.edu+7OpenMP+7wstein.org+7

Types of Work-Sharing Constructs:

1. #pragma omp for / #pragma omp do:

o Distributes iterations of a loop across the threads in a team.
o Each thread executes a subset of the loop iterations.
o Suitable for loops with independent iterations.
o Example:wstein.org+1LLNL HPC Tutorials+1LLNL HPC Tutorials+2Fiveable+2wstein.org+2

c
CopyEdit
#pragma omp parallel for
for (int i = 0; i < N; i++) {
// Loop body
}

2. #pragma omp sections:

o Divides a block of code into separate sections.
o Each section is executed by one thread.
o Useful for tasks that can be performed independently.
o Example:www3.risc.jku.at+3Fiveable+3LLNL HPC Tutorials+3

c
CopyEdit
#pragma omp parallel sections
{
#pragma omp section
{
// Code for section 1
}
#pragma omp section
{
// Code for section 2
}
}

3. #pragma omp single:

o Specifies that the enclosed code block is executed by only one thread.
o Other threads skip the block.
o Useful for initialization or tasks that should be done only once.
o Example:www3.risc.jku.atPassLab

c
CopyEdit
#pragma omp parallel
{
#pragma omp single
{
// Code to be executed by only one thread
}
}

4. #pragma omp workshare:

o Divides the execution of the enclosed code region among the threads in a team.
o Each unit of work is executed only once by one thread.
o Requires the nowait clause to omit the implicit barrier at the end.
o Example:Fiveable+6LLNL HPC
Tutorials+6wstein.org+6www3.risc.jku.at+7OpenMP+7OpenMP+7www3.risc.jku.at+7OpenMP+7OpenMP+7

fortran
CopyEdit
!$omp parallel
! Code before workshare
!$omp workshare
! Code to be distributed among threads
!$omp end workshare
! Code after workshare
!$omp end parallel

synchronization in OpenMP: critical sections , barriers

Critical Sections

The critical construct ensures that a specific section of code is executed by only one thread at a time, preventing race conditions when
multiple threads access shared resources.

Syntax:

 C/C++:

c
CopyEdit
#pragma omp critical [name]
{
// Code to be executed by one thread at a time
}

 Fortran:

fortran
CopyEdit
!$OMP CRITICAL [name]
! Code to be executed by one thread at a time
!$OMP END CRITICAL

If the optional name is omitted, OpenMP uses a global mutex to protect the critical section. Specifying a name allows multiple critical sections
to exist with different mutexes, reducing contention. IntelLLNL HPC Tutorials+1ResearchGate+1

Example:

c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
int x = 0;
#pragma omp parallel
{
#pragma omp critical
{
x = x + 1;
}
}
printf("Final value of x: %d\n", x);
return 0;
}

In this example, the increment of x is protected by the critical directive, ensuring that only one thread modifies x at a time.

⛔ Barriers

The barrier construct synchronizes all threads in a team. When a thread encounters a barrier, it waits until all other threads have reached the
same point before proceeding.OpenMP+5wstein.org+5Oracle Documentation+5Oracle Documentation+3techpubs.jurassic.nl+3wstein.org+3

Syntax:

 C/C++:

c
CopyEdit
#pragma omp barrier

 Fortran:

fortran
CopyEdit
!$OMP BARRIER

Example:

c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
#pragma omp parallel
{
printf("Thread %d before barrier\n", omp_get_thread_num());
#pragma omp barrier
printf("Thread %d after barrier\n", omp_get_thread_num());
}
return 0;
}

In this example, all threads print a message before and after the barrier. The barrier ensures that all threads reach the same point before any thread
proceeds beyond it.

threading , synchronization, and critical sections in OpenMP

Threading in OpenMP

OpenMP employs a fork-join model for parallel execution:

1. Fork: The master thread spawns a team of threads to execute a parallel region.
2. Join: Upon completion, threads synchronize and terminate, returning control to the master thread.

Threads are identified by unique IDs, accessible via omp_get_thread_num(). The number of threads can be controlled using the
OMP_NUM_THREADS environment variable or the num_threads clause in the #pragma omp parallel directive.Wikipedia

🔐 Synchronization Constructs

Synchronization ensures correct execution order and data consistency among threads. Key constructs include:
1. Critical Sections

The critical construct restricts access to a block of code, allowing only one thread to execute it at a time. This prevents race conditions when
multiple threads access shared resources.

Syntax:

 C/C++:

c
CopyEdit
#pragma omp critical
{
// Code to be executed by one thread at a time
}

 Fortran:

fortran
CopyEdit
!$OMP CRITICAL
! Code to be executed by one thread at a time
!$OMP END CRITICAL

Example:

c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
int x = 0;
#pragma omp parallel
{
#pragma omp critical
{
x = x + 1;
}
}
printf("Final value of x: %d\n", x);
return 0;
}

In this example, the increment of x is protected by the critical directive, ensuring that only one thread modifies x at a time.

2. Barriers

The barrier construct synchronizes all threads in a team. When a thread encounters a barrier, it waits until all other threads have reached the
same point before proceeding.

Syntax:

 C/C++:

c
CopyEdit
#pragma omp barrier

 Fortran:

fortran
CopyEdit
!$OMP BARRIER

Example:
c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
#pragma omp parallel
{
printf("Thread %d before barrier\n", omp_get_thread_num());
#pragma omp barrier
printf("Thread %d after barrier\n", omp_get_thread_num());
}
return 0;
}

In this example, all threads print a message before and after the barrier. The barrier ensures that all threads reach the same point before any thread
proceeds beyond it.

parallel loops and work sharing in OpenMP

Parallel Loops in OpenMP

OpenMP provides the #pragma omp parallel for directive to parallelize loops, enabling concurrent execution of loop iterations by
multiple threads.

Syntax:

c
CopyEdit
#pragma omp parallel for [clause[ [,] clause] ...]
for (initialization; condition; increment) {
// Loop body
}

This directive combines the parallel and for constructs, creating a team of threads that divide the loop iterations among themselves. Each
thread executes a subset of iterations, enhancing performance through parallelism.

Example:

c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
int sum = 0;
int a[100];
for (int i = 0; i < 100; i++) {
a[i] = i + 1;
}

#pragma omp parallel for reduction(+:sum)

for (int i = 0; i < 100; i++) {
sum += a[i];
}

printf("Total sum: %d\n", sum);

return 0;
}

In this example, the reduction clause ensures that each thread maintains a private copy of sum, which are then combined at the end to
produce the final result.PassLab

🔄 Work-Sharing Constructs

Work-sharing constructs in OpenMP allow for the division of work among threads without creating new threads. These constructs are used within
a parallel region to distribute tasks among the existing threads.

1. #pragma omp for / #pragma omp do

Distributes loop iterations across the threads in the team. Each thread executes a subset of iterations.

2. #pragma omp sections / #pragma omp section

Divides code into separate sections, each of which is executed by a different thread.

3. #pragma omp single

Specifies that a block of code should be executed by only one thread.

4. #pragma omp workshare

Distributes the execution of a block of code among the threads in a team.

5. #pragma omp parallel for

A combination of parallel and for constructs, creating a team of threads and distributing loop iterations among them.

LOOP - LEVEL parallelism in HPC

What Is Loop-Level Parallelism?

Loop-level parallelism entails executing multiple iterations of a loop simultaneously, leveraging multiple threads or processing units. This is
feasible when iterations are independent, meaning they do not share data that could lead to race conditions. By parallelizing loops, computational
tasks can be completed more quickly, making this method ideal for performance-critical applications.

🧵 Implementing Loop-Level Parallelism in HPC

In HPC, loop-level parallelism is typically achieved using parallel programming models like OpenMP, MPI, or hybrid approaches. These models
allow for the distribution of loop iterations across multiple processing units, enhancing computational efficiency.

Example with OpenMP:

c
CopyEdit
#include <omp.h>
#include <stdio.h>

int main() {
int sum = 0;
int a[100];
for (int i = 0; i < 100; i++) {
a[i] = i + 1;
}

#pragma omp parallel for reduction(+:sum)

for (int i = 0; i < 100; i++) {
sum += a[i];
}

printf("Total sum: %d\n", sum);

return 0;
}

In this example, the reduction clause ensures that each thread maintains a private copy of the sum variable, which are then combined at the
end to produce the final result.

⚙️Types of Loop-Level Parallelism

Loop-level parallelism can be categorized based on how iterations are distributed and dependencies are managed:

1. DO-ALL Parallelism (Independent Multithreading): Each loop iteration is independent, allowing all iterations to be executed in
parallel without inter-thread communication. This is the simplest form of parallelism and is highly efficient when applicable.
2. DO-ACROSS Parallelism (Cyclic Multithreading): Iterations are assigned to threads in a round-robin manner. Dependencies
between iterations are managed by delaying the start of each iteration until all dependencies from previous iterations are satisfied. This
approach increases parallelism by overlapping the sequential portion of iterations with parallel execution.
3. DO-PIPE Parallelism (Pipelined Multithreading): The loop body is divided into stages, each assigned to a different thread. Each
iteration of the loop is distributed across all threads, with each thread executing its assigned stage. This method is effective for loops
with cross-iteration dependencies, allowing for parallel execution while maintaining data flow integrity.

⚠️Challenges and Considerations

 Data Dependencies: Loops with dependencies between iterations (e.g., one iteration's output is another's input) cannot be trivially
parallelized. Identifying and managing these dependencies is crucial to avoid race conditions and ensure correct execution.
 Synchronization Overhead: Introducing parallelism often requires synchronization mechanisms to manage shared resources, which
can introduce overhead and reduce performance gains.
 Load Balancing: Uneven distribution of iterations among threads can lead to some threads being idle while others are overloaded,
affecting overall performance. Proper scheduling strategies are needed to balance the workload effectively.

High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
Unit 3
No ratings yet
Unit 3
13 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
Open MP
No ratings yet
Open MP
28 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
M4: Shared Memory Programming With Openmp
No ratings yet
M4: Shared Memory Programming With Openmp
63 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
HPC - Unit 3
No ratings yet
HPC - Unit 3
15 pages
Open MP
No ratings yet
Open MP
30 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Introduction To Openmp: Openmp in Small Bites: Overview
No ratings yet
Introduction To Openmp: Openmp in Small Bites: Overview
123 pages
Cs6801 Mcap MGM
No ratings yet
Cs6801 Mcap MGM
7 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
56 pages
Parallel Programming 2
No ratings yet
Parallel Programming 2
20 pages
CO3 Efficient openMP Programming in High Performance Computing
No ratings yet
CO3 Efficient openMP Programming in High Performance Computing
23 pages
Sample - Code - Parallel - Cse6230 Fa14 04 Omp
No ratings yet
Sample - Code - Parallel - Cse6230 Fa14 04 Omp
51 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
51 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
22 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
High Performance Computing (HPC) Lec4
No ratings yet
High Performance Computing (HPC) Lec4
32 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
ATPESC 2022 Track 2a Talk 1 Mattson Openmp
No ratings yet
ATPESC 2022 Track 2a Talk 1 Mattson Openmp
287 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
4.OpenMP Done
No ratings yet
4.OpenMP Done
3 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
No ratings yet
Openmp: Author: Blaise Barney, Lawrence Livermore National Laboratory
62 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
10 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Openmp: John H. Osorio Ríos
No ratings yet
Openmp: John H. Osorio Ríos
24 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Openmp
No ratings yet
Openmp
21 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
11 pages
Target Batch For Upsc Cse 2026: November
No ratings yet
Target Batch For Upsc Cse 2026: November
16 pages
A Study On Importance of Image Mining and Its Challenges
No ratings yet
A Study On Importance of Image Mining and Its Challenges
9 pages
Atomic Radii Lab Guide
No ratings yet
Atomic Radii Lab Guide
3 pages
ECON121 Midterm Review Guide
No ratings yet
ECON121 Midterm Review Guide
11 pages
Chapter 4 Quiz - Answer
100% (2)
Chapter 4 Quiz - Answer
2 pages
Sabyasachi Brand Manual
No ratings yet
Sabyasachi Brand Manual
34 pages
BataClub Receipt 123487412663
No ratings yet
BataClub Receipt 123487412663
4 pages
Mobilgear 600 XP Series
No ratings yet
Mobilgear 600 XP Series
3 pages
Races-2025 - Brochure-New
No ratings yet
Races-2025 - Brochure-New
1 page
Python ACI Programming Lab Guide
100% (1)
Python ACI Programming Lab Guide
23 pages
PP-CI-GG-005 Civil Inspection and Testing Services
No ratings yet
PP-CI-GG-005 Civil Inspection and Testing Services
7 pages
Creativeagencyevaluationform PDF
100% (1)
Creativeagencyevaluationform PDF
5 pages
How To Use TP5100 2A 8.4 - 4.2V 1S and 2S Lithium Battery Charger
No ratings yet
How To Use TP5100 2A 8.4 - 4.2V 1S and 2S Lithium Battery Charger
5 pages
Nucleophilic Addition Reactions
No ratings yet
Nucleophilic Addition Reactions
29 pages
Renewi Annual Report 2022 PDF
No ratings yet
Renewi Annual Report 2022 PDF
133 pages
(Ebook) Airplane Flying Handbook: FAA-H-8083-3A by Federal Aviation Administration ISBN 9781560275572, 156027557X Download
100% (2)
(Ebook) Airplane Flying Handbook: FAA-H-8083-3A by Federal Aviation Administration ISBN 9781560275572, 156027557X Download
81 pages
Chartres Cathedral
No ratings yet
Chartres Cathedral
6 pages
Nithuh Third Finalv2
No ratings yet
Nithuh Third Finalv2
62 pages
Sweat Testing: Macroduct® Advanced
No ratings yet
Sweat Testing: Macroduct® Advanced
4 pages
Sales Leadership & Growth Skills
No ratings yet
Sales Leadership & Growth Skills
1 page
Tuberculosis - Madeleine R Jasin - Clean
No ratings yet
Tuberculosis - Madeleine R Jasin - Clean
49 pages
Time Value of Money
No ratings yet
Time Value of Money
3 pages
English Worksheet For Class 10
No ratings yet
English Worksheet For Class 10
5 pages
UX-FOB 8KV, 500ma, Ultra Fast Recovery High Voltage Diode
No ratings yet
UX-FOB 8KV, 500ma, Ultra Fast Recovery High Voltage Diode
1 page
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
No ratings yet
Common Mathematical Misconceptions: Kitty Rutherford and Denise Schulz NC Department of Public Instruction
83 pages
Introduction To Scientific Method
No ratings yet
Introduction To Scientific Method
21 pages
Silo Support Structure Design Report
No ratings yet
Silo Support Structure Design Report
140 pages
Listening and Reading Skills Test
No ratings yet
Listening and Reading Skills Test
3 pages
Automated Stock Management System
No ratings yet
Automated Stock Management System
7 pages
Components of Environment and Natural Resources MGMT
100% (1)
Components of Environment and Natural Resources MGMT
4 pages

Unit 3 HPC

Uploaded by

Unit 3 HPC

Uploaded by

open MP in HPC

🧠 OpenMP in High-Performance Computing (HPC)

⚙️Hybrid Parallelism: Combining OpenMP with MPI

Hybrid Model Overview:

Best Practices for OpenMP in HPC

🧠 Core Concepts of OpenMP

1. Shared Memory Model

3. Fork-Join Execution Model

Basic OpenMP Example

Here's a simple example demonstrating OpenMP in C:

To compile and run this program:

🔧 Key Features of OpenMP

parallel regions and work sharing constructs

🧠 Parallel Regions in OpenMP

Types of Work-Sharing Constructs:

1. #pragma omp for / #pragma omp do:

2. #pragma omp sections:

3. #pragma omp single:

4. #pragma omp workshare:

synchronization in OpenMP: critical sections , barriers

threading , synchronization, and critical sections in OpenMP

OpenMP employs a fork-join model for parallel execution:

parallel loops and work sharing in OpenMP

Parallel Loops in OpenMP

#pragma omp parallel for reduction(+:sum)

printf("Total sum: %d\n", sum);

1. #pragma omp for / #pragma omp do

2. #pragma omp sections / #pragma omp section

3. #pragma omp single

Specifies that a block of code should be executed by only one thread.

4. #pragma omp workshare

Distributes the execution of a block of code among the threads in a team.

5. #pragma omp parallel for

LOOP - LEVEL parallelism in HPC

What Is Loop-Level Parallelism?

🧵 Implementing Loop-Level Parallelism in HPC

Example with OpenMP:

#pragma omp parallel for reduction(+:sum)

printf("Total sum: %d\n", sum);

⚙️Types of Loop-Level Parallelism

⚠️Challenges and Considerations

You might also like