0% found this document useful (0 votes)

37 views52 pages

OpenMP Intro

The document provides an introduction to OpenMP, a specification for multi-threaded parallel programming in shared memory architectures. It covers key concepts such as parallel regions, thread interaction, data scoping, and synchronization, along with advantages and disadvantages of using OpenMP. Additionally, it includes syntax examples, execution models, and various constructs to facilitate parallel programming.

Uploaded by

Shruthi Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views52 pages

OpenMP Intro

Uploaded by

Shruthi Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Introduction to OpenMP

Sandeep Agrawal
C-DAC Pune
Parallelism
Source: https://en.wikipedia.org/wiki/Pit_stop
Contents

 General concepts
 What is OpenMP
 OpenMP Programming and Execution Model
 OpenMP constructs
 Data Locality
 Granularity of Parallelization
 Domain Decomposition
 Advantages and Disadvantages of OpenMP
 References
Basis System Architecture

Single Core Processor Multi Core Processor

Memory

Memory
Sequential Program Execution
When you run sequential program
• Instructions executed in serial Multi Core Processor
• Other cores are idle

Waste of available resource… We

want all cores to be used to execute
program.

HOW ?
Memory
Process and Thread
• An executing instance of a program is called a process

• Process has its independent memory space

• A thread is a subset of the process – also called lightweight process allowing faster
context switching

• Threads share memory space within process’s memory

• Threads may have some (usually small) private data

• A thread is an independent instruction stream, thus allowing concurrent operation

• In OpenMP one usually wants no more than one thread per core
Shared Memory Model

 Multiple threads operate independently but share same

memory resources

 Data is not explicitly allocated

 Changes in a memory location effected by one process
is visible to all other processes

 Communication is implicit
Memory
 Synchronization is explicit
Open Multi-Processing
(OpenMP)
OpenMP Introduction

 Open Specification for Multi Processing

 Provides multi-threaded parallelism
 It is an specification for
o Directives
o Runtime Library Routines
o Environment Variables
 OpenMP is an Application Program Interface (API) for writing multi-
threaded, shared memory parallelism.
 Easy to create multi-threaded programs in C,C++ and Fortran.
Why Choose OpenMP ?
Portable
o Standardized for shared memory architectures

Simple and Quick

o Relatively easy to do parallelization for small parts of an application at a time
o Incremental parallelization
o Supports both fine grained and coarse grained parallelism

Compact API
o Simple and limited set of directives
o Not automatic parallelization
OpenMP Consortia and Release History
https://www.openmp.org/

OpenMP Architecture Review Board (ARB) members are from across

Version Year
academic, research, industrial organizations such as:
Fortran 1.0 1997
AMD, ARM, CRAY, IBM, Fujitsu, NEC, Intel, Red Hat … C/C++ 1.0 1998
ANL, LLNL, LBNL, ORNL, RWTH Aachen University, NASA ... Fortran 1.1 1999
Fortran 2.0 2000
OpenMP Compilers for C/C++/Fortran: C/C++ 2.0 2002
OpenMP 2.5 2005
GNU, Intel, PGI, LLVM/Clang, IBM, Absoft … OpenMP 3.0 2008
OpenMP 3.1 2011
From GCC 4.9.1, OpenMP 4.0 is fully supported for C/C++/Fortran
From GCC 6.1, OpenMP 4.5 is fully supported for C and C++ OpenMP 4.0 2013
From GCC 7.1, OpenMP 4.5 is partially supported for Fortran OpenMP 4.5 2015
From GCC 9.1, OpenMP 5.0 is partially supported for C and C++ OpenMP 5.0 2018
Execution Model

 OpenMP program starts single threaded

 To create additional threads, user starts a parallel region

 additional threads are launched to create a team
 original (master) thread is part of the team
 threads “go away” at the end of the parallel region

 Repeat parallel regions as necessary

Fork-join model
OpenMP Basic Syntax

Header file #include “omp.h”

Parallel region main (..)
{
C:
#pragma omp construct [clauses...] #pragma omp parallel
{
……
{ …...
}
// .. Do some work here

#pragma omp parallel

} // end of parallel region/block {
……
…...
}

}
Parallel Region

Fork a team of N threads {0.... N-1}

Without it, all codes are sequential

Parallel Directive
 OpenMP directives are comments in source code that specify parallelism

 C/C++ compiler directives begin with the sentinel #pragma omp

 FORTRAN compiler directives begin with one of the sentinels !$OMP, C$OMP, or *$OMP
 use !$OMPfor free-format F90

C/C++
Fortran
# pragma omp parallel
{
!$OMP parallel
work ...
work …
}
!$OMP end parallel
# pragma omp parallel
!$OMP parallel
{
work …
work...
!$OMP end parallel
}
How do Threads Interact ?

o threads read and write shared variable

– hence communication is implicit

o Unintended sharing of data causes race conditions

– race condition can lead to different outputs across different runs

o use synchronization to protect against race conditions

o synchronization is expensive
– change data storage attributes for minimizing synchronization
and improving cache reuse
OpenMP Language Extensions

Parallel Control Runtime functions,

Work Sharing Data Handling Synchronization
Structures Environment Variables

Distribute works Data scope Coordinates Runtime environments

Governs flow amongst threads
of control in variables thread execution
omp_set_num_threads()
the program Do / for
shared critical omp_get_thread_num()
parallel do/for
parallel private barrier OMP_NUM_THREADS
directive
section
directives clauses directives OMP_SCHEDULE
OpenMP Constructs

 Parallel region  Data Environment

#pragma omp parallel #pragma omp parallel shared/private (...)

 Synchronization
 Worksharing
#pragma omp barrier
#pragma omp for
#pragma omp critical
#pragma omp sections
Loop Constructs: Parallel for

In C/C++:

#pragma omp parallel for

for(i=0; i<n; i++)
{
a[i] = b[i] + c[i] ;
}
Scheduling of loop iterations

Schedule clause:
- specifies how loop iteration are divided among team of threads

Supported scheduling types

o Static #pragma omp parallel for schedule (type,[chunk size])
o Dynamic {
o Guided // ...some stuff
o Runtime }
schedule Clause

Schedule (static, [n])

• Each thread is assigned chunks in “round robin” fashion, known as block cyclic
scheduling
• If n has not been specified, it will contain
CEILING(number_of_iterations / number_of_threads) iterations
• Deterministic

Example:
loop of length 16, with 3 threads, and chunk size of 2:
schedule Clause (cont…)

schedule(dynamic, [n])
o Iteration of loop are divided into chunks containing n iterations each
o Default chunk size is 1
o Iterations picked by threads depends upon the relative speeds of thread execution

#pragma omp parallel for schedule (dynamic)

for(i=0; i<8; i++)
{
… (loop body)
}
schedule Clause (cont…)

schedule (guided, [n])

• If you specify n, that is the minimum chunk size that each thread should get
• Size of each successive chunks is decreasing
chunk size = max((num_of_iterations remaining / 2*num_of_threads), n)
- the formula may differ across compiler implementations

schedule (runtime)
Determine the scheduling type at run time by the OMP_SCHEDULE environment
variable
export OMP_SCHEDULE=“static, 4”
Data Scoping in OpenMP

#pragma omp parallel [data scope clauses ...]

o shared

o private

o firstprivate

o lastprivate

o default
shared Clause (Data Scope)

o Shared data among team of threads

o Each thread can modify shared variables

o Data corruption is possible when multiple threads attempt to update the same
memory location

o Data correctness is user’s responsibility

private Clause (Data Scope)
The values of private data are undefined upon entry to and exit from the specific
construct.

Loop iteration variable is private by default

Example:
#prgma omp parallel for private(tid)
for(i=0; i<n; i++)
{
tid = omp_get_thread_num();
printf(“ My rank is %d ”, tid)
}
firstprivate Clause (Data Scope)

The clause combines behavior of private clause with automatic initialization of the
variables in its list with values prior to parallel region
Example:
int b=51, n=100 ;
printf(“Before parallel loop: b=%d ,n=%d\n”,b,n)
#pragma omp parallel for private(i), firstprivate(b)
for(i=0; i<n; i++)
{
a[i] = i + b;
}
lastprivate Clause (Data Scope)

Performs finalization of private variables

Each thread has its own copy
Example:
b=51,n=100;
printf(“Before parallel loop: b=%d ,n=%d\n”,b,n)
#pragma omp parallel for private(i), firstprivate(b), lastprivate(a)
for(i=0; i<n; i++)
{
a=i+b;
}
//After parallel region: a = 150
default Clause (Data Scope)

o Defines the default data scope within parallel region

o default (private | shared | none)

More clauses for parallel directive

#pragma omp parallel [clause, clause, ...]

o nowait

o if

o reduction
nowait Clause

#pragma omp parallel nowait

o By default there is implicit barrier at the end of parallel region

o Allows threads that finish earlier to proceed without waiting

o If specified, then threads do not synchronize at the end of parallel loop

if Clause

#pragma omp parallel if (flag != 0)

{
// ...some stuff
}

if (integer expression)
o Determines if the region should be parallelized
o Useful option when data is too small
reduction Clause

o Performs a collective operation on variables according to the given operators

- built-in reduction operations such as +, *, -, max, min, logical operators
- user can define his/her own operations
o Makes reduction variable as private
- The variable is initialized according to reduction operator e.g. 0 for addition
o Each thread will perform the operation in its local variable
o Finally local results are combined into global result in shared variable

#pragma omp parallel for reduction(+ : result)

for (i = 1; i <= N; i++)

{
result += i ;
}
Work sharing : Section Directive
 One thread executes one section
 Each section is executed exactly once and

#pragma omp parallel

#pragma omp sections
{
#pragma omp section
x_calculation();
#pragma omp section
y_calculation();
#pragma omp section
z_calculation();
}
Work sharing : Single Directive

Designated section is executed by single thread only.

#pragma omp single

{
// read value of “a” from file
}
#pragma omp for
for (i=0;i<N;i++)
b[i] = a;
Work sharing : Master

Similar to single, but code block will be executed by the master thread only

#pragma omp master

{
// reading or writing data etc.
}
#pragma omp master

----- block of code--

Race condition

Problem: Max = 10
#pragma omp parallel for
Finding the largest element in
for (i=0;i<N;i++)
a list of numbers {
if (a(i) > Max)
Max = a(i) ;
}

Thread 0 Thread 1

Read a(i) value = 12 Read a(i) value = 11

Read Max value = 10 Read Max value = 10

If (a(i) > Max) (12 > 10) If (a(i) > Max) (11 > 10)
Max = a(i) (i.e. 12) Max = a(i) (i.e. 11)
Synchronization: Critical Section
Critical section restricts access to the enclosed code to only one thread at a
time
Max = 10
#pragma omp parallel for
for (i=0;i<N;i++)
{
…. other work….
#pragma omp critical
{
if (a(i) > Max)
Max = a(i) ;
}
…. other work….
}
Synchronization: Barrier Directive

Synchronizes all the threads in a team

Synchronization: Barrier Directive

int x=2;
#pragma omp parallel shared(x)
{
int tid = omp_get_thread_num();
if(tid == 0) Some threads may still have x=2 here
x=5;
else
printf(“thread %d: x=%d”,tid,x); Cache flush + thread synchronization
#pragma omp barrier
printf(“thread %d: x=%d\n”,tid,x); All threads have x=5 here

}
Synchronization: Atomic Directive

o Mini Critical section

o Specific memory location must be updated atomically

#pragma omp atomic

----- Single line code--

Some Runtime Library Routines

o Set number of threads for parallel region

omp_set_num_threads(integer)

o Get number of threads for parallel region

int omp_get_num_threads()

o Get thread ID / rank

omp_get_thread_num()
Environment Variables

o To set number of threads during execution

export OMP_NUM_THREADS=4

o To allow run time system to determine the number of threads

export OMP_DYNAMIC=TRUE

o To allow nesting of parallel region

export OMP_NESTED=TRUE

o Get thread ID
omp_get_thread_num()
Control the Number of Threads

o Parallel region clause

#pragma omp parallel num_threads(integer)
Priority
o Run-time function
omp_set_num_threads(integer)

o Environment Variable
OMP_NUM_THREADS
Data Locality
Uniform Memory Access (UMA) – all cores have equal access
times to shared memory
Non-uniform Memory Access (NUMA) – cores have higher
access times to non-local shared memory
First touch policy int a[N];
#pragma omp parallel for
For LOOP to initialize data
Fig: NUMA
CPU Pinning
Default thread placement policy depends upon the OpenMP implementation being used.
In absence of thread placement policy, during execution threads may migrate across different physical cores
and therefore suffer data locality issues.
CPU pinning enables binding of threads to cores.
Granularity of Parallelization

Coarse-grain parallelism vs. Fine grain parallelism

#pragma omp parallel
#pragma omp parallel for
{
for(i=0; i<n; i++)
#pragma omp for
{
// work 1; for(i=0; i<n; i++)
} {
// work ;
}
#pragma omp parallel for
#pragma omp for
for(i=0; i<n; i++)
{ for(i=0; i<n; i++)
// work 2 ; {
} // work ;
}
}
Subroutines having multiple independent DO/for Loops are good candidates
Domain Decomposition

#pragma omp parallel default(private) shared(N,nthreads)

{
Program Program nthreads = omp_get_num_threads()
iam = omp_get_thread_num()
ichunk = N/nthreads
istart = iam*ichunk
iend = (iam+1)*ichunk -1

my_sum(istart, iend, local)

1 Domain n threads
n sub-domains #pragma omp atomic
global = global + local
}
Some Tips

 Identify Loop-level parallelism: Run the loop backwards and see if same results are produced

 Load imbalance due to branching statements, sparse matrices: schedule(dynamic)

 Parallelization of less compute intensive loops: Use small number of threads e.g.
#pragma omp parallel num_threads(4)

 Parallelize initialization of input data – speedup and data locality

Advantages and Disadvantages

Advantages Disadvantages
• Shared address space provides user friendly • Internal details are hidden
programming • Programmer is responsible for specifying
• Ease of programming synchronization, e.g. locks
• Data sharing between threads is fast and • Cannot run across distributed memory
uniform (low latency) • Performance limited by memory architecture
• Incremental parallelization of sequential code • Lack of scalability between memory and CPUs
• Leaves thread management to compiler • Requires compiler which supports OpenMP
• Directly supported by compiler • Bigger machines are heavy on budget
Executing OpenMP Program

Compilation
gcc –fopenmp <program name> –o <execcutable>
gfortran –fopenmp <program name> –o <execcutable>
ifort <program name> -qopenmp –o <execcutable>
icc <program name> -qopenmp –o <execcutable>

Execution:
./ <executable-name>
References
The contents of the presentation have been adapted from several sources.
Some of the sources are as following:

www.openmp.org/
https://computing.llnl.gov/tutorials/openMP/
http://wiki.scinethpc.ca/wiki/images/9/9b/D s-openmp.pdf
http://openmp.org/sc13/OpenMP4.0_Intro_Y onghongYan_SC13.pdf
A "Hands-on" Introduction to OpenMP (Part 1/2) | Tim Mattson, Intel
Introduction to Parallel Computing on Ranger, Steve Lantz, Cornell University
Thank You

Open MPLecture
No ratings yet
Open MPLecture
54 pages
Open MP
No ratings yet
Open MP
30 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Num Tech
No ratings yet
Num Tech
39 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
OPENMP
No ratings yet
OPENMP
37 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Unit III
No ratings yet
Unit III
15 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
Unit 3
No ratings yet
Unit 3
13 pages
OpenMP Programming Guide
No ratings yet
OpenMP Programming Guide
38 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
OpenMP Basics and Examples
No ratings yet
OpenMP Basics and Examples
80 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
51 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Open MP
No ratings yet
Open MP
28 pages
OMP Exec
No ratings yet
OMP Exec
24 pages
Openmp Boston
No ratings yet
Openmp Boston
90 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
Openmp
No ratings yet
Openmp
115 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
11 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Unit Iii
No ratings yet
Unit Iii
61 pages
OpenMP SPM
No ratings yet
OpenMP SPM
9 pages
Seminar On Operating Systems by Ignatius D. Meshack
No ratings yet
Seminar On Operating Systems by Ignatius D. Meshack
35 pages
Rohit Industrial Report
No ratings yet
Rohit Industrial Report
19 pages
Nixos in Production
No ratings yet
Nixos in Production
88 pages
Student Record System
No ratings yet
Student Record System
15 pages
Rockchip Solutions EMMC Support List
No ratings yet
Rockchip Solutions EMMC Support List
17 pages
Mini Case Week 1
No ratings yet
Mini Case Week 1
2 pages
Assembler
No ratings yet
Assembler
7 pages
Class 4 Computer
No ratings yet
Class 4 Computer
1 page
Service Manual - 161150-486 - e - en - Bci Link - Lis - Vidas
No ratings yet
Service Manual - 161150-486 - e - en - Bci Link - Lis - Vidas
138 pages
TradeStation 8 Forex Data Guide
No ratings yet
TradeStation 8 Forex Data Guide
28 pages
Evaluation of CORDIC Algorithms For FPGA Design
No ratings yet
Evaluation of CORDIC Algorithms For FPGA Design
16 pages
Synopsis On AMS by Adi
No ratings yet
Synopsis On AMS by Adi
24 pages
Tech Career Portfolio
No ratings yet
Tech Career Portfolio
1 page
E783D Bedienungsanleitung Ventura Skaleo PDF
No ratings yet
E783D Bedienungsanleitung Ventura Skaleo PDF
62 pages
Code For Unit-1
No ratings yet
Code For Unit-1
6 pages
Frontage File Import DFD - 1
No ratings yet
Frontage File Import DFD - 1
15 pages
Fireforce en Manual
No ratings yet
Fireforce en Manual
10 pages
OSM9 Ecosystem Day ED1 BT Network Automation 4 O - 239903925 PDF
No ratings yet
OSM9 Ecosystem Day ED1 BT Network Automation 4 O - 239903925 PDF
16 pages
Server Malware Protection Guide
No ratings yet
Server Malware Protection Guide
3 pages
IoT Smart Basket for Efficient Shopping
No ratings yet
IoT Smart Basket for Efficient Shopping
3 pages
Eqqc 1 MST
No ratings yet
Eqqc 1 MST
446 pages
Jsmiranda
No ratings yet
Jsmiranda
22 pages
mc166 - Kompend - Kap082 - e - CBP PROFIBUS
No ratings yet
mc166 - Kompend - Kap082 - e - CBP PROFIBUS
137 pages
Elektor-1989-09 (High Grade Power Supply)
No ratings yet
Elektor-1989-09 (High Grade Power Supply)
60 pages
ArchiMate Quick Reference Guide
100% (1)
ArchiMate Quick Reference Guide
2 pages
ZKB202S - User Manual - 20230228
No ratings yet
ZKB202S - User Manual - 20230228
2 pages
PCM 3336
No ratings yet
PCM 3336
34 pages
Mk500 Product Reference Guide en Us
No ratings yet
Mk500 Product Reference Guide en Us
134 pages
MongoDB Crud Guide PDF
100% (1)
MongoDB Crud Guide PDF
82 pages
GstarCAD License Guide
No ratings yet
GstarCAD License Guide
35 pages

OpenMP Intro

Uploaded by

OpenMP Intro

Uploaded by

Introduction to OpenMP

Single Core Processor Multi Core Processor

Waste of available resource… We

• Process has its independent memory space

• Threads share memory space within process’s memory

• Threads may have some (usually small) private data

• A thread is an independent instruction stream, thus allowing concurrent operation

 Multiple threads operate independently but share same

 Data is not explicitly allocated

 Open Specification for Multi Processing

Simple and Quick

OpenMP Architecture Review Board (ARB) members are from across

 OpenMP program starts single threaded

 To create additional threads, user starts a parallel region

 Repeat parallel regions as necessary

Header file #include “omp.h”

#pragma omp parallel

Fork a team of N threads {0.... N-1}

Without it, all codes are sequential

 C/C++ compiler directives begin with the sentinel #pragma omp

o threads read and write shared variable

o Unintended sharing of data causes race conditions

o use synchronization to protect against race conditions

Parallel Control Runtime functions,

Distribute works Data scope Coordinates Runtime environments

 Parallel region  Data Environment

#pragma omp parallel for

Supported scheduling types

Schedule (static, [n])

#pragma omp parallel for schedule (dynamic)

schedule (guided, [n])

#pragma omp parallel [data scope clauses ...]

o Shared data among team of threads

o Each thread can modify shared variables

o Data correctness is user’s responsibility

Loop iteration variable is private by default

Performs finalization of private variables

o Defines the default data scope within parallel region

o default (private | shared | none)

#pragma omp parallel [clause, clause, ...]

#pragma omp parallel nowait

o By default there is implicit barrier at the end of parallel region

o Allows threads that finish earlier to proceed without waiting

o If specified, then threads do not synchronize at the end of parallel loop

#pragma omp parallel if (flag != 0)

o Performs a collective operation on variables according to the given operators

#pragma omp parallel for reduction(+ : result)

for (i = 1; i <= N; i++)

#pragma omp parallel

Designated section is executed by single thread only.

#pragma omp single

#pragma omp master

----- block of code--

Read a(i) value = 12 Read a(i) value = 11

Synchronizes all the threads in a team

o Mini Critical section

o Specific memory location must be updated atomically

#pragma omp atomic

----- Single line code--

o Set number of threads for parallel region

o Get number of threads for parallel region

o Get thread ID / rank

o To set number of threads during execution

o To allow run time system to determine the number of threads

o To allow nesting of parallel region

o Parallel region clause

Coarse-grain parallelism vs. Fine grain parallelism

#pragma omp parallel default(private) shared(N,nthreads)

my_sum(istart, iend, local)

 Load imbalance due to branching statements, sparse matrices: schedule(dynamic)

 Parallelize initialization of input data – speedup and data locality

You might also like