0% found this document useful (0 votes)

131 views35 pages

OpenMP Shared Memory Guide

OpenMP is a standard for shared memory parallel programming using compiler directives. It uses a fork-join model where the master thread forks additional threads to execute a parallel region. OpenMP consists of work-sharing constructs like parallel loops to distribute work, synchronization constructs to coordinate threads, and data environment constructs to specify data scope. It supports both loop-level and task-based parallelism and can be used to write hybrid MPI/OpenMP programs.

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

131 views35 pages

OpenMP Shared Memory Guide

Uploaded by

Debarshi Majumder

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Shared Memory Parallelism -

OpenMP

Sathish Vadhiyar
Credits/Sources:
OpenMP C/C++ standard (openmp.org)
OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openMP/#Introduction)
OpenMP sc99 tutorial presentation (openmp.org)
Dr. Eric Strohmaier (University of Tennessee, CS594 class, Feb 9, 2000)
Introduction

 A portable programming model and standard for

shared memory programming using compiler
directives
 Directives?: constructs or statements in the program
applying some action on a block of code
 A specification for a set of compiler directives, library
routines, and environment variables – standardizing
pragmas
 Easy to program; easy for code developer to convert
his sequential to parallel program by throwing
directives
 First version in 1997, development over the years till
the latest 4.5 in 2015
Fork-Join Model
 Begins as a single thread
called master thread
 Fork: When parallel construct
is encountered, team of
threads are created
 Statements in the parallel
region are executed in parallel
 Join: At the end of the parallel
region, the team threads
synchronize and terminate
OpenMP consists of…

 Work-sharing constructs
 Synchronization constructs
 Data environment constructs
 Library calls, environment variables
Introduction
 Mainly supports loop-level parallelism
 Specifies parallelism for a region of code: fine-
level parallelism
 The number of threads can be varied from one
region to another – dynamic parallelism
 Follows Amdahl’s law – sequential portions in the
code
 Applications have varying phases of parallelism
 Also supports
 Coarse-level parallelism – sections and tasks
 Executions on accelerators
 SIMD vectorizations
 task-core affinity
parallel construct

#pragma omp parallel [clause [, clause] …] new-line

structured-block
Clause: Can support nested
parallelism
Parallel construct - Example
#include <omp.h>

main () {
int nthreads, tid;

#pragma omp parallel private(nthreads, tid) {

printf("Hello World \n);

}

}
Work sharing construct

 For distributing the execution among the threads

that encounter it
 3 types of work sharing constructs – loops,
sections, single
for construct

 For distributing the iterations among the threads

#pragma omp for [clause [, clause] …] new-

line
for-loop
Clause:
for construct

 Restriction in the structure of the for

loop so that the compiler can
determine the number of iterations –
e.g. no branching out of loop
 The assignment of iterations to
threads depends on the schedule
clause
 Implicit barrier at the end of for if not
nowait
schedule clause

1. schedule(static, chunk_size) –
iterations/chunk_size chunks distributed
in round-robin
2. Schedule(dynamic, chunk_size) – same
as above, but chunks distributed
dynamically.
3. schedule(runtime) – decision at runtime.
Implementation dependent
for - Example

include <omp.h>
#define CHUNKSIZE 100
#define N 1000

main () {
int i, chunk; float a[N], b[N], c[N];

/* Some initializations */
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i) {
#pragma omp for schedule(dynamic,chunk) nowait
for (i=0; i < N; i++)
c[i] = a[i] + b[i];
} /* end of parallel section */

}
Coarse level parallelism – sections and
tasks
 sections

 tasks – dynamic mechanism

 depend clause for task

Synchronization directives
flush directive

 Point where consistent view of memory is

provided among the threads
 Thread-visible variables (global variables,
shared variables etc.) are written to memory
 If var-list is used, only variables in the list are
flushed
flush - Example
flush – Example (Contd…)
Data Scope Attribute Clauses

Most variables are shared by default

Data scopes explicitly specified by data scope attribute clauses
Clauses:
1. private
2. firstprivate
3. lastprivate
4. shared
5. default
6. reduction
7. copyin
8. copyprivate
threadprivate
• Global variable-list declared are made private to a thread
• Each thread gets its own copy
• Persist between different parallel regions
 #include <omp.h>
 int alpha[10], beta[10], i;
 #pragma omp threadprivate(alpha)
 main () {
 /* Explicitly turn off dynamic threads */
 omp_set_dynamic(0);
 /* First parallel region */
 #pragma omp parallel private(i,beta)
 for (i=0; i < 10; i++) alpha[i] = beta[i] = i;
 /* Second parallel region */
 #pragma omp parallel
 printf("alpha[3]= %d and beta[3]= %d\n",alpha[3],beta[3]);}
private, firstprivate & lastprivate

 private (variable-list)
 variable-list private to each thread
 A new object with automatic storage duration allocated for the
construct

 firstprivate (variable-list)
 The new object is initialized with the value of the old object that
existed prior to the construct

 lastprivate (variable-list)
 The value of the private object corresponding to the last iteration
or the last section is assigned to the original object
shared, default, reduction

 shared(variable-list)

 default(shared | none)
 Specifies the sharing behavior of all of the variables visible in the
construct

 Reduction(op: variable-list)
 Private copies of the variables are made for each thread
 The final object value at the end of the reduction will be
combination of all the private object values
default - Example
Library Routines (API)

 Querying function (number of threads etc.)

 General purpose locking routines
 Setting execution environment (dynamic
threads, nested parallelism etc.)
API

 OMP_SET_NUM_THREADS(num_threads)
 OMP_GET_NUM_THREADS()
 OMP_GET_MAX_THREADS()
 OMP_GET_THREAD_NUM()
 OMP_GET_NUM_PROCS()
 OMP_IN_PARALLEL()
 OMP_SET_DYNAMIC(dynamic_threads)
 OMP_GET_DYNAMIC()
 OMP_SET_NESTED(nested)
 OMP_GET_NESTED()
API(Contd..)
 omp_init_lock(omp_lock_t *lock)
 omp_init_nest_lock(omp_nest_lock_t *lock)
 omp_destroy_lock(omp_lock_t *lock)
 omp_destroy_nest_lock(omp_nest_lock_t *lock)
 omp_set_lock(omp_lock_t *lock)
 omp_set_nest_lock(omp_nest_lock_t *lock)
 omp_unset_lock(omp_lock_t *lock)
 omp_unset_nest__lock(omp_nest_lock_t *lock)
 omp_test_lock(omp_lock_t *lock)
 omp_test_nest_lock(omp_nest_lock_t *lock)

 omp_get_wtime()
 omp_get_wtick()

 omp_get_thread_num()
 omp_get_num_proc()
 omp_get_num_devices()
Lock details

 Simple locks and nestable locks

 Simple locks are not locked if they are already in a
locked state
 Nestable locks can be locked multiple times by the
same thread
 Simple locks are available if they are unlocked
 Nestable locks are available if they are unlocked or
owned by a calling thread
Example – Nested lock
Example – Nested lock (Contd..)
Example 1: Jacobi Solver
Example 2: BFS Version 1
(Nested Parallelism)
Example 3: BFS Version 3
(Using Task Construct)
Hybrid Programming – Combining MPI and
OpenMP benefits
 MPI
- explicit parallelism, no synchronization problems
- suitable for coarse grain
 OpenMP
- easy to program, dynamic scheduling allowed
- only for shared memory, data synchronization problems
 MPI/OpenMP Hybrid
- Can combine MPI data placement with OpenMP fine-grain
parallelism
- Suitable for cluster of SMPs (Clumps)
- Can implement hierarchical model
 END
Definitions

 Construct – statement containing directive and

structured block
 Directive – Based on C #pragma directives

#pragma <omp id> <other text>

#pragma omp directive-name [clause [, clause] …]
new-line

Example:
#pragma omp parallel default(shared) private(beta,pi)
Parallel construct

 Parallel region executed by multiple threads

 If num_threads, omp_set_num_threads(),
OMP_SET_NUM_THREADS not used, then
number of created threads is implementation
dependent
 Number of physical processors hosting the
thread also implementation dependent
 Threads numbered from 0 to N-1
 Nested parallelism by embedding one parallel
construct inside another

OpenMP Intro
No ratings yet
OpenMP Intro
52 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
40 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Unit III
No ratings yet
Unit III
15 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
OpenMP Guide for Parallel Computing
No ratings yet
OpenMP Guide for Parallel Computing
32 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
29 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
30 pages
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages
OpenMP Programming Guide
No ratings yet
OpenMP Programming Guide
38 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Unit 3
No ratings yet
Unit 3
13 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
Open MP
No ratings yet
Open MP
30 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Num Tech
No ratings yet
Num Tech
39 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
21th 22th Lecture
No ratings yet
21th 22th Lecture
22 pages
Omp Handouts
No ratings yet
Omp Handouts
109 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
3unit3 Mca Pecnotes
No ratings yet
3unit3 Mca Pecnotes
23 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
74 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
25 pages
OMP Exec
No ratings yet
OMP Exec
24 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
No ratings yet
Parallel Computing and Openmp Tutorial: Shao-Ching Huang
58 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
M4: Shared Memory Programming With Openmp
No ratings yet
M4: Shared Memory Programming With Openmp
63 pages
OPENMP
No ratings yet
OPENMP
37 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
Parallel Programming 2
No ratings yet
Parallel Programming 2
20 pages
OpenMP Basics and Examples
No ratings yet
OpenMP Basics and Examples
80 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
PDC Lecture 7
No ratings yet
PDC Lecture 7
11 pages
OpenMP for Parallel Programming
No ratings yet
OpenMP for Parallel Programming
51 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
No ratings yet
Work Replication With Parallel Region: #Pragma Omp Parallel For (For (J 0 J 10 J++) Printf ("Hello/n") )
19 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
40 pages
Aphs
No ratings yet
Aphs
111 pages
Assignment: Objective
No ratings yet
Assignment: Objective
30 pages
Scanned by Camscanner
No ratings yet
Scanned by Camscanner
6 pages
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
No ratings yet
(A) Puja (B) Mukhosh: Figure 1: Three Subfigures
1 page
C Programming Assignments
No ratings yet
C Programming Assignments
31 pages
RTRT TDP
No ratings yet
RTRT TDP
62 pages
45023320
No ratings yet
45023320
41 pages
Edward Bulwer-Lytton - Vril. The Power of The Coming Race (1871)
100% (2)
Edward Bulwer-Lytton - Vril. The Power of The Coming Race (1871)
260 pages
The Viking Age A Reader - Angus A. Somerville and R. Andrew McDonald
No ratings yet
The Viking Age A Reader - Angus A. Somerville and R. Andrew McDonald
550 pages
Memory ERR CFX Solver
No ratings yet
Memory ERR CFX Solver
4 pages
Complete Download (Ebook PDF) Public Finance 5th Canadian Edition by Harvey S Rosen PDF All Chapters
100% (4)
Complete Download (Ebook PDF) Public Finance 5th Canadian Edition by Harvey S Rosen PDF All Chapters
25 pages
The Crisis in Geometry
No ratings yet
The Crisis in Geometry
6 pages
Database Guide: Historic Map Works Library Edition, (1500s To Present)
No ratings yet
Database Guide: Historic Map Works Library Edition, (1500s To Present)
4 pages
English Language Learners Quiz
No ratings yet
English Language Learners Quiz
3 pages
Korçë: Albania's Cultural Hub
No ratings yet
Korçë: Albania's Cultural Hub
2 pages
Art Access for the Visually Impaired
No ratings yet
Art Access for the Visually Impaired
4 pages
WSDeveloperGuide 1.2.146.0
No ratings yet
WSDeveloperGuide 1.2.146.0
315 pages
Marilyn Jurich-Scheherazades Sisters. Trickster Heroines and Their Stories in World Literature
No ratings yet
Marilyn Jurich-Scheherazades Sisters. Trickster Heroines and Their Stories in World Literature
312 pages
Household and City Organization at Olynthus Nicholas Cahill Instant Download
No ratings yet
Household and City Organization at Olynthus Nicholas Cahill Instant Download
52 pages
Our Lady Goddess and The Femicide of The
No ratings yet
Our Lady Goddess and The Femicide of The
223 pages
Architectural Record 072017
No ratings yet
Architectural Record 072017
180 pages
Script in English
No ratings yet
Script in English
4 pages
امتحان الشهر الثاني للصف السادس
No ratings yet
امتحان الشهر الثاني للصف السادس
6 pages
(EBOOK PDF) Download Complete Constitutional Interpretation in Singapore Theory and Practice 1st Edition Jaclyn L Neo Ebook
100% (14)
(EBOOK PDF) Download Complete Constitutional Interpretation in Singapore Theory and Practice 1st Edition Jaclyn L Neo Ebook
85 pages
The Ili Rebellion: The Moslem Challenge To Chinese Authority in Xinjiang, 1944-1949
No ratings yet
The Ili Rebellion: The Moslem Challenge To Chinese Authority in Xinjiang, 1944-1949
296 pages
History of Electronics
No ratings yet
History of Electronics
238 pages
Human Computer Interaction PDF
100% (1)
Human Computer Interaction PDF
2,824 pages
Dinosaurium Press Release
100% (1)
Dinosaurium Press Release
2 pages
San Leandro Library History
No ratings yet
San Leandro Library History
28 pages
By The Time You Read This, Ill Be Dead (Peters, Julie Anne) (Z-Library)
No ratings yet
By The Time You Read This, Ill Be Dead (Peters, Julie Anne) (Z-Library)
224 pages
Ayala Museum: National Museum of Fine Arts
No ratings yet
Ayala Museum: National Museum of Fine Arts
4 pages
Chapter 15. Aegean and Cyprus (World Archaeology at The Pitt Rivers Museum)
100% (1)
Chapter 15. Aegean and Cyprus (World Archaeology at The Pitt Rivers Museum)
24 pages
Modern Residential Construction Practices 1st Edition David A. Madsen PDF Download
100% (4)
Modern Residential Construction Practices 1st Edition David A. Madsen PDF Download
61 pages
Let's Split Up Excerpt
No ratings yet
Let's Split Up Excerpt
24 pages

OpenMP Shared Memory Guide

Uploaded by

OpenMP Shared Memory Guide

Uploaded by

Shared Memory Parallelism -

 A portable programming model and standard for

#pragma omp parallel [clause [, clause] …] new-line

#pragma omp parallel private(nthreads, tid) {

printf("Hello World \n);

 For distributing the execution among the threads

 For distributing the iterations among the threads

#pragma omp for [clause [, clause] …] new-

 Restriction in the structure of the for

 tasks – dynamic mechanism

 depend clause for task

 Point where consistent view of memory is

Most variables are shared by default

 Querying function (number of threads etc.)

 Simple locks and nestable locks

 Construct – statement containing directive and

#pragma <omp id> <other text>

 Parallel region executed by multiple threads

You might also like