0% found this document useful (0 votes)

56 views58 pages

Slides 2

The document discusses message passing programming and models. It covers: 1) The two primary mechanisms needed for message passing programming are creating separate processes for execution on different computers and sending/receiving messages. 2) Message passing models include SPMD (single program multiple data), MPMD (multiple program multiple data), and PVM (parallel virtual machine). 3) Basic message passing involves point-to-point send and receive routines as well as group routines like broadcast, scatter, gather, and reduce. Asynchronous and synchronous routines are also discussed.

Uploaded by

Hafiz Safwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views58 pages

Slides 2

Uploaded by

Hafiz Safwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

slides2-1

Chapter 2

Message-Passing Computing

slides2-2

Basics of Message-Passing Programming using

User-level Message Passing Libraries
Two primary mechanisms needed:

1. A method of creating separate processes for execution on

different computers

2. A method of sending and receiving messages

slides2-3

Multiple program, multiple data (MPMD) model

Source
file

Compile to suit
processor

Executables

Processor 0

Processor p - 1

slides2-4

Single Program Multiple Data (SPMD) model

Different processes merged into one program. Within program,
control statements select different parts for each processor to
execute. All executables started together - static process creation.
Source
file

Basic MPI way

Compile to suit
processor
Executables

Processor 0

Processor p 1

slides2-5

Multiple Program Multiple Data (MPMD) Model

Separate programs for each processor. Master-slave approach
usually taken. One processor executes master process. Other
processes started from within master process - dynamic process
creation.
Process 1

spawn();

Start execution
of process 2

Process 2

Time

slides2-6

Basic point-to-point Send and Receive

Routines
Passing a message between processes using send() and recv()
library calls:

Process 1

Process 2

send(&x, 2);

Movement
of data
recv(&y, 1);

Generic syntax (actual formats later)

slides2-7

Synchronous Message Passing

Routines that actually return when message transfer completed.
Synchronous send routine
Waits until complete message can be accepted by the receiving
process before sending the message.
Synchronous receive routine
Waits until the message it is expecting arrives.
Synchronous routines intrinsically perform two actions: They
transfer data and they synchronize processes.

slides2-8

Synchronous send() and recv() library calls using 3-way protocol

Process 1
Process 2

Time
Suspend
process
Both processes
continue

send();

Request to send
Acknowledgment

recv();

Message

(a) When send() occurs before recv()

Process 1

Process 2

Time

recv();
Request to send
send();

Both processes
continue

Suspend
process

Message
Acknowledgment

(b) When recv() occurs before send()

slides2-9

Asynchronous Message Passing

Routines that do not wait for actions to complete before returning.
Usually require local storage for messages.

More than one version depending upon the actual semantics for
returning.

In general, they do not synchronize processes but allow processes

to move forward sooner. Must be used with care.

slides2-10

MPI Definitions of Blocking and Non-Blocking

Blocking - return after their local actions complete, though the
message transfer may not have been completed.

Non-blocking - return immediately.

Assumes that data storage to be used for transfer not modified by
subsequent statements prior to tbeing used for transfer, and it is left
to the programmer to ensure this.

These terms may have different interpretations in other systems.

slides2-11

How message-passing routines can return

before message transfer completed
Message buffer needed between source and destination to hold
message:

Process 1

Process 2

Message buffer

Time

send();

Continue
process

recv();

Read
message buffer

slides2-12

Asynchronous (blocking) routines changing to

synchronous routines
Once local actions completed and message is safely on its way,
sending process can continue with subsequent work.

Buffers only of finite length and a point could be reached when send
routine held up because all available buffer space exhausted.

Then, send routine will wait until storage becomes re-available - i.e
then routine behaves as a synchronous routine.

slides2-13

Message Tag
Used to differentiate between different types of messages being
sent.

Message tag is carried within message.

If special type matching is not required, a wild card message tag is

used, so that the recv() will match with any send().

slides2-14

Message Tag Example

To send a message, x, with message tag 5 from a source process,
1, to a destination process, 2, and assign to y:

Process 1

Process 2

send(&x,2,5);

Movement
of data
recv(&y,1,5);

Waits for a message from process 1 with a tag of 5

slides2-15

Group message passing routines

Apart from point-to-point message passing routines, have routines
that send message(s) to a group of processes or receive
message(s) from a group of processes - higher efficiency than
separate point-to-point routines although not absolutely necessary.

slides2-16

Broadcast
Sending same message to all processes concerned with problem.
Multicast - sending same message to defined group of processes.

Process 0

data

Process 1

data

Process p 1

data

Action

buf

Code

bcast();

MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-17

Scatter
Sending each element of an array in root process to a separate
process. Contents of ith location of array sent to ith process.

Process 0

Process 1

data

Process p 1
data

Action
buf
Code

scatter();

slides2-18

Gather
Having one process collect individual values from set of processes.

Process 0

Process 1

Process p 1

data

gather();

Action
buf
Code

slides2-19

Reduce
Gather operation
operation.

combined

with

specified

arithmetic/logical

Example
Values could be gathered and then added together by root:
Process p 1

Process 0

Process 1

data

reduce();

Action
buf
+

Code

reduce();

slides2-20

PVM (Parallel Virtual Machine)

Perhaps first widely adopted attempt at using a workstation cluster
as a multicomputer platform, developed by Oak Ridge National
Laboratories. Available at no charge.
Programmer decomposes problem into separate programs (usually
a master program and a group of identical slave programs).
Each program compiled to execute on specific types of computers.
Set of computers used on a problem first must be defined prior to
executing the programs (in a hostfile).

slides2-21

Message routing between computers done by PVM daemon processes

installed by PVM on computers that form the virtual machine.
Workstation
Can have more than one process
running on each computer.

Workstation

PVM
daemon
Application
program
(executable)

Messages
sent through
network
Workstation

PVM
daemon
Application
program
(executable)

MPI implementation we use is similar.

slides2-22

MPI (Message Passing Interface)

Standard developed by group of academics and industrial partners
to foster more widespread use and portability.

Defines routines, not implementation.

Several free implementations exist.

slides2-23

MPI
Process Creation and Execution

Purposely not defined and will depend upon the implementation.

Only static process creation is supported in MPI version 1. All

processes must be defined prior to execution and started together.

Orginally SPMD model of computation.

MPMD also possible with static creation - each program to be

started together specified.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-24

Communicators
Defines scope of a communication operation.

Processes have ranks associated with communicator.

Initially,

all

processes

enrolled

universe

called

MPI_COMM_WORLD, and each process is given a unique rank, a

number from 0 to p 1, where there are p processes.

Other communicators can be established for groups of processes.

slides2-25

Using the SPMD Computational Model

main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
.
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/*find process rank */
if (myrank == 0)
master();
else
slave();
.
.
MPI_Finalize();
}

where master() and slave() are procedures to be executed by

master process and slave process, respectively.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-26

Unsafe Message Passing

MPI specifically addresses unsafe message passing.

Unsafe message passing with libraries

Process 0

slides2-27

Process 1

Destination
send(,1,);
lib()

send(,1,);

(a) Intended behavior

Source
recv(,0,);

lib()

recv(,0,);

Process 0

Process 1

send(,1,);

(b) Possible behavior

lib()

send(,1,);
recv(,0,);

lib()

recv(,0,);

slides2-28

MPI Solution
Communicators

A communication domain that defines a set of processes that are

allowed to communicate between themselves.

The communication domain of the library can be separated from

that of a user program.

Used in all point-to-point and collective MPI message-passing

communications.

slides2-29

Default Communicator
MPI_COMM_WORLD, exists as the first communicator for all the
processes existing in the application.

A set of MPI routines exists for forming communicators.

Processes have a rank in a communicator.

slides2-30

Point-to-Point Communication
Uses send and receive routines with message tags (and
communicator). Wild card message tags available

slides2-31

Blocking Routines
Return when they are locally complete - when location used to hold
message can be used again or altered without affecting message
being sent.

A blocking send will send the message and return. This does not
mean that the message has been received, just that the process is
free to move on without adversely affecting the message.

slides2-32

Parameters of the blocking send

slides2-33

Parameters of the blocking receive

slides2-34

Example
To send an integer x from process 0 to process 1,

MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 1, MPI_INT, 0,msgtag,MPI_COMM_WORLD,status);
}

slides2-35

Nonblocking Routines
Nonblocking send - MPI_Isend(), will return immediately even
before source location is safe to be altered.

Nonblocking receive - MPI_Irecv(), will return even if there is no

message to accept.

slides2-36

Nonblocking Routine Formats

MPI_Isend(buf, count, datatype, dest, tag, comm, request)

MPI_Irecv(buf, count, datatype, source, tag, comm, request)

Completion detected by MPI_Wait()and MPI_Test().

MPI_Wait()waits until operation completed and returns then.

MPI_Test() returns with flag set indicating whether operation
completed at that time.

Need to know whether particular operation completed.

Determined by accessing the request parameter.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-37

Example
To send an integer x from process 0 to process 1 and allow process
0 to continue,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Isend(&x,1,MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1);
compute();
MPI_Wait(req1, status);
} else if (myrank == 1) {
int x;
MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);
}

slides2-38

Four Send Communication Modes

Standard Mode Send
Not assumed that corresponding receive routine has started.
Amount of buffering not defined by MPI. If buffering provided, send
could complete before receive reached.
Buffered Mode
Send may start and return before a matching receive. Necessary to
specify buffer space via routine MPI_Buffer_attach()
.
Synchronous Mode
Send and receive can start before each other but can only complete
together.
Ready Mode
Send can only start if matching receive already reached, otherwise
error. Use with care.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-39

Each of the four modes can be applied to both blocking and

nonblocking send routines.

Only the standard mode is available for the blocking and

nonblocking receive routines.

Any type of send routine can be used with any type of receive
routine.

slides2-40

Collective Communication
Involves set of processes, defined by an intra-communicator.
Message tags not present.

Broadcast and Scatter Routines

The principal collective operations operating upon data are
MPI_Bcast()
MPI_Gather()
MPI_Scatter()
MPI_Alltoall()
MPI_Reduce()
MPI_Reduce_scatter()
MPI_Scan()

Broadcast from root to all other processes

Gather values for group of processes
Scatters buffer in parts to group of processes
Sends data from all processes to all processes
Combine values on all processes to single value
Combine values and scatter results
Compute prefix reductions of data on processes

slides2-41

Example
To gather items from the group of processes into process 0, using
dynamically allocated memory in the root process, we might use

int data[10];
/*data to be gathered from processes*/
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
MPI_Comm_size(MPI_COMM_WORLD, &grp_size);
/*find group size*/
buf = (int *)malloc(grp_size*10*sizeof(int));/*allocate memory*/
}
MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_WORLD);

Note that MPI_Gather()gathers from all processes, including root.

slides2-42

Barrier
As in all message-passing systems, MPI provides a means of
synchronizing processes by stopping each one until they all have
reached a specific barrier call.

slides2-43

#include mpi.h
Sample MPI program.
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) {
/* Open input file and initialize data */
strcpy(fn,getenv(HOME));
strcat(fn,/MPI/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open the input file: %s\n\n, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,%d, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(I got %d from %d\n, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(The sum is %d.\n, result);
MPI_Finalize();
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-44

Evaluating Parallel Programs

slides2-45

Equations for Parallel Execution Time

First concern is how fast parallel implementation is likely to be.
Might begin by estimating execution time on a single computer, ts,
by counting computational steps of best sequential algorithm.

For a parallel algorithm, in addition to number of computational

steps, need to estimate communication overhead.

Parallel execution time, tp, composed of two parts: a computation

part, say tcomp, and a communication part, say tcomm; i.e.,
tp = tcomp + tcomm
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-46

Computational Time
Can be estimated in a similar way to that of a sequential algorithm,
by counting number of computational steps. When more than one
process being executed simultaneously, count computational steps
of most complex process. Generally, some function of n and p, i.e.
tcomp = f(n, p)
The time units of tp are that of a computational step.
Often break down computation time into parts. Then
tcomp = tcomp1 + tcomp2 + tcomp3 +
where tcomp1, tcomp2, tcomp3 are computation times of each part.
Analysis usually done assuming that all processors are same and
operating at same speed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-47

Communication Time
Will depend upon the number of messages, the size of each
message, the underlying interconnection structure, and the mode of
transfer. Many factors, including network structure and network
contention. For a first approximation, we will use
tcomm1 = tstartup + ntdata
for communication time of message 1.
tstartup is the startup time, essentially the time to send a message
with no data. Assumed to be constant.
tdata is the transmission time to send one data word, also assumed
constant, and there are n data words.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-48

Idealized Communication Time

Startup time
Number of data items (n)

slides2-49

Final communication time, tcomm will be the summation of

communication times of all the sequential messages from a
process, i.e.
tcomm = tcomm1 + tcomm2 + tcomm3 +

Typically, the communication patterns of all the processes are the

same and assumed to take place together so that only one process
need be considered.

Both startup and data transmission times, tstartup and tdata, are
measured in units of one computational step, so that we can add
tcomp and tcomm together to obtain the parallel execution time, tp.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-50

Benchmark Factors
With ts, tcomp, and tcomm, can establish speedup factor and
computation/communication ratio for a particular algorithm/
implementation:
ts
ts
Speedup factor = ----- = -----------------------------------------tp
t comp + t comm
t comp
Computation/communication ratio = -----------------t comm

Both functions of number of processors, p, and number of data

elements, n.
Will give an indication of the scalability of the parallel solution with
increasing number of processors and problem size. Computation/
communication ratio will highlight effect of communication with
increasing problem size and system size.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-51

Debugging and Evaluating Parallel Programs Empirically

Visualization Tools
Programs can be watched as they are executed in a space-time
diagram (or process-time diagram):

Process 1
Process 2
Process 3
Computing
Time
Waiting
Message-passing system routine
Message
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-52

Implementations of visualization tools are available for MPI.

An example is the Upshot program visualization system.

slides2-53

Evaluating Programs Empirically

Measuring Execution Time
To measure the execution time between point

and point

in the

code, we might have a construction such as

.
L1: time(&t1);
/* start timer */
.
.
L2: time(&t2);
/* stop timer */
.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf(Elapsed time = %5.2f seconds, elapsed_time);

MPI provides the routine MPI_Wtime() for returning time (in

seconds).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-54

Parallel Programming Home Page

http://www.cs.uncc.edu/par_prog
Gives step-by-step instructions for compiling and executing
programs, and other information.

slides2-55

Basic Instructions for Compiling/Executing MPI

Programs
Preliminaries

Set up paths

Create required directory structure

Create a file (hostfile) listing machines to be used

(required)

Details described on home page.

slides2-56

Hostfile
Before starting MPI for the first time, need to create a hostfile
Sample hostfile
ws404
#is-sm1 //Currently not executing, commented
pvm1 //Active processors, UNCC sun cluster called pvm1 - pvm8
pvm2
pvm3
pvm4
pvm5
pvm6
pvm7
pvm8
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-57

Compiling/executing (SPMD) MPI program

For LAM MPI version 6.5.2. At a command line:
To start MPI:
First time:
lamboot -v hostfile
Subsequently:
lamboot
To compile MPI programs:
mpicc -o file file.c
or
mpiCC -o file file.cpp
To execute MPI program:
mpirun -v -np no_processors file
To remove processes for reboot
lamclean -v
Terminate LAM
lamhalt
If fails
wipe -v lamhost
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

slides2-58

Compiling/Executing Multiple MPI Programs

Create a file specifying programs:
Example
1 master and 2 slaves, appfile contains
n0 master
n0-1 slave
To execute:
mpirun -v appfile
Sample output
3292 master running on n0 (o)
3296 slave running on n0 (o)
412 slave running on n1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.

Message-Passing Computing
No ratings yet
Message-Passing Computing
34 pages
FALLSEM2020-21 CSE4001 ETH VL2020210104170 Reference Material I 02-Sep-2020 Module4-MessagePassing 1
No ratings yet
FALLSEM2020-21 CSE4001 ETH VL2020210104170 Reference Material I 02-Sep-2020 Module4-MessagePassing 1
12 pages
Load Balancing and Termination Detection
No ratings yet
Load Balancing and Termination Detection
67 pages
Distribute Computing Another Slide
No ratings yet
Distribute Computing Another Slide
92 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
51 pages
Slides 1
No ratings yet
Slides 1
28 pages
Message-Passing Multicomputer
No ratings yet
Message-Passing Multicomputer
57 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Lec 9 DR Marwa Abbas
No ratings yet
Lec 9 DR Marwa Abbas
64 pages
Mpi Openmp Handouts
No ratings yet
Mpi Openmp Handouts
67 pages
Programming Using The Message-Passing Paradigm
No ratings yet
Programming Using The Message-Passing Paradigm
47 pages
PDC Lecture 16 MPI - Net-New
No ratings yet
PDC Lecture 16 MPI - Net-New
59 pages
MiniTool Partition Wizard Crack 12 Key Download Free 2025
0% (1)
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Mpi Course
No ratings yet
Mpi Course
93 pages
Module 203 20 - 20MPI 20for 20cluster 20computing 20lec
No ratings yet
Module 203 20 - 20MPI 20for 20cluster 20computing 20lec
30 pages
Unit-II Part I
No ratings yet
Unit-II Part I
15 pages
PDC Week 11 Synchronization
No ratings yet
PDC Week 11 Synchronization
6 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
Parallel Programming Models & Techniques
No ratings yet
Parallel Programming Models & Techniques
268 pages
Parallel Programming - Slides
No ratings yet
Parallel Programming - Slides
268 pages
Apk Nokepoi
No ratings yet
Apk Nokepoi
29 pages
Lect5 PDF
No ratings yet
Lect5 PDF
24 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
Module-4 CC Notes
No ratings yet
Module-4 CC Notes
7 pages
25th 26th Lecture
No ratings yet
25th 26th Lecture
24 pages
Lecture 12-MPI Collective Communication
No ratings yet
Lecture 12-MPI Collective Communication
53 pages
Group6 P&DC Presentation
No ratings yet
Group6 P&DC Presentation
18 pages
MPI Collective
No ratings yet
MPI Collective
33 pages
Lecture 04
No ratings yet
Lecture 04
58 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
48 pages
Unit3 All
No ratings yet
Unit3 All
115 pages
Parallel Programming With Message-Passing Interface (MPI)
No ratings yet
Parallel Programming With Message-Passing Interface (MPI)
6 pages
Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Mpi
No ratings yet
Mpi
46 pages
MPI Parallel Processing Course
No ratings yet
MPI Parallel Processing Course
49 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
Message Passing Architecture
No ratings yet
Message Passing Architecture
32 pages
Parallel Programming and MPI
No ratings yet
Parallel Programming and MPI
54 pages
High Performance Computing (HPC) Lec4
No ratings yet
High Performance Computing (HPC) Lec4
32 pages
Computing LLNL Gov
No ratings yet
Computing LLNL Gov
42 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
6 PLP Partitioning
No ratings yet
6 PLP Partitioning
47 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
02 Message Passing Interface Tutorial
No ratings yet
02 Message Passing Interface Tutorial
34 pages
MPI Using Java PDF
No ratings yet
MPI Using Java PDF
22 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
CH 6
No ratings yet
CH 6
47 pages
Intro To MPI: Hpc-Support@duke - Edu
No ratings yet
Intro To MPI: Hpc-Support@duke - Edu
56 pages
Introduction To Parallel Computing: Aamir Shafi Khizar Hussain
No ratings yet
Introduction To Parallel Computing: Aamir Shafi Khizar Hussain
101 pages
ch5 MPI
No ratings yet
ch5 MPI
53 pages
The SOC Hiring Handbook: Your Guide To Building and Retaining A Strong Security Team
100% (2)
The SOC Hiring Handbook: Your Guide To Building and Retaining A Strong Security Team
28 pages
Cyber+Capability+Toolkit+ +Cyber+Incident+Response+ +Malware+Playbook+v2.3
100% (1)
Cyber+Capability+Toolkit+ +Cyber+Incident+Response+ +Malware+Playbook+v2.3
22 pages
MAS - Methodology To Select Security Information and Event Management (SIEM) Use Cases
No ratings yet
MAS - Methodology To Select Security Information and Event Management (SIEM) Use Cases
35 pages
Cybersecurity Incident Playbook
No ratings yet
Cybersecurity Incident Playbook
14 pages
Cybersecurity Incident Response Tools
100% (2)
Cybersecurity Incident Response Tools
37 pages
Cybersecurity Incident Playbook
No ratings yet
Cybersecurity Incident Playbook
14 pages
Threat Intelligence Info
No ratings yet
Threat Intelligence Info
8 pages
93E 15-60kVA Extended Batteries June 2015
No ratings yet
93E 15-60kVA Extended Batteries June 2015
1 page
FortiOS-6 2 0-Cookbook PDF
No ratings yet
FortiOS-6 2 0-Cookbook PDF
1,221 pages
93E 15-80 KVA Guide Specification
No ratings yet
93E 15-80 KVA Guide Specification
11 pages
Registration Activities MainCampus Sept16
No ratings yet
Registration Activities MainCampus Sept16
1 page
Digital Electronics
No ratings yet
Digital Electronics
6 pages
Leica City Mapper
No ratings yet
Leica City Mapper
4 pages
Electronic Word of Mouth PDF
No ratings yet
Electronic Word of Mouth PDF
148 pages
Track Consignment
No ratings yet
Track Consignment
2 pages
DocEng Mars Project
No ratings yet
DocEng Mars Project
14 pages
Experiment-1 Aim:: Create An Application To Save The Employee Information Using Arrays
No ratings yet
Experiment-1 Aim:: Create An Application To Save The Employee Information Using Arrays
6 pages
Group Technology and Cellular Manufacturing-I
100% (1)
Group Technology and Cellular Manufacturing-I
20 pages
RFID Reference Guide
No ratings yet
RFID Reference Guide
2 pages
Ge3151 Python Unit 5
No ratings yet
Ge3151 Python Unit 5
15 pages
DN296 - GreenPerform Sleek
No ratings yet
DN296 - GreenPerform Sleek
14 pages
0603 Implementing SAP HANA On SLT With A Non SAP Source System
No ratings yet
0603 Implementing SAP HANA On SLT With A Non SAP Source System
45 pages
CENG280 Info PDF
No ratings yet
CENG280 Info PDF
2 pages
Implementing AAA with Cisco ACS
No ratings yet
Implementing AAA with Cisco ACS
28 pages
Final
100% (1)
Final
14 pages
C Program Syntax
No ratings yet
C Program Syntax
27 pages
Describe The Purpose of The Following in A Computer System and Give An Example of Each
No ratings yet
Describe The Purpose of The Following in A Computer System and Give An Example of Each
3 pages
Creating A Free Pdfwriter Using Ghostscript: Download Required Packages
No ratings yet
Creating A Free Pdfwriter Using Ghostscript: Download Required Packages
19 pages
SPSS Basics Manual
No ratings yet
SPSS Basics Manual
25 pages
ESET225 Assignment 1 W15
No ratings yet
ESET225 Assignment 1 W15
3 pages
SAP Cost Element Accounting Guide
No ratings yet
SAP Cost Element Accounting Guide
4 pages
E3D2 1version-1
No ratings yet
E3D2 1version-1
12 pages
(7-4) Overheads and Problems PDF
No ratings yet
(7-4) Overheads and Problems PDF
3 pages
Skyjet 3318 USB20 en
No ratings yet
Skyjet 3318 USB20 en
80 pages
Symbolic Math Operations Guide
No ratings yet
Symbolic Math Operations Guide
4 pages
Course Syllabus CS203 Logic Design SY 2009-2010
No ratings yet
Course Syllabus CS203 Logic Design SY 2009-2010
5 pages
15 Traveling Salesman Problem: Author
No ratings yet
15 Traveling Salesman Problem: Author
25 pages
Simplex Method
100% (4)
Simplex Method
16 pages
Blood Bank Presentation
100% (1)
Blood Bank Presentation
15 pages
Samanthadruceresume
No ratings yet
Samanthadruceresume
1 page
Steps To Create ZAM - MARA1 Table
100% (1)
Steps To Create ZAM - MARA1 Table
7 pages

Slides 2

Uploaded by

Slides 2

Uploaded by

slides2-1

Basics of Message-Passing Programming using

1. A method of creating separate processes for execution on

2. A method of sending and receiving messages

Multiple program, multiple data (MPMD) model

Single Program Multiple Data (SPMD) model

Basic MPI way

Multiple Program Multiple Data (MPMD) Model

Basic point-to-point Send and Receive

Generic syntax (actual formats later)

Synchronous Message Passing

Synchronous send() and recv() library calls using 3-way protocol

(a) When send() occurs before recv()

(b) When recv() occurs before send()

Asynchronous Message Passing

In general, they do not synchronize processes but allow processes

MPI Definitions of Blocking and Non-Blocking

Non-blocking - return immediately.

These terms may have different interpretations in other systems.

How message-passing routines can return

Asynchronous (blocking) routines changing to

Message tag is carried within message.

If special type matching is not required, a wild card message tag is

Message Tag Example

Waits for a message from process 1 with a tag of 5

Group message passing routines

PVM (Parallel Virtual Machine)

Message routing between computers done by PVM daemon processes

MPI implementation we use is similar.

MPI (Message Passing Interface)

Defines routines, not implementation.

Several free implementations exist.

Purposely not defined and will depend upon the implementation.

Only static process creation is supported in MPI version 1. All

Orginally SPMD model of computation.

MPMD also possible with static creation - each program to be

Processes have ranks associated with communicator.

MPI_COMM_WORLD, and each process is given a unique rank, a

Other communicators can be established for groups of processes.

Using the SPMD Computational Model

where master() and slave() are procedures to be executed by

Unsafe Message Passing

Unsafe message passing with libraries

(a) Intended behavior

(b) Possible behavior

A communication domain that defines a set of processes that are

The communication domain of the library can be separated from

Used in all point-to-point and collective MPI message-passing

A set of MPI routines exists for forming communicators.

Parameters of the blocking send

MPI_Send(buf, count, datatype, dest, tag, comm)

Parameters of the blocking receive

MPI_Recv(buf, count, datatype, src, tag, comm, status)

Nonblocking receive - MPI_Irecv(), will return even if there is no

Nonblocking Routine Formats

MPI_Irecv(buf, count, datatype, source, tag, comm, request)

Completion detected by MPI_Wait()and MPI_Test().

MPI_Wait()waits until operation completed and returns then.

Need to know whether particular operation completed.

Four Send Communication Modes

Each of the four modes can be applied to both blocking and

Only the standard mode is available for the blocking and

Broadcast and Scatter Routines

Broadcast from root to all other processes

Note that MPI_Gather()gathers from all processes, including root.

Evaluating Parallel Programs

Equations for Parallel Execution Time

For a parallel algorithm, in addition to number of computational

Parallel execution time, tp, composed of two parts: a computation

Idealized Communication Time

Final communication time, tcomm will be the summation of

Typically, the communication patterns of all the processes are the

Both functions of number of processors, p, and number of data

Debugging and Evaluating Parallel Programs Empirically

Implementations of visualization tools are available for MPI.

An example is the Upshot program visualization system.