slides2-1
Chapter 2
Message-Passing Computing
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-2
Basics of Message-Passing Programming using
User-level Message Passing Libraries
Two primary mechanisms needed:
1. A method of creating separate processes for execution on
different computers
2. A method of sending and receiving messages
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-3
Multiple program, multiple data (MPMD) model
Source
file
Source
file
Compile to suit
processor
Executables
Processor 0
Processor p - 1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-4
Single Program Multiple Data (SPMD) model
Different processes merged into one program. Within program,
control statements select different parts for each processor to
execute. All executables started together - static process creation.
Source
file
Basic MPI way
Compile to suit
processor
Executables
Processor 0
Processor p 1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-5
Multiple Program Multiple Data (MPMD) Model
Separate programs for each processor. Master-slave approach
usually taken. One processor executes master process. Other
processes started from within master process - dynamic process
creation.
Process 1
spawn();
Start execution
of process 2
Process 2
Time
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-6
Basic point-to-point Send and Receive
Routines
Passing a message between processes using send() and recv()
library calls:
Process 1
Process 2
send(&x, 2);
Movement
of data
recv(&y, 1);
Generic syntax (actual formats later)
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-7
Synchronous Message Passing
Routines that actually return when message transfer completed.
Synchronous send routine
Waits until complete message can be accepted by the receiving
process before sending the message.
Synchronous receive routine
Waits until the message it is expecting arrives.
Synchronous routines intrinsically perform two actions: They
transfer data and they synchronize processes.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-8
Synchronous send() and recv() library calls using 3-way protocol
Process 1
Process 2
Time
Suspend
process
Both processes
continue
send();
Request to send
Acknowledgment
recv();
Message
(a) When send() occurs before recv()
Process 1
Process 2
Time
recv();
Request to send
send();
Both processes
continue
Suspend
process
Message
Acknowledgment
(b) When recv() occurs before send()
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-9
Asynchronous Message Passing
Routines that do not wait for actions to complete before returning.
Usually require local storage for messages.
More than one version depending upon the actual semantics for
returning.
In general, they do not synchronize processes but allow processes
to move forward sooner. Must be used with care.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-10
MPI Definitions of Blocking and Non-Blocking
Blocking - return after their local actions complete, though the
message transfer may not have been completed.
Non-blocking - return immediately.
Assumes that data storage to be used for transfer not modified by
subsequent statements prior to tbeing used for transfer, and it is left
to the programmer to ensure this.
These terms may have different interpretations in other systems.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-11
How message-passing routines can return
before message transfer completed
Message buffer needed between source and destination to hold
message:
Process 1
Process 2
Message buffer
Time
send();
Continue
process
recv();
Read
message buffer
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-12
Asynchronous (blocking) routines changing to
synchronous routines
Once local actions completed and message is safely on its way,
sending process can continue with subsequent work.
Buffers only of finite length and a point could be reached when send
routine held up because all available buffer space exhausted.
Then, send routine will wait until storage becomes re-available - i.e
then routine behaves as a synchronous routine.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-13
Message Tag
Used to differentiate between different types of messages being
sent.
Message tag is carried within message.
If special type matching is not required, a wild card message tag is
used, so that the recv() will match with any send().
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-14
Message Tag Example
To send a message, x, with message tag 5 from a source process,
1, to a destination process, 2, and assign to y:
Process 1
Process 2
send(&x,2,5);
Movement
of data
recv(&y,1,5);
Waits for a message from process 1 with a tag of 5
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-15
Group message passing routines
Apart from point-to-point message passing routines, have routines
that send message(s) to a group of processes or receive
message(s) from a group of processes - higher efficiency than
separate point-to-point routines although not absolutely necessary.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-16
Broadcast
Sending same message to all processes concerned with problem.
Multicast - sending same message to defined group of processes.
Process 0
data
Process 1
data
Process p 1
data
Action
buf
Code
bcast();
bcast();
bcast();
MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-17
Scatter
Sending each element of an array in root process to a separate
process. Contents of ith location of array sent to ith process.
Process 0
Process 1
data
data
Process p 1
data
Action
buf
Code
scatter();
scatter();
scatter();
MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-18
Gather
Having one process collect individual values from set of processes.
Process 0
Process 1
Process p 1
data
data
data
gather();
gather();
gather();
Action
buf
Code
MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-19
Reduce
Gather operation
operation.
combined
with
specified
arithmetic/logical
Example
Values could be gathered and then added together by root:
Process p 1
Process 0
Process 1
data
data
data
reduce();
reduce();
Action
buf
+
Code
reduce();
MPI form
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-20
PVM (Parallel Virtual Machine)
Perhaps first widely adopted attempt at using a workstation cluster
as a multicomputer platform, developed by Oak Ridge National
Laboratories. Available at no charge.
Programmer decomposes problem into separate programs (usually
a master program and a group of identical slave programs).
Each program compiled to execute on specific types of computers.
Set of computers used on a problem first must be defined prior to
executing the programs (in a hostfile).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-21
Message routing between computers done by PVM daemon processes
installed by PVM on computers that form the virtual machine.
Workstation
Can have more than one process
running on each computer.
Workstation
PVM
daemon
Application
program
(executable)
Messages
sent through
network
Workstation
PVM
daemon
Application
program
(executable)
PVM
daemon
Application
program
(executable)
MPI implementation we use is similar.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-22
MPI (Message Passing Interface)
Standard developed by group of academics and industrial partners
to foster more widespread use and portability.
Defines routines, not implementation.
Several free implementations exist.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-23
MPI
Process Creation and Execution
Purposely not defined and will depend upon the implementation.
Only static process creation is supported in MPI version 1. All
processes must be defined prior to execution and started together.
Orginally SPMD model of computation.
MPMD also possible with static creation - each program to be
started together specified.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-24
Communicators
Defines scope of a communication operation.
Processes have ranks associated with communicator.
Initially,
all
processes
enrolled
in
universe
called
MPI_COMM_WORLD, and each process is given a unique rank, a
number from 0 to p 1, where there are p processes.
Other communicators can be established for groups of processes.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-25
Using the SPMD Computational Model
main (int argc, char *argv[])
{
MPI_Init(&argc, &argv);
.
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/*find process rank */
if (myrank == 0)
master();
else
slave();
.
.
MPI_Finalize();
}
where master() and slave() are procedures to be executed by
master process and slave process, respectively.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-26
Unsafe Message Passing
MPI specifically addresses unsafe message passing.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
Unsafe message passing with libraries
Process 0
slides2-27
Process 1
Destination
send(,1,);
lib()
send(,1,);
(a) Intended behavior
Source
recv(,0,);
lib()
recv(,0,);
Process 0
Process 1
send(,1,);
(b) Possible behavior
lib()
send(,1,);
recv(,0,);
lib()
recv(,0,);
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-28
MPI Solution
Communicators
A communication domain that defines a set of processes that are
allowed to communicate between themselves.
The communication domain of the library can be separated from
that of a user program.
Used in all point-to-point and collective MPI message-passing
communications.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-29
Default Communicator
MPI_COMM_WORLD, exists as the first communicator for all the
processes existing in the application.
A set of MPI routines exists for forming communicators.
Processes have a rank in a communicator.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-30
Point-to-Point Communication
Uses send and receive routines with message tags (and
communicator). Wild card message tags available
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-31
Blocking Routines
Return when they are locally complete - when location used to hold
message can be used again or altered without affecting message
being sent.
A blocking send will send the message and return. This does not
mean that the message has been received, just that the process is
free to move on without adversely affecting the message.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-32
Parameters of the blocking send
MPI_Send(buf, count, datatype, dest, tag, comm)
Datatype of
Address of
Message tag
each item
send buffer
Rank of destination Communicator
Number of items
process
to send
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-33
Parameters of the blocking receive
MPI_Recv(buf, count, datatype, src, tag, comm, status)
Status
Address of
Datatype of
Message tag after operation
receive buffer
each item
Maximum number
Rank of source Communicator
of items to receive
process
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-34
Example
To send an integer x from process 0 to process 1,
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Send(&x, 1, MPI_INT, 1, msgtag, MPI_COMM_WORLD);
} else if (myrank == 1) {
int x;
MPI_Recv(&x, 1, MPI_INT, 0,msgtag,MPI_COMM_WORLD,status);
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-35
Nonblocking Routines
Nonblocking send - MPI_Isend(), will return immediately even
before source location is safe to be altered.
Nonblocking receive - MPI_Irecv(), will return even if there is no
message to accept.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-36
Nonblocking Routine Formats
MPI_Isend(buf, count, datatype, dest, tag, comm, request)
MPI_Irecv(buf, count, datatype, source, tag, comm, request)
Completion detected by MPI_Wait()and MPI_Test().
MPI_Wait()waits until operation completed and returns then.
MPI_Test() returns with flag set indicating whether operation
completed at that time.
Need to know whether particular operation completed.
Determined by accessing the request parameter.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-37
Example
To send an integer x from process 0 to process 1 and allow process
0 to continue,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
int x;
MPI_Isend(&x,1,MPI_INT, 1, msgtag, MPI_COMM_WORLD, req1);
compute();
MPI_Wait(req1, status);
} else if (myrank == 1) {
int x;
MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-38
Four Send Communication Modes
Standard Mode Send
Not assumed that corresponding receive routine has started.
Amount of buffering not defined by MPI. If buffering provided, send
could complete before receive reached.
Buffered Mode
Send may start and return before a matching receive. Necessary to
specify buffer space via routine MPI_Buffer_attach()
.
Synchronous Mode
Send and receive can start before each other but can only complete
together.
Ready Mode
Send can only start if matching receive already reached, otherwise
error. Use with care.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-39
Each of the four modes can be applied to both blocking and
nonblocking send routines.
Only the standard mode is available for the blocking and
nonblocking receive routines.
Any type of send routine can be used with any type of receive
routine.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-40
Collective Communication
Involves set of processes, defined by an intra-communicator.
Message tags not present.
Broadcast and Scatter Routines
The principal collective operations operating upon data are
MPI_Bcast()
MPI_Gather()
MPI_Scatter()
MPI_Alltoall()
MPI_Reduce()
MPI_Reduce_scatter()
MPI_Scan()
Broadcast from root to all other processes
Gather values for group of processes
Scatters buffer in parts to group of processes
Sends data from all processes to all processes
Combine values on all processes to single value
Combine values and scatter results
Compute prefix reductions of data on processes
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-41
Example
To gather items from the group of processes into process 0, using
dynamically allocated memory in the root process, we might use
int data[10];
/*data to be gathered from processes*/
.
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
/* find rank */
if (myrank == 0) {
MPI_Comm_size(MPI_COMM_WORLD, &grp_size);
/*find group size*/
buf = (int *)malloc(grp_size*10*sizeof(int));/*allocate memory*/
}
MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0,MPI_COMM_WORLD);
Note that MPI_Gather()gathers from all processes, including root.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-42
Barrier
As in all message-passing systems, MPI provides a means of
synchronizing processes by stopping each one until they all have
reached a specific barrier call.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-43
#include mpi.h
Sample MPI program.
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) {
/* Open input file and initialize data */
strcpy(fn,getenv(HOME));
strcat(fn,/MPI/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open the input file: %s\n\n, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,%d, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(I got %d from %d\n, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(The sum is %d.\n, result);
MPI_Finalize();
}
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-44
Evaluating Parallel Programs
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-45
Equations for Parallel Execution Time
First concern is how fast parallel implementation is likely to be.
Might begin by estimating execution time on a single computer, ts,
by counting computational steps of best sequential algorithm.
For a parallel algorithm, in addition to number of computational
steps, need to estimate communication overhead.
Parallel execution time, tp, composed of two parts: a computation
part, say tcomp, and a communication part, say tcomm; i.e.,
tp = tcomp + tcomm
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-46
Computational Time
Can be estimated in a similar way to that of a sequential algorithm,
by counting number of computational steps. When more than one
process being executed simultaneously, count computational steps
of most complex process. Generally, some function of n and p, i.e.
tcomp = f(n, p)
The time units of tp are that of a computational step.
Often break down computation time into parts. Then
tcomp = tcomp1 + tcomp2 + tcomp3 +
where tcomp1, tcomp2, tcomp3 are computation times of each part.
Analysis usually done assuming that all processors are same and
operating at same speed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-47
Communication Time
Will depend upon the number of messages, the size of each
message, the underlying interconnection structure, and the mode of
transfer. Many factors, including network structure and network
contention. For a first approximation, we will use
tcomm1 = tstartup + ntdata
for communication time of message 1.
tstartup is the startup time, essentially the time to send a message
with no data. Assumed to be constant.
tdata is the transmission time to send one data word, also assumed
constant, and there are n data words.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-48
Idealized Communication Time
Startup time
Number of data items (n)
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-49
Final communication time, tcomm will be the summation of
communication times of all the sequential messages from a
process, i.e.
tcomm = tcomm1 + tcomm2 + tcomm3 +
Typically, the communication patterns of all the processes are the
same and assumed to take place together so that only one process
need be considered.
Both startup and data transmission times, tstartup and tdata, are
measured in units of one computational step, so that we can add
tcomp and tcomm together to obtain the parallel execution time, tp.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-50
Benchmark Factors
With ts, tcomp, and tcomm, can establish speedup factor and
computation/communication ratio for a particular algorithm/
implementation:
ts
ts
Speedup factor = ----- = -----------------------------------------tp
t comp + t comm
t comp
Computation/communication ratio = -----------------t comm
Both functions of number of processors, p, and number of data
elements, n.
Will give an indication of the scalability of the parallel solution with
increasing number of processors and problem size. Computation/
communication ratio will highlight effect of communication with
increasing problem size and system size.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-51
Debugging and Evaluating Parallel Programs Empirically
Visualization Tools
Programs can be watched as they are executed in a space-time
diagram (or process-time diagram):
Process 1
Process 2
Process 3
Computing
Time
Waiting
Message-passing system routine
Message
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-52
Implementations of visualization tools are available for MPI.
An example is the Upshot program visualization system.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-53
Evaluating Programs Empirically
Measuring Execution Time
To measure the execution time between point
L1
and point
L2
in the
code, we might have a construction such as
.
L1: time(&t1);
/* start timer */
.
.
L2: time(&t2);
/* stop timer */
.
elapsed_time = difftime(t2, t1); /* elapsed_time = t2 - t1 */
printf(Elapsed time = %5.2f seconds, elapsed_time);
MPI provides the routine MPI_Wtime() for returning time (in
seconds).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-54
Parallel Programming Home Page
http://www.cs.uncc.edu/par_prog
Gives step-by-step instructions for compiling and executing
programs, and other information.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-55
Basic Instructions for Compiling/Executing MPI
Programs
Preliminaries
Set up paths
Create required directory structure
Create a file (hostfile) listing machines to be used
(required)
Details described on home page.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-56
Hostfile
Before starting MPI for the first time, need to create a hostfile
Sample hostfile
ws404
#is-sm1 //Currently not executing, commented
pvm1 //Active processors, UNCC sun cluster called pvm1 - pvm8
pvm2
pvm3
pvm4
pvm5
pvm6
pvm7
pvm8
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-57
Compiling/executing (SPMD) MPI program
For LAM MPI version 6.5.2. At a command line:
To start MPI:
First time:
lamboot -v hostfile
Subsequently:
lamboot
To compile MPI programs:
mpicc -o file file.c
or
mpiCC -o file file.cpp
To execute MPI program:
mpirun -v -np no_processors file
To remove processes for reboot
lamclean -v
Terminate LAM
lamhalt
If fails
wipe -v lamhost
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.
slides2-58
Compiling/Executing Multiple MPI Programs
Create a file specifying programs:
Example
1 master and 2 slaves, appfile contains
n0 master
n0-1 slave
To execute:
mpirun -v appfile
Sample output
3292 master running on n0 (o)
3296 slave running on n0 (o)
412 slave running on n1
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M. Allen, 2004 Pearson Education Inc. All rights reserved.