Distributed systems
and Cloud Computing
Lecture 2
Message Passing
Interface - Part 2
Message Passing interface
Core MPI Functions
Most MPI programs can be written using just these six core functions:
● MPI_Init
○ int MPI_Init(int *argc, char ***argv)
○ initialize the MPI library (must be the first routine called)
● MPI_Comm_size
○ int MPI_Comm_size(MPI_Comm comm, int *size)
○ get the size of a communicator
● MPI_Comm_rank
○ int MPI_Comm_rank(MPI_Comm comm, int *rank)
○ get the rank of the calling process in the communicator
Core MPI Functions
● MPI_Send
○ int MPI_Send(const void *buf, int count, MPI_Datatype datatype,
int dest, int tag, MPI_Comm comm)
○ send a message to another process
● MPI_Recv
○ int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Status *status)
○ receive a message from another process
● MPI_Finalize
○ int MPI_Finalize()
○ clean up all MPI state (must be the last MPI function called by a
process)
MPI Example
MPI Communication Functions
There are two types of communication functions in MPI
● Blocking Communication Functions
● Non-blocking Communication Functions
Blocking Communication Functions
Blocking communication are routines where the completion of the call
is dependent on certain “events”.
For sends, the data must be successfully sent or safely copied to
system buffer space and for receives, the data must be safely stored in
the receive buffer.
Generally, this should be the go-to option for direct communication.
The functions are as follows:
● MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm)
● MPI_Recv(void* buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Status *status)
STATUS OBJECT
● The status object is used after completion of a receive to find the
actual length, source, and tag of a message.
● Status object is MPI-defined type and provides information about:
○ The source process for the message (status.MPI_SOURCE)
○ The message tag (status.MPI_TAG)
○ Error status (status.MPI_ERROR)
● The number of elements received is given by:
MPI_Get_count(MPI_Status *status, MPI_Datatype datatype,int
*count)
○ status - return status of receive operation
○ datatype - datatype of each receive buffer element (handle)
○ count - number of received elements (integer)(OUT)
Example Using the “status” object
Non-blocking Communication Functions
A communication routine is non-blocking if the call returns without
waiting for the communications to complete.
It is the programmer’s responsibility to insure that the buffer is free for
reuse.
These are primarily used to increase performance by overlapping
computation with communication.
It is recommended to first get your program working using blocking
communication before attempting to use non-blocking functions.
The functions are as follows:
● MPI_Isend(void* buf, int count, MPI_Datatype datatype, int dest,
int tag, MPI_Comm comm, MPI_Request *request)
● MPI_Irecv(void* buf, int count, MPI_Datatype datatype, int
source, int tag, MPI_Comm comm, MPI_Request *request)
Non-blocking Communication Example
Data movement (Collective Communication)
● MPI_Bcast: Broadcast sends a message from the process with rank
“root” to all other processes in the group.
● MPI_Scatter: The scatter operation performs a one-to-all
communication. It splits the message into n equal segments with
the ith segment sent to the ith process in the group
● MPI_Gather: This function is the logical opposite to MPI_Scatter, it
performs an all-to-one communication where a total of n data items
are collected at a single process for all of the other processes.
● MPI_Alltoall: This function is an all-to-all version of MPI_Scatter,
where every process is sending and receiving n data segments
● MPI_Alltoallv: This is a generalization of MPI_Alltoall where each
process sends/receives a customizable amount of data to/from
each process.
Broadcasting with MPI
● A broadcast is one of the standard collective communication
techniques.
● During a broadcast, one process sends the same data to all
processes in a communicator.
● One of the main uses of broadcasting is to send out user input to a
parallel program, or send out configuration parameters to all
processes.
MPI_Broadcast
● The function for broadcast in MPI is MPI_Bcast()
○ MPI_Bcast(void* data, int count, MPI_Datatype datatype, int
sender, MPI_Comm communicator)
● Although the root process and receiver processes do different jobs,
they all call the same MPI_Bcast function.
● When the root process calls MPI_Bcast, the data variable will be
sent to all other processes.
● When all of the receiver processes call MPI_Bcast, the data variable
will be filled in with the data from the root process.
MPI_Broadcast Example
Synchronization (Collective Communication)
Processes wait until all members of the group have reached the
synchronization point.
The function used to do this is:
MPI_Barrier (comm)
This causes each process, when reaching the MPI_Barrier call, to block
until all tasks in the group reach the same MPI_Barrier call.
Synchronization (Collective Communication)
MPI_Barrier(MPI_Comm comm)
Synchronization Example
Reductions (Collective computation)
One member of the group collects data from the other members and
performs an operation (min, max, add, multiply, etc.) on that data.
The function to do this is:
MPI_Reduce
One of the parameters of this function is the operation to be performed.
Examples of this are: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, etc.