Point-to-Point - IV
Lecture 7
      January 29, 2024
Performance of Send Modes
                            MPI_Send
                            MPI_Bsend
                            MPI_Ssend
                            Rendezvous
                            Forced buffering
                            Forced synchronization
                                                     2
Example
          3
MPI_Bsend
The size given should be the sum of the sizes of all outstanding Bsends that you intend to have,
plus 'MPI_BSEND_OVERHEAD' for each Bsend that you do.
                                                                                                   4
Nearest Neighbor (NN) Exchange
      0                          P-1
                                       5
 Nearest Neighbor Pseudocode                                                      Tags?
                                                                                  Performance?
                          Option 1: Schedule right sends followed by left sends
if (myrank < P-1)
 {
    // Send/recv right neighbor
                                                              ? MPI_COMM_WORLD);
    MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1,
                                                               ? MPI_COMM_WORLD, &status);
    MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank,
 }
if (myrank > 0)
{
   // Send/recv left neighbor
   MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank,? MPI_COMM_WORLD, &status);
                                                             ? MPI_COMM_WORLD);
   MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1,
}
                                                                                                 6
Output
         7
 Nearest Neighbor Pseudocode
                   Option 2: Schedule odd and even ranks alternately
if (myrank % 2 == 0 && myrank < P-1)
{
   // Send/recv right neighbour from even ranks
   MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1, MPI_COMM_WORLD);
   MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank, MPI_COMM_WORLD, &status);
}
else if (myrank % 2 != 0 && myrank > 0)
{
  // Send/recv left neighbor
  MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank, MPI_COMM_WORLD, &status);
  MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1, MPI_COMM_WORLD);
}
                                                                                             8
 Nearest Neighbor Pseudocode
if (myrank % 2 != 0 && myrank < P-1)
{
   // Send/recv right neighbor from odd ranks
   MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1, MPI_COMM_WORLD);
   MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank, MPI_COMM_WORLD, &status);
}
else if (myrank % 2 == 0 && myrank > 0)
{
  // Send/recv left neighbor
  MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank, MPI_COMM_WORLD, &status);
  MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1, MPI_COMM_WORLD);
}
                                                                                             9
Same Host (Option 1 vs. 2)
for i in `seq 1 5` ; do mpirun -np 4 ./nn-1 1000000 ; done
0.006751
0.006896
0.006518
0.006310
0.006356
for i in `seq 1 5` ; do mpirun -np 4 ./nn-2 1000000 ; done
0.006183
0.017730
0.006718
0.006862
                                                             10
0.006701
Two Hosts (Option 1 vs. 2)
for i in `seq 1 5` ; do mpirun -np 4 -hosts csews1,csews10 ./nn-1 1000000 ; done
0.450281
0.426031
0.419316
0.445110
0.416786
or i in `seq 1 5` ; do mpirun -np 4 -hosts csews1,csews10 ./nn-2 1000000 ; done
0.405743
0.423926
0.410813
0.420823
                                                                                   11
0.430066
Timing Option 1 vs. Option 2
                               12
Timing NN
            13
P2P Blocking – Performance Bottleneck
• MPI_Send (buf, count, datatype, dest, tag, comm)
• MPI_Recv (buf, count, datatype, source, tag, comm, status)
                    0               1
               MPI_Send (1)
                                                 Safe but may delay sender
                                MPI_Recv (0)
                                                                             14
Computation Communication Overlap
              0        1
           compute
            Send     compute   Time
           compute    Recv
            Wait     compute
           compute
                                      15
Non-blocking Point-to-Point
   • MPI_Isend (buf, count, datatype, dest, tag, comm, request)
   • MPI_Irecv (buf, count, datatype, source, tag, comm, request)
   • MPI_Wait (request, status)
   • MPI_Waitall (count, request, status)
                                                                    16
Many-to-one Non-blocking P2P
                               17
Output
         18
Non-blocking Performance
• Standard does not require overlapping communication and
  computation
• Implementation may use a thread to move data in parallel
• Implementation can delay the initiation of data transfer until “Wait”
• MPI_Test – non-blocking, tests completion, starts progress
• MPIR_CVAR_ASYNC_PROGRESS (MPICH)
                                                                          19
Asynchronous Communication Progress
                                      20
Non-blocking Point-to-Point Safety
• MPI_Isend (buf, count, datatype, dest, tag, comm, request)
• MPI_Irecv (buf, count, datatype, source, tag, comm, request)
• MPI_Wait (request, status)
                    0              1
               MPI_Isend       MPI_Isend            Safe
               MPI_Recv        MPI_Recv
                                                                 21
Homework: NN 1D using Non-blocking
     0                        P-1
                                     22
Process Mapping/Allocation
 0       1                   10   11
     0       1   2   3
     4               7
     8               11
                                       23
Attributes of Interconnects
      • Topology
      • Diameter
      • Cost
      • Anything else?
                              24