0% found this document useful (0 votes)
38 views24 pages

7 P2P-4

Uploaded by

spareyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views24 pages

7 P2P-4

Uploaded by

spareyash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Point-to-Point - IV

Lecture 7
January 29, 2024
Performance of Send Modes
MPI_Send
MPI_Bsend
MPI_Ssend

Rendezvous
Forced buffering
Forced synchronization

2
Example

3
MPI_Bsend
The size given should be the sum of the sizes of all outstanding Bsends that you intend to have,
plus 'MPI_BSEND_OVERHEAD' for each Bsend that you do.

4
Nearest Neighbor (NN) Exchange

0 P-1

5
Nearest Neighbor Pseudocode Tags?
Performance?
Option 1: Schedule right sends followed by left sends
if (myrank < P-1)
{
// Send/recv right neighbor
? MPI_COMM_WORLD);
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1,
? MPI_COMM_WORLD, &status);
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank,
}

if (myrank > 0)
{
// Send/recv left neighbor
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank,? MPI_COMM_WORLD, &status);
? MPI_COMM_WORLD);
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1,
}
6
Output

7
Nearest Neighbor Pseudocode
Option 2: Schedule odd and even ranks alternately

if (myrank % 2 == 0 && myrank < P-1)


{
// Send/recv right neighbour from even ranks
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1, MPI_COMM_WORLD);
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank, MPI_COMM_WORLD, &status);
}

else if (myrank % 2 != 0 && myrank > 0)


{
// Send/recv left neighbor
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank, MPI_COMM_WORLD, &status);
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1, MPI_COMM_WORLD);
}
8
Nearest Neighbor Pseudocode
if (myrank % 2 != 0 && myrank < P-1)
{
// Send/recv right neighbor from odd ranks
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank+1, myrank+1, MPI_COMM_WORLD);
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank+1, myrank, MPI_COMM_WORLD, &status);
}

else if (myrank % 2 == 0 && myrank > 0)


{
// Send/recv left neighbor
MPI_Recv (recvbuf, myArraySize, MPI_DOUBLE, myrank-1, myrank, MPI_COMM_WORLD, &status);
MPI_Send (data, myArraySize, MPI_DOUBLE, myrank-1, myrank-1, MPI_COMM_WORLD);
}
9
Same Host (Option 1 vs. 2)
for i in `seq 1 5` ; do mpirun -np 4 ./nn-1 1000000 ; done
0.006751
0.006896
0.006518
0.006310
0.006356

for i in `seq 1 5` ; do mpirun -np 4 ./nn-2 1000000 ; done


0.006183
0.017730
0.006718
0.006862
10
0.006701
Two Hosts (Option 1 vs. 2)
for i in `seq 1 5` ; do mpirun -np 4 -hosts csews1,csews10 ./nn-1 1000000 ; done
0.450281
0.426031
0.419316
0.445110
0.416786

or i in `seq 1 5` ; do mpirun -np 4 -hosts csews1,csews10 ./nn-2 1000000 ; done


0.405743
0.423926
0.410813
0.420823
11
0.430066
Timing Option 1 vs. Option 2

12
Timing NN

13
P2P Blocking – Performance Bottleneck

• MPI_Send (buf, count, datatype, dest, tag, comm)


• MPI_Recv (buf, count, datatype, source, tag, comm, status)

0 1
MPI_Send (1)
Safe but may delay sender

MPI_Recv (0)
14
Computation Communication Overlap

0 1

compute
Send compute Time
compute Recv
Wait compute
compute

15
Non-blocking Point-to-Point

• MPI_Isend (buf, count, datatype, dest, tag, comm, request)


• MPI_Irecv (buf, count, datatype, source, tag, comm, request)

• MPI_Wait (request, status)


• MPI_Waitall (count, request, status)

16
Many-to-one Non-blocking P2P

17
Output

18
Non-blocking Performance
• Standard does not require overlapping communication and
computation
• Implementation may use a thread to move data in parallel
• Implementation can delay the initiation of data transfer until “Wait”
• MPI_Test – non-blocking, tests completion, starts progress
• MPIR_CVAR_ASYNC_PROGRESS (MPICH)

19
Asynchronous Communication Progress

20
Non-blocking Point-to-Point Safety
• MPI_Isend (buf, count, datatype, dest, tag, comm, request)
• MPI_Irecv (buf, count, datatype, source, tag, comm, request)
• MPI_Wait (request, status)

0 1
MPI_Isend MPI_Isend Safe
MPI_Recv MPI_Recv

21
Homework: NN 1D using Non-blocking

0 P-1

22
Process Mapping/Allocation

0 1 10 11

0 1 2 3
4 7

8 11

23
Attributes of Interconnects

• Topology
• Diameter
• Cost
• Anything else?

24

You might also like