Tags: pmodels/mpich
Tags
TEMP: aggregated temp commits from aurora_test REVERT before update. temp: use yaksa engine in posix_rma for large messages temp: downgrade MPI_VERSION and MPICH to 4 Suspect mpi4py having trouble handling the version change. temp: add MPIR_CVAR_CH4_PROGRESS_THROTTLE etc Add MPIR_CVAR_CH4_PROGRESS_THROTTLE. The default is 0. Set to 1 if experience significant collective slowdowns at high PPN. Add MPIR_CVAR_CH4_PROGRESS_THROTTLE_NO_PROGRESS_COUNT. The default is 4096. Tune to minimize the side effects of turning on MPIR_CVAR_CH4_PROGRESS_THROTTLE. temp: update alltoallv_intra_pairwise_sendrecv_replace The naive linear pairing will hold the large ranks until lower ranks get them. Rank N-1 will blocked at first exchange until Rank 0 near finish. Slightly improve the algorithm, esp. for the high PPN case, do pair-wise exhcanges within each node first. Then finish the rest naive pairing over internode. Also, the double loop then selecting rank seem to be a silly way of a single loop. TODO: fix the naive pairing order. coll: perfect pairing by flipping bits in alltoallv In the alltoallv_intra_pairwise_sendrecv_replace algorithm, order the pairing by first exchange with self, then neighbor by flipping bit 0x1, then by flipping bit 0x10, and so on. If each sendrecv takes roughly the same latency in order, it should minimize the delay due to imbalance. Set MPIR_CVAR_ALLTOALLV_PAIRWISE_NEW=1 to select the new algorithm. temp: default MPIR_CVAR_PMI_DISABLE_GROUP to 1 PMIx_Fence_nb is not stable on Aurora. temp: default MPIR_CVAR_GPU_USE_IMMEDIATE_COMMAND_LIST to 1 temp: Replaced default algorithm tuning files with aurora-specific files
PreviousNext