You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository based by Mellanox/gpu_direct_rdma_access. Some errors in the code have been modified, some methods have been optimized, and some features have been added
The training monitor at the framework layer builds the training stream through eBPF and CUDA kernel injection. Realize slow node detection and silent anomaly diagnosis.