Stars
2
stars
written in Cuda
Clear filter
This project optimizes multi-GPU parallelism for machine learning training by accelerating multi-GPU using fused gradient buffers, NCCL AllReduce, and CUDA C kernel-level optimizations including me…