#
nccl
Here are 6 public repositories matching this topic...
CUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.
parallel-computing cuda high-performance-computing cuda-kernels quantization cutlass gemm performance-optimization nccl gpu-programming roofline-model tensor-core llm-inference flash-attention nsight-compute
-
Updated
May 11, 2026 - Cuda
use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall
-
Updated
Mar 1, 2022 - Cuda
Librería de operaciones matemáticas con matrices multi-gpu utilizando Nvidia NCCL.
-
Updated
Sep 9, 2020 - Cuda
EUMaster4HPC student challenge group 7 - EuroHPC Summit 2024 Antwerp
optimization scalability openmp mpi cuda efficiency conjugate-gradient cg parallel-programming nccl meluxina
-
Updated
Apr 14, 2024 - Cuda
Improve this page
Add a description, image, and links to the nccl topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the nccl topic, visit your repo's landing page and select "manage topics."