nccl
Here are 67 public repositories matching this topic...
Experimental Explicit Communications API for Kokkos
-
Updated
Apr 30, 2026 - C++
Real multi-node MPI benchmarks for AI infrastructure teams. By NYDUX — nydux.ai
-
Updated
Apr 22, 2026
Complete setup guide for a 2-node NVIDIA DGX Spark cluster — distributed training, CUDA inference with EXO, NCCL tuning for Grace Blackwell, NVMe-TCP shared storage, and 200 Gb/s direct fabric networking.
-
Updated
Apr 11, 2026 - Python
Mini Distributed Training Framework using NCCL
-
Updated
May 9, 2026
EUMaster4HPC student challenge group 7 - EuroHPC Summit 2024 Antwerp
-
Updated
Apr 14, 2024 - Cuda
practical guide to multi-node NCCL over switched RoCE fabric on NVIDIA GB10 (DGX Spark class) — documenting the gaps in NVIDIA's official playbooks
-
Updated
Apr 20, 2026
Simple quick test to benchmark your pytorch + nccl/ncclx setup
-
Updated
May 3, 2026 - Python
Experiments with low level communication patterns that are useful for distributed training.
-
Updated
Nov 14, 2018 - Python
Single-node data parallelism in Julia with CUDA
-
Updated
Nov 18, 2024 - Julia
A hybrid testbed for evaluating top open-source LLMs (like gpt-oss-20b and Llama 3.3) on local, cloud GPUs, and AWS Inferentia2/Trainium instances, focusing on vLLM optimization, capacity management, kernel bypass, hardware-software co-design, as well as supporting infrastructure such as NCCL, RDMA, NVMeoF.
-
Updated
Apr 21, 2026 - Python
Practical AI homelab setup guides for GB10, Mac Studio Ultra, RoCE/RDMA, MikroTik switching, NCCL, and heterogeneous workload experiments.
-
Updated
May 10, 2026
A practical model (with math + Python) to tell if you’re compute-, memory-, or network-bound—and what to buy next
-
Updated
Sep 4, 2025 - Jupyter Notebook
From-scratch RDMA-based PyTorch backend in C++; trained a char-level GPT on TinyShakespeare via DDP through code I wrote.
-
Updated
May 1, 2026 - C++
Librería de operaciones matemáticas con matrices multi-gpu utilizando Nvidia NCCL.
-
Updated
Sep 9, 2020 - Cuda
NCCL communication benchmarking and topology visualization on multi‑node GPU clusters.
-
Updated
Feb 5, 2026 - Python
Blood Cell Simulation server
-
Updated
Jan 29, 2024 - C++
Improve this page
Add a description, image, and links to the nccl topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the nccl topic, visit your repo's landing page and select "manage topics."