Trending

See what the GitHub community is most excited about this week.

nerfstudio-project / gsplat

CUDA accelerated rasterization of gaussian splatting

Cuda 3,631 546 Built by

54 stars this week

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 596 128 Built by

19 stars this week

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,773 513 Built by

38 stars this week

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,530 929 Built by

25 stars this week

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,268 317 Built by

7 stars this week

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 852 127 Built by

14 stars this week

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 16,945 2,009 Built by

26 stars this week

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,641 3,187 Built by

49 stars this week

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 2,104 409 Built by

9 stars this week

Infatoshi / cuda-course

Cuda 1,437 267 Built by

11 stars this week

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,709 175 Built by

18 stars this week

ROCm / rccl-tests

RCCL Performance Benchmark Tests

Cuda 76 58 Built by

1 star this week

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 724 89 Built by

6 stars this week

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,410 224 Built by

25 stars this week

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 721 57 Built by

7 stars this week

NVIDIA / AMGX

Distributed multigrid linear solver library on GPU

Cuda 596 160 Built by

1 star this week