[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 257 Updated Oct 28, 2025

Tony-Tan / CUDA_Freshman

Cuda 2,602 495 Updated Jan 16, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,596 235 Updated Oct 30, 2025

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,922 375 Updated Feb 15, 2025

PacktPublishing / Learn-CUDA-Programming

Learn CUDA Programming, published by Packt

Cuda 1,210 262 Updated Dec 30, 2023

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 959 97 Updated Dec 30, 2024

Tongkaio / CUDA_Kernel_Samples

CUDA 算子手撕与面试指南

Cuda 673 75 Updated Aug 23, 2025

clu0 / unet.cu

UNet diffusion model in pure CUDA

Cuda 651 31 Updated Jun 28, 2024

CisMine / Parallel-Computing-Cuda-C

CUDA Learning guide

Cuda 466 51 Updated Jun 20, 2024

Maharshi-Pandya / cudacodes

Learnings and programs related to CUDA

Cuda 422 18 Updated Jun 29, 2025

Cjkkkk / CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda 414 42 Updated Jan 4, 2024

ifromeast / cuda_learning

learning how CUDA works

Cuda 334 44 Updated Mar 3, 2025

microsoft / Swin3D

A shift-window based transformer for 3D sparse tasks

Cuda 263 23 Updated Jun 25, 2023

BlinkDL / RWKV-CUDA

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Cuda 223 34 Updated Dec 14, 2024

wzsh / wmma_tensorcore_sample

Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)

Cuda 144 20 Updated Aug 18, 2020

dendenxu / diff-gaussian-rasterization

Improved 3DGS rasterizer.

Cuda 123 5 Updated Feb 26, 2025

YouHuang67 / mamba-code-explained

Cuda 18 1 Updated Jul 17, 2024

Li Shuai lishuai-97

Lists (17)

3DGS

Attention

CPP

CS

CUDA

Diffusion

Distributed

DL

🔮 Future ideas

LaTeX

LLM

Optimizer

Point Cloud

Scene Generation

Template

Tools

Visualization

Stars