zihaomu

🎯

Focusing

ZihaoMu zihaomu

🎯

Focusing

Building efficient AI systems.

68 followers · 52 following

AMD
Shenzhen
16:13 (UTC +08:00)

Achievements

Organizations

Lists (5)

Sort

Stars

11 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 29,765 3,565 Updated Jun 26, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,827 1,093 Updated Apr 20, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,588 1,217 Updated Apr 29, 2026

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,335 402 Updated Jan 17, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,955 272 Updated Apr 22, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,234 201 Updated Apr 30, 2026

graphdeco-inria / diff-gaussian-rasterization

Cuda 1,439 454 Updated Oct 21, 2024

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 991 91 Updated Feb 25, 2026

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 543 91 Updated Sep 8, 2024

xlite-dev / ffpa-attn

FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.

Cuda 276 16 Updated Apr 30, 2026

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 269 25 Updated Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZihaoMu zihaomu

Achievements

Achievements

Organizations

Block or report zihaomu

Lists (5)

creative

productive

tool

Top1

tutorial

Stars

karpathy / llm.c

xlite-dev / LeetCUDA

deepseek-ai / DeepEP

thu-ml / SageAttention

BBuf / how-to-optim-algorithm-in-cuda

mirage-project / mirage

graphdeco-inria / diff-gaussian-rasterization

thu-ml / SpargeAttn

Bruce-Lee-LY / cuda_hgemm

xlite-dev / ffpa-attn

leimao / CUDA-GEMM-Optimization