cyx0406

Yuxiang Chen cyx0406

An undergraduate student at Tsinghua University, interested in Efficient Machine Learning

18 followers · 23 following

@thu-ml

Achievements

Stars

6 stars written in Cuda

Clear filter

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,985 778 Updated Dec 8, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,009 217 Updated Dec 9, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,876 289 Updated Dec 11, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,699 244 Updated Dec 6, 2025

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 842 71 Updated Dec 17, 2025

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 135 79 Updated May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yuxiang Chen cyx0406

Achievements

Achievements

Block or report cyx0406

Stars

deepseek-ai / DeepGEMM

HazyResearch / ThunderKittens

thu-ml / SageAttention

BBuf / how-to-optim-algorithm-in-cuda

thu-ml / SpargeAttn

tgale96 / grouped_gemm