nopeiyu

nopeiyu

Starred repositories

3 stars written in Cuda

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 258 Updated Oct 28, 2025

olcf / cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda 886 321 Updated Aug 19, 2024

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 758 65 Updated Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nopeiyu

Block or report nopeiyu

Starred repositories

thu-ml / SageAttention

olcf / cuda-training-series

thu-ml / SpargeAttn

Starred topics

text-to-speech