mlsys

Here are 3 public repositories matching this topic...

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

cuda triton attention vit quantization video-generation mlsys inference-acceleration efficient-attention llm llm-infra video-generate

Updated Jan 17, 2026
Cuda

thu-ml / SpargeAttn

Star

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

attention vit quantization video-generation mlsys inference-acceleration ai-infra vision-transformer sparse-attention llm sageattention

Updated Feb 25, 2026
Cuda

RC4ML / Legion

Star

GPU-initiated Large-scale GNN System [ATC 23]

gnn mlsys

Updated Oct 30, 2024
Cuda

Improve this page

Add a description, image, and links to the mlsys topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mlsys topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly