jason-huang03

Follow

Haofeng Huang jason-huang03

Follow

Undergraduate student from IIIS (Yao Class), Tsinghua University | Training Framework | Kernel | Model Arch & GenAI

193 followers · 35 following

Tsinghua University
Beijing, China
https://jason-huang03.github.io/

Achievements

Achievements

Organizations

jason-huang03/README.md

Hi there 👋

Pinned Loading

thu-ml/SageAttention thu-ml/SageAttention Public

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2.9k 289
thu-ml/SpargeAttn thu-ml/SpargeAttn Public

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 838 71
SPH_Project SPH_Project Public

SPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.

Python 197 16
mit-han-lab/llm-awq mit-han-lab/llm-awq Public

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3.4k 284
thu-nics/MoA thu-nics/MoA Public

[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

Python 154 8
mit-han-lab/omniserve mit-han-lab/omniserve Public

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 792 56