Skip to content
View jason-huang03's full-sized avatar
  • Tsinghua University
  • Beijing, China

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jason-huang03/README.md

Hi there 👋

Pinned Loading

  1. thu-ml/SageAttention thu-ml/SageAttention Public

    [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

    Cuda 2.5k 238

  2. thu-ml/SpargeAttn thu-ml/SpargeAttn Public

    [ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

    Cuda 735 60

  3. SPH_Project SPH_Project Public

    SPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.

    Python 189 16

  4. mit-han-lab/llm-awq mit-han-lab/llm-awq Public

    [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

    Python 3.3k 274

  5. thu-nics/MoA thu-nics/MoA Public

    [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

    Python 146 8

  6. mit-han-lab/omniserve mit-han-lab/omniserve Public

    [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

    C++ 763 52