thuwzt

Ziteng Wang thuwzt

PhD student in TSAIL Group, Department of Computer Science and Technology, Tsinghua University

17 followers · 2 following

@thu-ml
Beijing China
@thuwzt

Organizations

Stars

attention-survey / Efficient_Attention_Survey

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

187 4 Updated Aug 26, 2025

qiuzh20 / gated_attention

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 86 5 Updated Sep 19, 2025

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 735 60 Updated Sep 27, 2025

thu-ml / TetraJet-MXFP4Training

Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training

Python 30 3 Updated Jun 20, 2025

dair-ai / ml-visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

15,828 1,488 Updated Feb 13, 2023

thu-ml / Adaptive-Sparse-Trainer

Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)

Python 14 1 Updated Jul 1, 2025

thu-ml / ReMoE

[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

Python 96 5 Updated Dec 20, 2024

google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,424 373 Updated Jun 2, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,496 238 Updated Oct 8, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 15,834 3,126 Updated Oct 9, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 13,775 3,145 Updated Oct 9, 2025

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 243 21 Updated Oct 3, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,167 2,289 Updated Oct 9, 2025

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

1,220 97 Updated Jun 23, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,766 516 Updated Oct 9, 2025