- Beijing China
- @thuwzt
Stars
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Ongoing research training transformer models at scale
Triton-based implementation of Sparse Mixture of Experts.
Development repository for the Triton language and compiler
[TMLR 2024] Efficient Large Language Models: A Survey
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Official code for "Efficient Backpropagation with Variance Controlled Adaptive Sampling" (ICLR 2024)
Fast and memory-efficient exact attention
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The JavaScript library that provides a program-friendly interface to Tsinghua web portal
清华大学计算机系课程攻略 Guidance for courses in Department of Computer Science and Technology, Tsinghua University