Skip to content
View thuwzt's full-sized avatar

Organizations

@thu-ml

Block or report thuwzt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

187 4 Updated Aug 26, 2025

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 86 5 Updated Sep 19, 2025

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 735 60 Updated Sep 27, 2025

Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training

Python 30 3 Updated Jun 20, 2025

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

15,828 1,488 Updated Feb 13, 2023

Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)

Python 14 1 Updated Jul 1, 2025

[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

Python 96 5 Updated Dec 20, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,424 373 Updated Jun 2, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,496 238 Updated Oct 8, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 15,834 3,126 Updated Oct 9, 2025

Ongoing research training transformer models at scale

Python 13,775 3,145 Updated Oct 9, 2025

Triton-based implementation of Sparse Mixture of Experts.

Python 243 21 Updated Oct 3, 2025

Development repository for the Triton language and compiler

MLIR 17,167 2,289 Updated Oct 9, 2025

[TMLR 2024] Efficient Large Language Models: A Survey

1,220 97 Updated Jun 23, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,766 516 Updated Oct 9, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,346 4,575 Updated Oct 9, 2025

Official code for "Efficient Backpropagation with Variance Controlled Adaptive Sampling" (ICLR 2024)

Python 8 2 Updated Mar 8, 2024

Fast and memory-efficient exact attention

Python 19,840 2,045 Updated Oct 8, 2025

Low-bit optimizers for PyTorch

Python 131 9 Updated Oct 9, 2023

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 42,816 5,252 Updated Aug 23, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 44,911 7,654 Updated Dec 9, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 93,794 25,508 Updated Oct 9, 2025

LaTeX Thesis Template for Tsinghua University

TeX 4,999 1,123 Updated Jul 8, 2025

The JavaScript library that provides a program-friendly interface to Tsinghua web portal

TypeScript 28 5 Updated Sep 24, 2023

清华大学计算机系课程攻略 Guidance for courses in Department of Computer Science and Technology, Tsinghua University

HTML 35,672 7,782 Updated Sep 18, 2025