Skip to content
View thuwzt's full-sized avatar

Organizations

@thu-ml

Block or report thuwzt

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.

Python 57 2 Updated Mar 12, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 313 19 Updated Feb 24, 2026

Official repo for vidar and vidarc: video foundation model for robotics.

Python 42 1 Updated Dec 22, 2025

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,535 265 Updated Jun 17, 2026

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

299 5 Updated Dec 1, 2025

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 963 61 Updated Dec 20, 2025

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 1,005 95 Updated Feb 25, 2026

Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training

Python 40 3 Updated May 4, 2026

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

17,280 1,561 Updated Feb 13, 2023

Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)

Python 19 2 Updated Jul 1, 2025

[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

Python 116 11 Updated Dec 20, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,911 403 Updated Mar 27, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,427 434 Updated Jan 17, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,415 3,439 Updated Jun 19, 2026

Ongoing research training transformer models at scale

Python 16,755 4,100 Updated Jun 19, 2026

Triton-based implementation of Sparse Mixture of Experts.

Python 278 29 Updated Oct 3, 2025

Development repository for the Triton language and compiler

MLIR 19,474 2,947 Updated Jun 19, 2026

[TMLR 2024] Efficient Large Language Models: A Survey

1,259 98 Updated Jun 23, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,397 752 Updated Jun 17, 2026

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,541 4,863 Updated Jun 18, 2026

Official code for "Efficient Backpropagation with Variance Controlled Adaptive Sampling" (ICLR 2024)

Python 8 2 Updated Mar 8, 2024

Fast and memory-efficient exact attention

Python 24,188 2,844 Updated Jun 19, 2026

Low-bit optimizers for PyTorch

Python 139 9 Updated Oct 9, 2023

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 44,121 5,383 Updated May 22, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 59,871 10,328 Updated Nov 12, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,881 28,057 Updated Jun 19, 2026

LaTeX Thesis Template for Tsinghua University

TeX 5,392 1,163 Updated Jun 17, 2026

The JavaScript library that provides a program-friendly interface to Tsinghua web portal

TypeScript 29 5 Updated Sep 24, 2023

清华大学计算机系课程攻略 Guidance for courses in Department of Computer Science and Technology, Tsinghua University

HTML 37,187 7,838 Updated Jun 18, 2026