-
Shanghai Jiao Tong University
- Shanghai
-
03:25
(UTC +08:00) - https://zhitengli.github.io/
Stars
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Kimi K2 is the large language model series developed by Moonshot AI team
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
[ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Wan: Open and Advanced Large-Scale Video Generative Models
[ICCV 2025] QuantCache:Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
PyTorch code for our paper "AdaSVD: Adaptive Singular Value Decomposition for Large Language Models"
[ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A high-throughput and memory-efficient inference and serving engine for LLMs
📚 Collection of awesome generation acceleration resources.
A unified inference and post-training framework for accelerated video generation.
Solve Visual Understanding with Reinforced VLMs
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Official implementation of Half-Quadratic Quantization (HQQ)
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Official Repo for CVPR 2025 paper "OSDFace: One-Step Diffusion Model for Face Restoration"
[TMLR 2024] Efficient Large Language Models: A Survey
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.