Stars
HunyuanVideo-1.5: A leading lightweight video generation model
[Tutorial] Few-Step Distillation for Text-to-Image Generation: A Practical Guide
Lightweight Image Video Action Generation Inference Framework
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
High performance inference engine for diffusion models
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Wan: Open and Advanced Large-Scale Video Generative Models
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICCV 2025][Few-Step Student Surpasses Teacher Diffusion] Learning Few-Step Diffusion Models by Trajectory Distribution Matching
A unified inference and post-training framework for accelerated video generation.
A parallelism VAE avoids OOM for high resolution image generation
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
PKU-DAIR / Hetu
Forked from Hsword/HetuA high-performance distributed deep learning system targeting large-scale and automated distributed training.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Fast and memory-efficient exact attention
This project is an accelerated dataset for PyTorch. The dataset stores compressed batched data that is cropped while loading. By this, I/O operations and network traffic are reduced. Since the crop…
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-…
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.