Starred repositories
MOVA: Towards Scalable and Synchronized Video–Audio Generation
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
A unified inference and post-training framework for accelerated video generation.
System which logs and analyzes academic reading habits
contains many of my preliminary research ideas
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
The most powerful local music generation model that outperforms most commercial alternatives
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A template for research projects in computer science/machine learning using python and julia
A curated list of engineering blogs
Wan: Open and Advanced Large-Scale Video Generative Models
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
🚀 Efficient implementations of state-of-the-art linear attention models
SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)
Train transformer language models with reinforcement learning.
Run SD1.x/2.x/3.x, SDXL, and FLUX.1 on your phone device
Run Stable Diffusion on Android Devices with Snapdragon NPU acceleration. Also supports CPU/GPU inference.
Official implementation of Magic Clothing: Controllable Garment-Driven Image Synthesis
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)