Stars
[CVPR 2026] Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
CastleHill: Separable Causal Diffusion / Varitaion Flow Maps for LTX-2 long-form video generation
Official implementation of "OmniForcing: Unleashing Real-time Joint Audio-Visual Generation"[arXiv:2603.11647]. OmniForcing is the first framework to distill bidirectional audio-visual diffusion mo…
🧂 Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
[Tech Report] Alive: A Unified Audio-Video Generation Model
Official inference code for SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Code Implementation of "WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation"
Try X-Dub to sync any character in a video with any audio you like | Official repository for "From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping"
Unified Codebase for Advanced World Models.
Official Pytorch implementation of AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising
Generate high resolution videos with a custom voice and appearance, based on LTX-2/LTX-2.3 + Identity In-Context LoRA
Codebase for PrismMirror: Real-Time Human Frontal View Synthesis from a Single Image
Codebase for Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation
[ICLR 2026] LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
[ICLR 2025] GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
[CVPR26] RemedyGS: Defend 3D Gaussian Splatting Against Computation Cost Attacks
[ICLR'26] code for paper "Token-level Data Selection for Safe LLM Fine-tuning"
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
SoulX-FlashHead: A unified 1.3B-parameter framework designed for high-fidelity, infinite-length, and real-time streaming portrait video generation.
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
🏠 [ECCV 2024] The core gsplat component for GaussianImage
[IEEE TCSVT] Preprocessing Enhanced Image Compression for Machine Vision
🏠[ECCV 2024] GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting