-
20:17
(UTC +08:00)
Lists (2)
Sort Name ascending (A-Z)
Stars
Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation"
[NeurIPS 2025 D&B🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
VideoCoF: Unified Video Editing with Temporal Reasoner
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
[NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"
[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
"MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation"
A curated list of recent diffusion models for video generation, editing, and various other applications.
[ICLR 2026] LongLive: Real-time Interactive Long Video Generation
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Kandinsky 5.0: A family of diffusion models for Video & Image generation
HunyuanVideo-1.5: A leading lightweight video generation model
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
我的 nano-banana 创意玩法大合集! 持续更新中!
Official Repo for Self-Forcing++ High Quality Long Video Generation
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
[Unofficial] RF Inversion implemented for SD3 / SD3.5
Reference PyTorch implementation and models for DINOv3
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
A unified inference and post-training framework for accelerated video generation.
4-steps distilled version of Wan2.2-TI2V-5B