- Hong Kong
-
01:07
(UTC +08:00) - http://fuxiao0719.github.io/
- @lemonaddie0909
Stars
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Code for "FlashWorld: High-quality 3D Scene Generation within Seconds"
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
ViPE: Video Pose Engine for Geometric 3D Perception
[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official Code of "VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning"
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
LongLive: Real-time Interactive Long Video Generation
Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.
Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Wan: Open and Advanced Large-Scale Video Generative Models
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
verl: Volcano Engine Reinforcement Learning for LLMs
[TPAMI'25] PanopticNeRF-360 | [3DV'22] Panoptic NeRF (3D-to-2D Label Transfer in Urban Scenes)
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Official repository for the paper "Orientation Matters: Making 3D Generative Models Orientation-Aligned" (NeurIPS 2025)
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
[CVPR 2025 Highlight] Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
[NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory
[ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control