Skip to content
View fuxiao0719's full-sized avatar

Block or report fuxiao0719

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Python 569 31 Updated Oct 2, 2025

Native Multimodal Models are World Learners

Python 1,175 41 Updated Nov 7, 2025

Code for "FlashWorld: High-quality 3D Scene Generation within Seconds"

Python 506 35 Updated Oct 22, 2025

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 493 20 Updated Jan 6, 2025

ViPE: Video Pose Engine for Geometric 3D Perception

Python 1,487 116 Updated Oct 13, 2025

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

Python 1,032 104 Updated Oct 24, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,495 40 Updated Oct 15, 2025

Official Code of "VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning"

60 Updated Oct 10, 2025

rCM: SOTA Diffusion Distillation & Few-Step Video Generation

Python 266 13 Updated Nov 5, 2025

LongLive: Real-time Interactive Long Video Generation

Python 801 49 Updated Nov 3, 2025

Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.

Python 172 18 Updated Nov 7, 2025

Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

Python 346 27 Updated Nov 7, 2025

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Python 781 65 Updated Nov 7, 2025

[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Python 389 14 Updated Jul 25, 2025

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Python 1,714 176 Updated Oct 4, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 11,477 1,279 Updated Oct 12, 2025

Cameras as Relative Positional Encoding

Python 606 10 Updated Oct 20, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 649 23 Updated Sep 24, 2025

[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI

Python 206 11 Updated Nov 5, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

731 41 Updated Oct 10, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

877 26 Updated Aug 26, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,220 2,445 Updated Nov 7, 2025

[TPAMI'25] PanopticNeRF-360 | [3DV'22] Panoptic NeRF (3D-to-2D Label Transfer in Urban Scenes)

219 21 Updated Jun 16, 2025

Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Python 658 89 Updated Oct 29, 2025

Official repository for the paper "Orientation Matters: Making 3D Generative Models Orientation-Aligned" (NeurIPS 2025)

Python 100 2 Updated Oct 4, 2025

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 2,803 198 Updated Sep 12, 2025

[CVPR 2025 Highlight] Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Python 156 8 Updated Oct 3, 2025

[NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory

Python 267 11 Updated Oct 25, 2025

Open-source unified multimodal model

Python 5,258 455 Updated Oct 27, 2025

[ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control

Python 82 Updated Jul 4, 2025
Next