Skip to content
View fuxiao0719's full-sized avatar

Block or report fuxiao0719

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Python 568 31 Updated Oct 2, 2025

Native Multimodal Models are World Learners

Python 1,130 39 Updated Nov 5, 2025

Code for "FlashWorld: High-quality 3D Scene Generation within Seconds"

Python 489 34 Updated Oct 22, 2025

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 493 20 Updated Jan 6, 2025

ViPE: Video Pose Engine for Geometric 3D Perception

Python 1,481 116 Updated Oct 13, 2025

[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.

Python 1,031 104 Updated Oct 24, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,479 40 Updated Oct 15, 2025

Official Code of "VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning"

60 Updated Oct 10, 2025

rCM: SOTA Diffusion Distillation & Few-Step Video Generation

Python 262 13 Updated Nov 5, 2025

LongLive: Real-time Interactive Long Video Generation

Python 790 49 Updated Nov 3, 2025

Cosmos-Transfer2.5, built on top of Cosmos-Predict2.5, produces high-quality world simulations conditioned on multiple spatial control inputs.

Python 154 16 Updated Nov 5, 2025

Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video.

Python 306 25 Updated Nov 5, 2025

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Python 777 64 Updated Oct 30, 2025

[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Python 388 14 Updated Jul 25, 2025

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Python 1,714 176 Updated Oct 4, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 11,394 1,263 Updated Oct 12, 2025

Cameras as Relative Positional Encoding

Python 605 9 Updated Oct 20, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 647 23 Updated Sep 24, 2025

[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI

Python 206 11 Updated Nov 5, 2025

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

730 41 Updated Oct 10, 2025

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

875 25 Updated Aug 26, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,131 2,427 Updated Nov 5, 2025

[TPAMI'25] PanopticNeRF-360 | [3DV'22] Panoptic NeRF (3D-to-2D Label Transfer in Urban Scenes)

219 21 Updated Jun 16, 2025

Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.

Python 655 89 Updated Oct 29, 2025

Official repository for the paper "Orientation Matters: Making 3D Generative Models Orientation-Aligned" (NeurIPS 2025)

Python 99 2 Updated Oct 4, 2025

Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)

Python 2,790 197 Updated Sep 12, 2025

[CVPR 2025 Highlight] Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Python 155 8 Updated Oct 3, 2025

[NeurIPS 2025] WorldMem: Long-term Consistent World Simulation with Memory

Python 266 11 Updated Oct 25, 2025

Open-source unified multimodal model

Python 5,250 454 Updated Oct 27, 2025

[ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control

Python 82 Updated Jul 4, 2025
Next