Highlights
- Pro
Stars
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion
Official implementation of Déjà View: Looping Transformers for Multi-View 3D Reconstruction
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
GLUEMAP: Global Structure-from-Motion Meets Feedforward Reconstruction
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis (ECCV 2024 Oral) - Official Implementation
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
[CVPR 2022] Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
Perception toolkit for sim2real training and validation in Unity
1K resolution vision transformers pretrained on 1B human images.
UnrealZoo / unrealzoo-gym
Forked from zfw1226/gym-unrealcv[ICCV 2025 Highlights] Large-scale photo-realistic virtual worlds for embodied AI
Generate images of code and terminal output 📸
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
A ~9M parameter LLM that talks like a small fish.
A tutorial and a set of tools to compute depth-from-stereo with Project Aria Gen2 devices. This includes stereo image rectification as well as disparity estimation
Reimplementation of LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
Masked Depth Modeling for Spatial Perception
Official code for Zero-Shot Depth from Defocus (https://arxiv.org/abs/2603.26658)
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2025] Sekai: A Video Dataset towards World Exploration
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
[ECCV 2026] WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching