Stars
Code for the paper "Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations"
A Foundation Model for Generalist Gaming Agents
End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions
Towards Scalable Pre-training of Visual Tokenizers for Generation
A paper list for spatial reasoning
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Atom3d, atomising geometry, is a mesh processing toolbox specifically designed for 3D learning.
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Native and Compact Structured Latents for 3D Generation
Implementation of paper "SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model"
Official Implementation of Particulate: Feed-Forward 3D Object Articulation
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
A Cross-Platform Backend for High-Performance Sparse Convolutions
Official implementation of Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model
Public code for XFactor: Introduces the first geometry-free model to achieve true self-supervised / pose-free Novel View Synthesis (NVS) by learning transferable latent camera pose representations.
Matterport3D is a pretty awesome dataset for RGB-D machine learning tasks :)
Official inference repo for FLUX.2 models