Lists (1)
Sort Name ascending (A-Z)
Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Describe Anything, Anywhere, at Any Moment (DAAAM), a novel approach to real-time, large-scale, spatio-temporal memory
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
Official implementation of "Emergent Outlier View Rejection in Visual Geometry Grounded Transformers"
Official repository of "Multi-view Pyramid Transformer: Look Coarser to See Broader"
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A simple state update rule to enhance length generalization for CUT3R
[CVPR 2025 Oral & Best Paper Finalist] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
Official implementation of "Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation" (NeurIPS'25 Oral)
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Official repo and evaluation implementation of VSI-Bench
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
Code for ICCV'2025 (Best student paper honorable mention) "RayZer: A Self-supervised Large View Synthesis Model"
[Neurips DB 2025] PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
[NeurIPS 2025] Pixel-Perfect Depth
[NeurIPS 2024] AV-Cloud: Spatial Audio Rendering Through Audio-Visual Cloud Splatting
Official implementation of DepthLM
[CVPR 2025] Towards In-the-wild 3D Plane Reconstruction from a Single Image
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations