Stars
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Collect some World Models for Autonomous Driving (and Robotic) papers.
This is the repo of NeurIPS 2022 paper: "Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning"
[IEEE CAL 2025] Accelerating Page Migrations in Operating Systems with Intel DSA
[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis
Official implementation of DiffuseSlide
UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks. Matches any pair of images. (NeurIPS 2025)
Official Implementation of Puzzles: Unbounded Video-Depth Augmentation for Scalable, End-to-End 3D Reconstruction.
PreciseCam: Precise Camera Control for Text-to-Image Generation
Robot kinematics implemented in pytorch
Generate 3D objects conditioned on text or images
A final sanity checklist to help your CS paper get accepted, not desk rejected.
[TMLR 2025] Monocular Dynamic Gaussian Splatting: Fast, Brittle, and Scene Complexity Rules
[SIGGRAPH 2025] LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation"
<Foundations of Computer Vision> Book
100 Days of GPU Challenge
[CVPR'25] DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Total Selfie: Generating Full-Body Selfies, CVPR 2024 (Highlight)
Code for RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion [3DV 2025]
[CVPR 25] Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Code release for CVPR'24 submission 'OmniGlue'
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes (ICRA 2025)