- Boston, USA
- https://udaysankar01.github.io/
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
A curated list of papers & resources on anomaly detection foundation models using large language model, vision-language model, graph foundation model, time series foundation model, etc
PyTorch code and models for VJEPA2 self-supervised learning from video.
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use
A fork to add multimodal model training to open-r1
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Code for Streaming 4D Visual Geometry Transformer
Code for FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
Reference PyTorch implementation and models for DINOv3
Official PyTorch implementation Source code for Weakly Supervised Video Scene Graph Generation via Natural Language Supervision, accepted at ICLR 2025
Official repository for "AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos" (CVPR 2025)
[CVPR 2025 (Highlight)] Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
This is the official implementation of "DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation" (Accepted at CVPR 2025).
A generative world for general-purpose robotics & embodied AI learning.
Code of ฯ^3: Permutation-Equivariant Visual Geometry Learning
This is a repository for listing papers on scene graph generation and application.
[arXiv'25]๐ Unseen 3D Geometry Reasoning from a Single Image.
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
Paper Survey for Transformer-based SLAM
๐ PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map [RSS' 25]
Code for "LiftFeat: 3D Geometry-Aware Local Feature Matching", ICRA2025
[ICRA 2025] Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems
[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.