Stars
Mizhi - A lightweight vehicle detection and viewpoint classification model optimized for mobile browsers
[CVPR'25] DepthSplat: Connecting Gaussian Splatting and Depth
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
A repo to train your own small sarvam-30b model
Hypernetworks that update LLMs to remember factual information
official code for the paper "GenSeg-R1: RL-Driven Vision–Language Grounding for Fine-Grained Referring Segmentation"
🚀 Daily AI Research Digest: Tracking breakthroughs in AI/NLP/CV/Robotics with dynamic updates and paper navigation.
Masked Depth Modeling for Spatial Perception
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Interactive visualizations of the geometric intuition behind diffusion models.
Full course on becoming AI researcher from scratch
A Mixture Density RNN for generating musical touchscreen interactions.
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
[NeurIPS 2025 Spotlight] "SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation."
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[TPAMI 2025] Towards Visual Grounding: A Survey
Morphology-adaptive muscle-driven locomotion via attention mechanisms
Solve Visual Understanding with Reinforced VLMs