Stars
pySLAM is a Python-based Visual SLAM pipeline that supports monocular, stereo, and RGB-D cameras. It offers a wide range of modern local and global features, multiple loop-closing strategies, a vol…
Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding
A curated list for awesome discrete diffusion models resources.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Building simple diffusion models for image generation. More so for understanding and learning.
[WACV'25] Temporal Instructional Diagram Grounding in Unconstrained Videos
A collection of my book notes on various subjects, mainly computer science
[ECCV2024] Gated Temporal Action Anticipation for Stochastic Long-Term Anticipation
Code and data release for the paper "Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment" (NeurIPS 2023)
React + Next.js template for research websites (for PhD students, researchers, etc)
Visualizing the learned space-time attention using Attention Rollout
Collection of AWESOME vision-language models for vision tasks
It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced research…
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
Code and models for the ICML 2024 paper "Tell, Don`t Show!: Language Guidance Eases Transfer Across Domains in Images and Videos"
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024