Stars
[ICCV 2025] Code and datasets for "Switch-a-View: Few-Shot View Selection Learned from Unlabeled In-the-wild Videos"
Handwritten Digit Recognition Using Convolutional Neural Network by Python
[NeurIPS 2025] Official Implementation of DINO-Foresight: Looking into the Future with DINO
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[CVPR 2025 Highlight] Code and datasets for "Which Viewpoint Shows it Best?Language for Weakly SupervisingView Selection in Multi-view Instructional Videos"
VideoRAG: Retrieval-Augmented Generation over Video Corpus
[CVPR2025] EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
[CVPR'25] How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
[NeurIPS'25] EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.
Official implementation of EgoHOD at ICLR 2025; 14 EgoVis Challenge Winners in CVPR 2024
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
Official implementation of Continuous 3D Perception Model with Persistent State
Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025
Official repository for our work on micro-budget training of large-scale diffusion models.
The best OSS video generation models, created by Genmo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A concise but complete full-attention transformer with a set of promising experimental features from various papers
High-resolution models for human tasks.
[ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
CVPR and NeurIPS poster examples and templates
Code for the paper "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos" published at CVPR 2024
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
The Most Faithful Implementation of Segment Anything (SAM) in 3D