Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[ICRA 2024] Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection by an Event Camera
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Reference PyTorch implementation and models for DINOv3
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
ECCV2022 'DVS-Voltmeter: Stochastic Process-based Event Simulator for Dynamic Vision Sensors'
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
The code for PixelRefer & VideoRefer
🔥🔥First-ever hour scale video understanding models
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
UniMD: Towards Unifying Moment retrieval and temporal action Detection
This is an official implementation of TubeR: Tubelet Transformer for Video Action Detection
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[NeurIPS 2022 Spotlight] VideoMAE for Action Detection
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
BasicTAD: an Astounding RGB-Only Baselinefor Temporal Action Detection
[ICCV 2023] Efficient Video Action Detection with Token Dropout and Context Refinement
[CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
[TIP 2022] End-to-end Temporal Action Detection with Transformer
[CVPR 2022] An Empirical Study of End-to-end Temporal Action Detection
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"