Stars
[CVPR 2026] This repository is the official implementation of MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation
[ICML2025 Oral] ReferSplat: Referring Segmentation in 3D Gaussian Splatting
A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation
Official repository for the AAAI2026 paper (Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach)
[ICLR 2026] Official implementation of JavisDiT and JavisDiT++ series.
[Lumina具身智能社区] 具身智能技术指南 Embodied-AI-Guide
[CVPR'26] PE3R: Perception-Efficient 3D Reconstruction. Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start exploring your 3D world via text!
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"
Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)
[ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models
【AAAI2025】MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
[NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
TraDiffusion: Trajectory-Based Training-Free Image Generation
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
《动手学大模型Dive into LLMs》系列编程实践教程
PyTorch implementation of the paper `Toward Open-set Human Object Interaction Detection' (AAAI2024)
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"