Stars
Code and implementation guidelines for the paper Counting Anything. Project Page: https://mengqi-lei.github.io/count-anything-projectpage/
[CVPR 2026 Oral] Official implementation for ChordEdit: One-Step Low-Energy Transport for Image Editing
Segment Any Concept via Meta-Reinforcement Learning
[DEIMv2] Real Time Object Detection Meets DINOv3
Gemma open-weight LLM library, from Google DeepMind
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
We propose IAD-R1, a universal post-training framework that enhances Vision-Language Models for industrial anomaly detection through a two-stage training strategy.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
[CVPR 2026] A Closer Look at Cross-Domain Few-Shot Object Detection: Fine-Tuning Matters and Parallel Decoder Helps
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
ADRepository: Real-world anomaly detection datasets, including tabular data (categorical and numerical data), time series data, graph data, image data, and video data.
[AAAI 2026] BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation
Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation
MedSAM3: Delving into Segment Anything with Medical Concepts
Decoupled Memory Selection for Multi-target Video Segmentation of SAM3
首家工业级全流程 AI 影视生产平台。Industry-first professional AI Agent platform for controllable film & video production. From shorts to live-action with Hollywood-standard workflows.
Segment Your Ring (SYR) - Segment Anything model adapted with LoRA to segment rings.
The official implementation of Segment Any 3D GAussians (AAAI-25)
[ICLR 2026] The official implementation of "Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval"
[ICML 2026] Official Repo for Fast-SAM3D: 3Dfy Anything in Images but Faster