Stars
[NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.
[TPAMI 2025] Towards Visual Grounding: A Survey
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.