Lists (11)
Sort Name ascending (A-Z)
Stars
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
Official implement of CIKM2025: 《UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion》
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
Official PyTorch Implementation of Correlation Verification for Image Retrieval, CVPR 2022 (Oral Presentation)
Effortless data labeling with AI support from Segment Anything and other awesome models.
[DEIMv2] Real Time Object Detection Meets DINOv3
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
gguf (GPT-Generated Unified Format) connector
RepVGG: Making VGG-style ConvNets Great Again
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
OCR, layout analysis, reading order, table recognition in 90+ languages
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Some out-of-the-box hooks for pre-commit
A PyTorch-based knowledge distillation toolkit for natural language processing
Text-audio foundation model from Boson AI
verl: Volcano Engine Reinforcement Learning for LLMs