[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 956 55 Updated Aug 5, 2025

PaddlePaddle / PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 77,884 10,440 Updated May 14, 2026

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,572 788 Updated May 31, 2024

tyxsspa / AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Python 4,852 302 Updated Mar 7, 2025

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,543 190 Updated Apr 2, 2025

XingangPan / DragGAN

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,864 3,427 Updated May 18, 2024

mlfoundations / open_flamingo

An open-source framework for training large multimodal models.

Python 4,098 319 Updated Aug 31, 2024

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 17,573 1,592 Updated Sep 5, 2024

baaivision / Painter

Painter & SegGPT Series: Vision Foundation Models from BAAI

Python 2,587 180 Updated Dec 6, 2024

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 54,155 6,337 Updated Sep 18, 2024

zai-org / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,108 5,165 Updated Jun 27, 2024

microsoft / MM-REACT

Official repo for MM-REACT

Python 967 68 Updated Jan 31, 2024

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 160,641 33,219 Updated May 15, 2026

langchain-ai / langchain

The agent engineering platform.

Python 136,794 22,622 Updated May 14, 2026

google-research-datasets / vrdu

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including…

82 4 Updated Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JianqiangWan

Achievements

Achievements

Block or report JianqiangWan

Stars

alibaba / OmniRL

NVIDIA / Megatron-LM

deepseek-ai / DeepSeek-R1

QwenLM / Qwen-Agent

QwenLM / Qwen3

QwenLM / Qwen3-VL

infinigence / Infini-Megrez

opendatalab / DocLayout-YOLO

mwilliamson / mammoth.js

mwilliamson / python-mammoth

Yuliang-Liu / Monkey

ZZZHANG-jx / DocRes

ParadoxZW / LLaVA-UHD-Better

ymy-k / Hi-SAM

modelscope / modelscope

mbzuai-oryx / groundingLMM