Skip to content
View JianqiangWan's full-sized avatar

Block or report JianqiangWan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 23 2 Updated Mar 21, 2025

Ongoing research training transformer models at scale

Python 16,344 3,955 Updated May 15, 2026

Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.

Python 16,331 1,604 Updated Mar 4, 2026

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,228 1,981 Updated Jan 9, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 19,174 1,760 Updated Jan 30, 2026

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 2,155 158 Updated Apr 14, 2025

Convert Word documents (.docx files) to HTML

JavaScript 6,197 662 Updated Mar 13, 2026

Convert Word documents (.docx files) to HTML

Python 1,086 146 Updated Mar 13, 2026

Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)

Python 1,951 140 Updated Jan 24, 2026

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Python 602 67 Updated Aug 3, 2025

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

Python 35 3 Updated Aug 12, 2024

[IEEE TPAMI] Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

Python 357 20 Updated May 30, 2025

ModelScope: bring the notion of Model-as-a-Service to life.

Python 8,933 942 Updated May 15, 2026

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 956 55 Updated Aug 5, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 77,884 10,440 Updated May 14, 2026

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,572 788 Updated May 31, 2024

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Python 4,852 302 Updated Mar 7, 2025

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,543 190 Updated Apr 2, 2025

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,864 3,427 Updated May 18, 2024

An open-source framework for training large multimodal models.

Python 4,098 319 Updated Aug 31, 2024

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 17,573 1,592 Updated Sep 5, 2024

Painter & SegGPT Series: Vision Foundation Models from BAAI

Python 2,587 180 Updated Dec 6, 2024

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 54,155 6,337 Updated Sep 18, 2024

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,108 5,165 Updated Jun 27, 2024

Official repo for MM-REACT

Python 967 68 Updated Jan 31, 2024

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 160,641 33,219 Updated May 15, 2026

The agent engineering platform.

Python 136,794 22,622 Updated May 14, 2026

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including…

82 4 Updated Feb 8, 2023
Next