Skip to content
View xmu-xiaoma666's full-sized avatar

Block or report xmu-xiaoma666

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 111,116 18,522 Updated Apr 8, 2026
C++ 21 Updated Aug 30, 2025

Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets

Jupyter Notebook 289 30 Updated Oct 26, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,729 15,336 Updated Apr 8, 2026

Grounded Language-Image Pre-training

Python 2,581 216 Updated Jan 24, 2024

Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.

Python 250 12 Updated Aug 12, 2025

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Python 212 9 Updated Oct 15, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 8,189 737 Updated Mar 24, 2026

The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.

Dockerfile 285 8 Updated Sep 26, 2025

A Scientific Multimodal Foundation Model

765 40 Updated Mar 27, 2026

Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"

128 2 Updated Oct 2, 2025

[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.

Jupyter Notebook 93 6 Updated Feb 12, 2026

OpenMMLab Detection Toolbox and Benchmark

Python 32,586 9,850 Updated Aug 21, 2024

[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning

Python 291 12 Updated Jul 15, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,802 207 Updated Mar 25, 2026

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Python 7,699 1,451 Updated Jan 4, 2026

Evaluation code for Ref-L4, a new REC benchmark in the LMM era

Python 59 1 Updated Dec 28, 2024

VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning

Python 332 15 Updated Feb 9, 2026

Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.

Python 310 37 Updated Jun 25, 2025

[TPAMI 2025] Towards Visual Grounding: A Survey

Shell 302 26 Updated Nov 18, 2025

The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.

489 25 Updated Aug 20, 2025

All-in-One Development Tool based on PaddlePaddle

Python 6,094 1,178 Updated Apr 1, 2026
Python 1,187 73 Updated Nov 20, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,568 65 Updated Jun 14, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,929 378 Updated Mar 12, 2026

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,824 367 Updated Apr 6, 2026

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,175 77 Updated Jul 15, 2025

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 6,187 582 Updated Apr 7, 2026
Next