-
Xiamen University
- Xiamen, China
- https://xmu-xiaoma666.github.io/
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
A high-throughput and memory-efficient inference and serving engine for LLMs
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Multilingual Document Layout Parsing in a Single Vision-Language Model
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
[IGARSS 2025 Oral] A Simple Aerial Detection Baseline of Multimodal Language Models.
OpenMMLab Detection Toolbox and Benchmark
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
[TPAMI 2025] Towards Visual Grounding: A Survey
The official repository of the dots.llm1 base and instruct models proposed by rednote-hilab.
All-in-One Development Tool based on PaddlePaddle
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Solve Visual Understanding with Reinforced VLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈