-
Alibaba Cloud
- Hangzhou, China
- https://gujiaqivadin.github.io/
- https://orcid.org/0000-0002-4644-6046
Starred repositories
MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training
Memory Sparse Attention - A scalable, end-to-end trainable latent-memory framework for 100M-token contexts.
The largest open-source medical AI skills library for OpenClaw🦞.
论文X光机 — Claude Code Skill,解构学术论文,提炼餐巾纸公式
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
SGLang is a high-performance serving framework for large language models and multimodal models.
[NeurIPS 2025]⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.
A high-throughput and memory-efficient inference and serving engine for LLMs
An Arena-style Automated Evaluation Benchmark for Detailed Captioning
The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.
A Comprehensive Survey on Continual Learning in Generative Models.
The open-source CapCut alternative
Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Geologic models from Llama 4 language model + GemPy!
Ola: Pushing the Frontiers of Omni-Modal Language Model
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
[CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Collections of Papers and Projects for Multimodal Reasoning.
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
Explore the Multimodal “Aha Moment” on 2B Model
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’