-
Sichuan University
- https://qinyang-cs.github.io
Lists (1)
Sort Name ascending (A-Z)
Stars
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
⭐️ A cross-platform CLI All-in-One assistant tool for Claude Code, Codex & Gemini CLI.
A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io
[CVPR 2026 Highlight] DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
Lightweight, open-source AI agent for your tools, chats, and workflows.
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
[NeurIPS 2025] First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Tongyi Deep Research, the Leading Open-source Deep Research Agent
The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding
A library for advanced large language model reasoning
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Trustworthy Visual-Textual Retrieval (TIP 2025 Pytorch Code)
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
[CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
[ICML 2025] Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond
GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Task Singular Vectors: Reducing Task Interference in Model Merging. Merge models avoiding task interference through separable models.