Skip to content
View QinYang79's full-sized avatar

Block or report QinYang79

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,483 46 Updated Mar 9, 2026

⭐️ A cross-platform CLI All-in-One assistant tool for Claude Code, Codex & Gemini CLI.

Rust 3,583 205 Updated Jun 15, 2026

A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io

Rust 102,532 6,784 Updated Jun 16, 2026

[CVPR 2026 Highlight] DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding

Python 15 1 Updated Jun 4, 2026

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…

Python 12,185 1,119 Updated Jun 15, 2026

Lightweight, open-source AI agent for your tools, chats, and workflows.

Python 44,288 7,829 Updated Jun 16, 2026

100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

Python 114,732 17,027 Updated Jun 15, 2026

[NeurIPS 2025] First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training

Python 87 2 Updated Oct 29, 2025
Python 54 6 Updated Oct 10, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 19,417 1,489 Updated Feb 27, 2026

The paper list of the 86-page SCIS cover paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

8,149 493 Updated Sep 12, 2025

Deep Research

Python 303 10 Updated Aug 26, 2025

UI-Venus is a native UI agent designed to perform precise GUI element grounding and effective navigation using only screenshots as input.

Python 1,010 85 Updated May 11, 2026

[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…

Python 1,380 27 Updated Mar 20, 2026

This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!

1,421 62 Updated May 11, 2026

[AAAI 2026] GUI-G²: Gaussian Reward Modeling for GUI Grounding

Python 309 10 Updated Apr 15, 2026

A library for advanced large language model reasoning

Python 2,342 203 Updated Jun 10, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 5,015 373 Updated Apr 6, 2026

Trustworthy Visual-Textual Retrieval (TIP 2025 Pytorch Code)

Python 9 Updated Jan 14, 2026

[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Python 406 25 Updated Oct 7, 2024

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python 411 33 Updated Aug 24, 2024

[CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens

Python 81 11 Updated Oct 9, 2025

[ICML 2025] Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models

Python 13 Updated May 28, 2025

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

1,024 47 Updated Sep 27, 2025

😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond

355 13 Updated Jan 22, 2026

The code will come soon.

Python 11 3 Updated Aug 27, 2025

GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval

6 Updated Jul 14, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,304 1,992 Updated Jan 9, 2026

Task Singular Vectors: Reducing Task Interference in Model Merging. Merge models avoiding task interference through separable models.

Python 55 9 Updated Dec 15, 2025
Next