Stars
Related code, checkpoints and project page for V-Reflection
Official implementation of Seeing with You: Perception-Reasoning Co-evolution for Multimodal Reasoning.
A Guided Reinforcement Learning framework enhancing MLLM reasoning via process-level verification and collaborative rollout strategies.
[CVPR 2026 Highlight] Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
An implementation of the hallucination mitigation method "REVIS" introduced in "Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models"..
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
Do LLMs and VLMs Share Neurons for Inference? Evidence and Mechanisms of Cross-Modal Transfer
Shaping capabilities with token-level pretraining data filtering
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
Codes and Data for ICLR 2026 paper "LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision"
[ICLR 2026] "VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use"
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
[ACML 2025] Conformal Abstention for LLMs and VLMs
Conformal prediction for controlling monotonic risk functions. Simple accompanying PyTorch code for conformal risk control in computer vision and natural language processing.
A scikit-learn-compatible library for estimating prediction intervals and controlling risks, based on conformal predictions.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation