-
Shanghai Jiao Tong University
- https://scholar.google.com.hk/citations?user=_kAniL4AAAAJ&hl=zh-CN
Stars
[ACL 2024] Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
Benchmarking Generalized Out-of-Distribution Detection
The official implementation of Delta Energy: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization (NeurIPS2025)
The official implementation of InfoBound: A Provable Information-Bounds Inspired Framework for Both OoD Generalization and OoD Detection (T-PAMI 2025)
[ICLR2025] The official implementation of Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
(CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning
[NeurIPS2023] LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning
This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attribution and Intervention, ICLR 2025".
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling (REVERSE)"
Reference PyTorch implementation and models for DINOv3
The example of correspondence between fine classes and superclasses (coarse classes) in ImageNet.
Adaptation of vision-language models (CLIP) to downstream tasks using local and global prompts.
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
Data release for the ImageInWords (IIW) paper.
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
[NeurIPS 2024] Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models
[ICLR 2024 Spotlight] "Negative Label Guided OOD Detection with Pretrained Vision-Language Models"
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
Solve Visual Understanding with Reinforced VLMs
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.
[CVPR2025] The implementation of the paper "OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary".
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning