-
Shanghai Jiao Tong University
- https://scholar.google.com.hk/citations?user=_kAniL4AAAAJ&hl=zh-CN
Stars
Stable Diffusion web UI
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Solve Visual Understanding with Reinforced VLMs
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Benchmarking Generalized Out-of-Distribution Detection
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"
[ICLR 2026] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
StyleShot: A SnapShot on Any Style. 一款可以迁移任意风格到任意内容的模型,无需针对图片微调,即能生成高质量的个性风格化图片!
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
[ICLR 2025] HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
[AAAI 2025] Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
[ICLR2025] The official implementation of Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
[NeurIPS2023] LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning
[ICLR 2024] Test-Time RL with CLIP Feedback for Vision-Language Models.