Kevin Qinghong Lin

Postdoctoral Researcher

Torr Vision Group
University of Oxford

Email: kevin.qh.lin [at] gmail.com

[Scholar] [Github] [HF] [LinkedIn] [Twitter]

Biography

I am a Postdoctoral Researcher in TVG, University of Oxford, working with Prof. Philip Torr.

I obtained my PhD from Show Lab, National University of Singapore in three years, luckily advised by Prof. Mike Shou.

I was fortunate to intern at Tencent / Meta AI / Meta Reality Labs / Microsoft Research.

My research focuses on multimodal agents.

I’m on the job market. Feel free to email me if I might be a good fit.

Blogs

When Vision Meets Code, Feb 2026.
Code offers a new lens on the visual world. This blog discusses (i) Code as Visual Representation; (ii) Video Generation via Programming; (iii) Coder as World Model.

Selected Publications [Google Scholar]

† indicates equal contribution. Denotes student I mentored. ✉ indicates corresponding author.

Egocentric Video-Language Pretraining
Kevin QH. Lin, Alex JP. Wang, M. Soldan, M. Wray, R. Yan, Eric ZC. Xu, D. Gao, R. Tu, W. Zhao, W. Kong, C. Cai, H. Wang, D. Damen, B. Ghanemå, W. Liu, Mike Z. Shou.

NeurIPS 2022 Spotlight (1.7%)
[project] [paper] [EgoVLPv2] [code] [poster] [twitter] [media]
EgoVis Distinguished Paper Award.
PREMIA Best Student Paper Award, Gold Award.
Double champions in Ego4D & Epic-Kitchens CVPR 2022 challenges.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin QH. Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan WX. Lei, Lijuan Wang, Mike Z. Shou.

CVPR 2025
[paper] [code] [huggingface] [dataset] [demo] [twitter]
#1 Huggingface daily paper.
Oral talk and Outstanding Paper Award, NeurIPS Open-World Agents Workshop 2024.
The model has been downloaded for over 240,000 times. 1.6K github stars.

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin QH. Lin, Mike Z. Shou.

CVPR 2025
[paper] [code] [twitter]
580+ github stars.

UniVTG: Towards Unified Video-Language Temporal Grounding
Kevin QH. Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex JP. Wang, Rui Yan, Mike Z. Shou.

ICCV 2023
[paper] [code] [demo] [twitter]
370+ github stars.

Learning Video Context as Interleaved Multimodal Sequences
Kevin QH. Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Z. Shou.

ECCV 2024
[paper] [code]

VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin QH. Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Z. Shou.

NeurIPS 2025 Spotlight
[project] [paper] [code] [twitter]

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Ye Liu†, Kevin QH. Lin†, Chang Wen Chen, Mike Z. Shou.

ICLR 2026
NeurIPS LAW workshop, 2025. Spotlight
[project] [paper] [code] [dataset] [demo] [twitter]

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
Jiaqi Wang†, Kevin QH. Lin†, James Cheng, Mike Z. Shou.

NeurIPS 2025
[paper] [code] [huggingface] [twitter]

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Wei Pang†, Kevin QH. Lin†, Xiangru Jian†, Xi He, Philip Torr.

NeurIPS 2025
[project] [paper] [code] [datasets] [demo] [poster] [twitter]
3K github stars. 1.2K twitter likes.
Oral talk at ICML 2025 MAS workshop

Paper2Video: Automatic Video Generation from Scientific Papers
Zeyu Zhu†, Kevin QH. Lin†, Mike Z. Shou.

Preprint 2025
[project] [paper] [code] [dataset] [twitter]
#2 Huggingface daily paper.
1.9K github stars. 1M+ twitter views. Highlighted by YC Hacker News

Code2Video: A Code-centric Paradigm for Educational Video Generation
Yanzhe Chen†, Kevin QH. Lin†, Mike Z. Shou.

Preprint 2025
[project] [paper] [code] [dataset] [twitter]
1.4K github stars.

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie†, Weijia Mao†, Zechen Bai†, David JH. Zhang†, Weihao Wang, Kevin QH. Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Z. Shou.

ICLR 2025
[project] [paper] [code] [huggingface] [demo] [twitter]
1.8K github stars.
Most Influential ICLR Papers #4

Honors

Tinker Research Grant, Thinking Machines Lab

2025
DAAD AINeT Fellowship

2025
CVPR Doctoral Consortium

2025
Outstanding Paper Award, NeurIPS Open-World Agents

2024
NeurIPS Top Reviewers

2024
Best Demo Paper Award, ACM Multimedia HCMA

2024
Egocentric Vision (EgoVis) Distinguished Paper Award

2024
CVPR Outstanding Reviewers (Top 2%)

2024
PREMIA Best Student Paper Awards, Gold Award

2023
NeurIPS Scholar Award

2022
Tencent Rhino-Bird Research Scholarship, Second Prize

2022
1st Place on Ego4D - Object State Change Classiﬁcation Challenge, CVPR

2022
1st Place on EPIC-Kitchens - Multi-Instance Retrieval Challenge, CVPR
2022
Show Lab Annual Award

2022, 2024
China National Scholarship

2018, 2021

Service

Area Chair: NeurIPS 2025.
Workshop Organizer: Open Multimodal Gathering @ NUS; Multimodal Video Agent @ CVPR 25.
Conference Reviewer: CVPR (2024 Outstanding Reviewers), ICCV, ECCV, NeurIPS (2024 Top Reviewers), ICML, ICLR, etc.
Journal Reviewer: TPAMI, IJCV, TMLR, TNNLS, TMM, etc.
Guest Lecture: Multimodal Agent @ NUS CS6212; Multimodal Agent @ NUS EE6934
Teaching Assistant: EE6934, EE6733, EE4212
Co-organizer of The AI Talks.