James Burgess

I am a Stanford PhD student working on computer vision and machine learning. I'm fortunate to be advised by Serena Yeung-Levy and to be supported by the Quad Fellowship.

My methods work focuses on vision-language models, agent-based systems, and evaluation. I also develop multimodal large language models for biology research.

Selected Publications

ArtifactLens: Detecting Image Artifacts with a VLM Scaffold

James Burgess, Rameen Abdal, Dan Stoddart, Sergey Tulyakov, Serena Yeung-Levy, Kuan-Chieh Jackson Wang

Preprint (to be released)

coming soon

Detecting artifacts in AI-generated images can improve generators. Older works fine-tunes VLMs, but we find that scaffolding with small datasets can be enough. Key tools: in-context learning, prompt optimization, and multi-component design—we improve each.

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

James Burgess, Jan N. Hansen, Duo Peng, Yuhui Zhang, Alejandro Lozano, Min Woo Sun, Emma Lundberg, Serena Yeung-Levy

Preprint

preprint

LLM agents that search and reason over documents can be trained with reinforcement learning with verifiable rewards (RLVR), but require training environments. We propose a scalable method to generate synthetic QA datasets from scientific papers.

MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research

James Burgess*, Jeffrey J Nirschl*, Laura Bravo-Sánchez*, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D Leonetti, Chad Liu, Emma Lundberg, Serena Yeung-Levy

*co-first authorship

CVPR 2025

project page & blog / arxiv / benchmark / code

MicroVQA is an expert-curated benchmark for research-level reasoning in biological microscopy. We also propose a method called RefineBot for removing language shortcuts from multiple-choice VQA.

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

Alejandro Lozano*, Min Woo Sun*, James Burgess*, Liangyu Chen, Jeffrey J. Nirschl, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Anita Rau, Austin Wolfgana Katzer, Collin Chiu, Xiaohan Wang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy

*co-first authorship

CVPR 2025

project page / arxiv / code / data

The BIOMEDICA dataset has 6 million scientific articles and 24 million image-text pairs for training vision-language models in biomedicine. We use it to train state-of-the-art embedding models for biomedical images.

Video Action Differencing

James Burgess, Xiaohan Wang, Yuhui Zhang, Anita Rau, Alejandro Lozano, Lisa Dunlap, Trevor Darrell, Serena Yeung-Levy

ICLR 2025

project page & blog / paper / benchmark / code

We propose Video Action Differencing (VidDiff), a new task for detecting subtle variations in how actions are performed between two videos. We release a benchmark spaning diverse skilled actions, and a baseline method that is a simple agentic workflow.

Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models

James Burgess, Kuan-Chieh Wang, Serena Yeung-Levy

Outstanding Paper Award at the ECCV Workshop "Emergent Visual Abilities and Limits of Foundation Models"

ECCV 2024

project page / arXiv / code

We show that 2D diffusion models like StableDiffusion have 3D control in their text input space which we call '3D view tokens'.

Orientation-invariant autoencoders learn robust representations for shape profiling of cells and organelles

James Burgess, Jeffrey J. Nirschl, Maria-Clara Zanellati, Alejandro Lozano, Sarah Cohen, Serena Yeung-Levy

Nature Communications 2024

paper / code

Unsupervised shape representations of cells and organelles are erroneously sensitive to image orientation, which we mitigate with equivariant convolutional network encoders in our method, O2VAE.

Global organelle profiling reveals subcellular localization and remodeling at proteome scale

Hein et. al. (including James Burgess)

Cell 2024

bioRxiv / code

A proteomics map of human subcellular architecture, led by the Chan-Zuckerberg Biohub.

Other Publications

Squeezed Diffusion Models
Jyotirmai Singh, Samar Khanna, James Burgess
Preprint
paper / code

The Impact of Image Resolution on Biomedical Multimodal Large Language Models
Liangyu Chen, James Burgess, Jeffrey Nirschl, Orr Zohar, Serena Yeung-Levy
MLHC 2025
paper / code

Can Large Language Models Match the Conclusions of Systematic Reviews?
Christopher Polzak*, Alejandro Lozano*, Min Woo Sun*, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy
Preprint
paper / benchmark / code

CellFlux: Simulating Cellular Morphology Changes via Flow Matching
Yuhui Zhang*, Yuchang Su*, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, Serena Yeung-Levy
ICML 2025
project page / paper / code

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang*, Yuchang Su*, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy
CVPR 2025
project page / paper / benchmark / code

Micro-Bench: A Vision-Language Benchmark for Microscopy Understanding
Alejandro Lozano*, Jeffrey Nirschl*, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
NeurIPS Datasets & Benchmarks 2024
project page / paper / benchmark / code

Teaching

Lecturer and teaching assistant, CS286/BIODS276 Advanced Topics in Computer Vision and Biomedicine, Stanford 2024.

Teaching assistant, CS271/BIODS220, Artificial Intelligence in Healthcare, Stanford 2022.