Detecting artifacts in AI-generated images can improve generators. Older works fine-tunes VLMs, but we find that scaffolding with small datasets can be enough. Key tools: in-context learning, prompt optimization, and multi-component design—we improve each.
James Burgess
I am a Stanford PhD student working on computer vision and machine learning. I'm fortunate to be advised by Serena Yeung-Levy and to be supported by the Quad Fellowship.
My methods work focuses on vision-language models, agent-based systems, and evaluation. I also develop multimodal large language models for biology research.
Selected Publications
LLM agents that search and reason over documents can be trained with reinforcement learning with verifiable rewards (RLVR), but require training environments. We propose a scalable method to generate synthetic QA datasets from scientific papers.
MicroVQA is an expert-curated benchmark for research-level reasoning in biological microscopy. We also propose a method called RefineBot for removing language shortcuts from multiple-choice VQA.
The BIOMEDICA dataset has 6 million scientific articles and 24 million image-text pairs for training vision-language models in biomedicine. We use it to train state-of-the-art embedding models for biomedical images.
We propose Video Action Differencing (VidDiff), a new task for detecting subtle variations in how actions are performed between two videos. We release a benchmark spaning diverse skilled actions, and a baseline method that is a simple agentic workflow.
We show that 2D diffusion models like StableDiffusion have 3D control in their text input space which we call '3D view tokens'.
Unsupervised shape representations of cells and organelles are erroneously sensitive to image orientation, which we mitigate with equivariant convolutional network encoders in our method, O2VAE.
A proteomics map of human subcellular architecture, led by the Chan-Zuckerberg Biohub.
Other Publications
Jyotirmai Singh, Samar Khanna, James Burgess
Preprint
paper / code
Liangyu Chen, James Burgess, Jeffrey Nirschl, Orr Zohar, Serena Yeung-Levy
MLHC 2025
paper / code
Christopher Polzak*, Alejandro Lozano*, Min Woo Sun*, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy
Preprint
paper / benchmark / code
Yuhui Zhang*, Yuchang Su*, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, Serena Yeung-Levy
ICML 2025
project page / paper / code
Yuhui Zhang*, Yuchang Su*, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy
CVPR 2025
project page / paper / benchmark / code
Alejandro Lozano*, Jeffrey Nirschl*, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
NeurIPS Datasets & Benchmarks 2024
project page / paper / benchmark / code