Detecting artifacts in AI-generated images can improve generators. Older works fine-tune VLMs, but we find that scaffolding with small datasets (hundreds) can be enough. Key tools: in-context learning, prompt optimization, and multi-component design.
James Burgess
I am a Stanford PhD student working on machine learning. I'm fortunate to be advised by Serena Yeung-Levy and to be supported by the Quad Fellowship.
I build and evaluate LLMs and VLMs: benchmarks, RL training, agentic systems, and synthetic data. My work covers both general ML methodology and applications to biology and medicine. I graduate in June 2026.
Selected Publications
LLM agents that search and reason over documents can be trained with RLVR, but this requires training environments. We propose a scalable method to generate synthetic QA data from scientific papers.
Benchmark and eval methodology. MicroVQA is a benchmark for PhD-level visual reasoning in biological microscopy, used internally at frontier labs. RefineBot is a method for removing language shortcuts from VQA.
The BIOMEDICA dataset has 6 million scientific articles and 24 million image-text pairs for training vision-language models in biomedicine. We use it to train state-of-the-art embedding models for biomedical images.
We introduce Video Action Differencing (VidDiff), a task for comparing actions between videos via natural language. We release a benchmark covering diverse skilled actions, and a baseline method implemented as a simple agentic workflow.
We show that 2D diffusion models like StableDiffusion have 3D control in their text input space which we call '3D view tokens'. We use them for view-controlled image generation and novel view synthesis from a single image.
Unsupervised shape representations of cells and organelles are erroneously sensitive to image orientation. O2VAE fixes this with an architectural change that guarantees orientation invariance.
A proteomics map of human subcellular architecture, led by the Chan-Zuckerberg Biohub. I contributed embedding models for microscopy image classification.
Other Publications
Rameen Abdal, James Burgess, Sergey Tulyakov, Kuan-Chieh Jackson Wang
CVPR 2026
project page / paper
Christopher Polzak*, Alejandro Lozano*, Min Woo Sun*, James Burgess, Yuhui Zhang, Kevin Wu, Serena Yeung-Levy
ICLR 2026
paper / benchmark / code
Jyotirmai Singh, Samar Khanna, James Burgess
NeurIPSw 2025
paper / code
Min Woo Sun, Alejandro Lozano, Javier Gamazo Tejero, Vishwesh Nath, Xiao Xiao Sun, James Burgess, Yuhui Zhang, Kun Yuan, Robert Tibshirani, Sean Huver, Serena Yeung-Levy
ML4H 2025
paper
Yuhui Zhang, Yuchang Su, Zoe Wefers, Shiye Su, He Li, Tianhong Li, Chenyu Wang, James Burgess, Alejandro Lozano, Linqi Zhou, Daisy Ding, Jeffrey Nirschl, Emma Lundberg, Serena Yeung-Levy
ML4H 2025
paper
Liangyu Chen, James Burgess, Jeffrey Nirschl, Orr Zohar, Serena Yeung-Levy
MLHC 2025
paper / code
Yuhui Zhang*, Yuchang Su*, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, Serena Yeung-Levy
ICML 2025
project page / paper / code
Yuhui Zhang*, Yuchang Su*, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy
CVPR 2025
project page / paper / benchmark / code
Alejandro Lozano*, Jeffrey Nirschl*, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
NeurIPS Datasets & Benchmarks 2024
project page / paper / benchmark / code