BioML-bench (v0.1-alpha)

Note: This is a pre-release version of BioML-bench. Expect bugs and incomplete features.

A benchmark suite for evaluating LLM agents on biomedical machine learning tasks.

📄 Paper: BioML-bench: Evaluation of AI Agents for End-to-End Biomedical ML

BioML-bench is built on top of MLE-bench and provides a comprehensive framework for benchmarking LLM agents on biomedical machine learning tasks including protein engineering, drug discovery, single cell omics, medical imaging, and clinical biomarkers.

Agents autonomously read task descriptions, analyze biomedical data, design appropriate ML approaches, and implement complete solutions from scratch.

🧬 Key Features

🔬 Diverse Biomedical Tasks: Protein engineering, drug discovery, single cell omics, medical imaging, clinical biomarkers
🤖 Agent-Agnostic Evaluation: Any LLM agent that can read task descriptions and produce file/folder submissions can be evaluated
👨‍⚕️ Human Baselines: Built-in human performance benchmarks for comparison
🔧 Extensible Framework: Easy to add new biomedical tasks
📚 Biomedical Libraries: Pre-installed RDKit, BioPython, and other domain-specific tools for use by agents

🚀 Quick Start

Prerequisites

Python 3.11+
Docker - For containerized agent execution
uv - Python package manager (installation guide)

Installation

# Clone the repository
git clone https://github.com/science-machine/biomlbench.git
cd biomlbench

# Install with uv (recommended)
uv sync

# Activate the environment
source .venv/bin/activate  # Linux/macOS
# or .venv\Scripts\activate  # Windows

Basic Usage

# Pull prebuilt agent images (recommended - saves build time)
./scripts/pull_prebuilt_images.sh

# 1. Prepare a task dataset
biomlbench prepare -t polarishub/tdcommons-caco2-wang

# 2. Run an agent (example with dummy agent)
biomlbench run-agent --agent dummy --task-id polarishub/tdcommons-caco2-wang

# 3. Grade the results
biomlbench grade --submission <run-group-dir>/submission.jsonl --output-dir results/

NOTE: To run any real LLM agents, you will need to create a .env file at the root of the repository with the relevant API keys listed:

OPENAI_API_KEY=sk-proj-1234567890
ANTHROPIC_API_KEY=sk-proj-1234567890
OPENROUTER_API_KEY=sk-proj-1234567890
GEMINI_API_KEY=sk-proj-1234567890
MEM0_API_KEY=sk-proj-1234567890
...

📚 Documentation

📖 Full Documentation - Complete guides and API reference
⚙️ Installation Guide - Detailed setup instructions
📝 Usage Guide - Comprehensive usage examples
🏗️ API Reference - Complete API documentation
🛠️ Developer Guide - Extending and contributing

🤝 Contributing

We welcome contributions! See our Contributing Guide for details on:

Adding new biomedical tasks
Adding new agents
Extending data sources
Improving documentation
Adding new analyses (e.g., analysis of LLM impact on agent performance)

📄 Citation

If you use BioML-bench in your research, please cite our paper:

@article{biomlbench2025,
  title={BioML-bench: Evaluation of AI Agents for End-to-End Biomedical ML},
  author={[Authors]},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.09.01.673319},
  url={https://www.biorxiv.org/content/10.1101/2025.09.01.673319v2}
}

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
agents		agents
biomlbench		biomlbench
deploy		deploy
docs		docs
environment		environment
experiments		experiments
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BioML-bench (v0.1-alpha)

🧬 Key Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

📚 Documentation

🤝 Contributing

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

science-machine/biomlbench

Folders and files

Latest commit

History

Repository files navigation

BioML-bench (v0.1-alpha)

🧬 Key Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

📚 Documentation

🤝 Contributing

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages