The first bioinformatics-native AI agent skill library.
Built on OpenClaw (180k+ GitHub stars). Local-first. Privacy-focused. Reproducible.
A community contributor built a nutrigenomics skill and ran it β from raw genetic data to personalised nutrition report with radar charts, heatmaps, and reproducibility bundle:
https://github.com/ClawBio/ClawBio/releases/download/v0.2.0/david-nutrigx-demo.mp4
What just happened behind the scenes
- The AI agent read
SKILL.mdβ a specification that encodes the correct bioinformatics decisions (40 SNPs, 13 nutrient domains, evidence-based risk thresholds) - It ran the Python skill locally β no genetic data left the machine
- It produced a markdown report with figures, tables, and a reproducibility bundle (
commands.sh,environment.yml,checksums.sha256) - Anyone can re-run the exact same analysis and get identical results, SHA-256 verified
PharmGx Reporter: 12 genes, 51 drugs, under 1 second
Manuel Corpas introduces ClawBio to Peter Steinberger at the UK AI Agent Hack (1 March 2026):
You read a paper. You want to reproduce Figure 3. So you:
- Go to GitHub. Clone the repo.
- Wrong Python version. Fix dependencies.
- Need the reference data β where is it?
- Download 2GB from Zenodo. Link is dead.
- Email the first author. Wait 3 weeks.
- Paths are hardcoded to
/home/jsmith/data/. - Two days later: still broken. You give up.
Now imagine the same paper published a skill:
python ancestry_pca.py --demo --output fig3
# Figure 3 reproduced. Identical. SHA-256 verified. 30 seconds.That's ClawBio. Every figure in your paper should be one command away from reproduction.
A skill is a domain expert's knowledge β frozen into code β that an AI agent executes correctly every time.
ChatGPT / Claude = a smart generalist who guesses at bioinformatics
π¦ ClawBio skill = a domain expert's proven pipeline that the AI executes
- Local-first: Your genomic data never leaves your laptop. No cloud uploads, no data exfiltration.
- Reproducible: Every analysis exports
commands.sh,environment.yml, and SHA-256 checksums. Anyone can reproduce it without the agent. - Modular: Each skill is a self-contained directory (
SKILL.md+ Python scripts) that plugs into the orchestrator. - MIT licensed: Open-source, free, community-driven.
Ask Claude to "profile my pharmacogenes from this 23andMe file." It'll write plausible Python. But:
- It hallucinates star allele calls and uses outdated CPIC guidelines
- It forgets CYP2D6 *4 is no-function (not reduced)
- You spend 45 minutes debugging its output
- No reproducibility bundle. No audit log. No checksums.
ClawBio encodes the correct bioinformatics decisions so the agent gets it right first time, every time.
Every ClawBio analysis ships with a reproducibility bundle β not as an afterthought, but as part of the output:
report/
βββ report.md # Full analysis with figures and tables
βββ figures/ # Publication-quality PNGs
βββ tables/ # CSV data tables
βββ commands.sh # Exact commands to reproduce
βββ environment.yml # Conda environment snapshot
βββ checksums.sha256 # SHA-256 of every input and output file
Why this matters: a reviewer can re-run your analysis in 30 seconds. A collaborator can reproduce your Figure 3 without emailing you. Future-you can regenerate results two years later from the same bundle.
| Skill | Status | Description |
|---|---|---|
| Bio Orchestrator | MVP | Routes requests to the right skill |
| PharmGx Reporter | MVP | 12 genes, 51 drugs, CPIC guidelines |
| Equity Scorer | MVP | HEIM diversity metrics from VCF/ancestry |
| NutriGx Advisor | MVP | Personalised nutrigenomics (40 SNPs, 13 domains) |
| Metagenomics Profiler | MVP | Kraken2/RGI/HUMAnN3 taxonomy + resistome |
| Ancestry PCA | MVP | PCA vs SGDP (345 samples, 164 populations) |
| Semantic Similarity | MVP | Isolation Index from 13.1M PubMed abstracts |
| Genome Comparator | MVP | IBS vs George Church (PGP-1) + ancestry estimation |
| VCF Annotator | Planned | Variant annotation with VEP, ClinVar, gnomAD + ancestry context |
| Lit Synthesizer | Planned | PubMed/bioRxiv search with LLM summarisation and citation graphs |
| scRNA Orchestrator | Planned | Scanpy automation: QC, clustering, DE analysis, visualisation |
| Struct Predictor | Planned | AlphaFold/Boltz local structure prediction |
| Repro Enforcer | Planned | Export any analysis as Conda env + Singularity + Nextflow pipeline |
Generates a pharmacogenomic report from consumer genetic data (23andMe, AncestryDNA):
- Parses raw genetic data (auto-detects format)
- Extracts 31 pharmacogenomic SNPs across 12 genes (CYP2C19, CYP2D6, CYP2C9, VKORC1, SLCO1B1, DPYD, TPMT, UGT1A1, CYP3A5, CYP2B6, NUDT15, CYP1A2)
- Calls star alleles and determines metabolizer phenotypes
- Looks up CPIC drug recommendations for 51 medications
- Zero dependencies. Runs in < 1 second.
python pharmgx_reporter.py --input demo_patient.txt --output reportDemo result: CYP2D6 *4/*4 (Poor Metabolizer) β 10 drugs AVOID (codeine, tramadol, 7 TCAs, tamoxifen), 20 caution, 21 standard.
~7% of people are CYP2D6 Poor Metabolizers β codeine gives them zero pain relief. ~0.5% carry DPYD variants where standard 5-FU dose can be lethal. This skill catches both.
Runs principal component analysis on your cohort against the SGDP reference panel (345 samples, 164 global populations):
- Contig normalisation (chr1 vs 1)
- IBD removal (related individuals filtered)
- Common biallelic SNPs only
- Confidence ellipses per population
- Publication-quality 4-panel figure generated instantly
python ancestry_pca.py --demo --output ancestry_reportDemo result: 736 Peruvian samples across 28 indigenous populations. Amazonian groups (Matzes, Awajun, Candoshi) sit in genetic space that no SGDP population occupies β genuinely underrepresented, not just in GWAS, but in the reference panels themselves.
Computes a Semantic Isolation Index for diseases using 13.1M PubMed abstracts and PubMedBERT embeddings (768-dim):
- SII (Semantic Isolation Index): higher = more isolated in literature
- KTP (Knowledge Transfer Potential): higher = more cross-disease spillover
- RCC (Research Clustering Coefficient): diversity of research approaches
- Temporal Drift: how research focus evolves over time
- Publication-quality 4-panel figure
python semantic_sim.py --demo --output sem_reportKey finding: Neglected tropical diseases are +38% more semantically isolated (P < 0.0001, Cohen's d = 0.84). 14 of the 25 most isolated diseases are Global South priority conditions. Knowledge silos kill innovation β a malaria immunology breakthrough could help leishmaniasis, but the literatures don't talk to each other.
Corpas et al. (2026). HEIM: Health Equity Index for Measuring structural bias in biomedical research. Under review.
git clone https://github.com/ClawBio/ClawBio.git && cd ClawBio
pip install -r requirements.txt
python clawbio.py run pharmgx --demoPharmGx demo runs in <2 seconds. Only needs Python 3.10+.
python clawbio.py list # See available skills
python clawbio.py run pharmgx --demo # Pharmacogenomics (1s)
python clawbio.py run equity --demo # Equity scoring (55s)
python clawbio.py run nutrigx --demo # Nutrigenomics (60s)
python clawbio.py run metagenomics --demo # Metagenomics (3s)
python clawbio.py run compare --demo # Manuel Corpas vs George Church (10s)python clawbio.py run pharmgx --input my_23andme.txt --output results/pip install pytest
python -m pytest
RoboTerri β ClawBio's Telegram agent, inspired by Prof. Teresa K. Attwood
ClawBio skills are also available through RoboTerri, our Telegram AI agent β named after Prof. Teresa K. Attwood, a pioneer of bioinformatics education and computational biology in the UK. Send a genetic data file or ask for a demo β get back a summary, full report, and figures directly in Telegram.
You: [send 23andMe file]
RoboTerri: Running PharmGx Reporter...
CYP2D6 *4/*4 β Poor Metabolizer β 10 drugs AVOID
[report.md attached]
[3 figures attached]
RoboTerri auto-detects file type (23andMe .txt, AncestryDNA .csv, VCF, FASTQ) and routes to the right skill via the Bio Orchestrator. You can also ask explicitly:
- "run pharmgx demo" β PharmGx with synthetic patient data
- "run equity demo" β HEIM equity score with demo populations
- "run nutrigx demo" β Nutrigenomics with synthetic genotypes
The integration uses the same clawbio.run_skill() API, so results are identical whether you run via CLI or Telegram. See 01-AGENTS/02-ROBOTERRI for the full agent source.
Telegram (RoboTerri) CLI (clawbio.py) Python (import clawbio)
β β β
ββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
ββββββββΌβββββββ
β Bio β β routes by file type + keywords
β Orchestratorβ
ββββββββ¬βββββββ
β
βββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β β
PharmGx Equity NutriGx Metagenomics Ancestry
Reporter Scorer Advisor Profiler PCA ...
β β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββΌβββββββ
β Markdown β β report + figures + checksums
β Report β + reproducibility bundle
βββββββββββββββ
Each skill is standalone β the orchestrator routes to the right one, but every skill also works independently. The clawbio.run_skill() API is importable by any agent (RoboTerri, RoboIsaac, Claude Code).
See docs/architecture.md for the full design.
We want skills from the bioinformatics community. If you work with genomics, proteomics, metabolomics, imaging, or clinical data β wrap your pipeline as a skill.
| Skill | What | Your expertise |
|---|---|---|
| claw-gwas | PLINK/REGENIE automation | Statistical genetics |
| claw-metagenomics | Kraken2/MetaPhlAn wrapper | Microbiome |
| claw-acmg | Clinical variant classification | Clinical genomics |
| claw-pathway | GO/KEGG enrichment | Functional genomics |
| claw-phylogenetics | IQ-TREE/RAxML automation | Evolutionary biology |
| claw-proteomics | MaxQuant/DIA-NN | Proteomics |
| claw-spatial | Visium/MERFISH | Spatial transcriptomics |
See CONTRIBUTING.md for the submission process and templates/SKILL-TEMPLATE.md for the skill template.
ClawBio was announced at the London Bioinformatics Meetup on 26 February 2026.
- Slides: clawbio.github.io/ClawBio/slides/
- Talk: 10 Tips for Becoming a Top 1% AI User β with live demos of all three MVP skills
If you use ClawBio in your research, please cite:
@software{clawbio_2026,
author = {Corpas, Manuel},
title = {ClawBio: An Open-Source Library of AI Agent Skills for Reproducible Bioinformatics},
year = {2026},
url = {https://github.com/ClawBio/ClawBio}
}- π¦ Slides: clawbio.github.io/ClawBio/slides/
- OpenClaw β The agent platform
- ClawHub β Skill registry
- HEIM Index β Health Equity Index for Minorities
MIT β clone it, run it, build a skill, submit a PR. π¦