Skip to content

alabarga/ClawBio

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

74 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ClawBio

πŸ¦– ClawBio

The first bioinformatics-native AI agent skill library.
Built on OpenClaw (180k+ GitHub stars). Local-first. Privacy-focused. Reproducible.

CI Python 3.10+ MIT License ClawHub Skills Open Issues Slides


See It in Action

A community contributor built a nutrigenomics skill and ran it β€” from raw genetic data to personalised nutrition report with radar charts, heatmaps, and reproducibility bundle:

https://github.com/ClawBio/ClawBio/releases/download/v0.2.0/david-nutrigx-demo.mp4

What just happened behind the scenes
  1. The AI agent read SKILL.md β€” a specification that encodes the correct bioinformatics decisions (40 SNPs, 13 nutrient domains, evidence-based risk thresholds)
  2. It ran the Python skill locally β€” no genetic data left the machine
  3. It produced a markdown report with figures, tables, and a reproducibility bundle (commands.sh, environment.yml, checksums.sha256)
  4. Anyone can re-run the exact same analysis and get identical results, SHA-256 verified

ClawBio PharmGx Demo
PharmGx Reporter: 12 genes, 51 drugs, under 1 second


ClawBio at the UK AI Agent Hack, Imperial College London

Manuel Corpas introduces ClawBio to Peter Steinberger at the UK AI Agent Hack (1 March 2026):

ClawBio intro to Peter Steinberger
Click to watch on YouTube


The Problem

You read a paper. You want to reproduce Figure 3. So you:

  1. Go to GitHub. Clone the repo.
  2. Wrong Python version. Fix dependencies.
  3. Need the reference data β€” where is it?
  4. Download 2GB from Zenodo. Link is dead.
  5. Email the first author. Wait 3 weeks.
  6. Paths are hardcoded to /home/jsmith/data/.
  7. Two days later: still broken. You give up.

Now imagine the same paper published a skill:

python ancestry_pca.py --demo --output fig3
# Figure 3 reproduced. Identical. SHA-256 verified. 30 seconds.

That's ClawBio. Every figure in your paper should be one command away from reproduction.


πŸ¦– What Is ClawBio?

A skill is a domain expert's knowledge β€” frozen into code β€” that an AI agent executes correctly every time.

ChatGPT / Claude  = a smart generalist who guesses at bioinformatics
πŸ¦– ClawBio skill  = a domain expert's proven pipeline that the AI executes
  • Local-first: Your genomic data never leaves your laptop. No cloud uploads, no data exfiltration.
  • Reproducible: Every analysis exports commands.sh, environment.yml, and SHA-256 checksums. Anyone can reproduce it without the agent.
  • Modular: Each skill is a self-contained directory (SKILL.md + Python scripts) that plugs into the orchestrator.
  • MIT licensed: Open-source, free, community-driven.

Why Not Just Use ChatGPT?

Ask Claude to "profile my pharmacogenes from this 23andMe file." It'll write plausible Python. But:

  • It hallucinates star allele calls and uses outdated CPIC guidelines
  • It forgets CYP2D6 *4 is no-function (not reduced)
  • You spend 45 minutes debugging its output
  • No reproducibility bundle. No audit log. No checksums.

ClawBio encodes the correct bioinformatics decisions so the agent gets it right first time, every time.


πŸ” Provenance & Reproducibility

Every ClawBio analysis ships with a reproducibility bundle β€” not as an afterthought, but as part of the output:

report/
β”œβ”€β”€ report.md              # Full analysis with figures and tables
β”œβ”€β”€ figures/               # Publication-quality PNGs
β”œβ”€β”€ tables/                # CSV data tables
β”œβ”€β”€ commands.sh            # Exact commands to reproduce
β”œβ”€β”€ environment.yml        # Conda environment snapshot
└── checksums.sha256       # SHA-256 of every input and output file

Why this matters: a reviewer can re-run your analysis in 30 seconds. A collaborator can reproduce your Figure 3 without emailing you. Future-you can regenerate results two years later from the same bundle.


πŸ¦– Skills

Skill Status Description
Bio Orchestrator MVP Routes requests to the right skill
PharmGx Reporter MVP 12 genes, 51 drugs, CPIC guidelines
Equity Scorer MVP HEIM diversity metrics from VCF/ancestry
NutriGx Advisor MVP Personalised nutrigenomics (40 SNPs, 13 domains)
Metagenomics Profiler MVP Kraken2/RGI/HUMAnN3 taxonomy + resistome
Ancestry PCA MVP PCA vs SGDP (345 samples, 164 populations)
Semantic Similarity MVP Isolation Index from 13.1M PubMed abstracts
Genome Comparator MVP IBS vs George Church (PGP-1) + ancestry estimation
VCF Annotator Planned Variant annotation with VEP, ClinVar, gnomAD + ancestry context
Lit Synthesizer Planned PubMed/bioRxiv search with LLM summarisation and citation graphs
scRNA Orchestrator Planned Scanpy automation: QC, clustering, DE analysis, visualisation
Struct Predictor Planned AlphaFold/Boltz local structure prediction
Repro Enforcer Planned Export any analysis as Conda env + Singularity + Nextflow pipeline

πŸ¦– MVP Skills in Detail

PharmGx Reporter β€” Personal Scale

Generates a pharmacogenomic report from consumer genetic data (23andMe, AncestryDNA):

  • Parses raw genetic data (auto-detects format)
  • Extracts 31 pharmacogenomic SNPs across 12 genes (CYP2C19, CYP2D6, CYP2C9, VKORC1, SLCO1B1, DPYD, TPMT, UGT1A1, CYP3A5, CYP2B6, NUDT15, CYP1A2)
  • Calls star alleles and determines metabolizer phenotypes
  • Looks up CPIC drug recommendations for 51 medications
  • Zero dependencies. Runs in < 1 second.
python pharmgx_reporter.py --input demo_patient.txt --output report

Demo result: CYP2D6 *4/*4 (Poor Metabolizer) β†’ 10 drugs AVOID (codeine, tramadol, 7 TCAs, tamoxifen), 20 caution, 21 standard.

~7% of people are CYP2D6 Poor Metabolizers β€” codeine gives them zero pain relief. ~0.5% carry DPYD variants where standard 5-FU dose can be lethal. This skill catches both.

Ancestry PCA β€” Population Scale

Runs principal component analysis on your cohort against the SGDP reference panel (345 samples, 164 global populations):

  • Contig normalisation (chr1 vs 1)
  • IBD removal (related individuals filtered)
  • Common biallelic SNPs only
  • Confidence ellipses per population
  • Publication-quality 4-panel figure generated instantly
python ancestry_pca.py --demo --output ancestry_report

Demo result: 736 Peruvian samples across 28 indigenous populations. Amazonian groups (Matzes, Awajun, Candoshi) sit in genetic space that no SGDP population occupies β€” genuinely underrepresented, not just in GWAS, but in the reference panels themselves.

Semantic Similarity Index β€” Systemic Scale

Computes a Semantic Isolation Index for diseases using 13.1M PubMed abstracts and PubMedBERT embeddings (768-dim):

  • SII (Semantic Isolation Index): higher = more isolated in literature
  • KTP (Knowledge Transfer Potential): higher = more cross-disease spillover
  • RCC (Research Clustering Coefficient): diversity of research approaches
  • Temporal Drift: how research focus evolves over time
  • Publication-quality 4-panel figure
python semantic_sim.py --demo --output sem_report

Key finding: Neglected tropical diseases are +38% more semantically isolated (P < 0.0001, Cohen's d = 0.84). 14 of the 25 most isolated diseases are Global South priority conditions. Knowledge silos kill innovation β€” a malaria immunology breakthrough could help leishmaniasis, but the literatures don't talk to each other.

Corpas et al. (2026). HEIM: Health Equity Index for Measuring structural bias in biomedical research. Under review.


Quick Start

git clone https://github.com/ClawBio/ClawBio.git && cd ClawBio
pip install -r requirements.txt
python clawbio.py run pharmgx --demo

PharmGx demo runs in <2 seconds. Only needs Python 3.10+.

Try all skills

python clawbio.py list                          # See available skills
python clawbio.py run pharmgx --demo            # Pharmacogenomics (1s)
python clawbio.py run equity --demo             # Equity scoring (55s)
python clawbio.py run nutrigx --demo            # Nutrigenomics (60s)
python clawbio.py run metagenomics --demo       # Metagenomics (3s)
python clawbio.py run compare --demo            # Manuel Corpas vs George Church (10s)

Run with your own data

python clawbio.py run pharmgx --input my_23andme.txt --output results/

Run tests

pip install pytest
python -m pytest

Run via Telegram (RoboTerri)

RoboTerri
RoboTerri β€” ClawBio's Telegram agent, inspired by Prof. Teresa K. Attwood

ClawBio skills are also available through RoboTerri, our Telegram AI agent β€” named after Prof. Teresa K. Attwood, a pioneer of bioinformatics education and computational biology in the UK. Send a genetic data file or ask for a demo β€” get back a summary, full report, and figures directly in Telegram.

You:        [send 23andMe file]
RoboTerri:  Running PharmGx Reporter...
            CYP2D6 *4/*4 β€” Poor Metabolizer β†’ 10 drugs AVOID
            [report.md attached]
            [3 figures attached]

RoboTerri auto-detects file type (23andMe .txt, AncestryDNA .csv, VCF, FASTQ) and routes to the right skill via the Bio Orchestrator. You can also ask explicitly:

  • "run pharmgx demo" β€” PharmGx with synthetic patient data
  • "run equity demo" β€” HEIM equity score with demo populations
  • "run nutrigx demo" β€” Nutrigenomics with synthetic genotypes

The integration uses the same clawbio.run_skill() API, so results are identical whether you run via CLI or Telegram. See 01-AGENTS/02-ROBOTERRI for the full agent source.


πŸ¦– Architecture

Telegram (RoboTerri)     CLI (clawbio.py)     Python (import clawbio)
         β”‚                      β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
             β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
             β”‚  Bio         β”‚  ← routes by file type + keywords
             β”‚  Orchestratorβ”‚
             β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                                                         β”‚
  PharmGx    Equity     NutriGx    Metagenomics   Ancestry
  Reporter   Scorer     Advisor    Profiler        PCA    ...
  β”‚                                                         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
             β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
             β”‚  Markdown    β”‚  ← report + figures + checksums
             β”‚  Report      β”‚     + reproducibility bundle
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each skill is standalone β€” the orchestrator routes to the right one, but every skill also works independently. The clawbio.run_skill() API is importable by any agent (RoboTerri, RoboIsaac, Claude Code).

See docs/architecture.md for the full design.


Community Wanted Skills πŸ¦–

We want skills from the bioinformatics community. If you work with genomics, proteomics, metabolomics, imaging, or clinical data β€” wrap your pipeline as a skill.

Skill What Your expertise
claw-gwas PLINK/REGENIE automation Statistical genetics
claw-metagenomics Kraken2/MetaPhlAn wrapper Microbiome
claw-acmg Clinical variant classification Clinical genomics
claw-pathway GO/KEGG enrichment Functional genomics
claw-phylogenetics IQ-TREE/RAxML automation Evolutionary biology
claw-proteomics MaxQuant/DIA-NN Proteomics
claw-spatial Visium/MERFISH Spatial transcriptomics

See CONTRIBUTING.md for the submission process and templates/SKILL-TEMPLATE.md for the skill template.


Presentation

ClawBio was announced at the London Bioinformatics Meetup on 26 February 2026.


Citation

If you use ClawBio in your research, please cite:

@software{clawbio_2026,
  author = {Corpas, Manuel},
  title = {ClawBio: An Open-Source Library of AI Agent Skills for Reproducible Bioinformatics},
  year = {2026},
  url = {https://github.com/ClawBio/ClawBio}
}

Links

License

MIT β€” clone it, run it, build a skill, submit a PR. πŸ¦–

About

πŸ¦– ClawBio β€” The first bioinformatics-native AI agent skill library. Local-first. Reproducible. Built on OpenClaw.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 82.1%
  • HTML 17.8%
  • Makefile 0.1%