Skip to content

hgbrian/biomodals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

197 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

biomodals

Bioinformatics tools running on modal.

install and set up modal

pip install modal
python3 -m modal setup

Or alternatively, use uv, e.g.:

uv run --with modal modal run modal_minimap2.py

Apps

Sorted alphabetically.

  • AF2Rank — structure ranking via AF2 prediction
  • AFDesign — peptide/binder design via AlphaFold2
  • AlphaFold-Multimer — multimer structure prediction
  • ANARCI — antibody sequence annotation
  • BindCraft — protein binder design
  • Boltz — AF3-like open structure prediction
  • BoltzGen — generative structure model
  • Chai-1 — AF3-like open structure prediction
  • DiffDock — small molecule docking
  • ESM2 — masked amino acid prediction
  • ESMFold2 — single-sequence / complex structure prediction
  • ESMFold2 binder design — gradient-guided sequence-only binder design
  • FASPR — side-chain packing
  • Germinal — binder design
  • IgGM — antibody design
  • LigandMPNN — protein sequence design conditioned on ligands
  • mBER — VHH nanobody design
  • minimap2 — short-read alignment
  • nextflow — nextflow hello-world image
  • pdb2png — PDB → PNG rendering via pymol
  • Protenix — AF3 reproduction
  • RSO — Rejection Sampling Optimization binder design
  • SASA — solvent-accessible surface area
  • tmol — GPU Rosetta energy scoring
  • USalign — TM-score / RMSD structural alignment

AF2Rank

wget https://files.rcsb.org/download/1YWI.pdb
uv run --with modal modal run modal_af2rank.py --input-pdb 1YWI.pdb --run-name 1YWI

AFDesign

Create a cyclic peptide against a pdb file (using pdb-redo data by default)

uv run --with modal modal run modal_afdesign.py --pdb 4MZK --target-chain A

Set the first and last amino acid of the (cyclic) peptide to cysteine. Here using a small number of iterations for speed reasons... Use --soft-iters 30 --hard-iters 6 or more for better results.

uv run --with modal modal run modal_afdesign.py --pdb 1A00 --target-chain A --soft-iters 2 --hard-iters 2 --binder-len 6 --set-fixed-aas C....C

Create a linear peptide against a local PDB file that has been manually edited. This is unfortunately sometimes necessary when e.g. a chain is too long or there are too many chains.

uv run --with modal modal run modal_afdesign.py --pdb in/afdesign/1igy_cropped.fixed.pdb --target-chain B

AlphaFold-Multimer

A very basic implementation.

wget https://www.rcsb.org/fasta/entry/3NIT -O 3NIT.faa
uv run --with modal modal run modal_alphafold.py --input-faa 3NIT.faa

ANARCI

A tool for annotating antibody sequences https://github.com/oxpig/ANARCI

printf '>test_anarci\nDIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLESGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRT\n' > test_anarci.faa
uv run --with modal modal run modal_anarci.py --input-faa test_anarci.faa

BindCraft

Basic PDL1 binder (example from https://github.com/martinpacesa/BindCraft)

wget https://raw.githubusercontent.com/martinpacesa/BindCraft/refs/heads/main/example/PDL1.pdb
GPU=A100 uv run --with modal modal run modal_bindcraft.py --input-pdb PDL1.pdb --number-of-final-designs 1

Boltz

Boltz, an open source AlphaFold 3-like model.

printf 'sequences:\n    - protein:\n        id: A\n        sequence: TDKLIFGKGTRVTVEP\n' > test_boltz.yaml
uv run --with modal modal run modal_boltz.py --input-yaml test_boltz.yaml --params-str "--seed 42"

BoltzGen

BoltzGen, generative model for biomolecular structures.

wget https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif
wget https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml
uv run --with modal modal run modal_boltzgen.py --input-yaml 1g13prot.yaml --num-designs 1

Chai-1

Chai-1, another open source AlphaFold 3-like model.

printf '>protein|name=insulin\nMAWTPLLLLLLSHCTGSLSQPVLTQPTSLSASPGASARFTCTLRSGINVGTYRIYWYQQKPGSLPRYLLRYKSDSDKQGSGVPSRFSGSKDASTNAGLLLISGLQSEDEADYYCAIWYSSTS\n>RNA|name=rna\nACUGACUGGAAGUCCCCCGUAGUACCCGACG\n>ligand|name=caffeine\nN[C@@H](Cc1ccc(O)cc1)C(=O)O\n' > test_chai1.faa
uv run --with modal modal run modal_chai1.py --input-faa test_chai1.faa

DiffDock

WARNING: DiffDock's image build is very slow (downloads ~3GB of ESM2 + DiffDock models).

Dock a .mol2 file against a local pdb file. DiffDock may require an 80GB A100 to run for larger proteins.

wget https://files.rcsb.org/download/1IGY.pdb
wget https://gist.github.com/hgbrian/393ec799893cbf518f3084847c17cb2d/raw/1IGY_example.mol2
uv run --with modal modal run modal_diffdock.py --pdb-file 1IGY.pdb --mol2-file 1IGY_example.mol2

ESM2 (masked-position prediction)

Predict the amino acid at a masked position in a sequence.

printf '>1\nMA<mask>GMT\n' > test_esm2.faa
uv run --with modal modal run modal_esm2_predict_masked.py --input-faa test_esm2.faa

ESMFold2

ESMFold2 (Biohub) — single-sequence / complex structure prediction. No MSA required (PLM-only). Multi-entity FASTA: header type tags protein|, dna|, rna|, ligand| (SMILES for ligand) are honored.

printf '>protein|name=insulin\nGIVEQCCTSICSLYQLENYCN\n' > test_esmfold2.faa
uv run --with modal modal run modal_esmfold2.py --input-faa test_esmfold2.faa

ESMFold2 binder design

Gradient-guided binder sequence design using ESMFold2 (folding/distogram) + ESMC (LM regularization), adapted from Biohub's cookbook binder_design.py. Sequence-only — no target PDB or hotspots required. Built-in preset targets (cd45, ctla4, egfr, pd-l1, pdgfr) and binder scaffolds (minibinder, trastuzumab_framework_vhvl, atezolizumab_framework_vhvl, ocankitug_framework_vhvl).

# Minibinder against PD-L1 (preset target + preset scaffold)
uv run --with modal modal run modal_esmfold2_binder_design.py --target-name pd-l1 --binder-name minibinder

# Trastuzumab-framework antibody against CTLA-4
uv run --with modal modal run modal_esmfold2_binder_design.py \
    --target-name ctla4 --binder-name trastuzumab_framework_vhvl --is-antibody

# Custom target + custom template ('#' = designable position)
uv run --with modal modal run modal_esmfold2_binder_design.py \
    --target-sequence AFTVTVPKDLYVVEYGSNMTIECKFPVEKQLDLAALIVYWEMEDKNIIQFVHGEEDLKVQHSSYRQRARLLKDQLSLGNAALQITDVKLQDAGVYRCMISYGGADYKRITVKVNA \
    --binder-sequence "############################################################"

FASPR (side-chain packing)

FASPR — fast and accurate side-chain packing. Repacks side chains of a PDB (requires complete main-chain atoms) and can introduce mutations via a sequence file.

wget https://files.rcsb.org/download/1CRN.pdb
uv run --with modal modal run modal_faspr.py --input-pdb 1CRN.pdb

Germinal

Germinal took some serious hacking to get working. It seems to work ok but buyer beware. I recommend using BoltzGen instead. Unlike some other apps here, it creates a Volume to store params instead of storing them in the image.

# Get the PD-L1 structure from RCSB PDB
wget https://files.rcsb.org/download/5O45.pdb

# Extract chain A only
grep "^ATOM.*\ A\ " 5O45.pdb > 5O45_chainA.pdb

# Create target configuration file
cat > target_example.yaml << 'EOF'
target_name: "5O45"
target_pdb_path: "5O45_chainA.pdb"
target_chain: "A"
binder_chain: "B"
target_hotspots: "A19,A20,A21,A22"
length: 129
EOF

# Run minimal test (1 trajectory, 1 passing design)
uv run --with modal --with PyYAML modal run modal_germinal.py --target-yaml target_example.yaml --max-trajectories 1 --max-passing-designs 1

IgGM

IgGM, antibody design model.

wget https://files.rcsb.org/download/5O45.pdb
printf '>H\nEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDRLSITIRPRYYGLDVWGQGTTVTVSS\n>L\nDIQMTQSPSSLSASVGDRVTITCRASQDVNTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQHYTTPPTFGQGTKVEIKRT\n>A\n' > test_iggm.faa
uv run --with modal modal run modal_iggm.py --input-faa test_iggm.faa --antigen 5O45.pdb --epitope 19,20,21

LigandMPNN

wget https://files.rcsb.org/download/1IVO.pdb
uv run --with modal modal run modal_ligandmpnn.py --input-pdb 1IVO.pdb --extract-chains AC --params-str '--seed 1 --checkpoint_protein_mpnn "/LigandMPNN/model_params/proteinmpnn_v_48_020.pt"  --chains_to_design "C" --save_stats 1'

mBER (VHH nanobody design)

Design VHH nanobody binders against a target protein using mBER.

wget https://files.rcsb.org/download/7STF.pdb
uv run --with modal modal run modal_mber.py --target-pdb 7STF.pdb --target-name PDL1

With custom masked sequence (* = positions to design):

uv run --with modal modal run modal_mber.py --target-pdb target.pdb --target-name MyTarget \
    --masked-binder-seq "EVQLVESGGGLVQPGGSLRLSCAASG*********WFRQAPGKEREF***********NADSVKGRFTISRDNAKNTLYLQMNSLRAEDTAVYYC************WGQGTLVTVSS"

minimap2 (short reads example)

Runs minimap2 -ax sr <fasta> <reads>

Just a simple example of running a binary on a powerful box.

wget https://gist.githubusercontent.com/hgbrian/56787d9b3ce2e68f698ac94d537340d8/raw/mito.fasta
wget https://gist.githubusercontent.com/hgbrian/802d8094bb4fed435bbb93a8c9092ee2/raw/mito_reads.fastq
uv run --with modal modal run modal_minimap2.py --input-ref-fasta mito.fasta --input-reads-fastq mito_reads.fastq

nextflow

Minimal hello world app, with conda and nextflow installed (not trivial!)

uv run --with modal modal run modal_nextflow_example.py

pdb2png

A simple pymol-based script to convert PDBs to PNGs for easy output viewing.

wget https://files.rcsb.org/download/1YWI.pdb
uv run --with modal modal run modal_pdb2png.py --input-pdb 1YWI.pdb --protein-zoom 0.8 --protein-color 240,200,190

Protenix

Protenix, an open-source PyTorch reproduction of AlphaFold 3.

printf '>protein|A\nMAWTPLLLLLLSHCTGSLSQPVLTQPTSLSASPGASARFTCTLRSGINVGTYRIYWYQQKPGSLPRYLLRYKSDSDKQQGSGVPSRFSGSKDASTNAGLLLISGLQSEDEADYYCAIWYSSTS\n' > test_protenix.faa
uv run --with modal modal run modal_protenix.py --input-faa test_protenix.faa --seeds 42 --no-use-msa

RSO (binder design)

Design binders using RSO (Rejection Sampling Optimization).

wget https://files.rcsb.org/download/5O45.pdb
grep "^ATOM.*\ A\ " 5O45.pdb > 5O45_chainA.pdb
uv run --with modal modal run modal_rso.py --input-pdb 5O45_chainA.pdb --num-designs 1 --traj-iters 10 --binder-len 30

SASA (solvent-accessible surface area)

dr_sasa — annotates a PDB (or CIF, auto-converted via openbabel) with per-atom SASA in the B-factor column.

wget https://files.rcsb.org/download/1CRN.pdb
uv run --with modal modal run modal_sasa.py --input-pdb 1CRN.pdb
# pymol out/sasa/<run>/1CRN.asa.pdb

tmol (Rosetta energy scoring)

tmol — GPU-accelerated Rosetta beta_nov2016 energy scoring. Runs an L-BFGS Cartesian relax and (optionally) a FASPR repack before scoring. Pass --input-dir to score every PDB in a directory in parallel.

Note: tmol requires a clean PDB (no chain breaks, no missing residues, no extra hydrogens). PDBs straight from the RCSB often need preprocessing.

modal deploy modal_faspr.py     # only needed if using --faspr (default)
wget https://files.rcsb.org/download/1CRN.pdb
uv run --with modal modal run modal_tmol.py --input-pdb 1CRN.pdb

USalign (structural alignment)

US-align — universal structural alignment (successor to TM-align). Reports TM-score and RMSD between two or more structures (PDB or CIF, proteins/RNA/DNA, monomer or complex).

wget https://files.rcsb.org/download/5O45.pdb
grep "^ATOM.*\ A\ " 5O45.pdb > 5O45_chainA.pdb
uv run --with modal modal run modal_usalign.py --pdb 5O45.pdb --vs-pdbs 5O45_chainA.pdb

Score a single chain inside aligned complexes (two-step alignment):

uv run --with modal modal run modal_usalign.py --pdb 5O45.pdb --vs-pdbs 5O45_chainA.pdb --chain A

Testing

Build all images (no GPU, but slow):

uv run --with modal --with pytest pytest tests/test_build_images.py -v
uv run --with modal --with pytest pytest tests/test_build_images.py -v -k alphafold  # single app

Run all apps with minimal inputs (uses GPU, costs money):

uv run --with modal --with pytest pytest tests/test_quick_runs.py -v
uv run --with modal --with pytest pytest tests/test_quick_runs.py -v -k sasa  # single app

Other modal repos

About

bioinformatics tools running on modal

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages