Skip to content

Zer0pa/Synthetic-Biology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic-Biology

Live window into the Zer0pa lab. Synthetic Biology / Metabolic Pathway Engineering — Pipeline 4 of 6.

What This Is

In silico metabolic-pathway-engineering pipeline (L1→L7 + L4.5 + L5_OED) producing predicted pathways, KPIs, and SBOL3-attested genetic-modification specs as research artifacts.

The pipeline trains a Zer0pa-owned Conditional Enzyme Kinetics Model (CEKM) on real BRENDA / EnzyExtract / GotEnzymes2 / ProteinGym corpora, runs the four-tool L4 kinetics ensemble (DLKcat / CatPred / TurNuP / CEKM) plus eQuilibrator MDF / COBRApy FBA / FluxGAT essentiality, scaffolds the L4.5 unknown-enzyme path with RFdiffusion2

  • MACE-OFF + ESMFold + ProDy + Genie-CAT, and emits SBOL3-attested genetic-modification specifications via L6 host engineering. Every adapter emits a UniversalLayerEnvelope whose 23-falsifier registry, boundary-block sha256, and license-class enforcement are first-class audit invariants.

The Human Milk Oligosaccharide (HMO) seed triple — 2'-fucosyllactose, 3'-sialyllactose, and disialyllacto-N-tetraose in E. coli iML1515 — is the validation triple. Pre-registered acceptance thresholds (validation/hmo-seed-evidence/<seed>/acceptance.yaml) are the binding numerical gates; structural envelope-chain conformance passes 3/3 today.

Pipeline Mechanics

Field Value
Architecture METABOLIC_PATHWAY_PIPELINE (L1 ZPE → L2 LIRC → L3 retrosynthesis → L3.5 ranking → L4 deep-eval → L4.5 unknown-enzyme → L5 MFMO → L5_OED → L6 host engineering → L6_BUILD cell-free TX-TL → L7 dossier)
Substrate UniversalLayerEnvelope (Pydantic v2 + canonical-JSON sha256), SBOL3-attested L6 spec, PROV-O JSON-LD chain, DuckDB audit + GraphML/Cypher/RDF KG export
Execution Mac CPU + H100 SXM 80 GB (autonomous orchestrator on Runpod, 10-phase chain with resume sentinels)
Toolchain torch 2.2/cu130 + transformers (ESM-2 650M / ESMFold) + RFdiffusion2 (BSD-3) + MACE-OFF (medium) + equilibrator-pathway (MDF LP) + COBRApy + ripser/persim (TDA) + BoTorch (Hamming kernel + qLogNEHVI) + selfies + RDKit
Discipline 23 falsifiers across 3 tiers (Tier-A fast / Tier-B medium / Tier-C heavy) + cross-model disagreement first-class + GPL-subprocess isolation (Salis RBS Calculator v1.0) + RESISTANCE.md anti-corruption protocol
Compute Status v0.1 H100 chain runs end-to-end at the structural-audit layer on Pod 1hx4ctwg1mpmxr 2026-05-03 (CEKM 20,000 fp32 steps; loss 6.93 → ~3.0, best 2.73 at step-19850; HMO triple + L4.5 inference + 19.2 GB CEKM push to HF emitted in same chain). Committed HMO evidence packets at validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md report scientific_valid: False; scientific HMO validation is not claimed at v0.1 and is in claim-surface repair.

Key Metrics

Metric Value Baseline
CEKM_REAL_CORPUS_LOSS 6.93 → ~3.0 (steps 0 → 20000; best 2.73 at step-19850) total: 33,851 in-corpus rows + 5,961 held-out + 101,553 adversarial Tier α/β/γ negatives
AUTONOMOUS_CHAIN_PHASES 10 / 10 complete preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize (Pod 1hx4ctwg1mpmxr 2026-05-03 → 3b9744e)
HMO_TRIPLE_AUDIT_VERIFY (structural) 3/3 PASS structural conformance only (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present, license-class enforcement, falsifier registry loaded) per docs/synbio-audit-trail-v0.1-spec.md §10; DSLNT round-0 dossier envelope_count=11 from 2026-05-03 chain. Scientific-validity gates in validation/hmo-seed-evidence/{seed}/RESULT.md report scientific_valid: False (stub mode); not equivalent to scientific HMO validation.
CPU_PIPELINE_TESTS 256 passing, 59 GPU-skipped 0 regressions across CPU continuation A-H

Source: PRD.md, FINAL-REPORT.md, FINAL-REPORT-RUNPOD.md, FINAL-REPORT-RUNPOD-AUTONOMOUS.md, validation/hmo-seed-evidence/, audit/runtime/runpod/.

Repo Identity

Field Value
Identifier Synthetic-Biology
Repository https://github.com/Zer0pa/Synthetic-Biology
Portfolio Bio-Engineering
Visibility PUBLIC
Default Branch main
Authority Source PRD.md (locked v1.0 decisions)
License repository license file

Readiness

Field Value
Evidence posture v0.1 first full-budget H100 chain runs end-to-end at the structural-audit layer; scientific HMO validation is not claimed (committed evidence packets report scientific_valid: False); not a productized service
Checks 256 passing tests + 23 falsifiers + 3/3 HMO seed audit-verify PASS
Custody boundary 3 CEKM ckpts (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL) on HF Architect-Prime/synbio-cekm-v0.1; envelope chains + dossiers + L4.5 ESMFold PDBs + MACE-OFF binding ΔG JSONs in git
Confidence scoped by Tier-A/B/C falsifier hierarchy; PathGym DBTL-holdout calibration deferred; CEKM calibration gate non-blocking by design (no BRENDA holdout in v0.1 corpus)
Authority PRD.md (locked decisions); FINAL-REPORT-RUNPOD-AUTONOMOUS.md (chain receipts at 3b9744e); HANDOFF-CPU-CONTINUATION.md (CPU phase A-H record)

Honest Blocker

The HMO triple passes structural audit conformance only; the committed HMO evidence packets at validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md report scientific_valid: False and stub mode for downstream scientific predictions. Scientific HMO validation is not claimed at v0.1 and remains in claim-surface repair; the HMO_TRIPLE_AUDIT_VERIFY 3/3 PASS claim attests structural conformance only, not scientific validity. CEKM v0.1 reached its 20,000-step target with checkpoints at step 1500 / 18000 / 19000 pushed to HF; this is a v0.1 research checkpoint, not a calibrated affinity predictor. Wet-lab Phase 2 dispatch is triple-gated and never on the cutover path. PathGym DBTL holdout calibration of TDA warning_score thresholds and L5 surrogate calibration scores is deferred to held-out post-experiment data. Real RFdiffusion2 motif-conditional designs require curated TS-mimetic geometry, downstream of v0.1; the v0.1 RFD2 wrapper additionally errored on run_inference.py not found (upstream layout drift across the 3 candidate paths the wrapper probes — non-blocking since ESMFold + MACE-OFF outputs landed for all 3 HMO seeds). BRENDA bulk download requires registration; v0.1 trains on EnzyExtract dark-matter + GotEnzymes2 + ProteinGym subsets, not full BRENDA core. CEKM Phase 40 calibration gate is non-blocking by design (sentinel-touched after eval ran cleanly against step-19000 ckpt; tier α/β/γ AUCs return None because no BRENDA holdout exists in this corpus).

What We Prove

  • Real CEKM training on real corpus runs the full v0.1 budget end-to-end on H100 SXM (EnzyExtract 60K + GotEnzymes2 17K → 33K in-corpus + 6K held-out + 100K adversarial Tier α/β/γ negatives; loss curve 6.93 → ~3.0 over 20,000 fp32 steps, best 2.73 at step-19850; sustained 1.39 steps/s post-recovery; atomic-save + defensive _latest_checkpoint patches survived ~6 mfs-quota-induced partial-write events without losing checkpoint integrity).
  • Autonomous H100 chain runs all 10 phases (preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize) end-to-end on Pod 1hx4ctwg1mpmxr 2026-05-03; phases 50–90 took 6m 32s wallclock after Phase 30's 3h training; emits real ESMFold PDBs for 7 enzymes across 3 HMO seeds + MACE-OFF binding ΔG JSONs + DSLNT round-0 dossier (envelope_count=11) + 19.2 GB CEKM checkpoint push to Hugging Face in 48s.
  • HMO scientific-validation triple emits structurally complete L1→L7 envelope chains for 2'-fucosyllactose / 3'-sialyllactose / disialyllacto-N-tetraose; synbio audit verify passes 3/3 under the conformance verifier (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present on every L6 envelope, Class C/D/E license-grants enforced, cross-model disagreement records emitted, falsifier registry loaded).
  • L4B real eQuilibrator MDF on HMO precursor pathway: 2'-FL MDF=+6.78, 3'-SL +11.84, DSLNT +11.41 kJ/mol via equilibrator_pathway.ThermodynamicModel.mdf_analysis() with per-compound optimal concentrations in the 1 μM – 10 mM physiological window.
  • L5 real BoTorch surrogate: GP per objective with custom Hamming-distance kernel + qLogNoisyExpectedHypervolumeImprovement + ASR-thermostable warm-starts (split-venv subprocess pattern; weights stay float32, autocast handles per-op casting; plug-replaceability invariant preserved across real-vs-stub paths).
  • TDA real fermentation simulator: 5-state Monod ODE via scipy.integrate.solve_ivp(LSODA) covering all five PRD §5.3 failure modes (oxygen-transfer collapse / byproduct buildup / growth stall / toxicity threshold / nutrient depletion) with multi-channel ripser bottleneck + late-vs-early rate-of-change hybrid early-warning.
  • Synbio Audit-Trail Specification v0.1 (CC BY 4.0, Zer0pa-published): SBOL3 + PROV-O extension + canonical-JSON sha256 hash chain + Class A/B/C/D/E license-class enforcement + GPL-subprocess-isolation pattern (Salis RBS Calculator v1.0 binary wrapper, no Python import of GPL modules).

What We Don't Claim

  • This is not a clinical or human-subject pipeline. No diagnostic, therapeutic, or device claims.
  • This is not a deployed industrial production system. No commercial titer guarantees.
  • The CEKM v0.1 checkpoint is not a calibrated affinity predictor; it is a v0.1 research checkpoint trained for the full 20,000-step budget with bounded loss-decline evidence on a held-out partition. Tier α/β/γ AUCs are None because v0.1 has no BRENDA holdout.
  • HMO predictions are advisory research artifacts, not regulatory submissions or product specifications. Wet-lab validation is operator-gated and never on the cutover path.
  • The L4.5 unknown-enzyme path emits Tier-1 / Tier-2 / Tier-3 advisories per PRD §6.6; these are research suggestions, not enzyme designs warranting downstream synthesis without independent verification.
  • No environmental release of GMOs. No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.

Verification Status

Surface Status Evidence
Test suite 256 passing, 59 GPU-skipped pytest tests/ clean on Python 3.13 / macOS x86_64; CPU continuation A-H 0 regressions
Falsifier registry 23 falsifiers across Tiers A/B/C, registry loads at module import audit/falsifiers.yaml + src/zer0pa_synbio/falsifiers/checks.py (one CPU implementation per registry entry; deliberate-trigger test per falsifier)
HMO triple conformance 3/3 PASS under synbio audit verify validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md + envelope chains 21/24/24 envelopes per seed
CEKM checkpoint custody 3 ckpts on HF (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL + meta sha256-recorded) https://huggingface.co/Architect-Prime/synbio-cekm-v0.1 (push 2026-05-03T03:46Z, 48s upload @ 3.43 GB/s)
Cutover invariance 38 plug-replaceability / cutover-invariance tests httpx.MockTransport golden-fixture suite forked from sibling-workstream Energy Wave 4
Boundary discipline Boundary block sha256-checked on every envelope; falsifier f000_boundary_violation enforces src/zer0pa_synbio/boundary.py + BOUNDARY.md

Proof Anchors

  • PRD.md — locked v1.0 spec; controlling decisions, layer contracts, falsifier registry, license discipline.
  • audit/falsifiers.yaml — 23-falsifier registry with id, tier, severity, gate_action.
  • validation/hmo-seed-evidence/ — pre-registered acceptance thresholds + envelope chains + dossiers + audit-verify reports for the 2'-FL / 3'-SL / DSLNT validation triple.
  • docs/synbio-audit-trail-v0.1-spec.md — Zer0pa-published Synbio Audit-Trail Spec v0.1 (CC BY 4.0): SBOL3 + PROV-O + sha256 hash chain + license-class enforcement + GPL subprocess isolation.
  • src/zer0pa_synbio/cekm/train.py — CEKM training entrypoint (real corpus path, adversarial-negatives sampler, atomic-save checkpoint, defensive resume that skips zero-byte/truncated meta).
  • FINAL-REPORT-RUNPOD-AUTONOMOUS.md — chain receipts at commit 3b9744e: per-phase START/RETRY/DONE events, all 10 sentinels, HF push verification, L4.5 inference outputs.

Repo Shape

  • src/zer0pa_synbio/ — adapters L1-L7, envelope, falsifiers, CEKM model + train + loaders, KG writer, audit writer, TDA simulator, runpod_inference, CLI
  • audit/ — falsifiers.yaml, source_manifests/, license_grants/, runtime/ (gitignored except runpod state surface)
  • validation/hmo-seed-evidence/ — 2'-FL / 3'-SL / DSLNT triple with acceptance.yaml + dossier.json + envelope_chain.json + RESULT.md per seed
  • kg/ — schema.cypher + nodes.csv + edges.csv (Neo4j-shaped + GraphML/Cypher/RDF/Turtle export)
  • tests/ — 256 passing tests across contract / integration / falsification waves / cutover invariance
  • docs/ — Synbio Audit-Trail Spec v0.1 (CC BY 4.0)
  • scripts/runpod/ — autonomous H100 SXM chain (bootstrap, orchestrator, heartbeat, watchdog, 10 phase scripts) + Mac-side wake-up watcher + corpus stager
  • configs/ — wave4 real-corpus CEKM training + runpod orchestrator phase config
  • fixtures/ — LIRC slice + CEKM mini-fixtures + per-source manifests

Boundary

Research infrastructure for in silico synthetic biology / metabolic pathway engineering. Outputs are research artifacts — predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims. No clinical or human-subject use. No environmental release of GMOs. No biocontainment-level claims (the pipeline does not commission BSL-2/3 work). No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.

Read Order (for next agents)

  1. BOUNDARY.md — the binding boundary block.
  2. PRD.md — the controlling spec (orchestrator's locked v1.0 decisions).
  3. RESISTANCE.md — anti-corruption discipline; binding meta-protocol.
  4. HANDOFF-CPU-CONTINUATION.md — what the CPU-continuation phase did (items A-H).
  5. FINAL-REPORT-RUNPOD-AUTONOMOUS.md — what the autonomous H100 chain produced.
  6. RUNPOD-AUTONOMOUS-RUNBOOK.md — operator runbook for the autonomous chain.
  7. NEXT-WAVE-PLAN.md — open work, ordered by priority.
  8. docs/synbio-audit-trail-v0.1-spec.md — the published Zer0pa standard.
  9. MODUS-OPERANDI.md — the multi-agent role chain.

Cross-workstream principle

This workstream runs in parallel with Zer0pa/Health, Zer0pa/Materials, and Zer0pa/Energy. Each workstream is built end-to-end as an independent pipeline. No substrate is shared at runtime. Fork-and-own is required: copy the pattern, reimplement inside Synthetic Biology. The research agent's three cross-workstream substrate-sharing recommendations (Shared Infrastructure Layer, Cross-Pipeline Gym Flywheel, single SE(3) MACE service) are captured-and-overridden per operator policy.

Provenance

  • Initial commit: 2026-05-01.
  • CPU continuation phase (items A-H): 2026-05-01 — see commits 52b8ad2 through 3d8317f.
  • Autonomous H100 SXM chain bootstrap + 10-phase orchestrator: 2026-05-01 — 29dc4f2.
  • Real MACE-OFF binding ΔG + RFdiffusion2 inference modules: 2026-05-02 — a5fc98e.
  • Pod 1hx4ctwg1mpmxr autonomous run: 2026-05-02.
  • Defensive _latest_checkpoint (skip zero-byte/truncated ckpts on resume): 2026-05-03 — a08ee50.
  • Atomic checkpoint save (tmp+rename, prevents 0-byte meta/truncated .pt at source): 2026-05-03 — 0aeafb3.
  • Pod 1hx4ctwg1mpmxr autonomous run COMPLETE — all 10 phases sentinel-marked: 2026-05-03 — 3b9744e.

About

An in-silico metabolic-pathway-engineering pipeline (L1→L7). Structural-audit conformance for 3/3 HMO targets; scientific HMO validation not claimed at v0.1 (claim-surface repair).

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors