Live window into the Zer0pa lab. Synthetic Biology / Metabolic Pathway Engineering — Pipeline 4 of 6.
In silico metabolic-pathway-engineering pipeline (L1→L7 + L4.5 + L5_OED) producing predicted pathways, KPIs, and SBOL3-attested genetic-modification specs as research artifacts.
The pipeline trains a Zer0pa-owned Conditional Enzyme Kinetics Model (CEKM) on real BRENDA / EnzyExtract / GotEnzymes2 / ProteinGym corpora, runs the four-tool L4 kinetics ensemble (DLKcat / CatPred / TurNuP / CEKM) plus eQuilibrator MDF / COBRApy FBA / FluxGAT essentiality, scaffolds the L4.5 unknown-enzyme path with RFdiffusion2
- MACE-OFF + ESMFold + ProDy + Genie-CAT, and emits SBOL3-attested
genetic-modification specifications via L6 host engineering. Every
adapter emits a
UniversalLayerEnvelopewhose 23-falsifier registry, boundary-block sha256, and license-class enforcement are first-class audit invariants.
The Human Milk Oligosaccharide (HMO) seed triple — 2'-fucosyllactose,
3'-sialyllactose, and disialyllacto-N-tetraose in E. coli iML1515 —
is the validation triple. Pre-registered acceptance thresholds
(validation/hmo-seed-evidence/<seed>/acceptance.yaml) are the
binding numerical gates; structural envelope-chain conformance passes
3/3 today.
| Field | Value |
|---|---|
| Architecture | METABOLIC_PATHWAY_PIPELINE (L1 ZPE → L2 LIRC → L3 retrosynthesis → L3.5 ranking → L4 deep-eval → L4.5 unknown-enzyme → L5 MFMO → L5_OED → L6 host engineering → L6_BUILD cell-free TX-TL → L7 dossier) |
| Substrate | UniversalLayerEnvelope (Pydantic v2 + canonical-JSON sha256), SBOL3-attested L6 spec, PROV-O JSON-LD chain, DuckDB audit + GraphML/Cypher/RDF KG export |
| Execution | Mac CPU + H100 SXM 80 GB (autonomous orchestrator on Runpod, 10-phase chain with resume sentinels) |
| Toolchain | torch 2.2/cu130 + transformers (ESM-2 650M / ESMFold) + RFdiffusion2 (BSD-3) + MACE-OFF (medium) + equilibrator-pathway (MDF LP) + COBRApy + ripser/persim (TDA) + BoTorch (Hamming kernel + qLogNEHVI) + selfies + RDKit |
| Discipline | 23 falsifiers across 3 tiers (Tier-A fast / Tier-B medium / Tier-C heavy) + cross-model disagreement first-class + GPL-subprocess isolation (Salis RBS Calculator v1.0) + RESISTANCE.md anti-corruption protocol |
| Compute Status | v0.1 H100 chain runs end-to-end at the structural-audit layer on Pod 1hx4ctwg1mpmxr 2026-05-03 (CEKM 20,000 fp32 steps; loss 6.93 → ~3.0, best 2.73 at step-19850; HMO triple + L4.5 inference + 19.2 GB CEKM push to HF emitted in same chain). Committed HMO evidence packets at validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md report scientific_valid: False; scientific HMO validation is not claimed at v0.1 and is in claim-surface repair. |
| Metric | Value | Baseline |
|---|---|---|
| CEKM_REAL_CORPUS_LOSS | 6.93 → ~3.0 (steps 0 → 20000; best 2.73 at step-19850) | total: 33,851 in-corpus rows + 5,961 held-out + 101,553 adversarial Tier α/β/γ negatives |
| AUTONOMOUS_CHAIN_PHASES | 10 / 10 complete | preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize (Pod 1hx4ctwg1mpmxr 2026-05-03 → 3b9744e) |
| HMO_TRIPLE_AUDIT_VERIFY (structural) | 3/3 PASS | structural conformance only (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present, license-class enforcement, falsifier registry loaded) per docs/synbio-audit-trail-v0.1-spec.md §10; DSLNT round-0 dossier envelope_count=11 from 2026-05-03 chain. Scientific-validity gates in validation/hmo-seed-evidence/{seed}/RESULT.md report scientific_valid: False (stub mode); not equivalent to scientific HMO validation. |
| CPU_PIPELINE_TESTS | 256 passing, 59 GPU-skipped | 0 regressions across CPU continuation A-H |
Source: PRD.md, FINAL-REPORT.md, FINAL-REPORT-RUNPOD.md, FINAL-REPORT-RUNPOD-AUTONOMOUS.md, validation/hmo-seed-evidence/, audit/runtime/runpod/.
| Field | Value |
|---|---|
| Identifier | Synthetic-Biology |
| Repository | https://github.com/Zer0pa/Synthetic-Biology |
| Portfolio | Bio-Engineering |
| Visibility | PUBLIC |
| Default Branch | main |
| Authority Source | PRD.md (locked v1.0 decisions) |
| License | repository license file |
| Field | Value |
|---|---|
| Evidence posture | v0.1 first full-budget H100 chain runs end-to-end at the structural-audit layer; scientific HMO validation is not claimed (committed evidence packets report scientific_valid: False); not a productized service |
| Checks | 256 passing tests + 23 falsifiers + 3/3 HMO seed audit-verify PASS |
| Custody boundary | 3 CEKM ckpts (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL) on HF Architect-Prime/synbio-cekm-v0.1; envelope chains + dossiers + L4.5 ESMFold PDBs + MACE-OFF binding ΔG JSONs in git |
| Confidence | scoped by Tier-A/B/C falsifier hierarchy; PathGym DBTL-holdout calibration deferred; CEKM calibration gate non-blocking by design (no BRENDA holdout in v0.1 corpus) |
| Authority | PRD.md (locked decisions); FINAL-REPORT-RUNPOD-AUTONOMOUS.md (chain receipts at 3b9744e); HANDOFF-CPU-CONTINUATION.md (CPU phase A-H record) |
The HMO triple passes structural audit conformance only; the committed HMO evidence packets at validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md report scientific_valid: False and stub mode for downstream scientific predictions. Scientific HMO validation is not claimed at v0.1 and remains in claim-surface repair; the HMO_TRIPLE_AUDIT_VERIFY 3/3 PASS claim attests structural conformance only, not scientific validity. CEKM v0.1 reached its 20,000-step target with checkpoints at step 1500 / 18000 / 19000 pushed to HF; this is a v0.1 research checkpoint, not a calibrated affinity predictor. Wet-lab Phase 2 dispatch is triple-gated and never on the cutover path. PathGym DBTL holdout calibration of TDA warning_score thresholds and L5 surrogate calibration scores is deferred to held-out post-experiment data. Real RFdiffusion2 motif-conditional designs require curated TS-mimetic geometry, downstream of v0.1; the v0.1 RFD2 wrapper additionally errored on run_inference.py not found (upstream layout drift across the 3 candidate paths the wrapper probes — non-blocking since ESMFold + MACE-OFF outputs landed for all 3 HMO seeds). BRENDA bulk download requires registration; v0.1 trains on EnzyExtract dark-matter + GotEnzymes2 + ProteinGym subsets, not full BRENDA core. CEKM Phase 40 calibration gate is non-blocking by design (sentinel-touched after eval ran cleanly against step-19000 ckpt; tier α/β/γ AUCs return None because no BRENDA holdout exists in this corpus).
- Real CEKM training on real corpus runs the full v0.1 budget end-to-end on H100 SXM (EnzyExtract 60K + GotEnzymes2 17K → 33K in-corpus + 6K held-out + 100K adversarial Tier α/β/γ negatives; loss curve 6.93 → ~3.0 over 20,000 fp32 steps, best 2.73 at step-19850; sustained 1.39 steps/s post-recovery; atomic-save + defensive
_latest_checkpointpatches survived ~6 mfs-quota-induced partial-write events without losing checkpoint integrity). - Autonomous H100 chain runs all 10 phases (preflight → install → stage → CEKM train → eval → HF push → L4.5 inference → HMO triple → audit verify → finalize) end-to-end on Pod 1hx4ctwg1mpmxr 2026-05-03; phases 50–90 took 6m 32s wallclock after Phase 30's 3h training; emits real ESMFold PDBs for 7 enzymes across 3 HMO seeds + MACE-OFF binding ΔG JSONs + DSLNT round-0 dossier (envelope_count=11) + 19.2 GB CEKM checkpoint push to Hugging Face in 48s.
- HMO scientific-validation triple emits structurally complete L1→L7 envelope chains for 2'-fucosyllactose / 3'-sialyllactose / disialyllacto-N-tetraose;
synbio audit verifypasses 3/3 under the conformance verifier (envelope-schema valid, boundary-sha256 canonical, SBOL3 attestation present on every L6 envelope, Class C/D/E license-grants enforced, cross-model disagreement records emitted, falsifier registry loaded). - L4B real eQuilibrator MDF on HMO precursor pathway: 2'-FL MDF=+6.78, 3'-SL +11.84, DSLNT +11.41 kJ/mol via
equilibrator_pathway.ThermodynamicModel.mdf_analysis()with per-compound optimal concentrations in the 1 μM – 10 mM physiological window. - L5 real BoTorch surrogate: GP per objective with custom Hamming-distance kernel +
qLogNoisyExpectedHypervolumeImprovement+ ASR-thermostable warm-starts (split-venv subprocess pattern; weights stay float32, autocast handles per-op casting; plug-replaceability invariant preserved across real-vs-stub paths). - TDA real fermentation simulator: 5-state Monod ODE via
scipy.integrate.solve_ivp(LSODA)covering all five PRD §5.3 failure modes (oxygen-transfer collapse / byproduct buildup / growth stall / toxicity threshold / nutrient depletion) with multi-channel ripser bottleneck + late-vs-early rate-of-change hybrid early-warning. - Synbio Audit-Trail Specification v0.1 (CC BY 4.0, Zer0pa-published): SBOL3 + PROV-O extension + canonical-JSON sha256 hash chain + Class A/B/C/D/E license-class enforcement + GPL-subprocess-isolation pattern (Salis RBS Calculator v1.0 binary wrapper, no Python
importof GPL modules).
- This is not a clinical or human-subject pipeline. No diagnostic, therapeutic, or device claims.
- This is not a deployed industrial production system. No commercial titer guarantees.
- The CEKM v0.1 checkpoint is not a calibrated affinity predictor; it is a v0.1 research checkpoint trained for the full 20,000-step budget with bounded loss-decline evidence on a held-out partition. Tier α/β/γ AUCs are None because v0.1 has no BRENDA holdout.
- HMO predictions are advisory research artifacts, not regulatory submissions or product specifications. Wet-lab validation is operator-gated and never on the cutover path.
- The L4.5 unknown-enzyme path emits Tier-1 / Tier-2 / Tier-3 advisories per PRD §6.6; these are research suggestions, not enzyme designs warranting downstream synthesis without independent verification.
- No environmental release of GMOs. No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.
| Surface | Status | Evidence |
|---|---|---|
| Test suite | 256 passing, 59 GPU-skipped | pytest tests/ clean on Python 3.13 / macOS x86_64; CPU continuation A-H 0 regressions |
| Falsifier registry | 23 falsifiers across Tiers A/B/C, registry loads at module import | audit/falsifiers.yaml + src/zer0pa_synbio/falsifiers/checks.py (one CPU implementation per registry entry; deliberate-trigger test per falsifier) |
| HMO triple conformance | 3/3 PASS under synbio audit verify |
validation/hmo-seed-evidence/{2pFL,3pSL,DSLNT}/RESULT.md + envelope chains 21/24/24 envelopes per seed |
| CEKM checkpoint custody | 3 ckpts on HF (step 1500 / 18000 / 19000, 19.2 GB total + audit JSONL + meta sha256-recorded) | https://huggingface.co/Architect-Prime/synbio-cekm-v0.1 (push 2026-05-03T03:46Z, 48s upload @ 3.43 GB/s) |
| Cutover invariance | 38 plug-replaceability / cutover-invariance tests | httpx.MockTransport golden-fixture suite forked from sibling-workstream Energy Wave 4 |
| Boundary discipline | Boundary block sha256-checked on every envelope; falsifier f000_boundary_violation enforces |
src/zer0pa_synbio/boundary.py + BOUNDARY.md |
- PRD.md — locked v1.0 spec; controlling decisions, layer contracts, falsifier registry, license discipline.
- audit/falsifiers.yaml — 23-falsifier registry with
id,tier,severity,gate_action. - validation/hmo-seed-evidence/ — pre-registered acceptance thresholds + envelope chains + dossiers + audit-verify reports for the 2'-FL / 3'-SL / DSLNT validation triple.
- docs/synbio-audit-trail-v0.1-spec.md — Zer0pa-published Synbio Audit-Trail Spec v0.1 (CC BY 4.0): SBOL3 + PROV-O + sha256 hash chain + license-class enforcement + GPL subprocess isolation.
- src/zer0pa_synbio/cekm/train.py — CEKM training entrypoint (real corpus path, adversarial-negatives sampler, atomic-save checkpoint, defensive resume that skips zero-byte/truncated meta).
- FINAL-REPORT-RUNPOD-AUTONOMOUS.md — chain receipts at commit
3b9744e: per-phase START/RETRY/DONE events, all 10 sentinels, HF push verification, L4.5 inference outputs.
src/zer0pa_synbio/— adapters L1-L7, envelope, falsifiers, CEKM model + train + loaders, KG writer, audit writer, TDA simulator, runpod_inference, CLIaudit/— falsifiers.yaml, source_manifests/, license_grants/, runtime/ (gitignored except runpod state surface)validation/hmo-seed-evidence/— 2'-FL / 3'-SL / DSLNT triple with acceptance.yaml + dossier.json + envelope_chain.json + RESULT.md per seedkg/— schema.cypher + nodes.csv + edges.csv (Neo4j-shaped + GraphML/Cypher/RDF/Turtle export)tests/— 256 passing tests across contract / integration / falsification waves / cutover invariancedocs/— Synbio Audit-Trail Spec v0.1 (CC BY 4.0)scripts/runpod/— autonomous H100 SXM chain (bootstrap, orchestrator, heartbeat, watchdog, 10 phase scripts) + Mac-side wake-up watcher + corpus stagerconfigs/— wave4 real-corpus CEKM training + runpod orchestrator phase configfixtures/— LIRC slice + CEKM mini-fixtures + per-source manifests
Research infrastructure for in silico synthetic biology / metabolic pathway engineering. Outputs are research artifacts — predicted pathways, predicted KPIs, candidate genetic modification specifications. No regulatory certification claims. No clinical or human-subject use. No environmental release of GMOs. No biocontainment-level claims (the pipeline does not commission BSL-2/3 work). No human gene drive or eugenic application. Defence / weapons / dual-use bio applications excluded under operator policy.
- BOUNDARY.md — the binding boundary block.
- PRD.md — the controlling spec (orchestrator's locked v1.0 decisions).
- RESISTANCE.md — anti-corruption discipline; binding meta-protocol.
- HANDOFF-CPU-CONTINUATION.md — what the CPU-continuation phase did (items A-H).
- FINAL-REPORT-RUNPOD-AUTONOMOUS.md — what the autonomous H100 chain produced.
- RUNPOD-AUTONOMOUS-RUNBOOK.md — operator runbook for the autonomous chain.
- NEXT-WAVE-PLAN.md — open work, ordered by priority.
- docs/synbio-audit-trail-v0.1-spec.md — the published Zer0pa standard.
- MODUS-OPERANDI.md — the multi-agent role chain.
This workstream runs in parallel with Zer0pa/Health, Zer0pa/Materials, and Zer0pa/Energy. Each workstream is built end-to-end as an independent pipeline. No substrate is shared at runtime. Fork-and-own is required: copy the pattern, reimplement inside Synthetic Biology. The research agent's three cross-workstream substrate-sharing recommendations (Shared Infrastructure Layer, Cross-Pipeline Gym Flywheel, single SE(3) MACE service) are captured-and-overridden per operator policy.
- Initial commit: 2026-05-01.
- CPU continuation phase (items A-H): 2026-05-01 — see commits
52b8ad2through3d8317f. - Autonomous H100 SXM chain bootstrap + 10-phase orchestrator: 2026-05-01 —
29dc4f2. - Real MACE-OFF binding ΔG + RFdiffusion2 inference modules: 2026-05-02 —
a5fc98e. - Pod 1hx4ctwg1mpmxr autonomous run: 2026-05-02.
- Defensive
_latest_checkpoint(skip zero-byte/truncated ckpts on resume): 2026-05-03 —a08ee50. - Atomic checkpoint save (tmp+rename, prevents 0-byte meta/truncated .pt at source): 2026-05-03 —
0aeafb3. - Pod 1hx4ctwg1mpmxr autonomous run COMPLETE — all 10 phases sentinel-marked: 2026-05-03 —
3b9744e.