As agents move from demos to unsupervised work, the bottleneck shifts from can it do the task to can you trust it to. I build the reliability, verification, and provenance infrastructure that makes autonomous agents safe to ship. 当 agent 从 demo 走向无人值守的真实工作,瓶颈就从「它能不能做」变成「你敢不敢信它做」。我构建让自主 agent 能放心上线的可靠性、验证与溯源基础设施。
No live demo here — these are recorded runs, replayed. Logs are claims; replays are proofs.这里没有 live demo——只有录制的运行与回放。Logs are claims; replays are proofs.
Frontier models now finish multi-day unattended runs — capability is no longer the bottleneck, accountability is. As prompt engineering gave way to loop engineering, the scarce skill became designing what surrounds the model: the gates that define "done", the records that survive the night, the replays that catch drift. My three tools are that loop's trust layer, and they interlock. 前沿模型已经能跑完数天的无人值守任务——瓶颈不再是能力,而是可问责性。当 prompt engineering 让位于 loop engineering,稀缺的技能变成了设计模型周围的那一圈:判定「done」的门禁、撑过一整夜的记录、抓住漂移的重放。我的三个工具就是这个循环的信任层,而且互相咬合。
Two contracts make them one system: Alfred (≥0.7) emits Claude Code-compatible hooks, so NightWatch records either harness with one command — and all three gates emit the same Agent Trust Report v0, so CI reads one verdict format. The flagship composition is the dual witness: one overnight run, two independent ledgers — the agent's own signed receipt, and a black box it can't edit. To lie, it would have to forge both. 两个契约把它们焊成一个系统:Alfred(≥0.7)发出 Claude Code 兼容的 hooks,NightWatch 一条命令即可录制任一 harness;三道门禁输出同一种 Agent Trust Report v0,CI 只读一种裁决格式。旗舰玩法是双见证:一次过夜运行,两本独立账本——agent 自己的签名收据,和它无法篡改的黑匣子。想撒谎,得同时伪造两本。
--agent alfred) — six recorder bugs found by dogfooding, all fixed with regression tests.↳ 同一套 hooks 录两种 harness:Claude Code 与 Alfred(--agent alfred)——dogfooding 抓出 6 个记录器 bug,全部修复并有回归测试。Ecosystem hub: agent-trust-layer — the Trust Report v0 spec + a real dual-witness run, raw ledgers committed. More: provenant — glass-box bill auditing, HMAC-signed proof receipts · simp-skill ⭐240+ · RAG-learning生态中心:agent-trust-layer——Trust Report v0 规范 + 一次真实双见证运行(原始账本已提交)。更多:provenant——玻璃盒账单审计,HMAC 签名证明收据 · simp-skill ⭐240+ · RAG-learning
I'm an agent engineer focused on what happens after the demo works: making AI agents reliable, verifiable, and trustworthy enough to run without a human watching. I like the unglamorous infrastructure — replay harnesses, proof receipts, eval gates — that turns a clever prototype into something you can actually depend on. 我是一名 agent 工程师,专注于「demo 跑通之后」的事:让 AI agent 足够可靠、可验证、可信,能在没人盯着的情况下运行。我喜欢那些不性感的基础设施——重放框架、证明收据、评测门禁——正是它们把一个聪明的原型,变成真正能依赖的东西。
My bet for the loop-engineering era: when models write the code, the engineering that still matters is the loop around them — and every layer of that loop will need to prove what it did. I build with the same discipline I sell: my tools record their own dogfooding runs, and the bugs those runs catch ship in the changelogs. 我对 loop engineering 时代的判断:当模型负责写代码,真正还需要工程师的是模型周围的那个循环——而循环的每一层都得能证明自己做了什么。我用自己卖的纪律来构建:这些工具录制自己的 dogfooding 运行,运行抓出的 bug 全部写进 changelog。