Agent EngineerAgent 工程师

I build the trust layer for AI agents. 我为 AI agent 构建信任层

As agents move from demos to unsupervised work, the bottleneck shifts from can it do the task to can you trust it to. I build the reliability, verification, and provenance infrastructure that makes autonomous agents safe to ship. 当 agent 从 demo 走向无人值守的真实工作,瓶颈就从「它能不能做」变成「你敢不敢信它做」。我构建让自主 agent 能放心上线的可靠性、验证与溯源基础设施。

No live demo here — these are recorded runs, replayed. Logs are claims; replays are proofs.这里没有 live demo——只有录制的运行与回放。Logs are claims; replays are proofs.

The through-line主线

Not three projects. One stack.不是三个项目,是一个栈。

Frontier models now finish multi-day unattended runs — capability is no longer the bottleneck, accountability is. As prompt engineering gave way to loop engineering, the scarce skill became designing what surrounds the model: the gates that define "done", the records that survive the night, the replays that catch drift. My three tools are that loop's trust layer, and they interlock. 前沿模型已经能跑完数天的无人值守任务——瓶颈不再是能力,而是可问责性。当 prompt engineering 让位于 loop engineering,稀缺的技能变成了设计模型周围的那一圈:判定「done」的门禁、撑过一整夜的记录、抓住漂移的重放。我的三个工具就是这个循环的信任层,而且互相咬合。

Two contracts make them one system: Alfred (≥0.7) emits Claude Code-compatible hooks, so NightWatch records either harness with one command — and all three gates emit the same Agent Trust Report v0, so CI reads one verdict format. The flagship composition is the dual witness: one overnight run, two independent ledgers — the agent's own signed receipt, and a black box it can't edit. To lie, it would have to forge both. 两个契约把它们焊成一个系统:Alfred(≥0.7)发出 Claude Code 兼容的 hooks,NightWatch 一条命令即可录制任一 harness;三道门禁输出同一种 Agent Trust Report v0,CI 只读一种裁决格式。旗舰玩法是双见证:一次过夜运行,两本独立账本——agent 自己的签名收据,和它无法篡改的黑匣子。想撒谎,得同时伪造两本。

01  Reliability可靠性 02  Verification验证 03  Provenance溯源 +  One report format同一种报告格式
Selected work精选作品

Projects项目

trace-vault

LIVE已上线
Snapshot testing & replay for AI agents — CI for non-deterministic systems.AI agent 的快照测试与重放——给非确定性系统做 CI。
Records real agent runs and replays them to catch silent regressions when a prompt or model changes. Built on one core insight: determinism ≠ faithfulness — a reproducible run isn't a correct one.录制真实 agent 运行并重放,在改了 prompt 或模型后抓出悄悄发生的回归。核心洞见:determinism ≠ faithfulness——可复现 ≠ 正确。
↳ Field-tested on mastra#17737 — CI diagnosis adopted upstream, credited in the commit.↳ 同域实战:为 mastra#17737 根因诊断红 CI,修复被上游采纳并在 commit 中致谢
record / replay录制 / 重放regression回归测试eval评测

Alfred

LIVE · npm已上线 · npm
A verifiable autonomous coding agent in your terminal.终端里的可验证自主编码 agent。
A TypeScript / Bun CLI agent where the harness itself is executable: "done" is a machine-enforced verify gate, memory is agent-curated but inspectable, and every hands-off run leaves a signed, replayable ledger. 900+ tests, strict tsc clean. 30-second demo: bunx alfred-agent demoTypeScript / Bun 的 CLI 编码 agent,harness 本身可执行:「done」由机器门禁判定,记忆由 agent 维护且可审查,每次脱手运行都留下签名、可重放的 ledger。900+ 个测试,严格 tsc 零报错。30 秒上手:bunx alfred-agent demo
↳ Emits Claude Code-compatible hooks (≥0.7) — recordable by NightWatch with one command.↳ 发出 Claude Code 兼容 hooks(≥0.7)——一条命令即可被 NightWatch 录制
TypeScriptBundone-gatesdone 门禁signed ledger签名账本

nightwatch

LIVE · npm已上线 · npm
The black box recorder for overnight AI agents.过夜 AI agent 的黑匣子记录仪。
Built for the Fable-5 era of multi-day autonomous runs: every session event lands in a hash-chained, append-only ledger with worktree checkpoints, and the morning debrief independently verifies the agent's claims instead of trusting its summary. 30-second demo: npm i -g nightwatch-agent为 Fable 5 时代的多天自主运行而建:会话的每个事件都写入哈希链式、只可追加的 ledger 并对工作区做检查点;晨报独立验证 agent 的声明,而不是转述它的总结。30 秒 demo:npm i -g nightwatch-agent
↳ Records two harnesses with the same five hooks: Claude Code and Alfred (--agent alfred) — six recorder bugs found by dogfooding, all fixed with regression tests.↳ 同一套 hooks 录两种 harness:Claude Code 与 Alfred(--agent alfred)——dogfooding 抓出 6 个记录器 bug,全部修复并有回归测试。
TypeScripthash-chain ledger哈希链账本checkpoints检查点morning debrief晨报

Ecosystem hub: agent-trust-layer — the Trust Report v0 spec + a real dual-witness run, raw ledgers committed. More: provenant — glass-box bill auditing, HMAC-signed proof receipts · simp-skill ⭐240+ · RAG-learning生态中心:agent-trust-layer——Trust Report v0 规范 + 一次真实双见证运行(原始账本已提交)。更多:provenant——玻璃盒账单审计,HMAC 签名证明收据 · simp-skill ⭐240+ · RAG-learning

About关于

Hi, I'm Beamus.你好,我是 Beamus。

I'm an agent engineer focused on what happens after the demo works: making AI agents reliable, verifiable, and trustworthy enough to run without a human watching. I like the unglamorous infrastructure — replay harnesses, proof receipts, eval gates — that turns a clever prototype into something you can actually depend on. 我是一名 agent 工程师,专注于「demo 跑通之后」的事:让 AI agent 足够可靠、可验证、可信,能在没人盯着的情况下运行。我喜欢那些不性感的基础设施——重放框架、证明收据、评测门禁——正是它们把一个聪明的原型,变成真正能依赖的东西。

My bet for the loop-engineering era: when models write the code, the engineering that still matters is the loop around them — and every layer of that loop will need to prove what it did. I build with the same discipline I sell: my tools record their own dogfooding runs, and the bugs those runs catch ship in the changelogs. 我对 loop engineering 时代的判断:当模型负责写代码,真正还需要工程师的是模型周围的那个循环——而循环的每一层都得能证明自己做了什么。我用自己卖的纪律来构建:这些工具录制自己的 dogfooding 运行,运行抓出的 bug 全部写进 changelog。

Open to Agent Engineer roles正在寻找 Agent 工程师机会