Declarative, local-first runtime for bounded AI agents. Define an agent in one manifest, run it headless, and audit every effect.
Jeju is to an AI agent what a Kubernetes manifest is to a deployment: you
describe the agent declaratively — model provider, instructions, runtime loop,
workspace, tools, skills, permissions, context budget, optional output schema,
and evaluators — and Jeju validates, compiles, and runs that manifest against a
local workspace, recording every meaningful effect in trajectory.jsonl. It can
then use evaluation evidence to improve the agent with jeju evolve.
Jeju is not an interactive coding assistant and not a Python agent framework. It is the layer below those: a way to capture a workflow as a versioned, permissioned, inspectable agent you can package and re-run headlessly.
| Interactive assistants (Claude Code, Cursor) |
Agent frameworks (LangGraph, CrewAI) |
Jeju | |
|---|---|---|---|
| Interface | interactive chat | Python code | declarative manifest |
| Runs | live session | embedded in your app | headless, recorded |
| Boundaries | session-level | hand-coded | workspace / tools / permissions / sandbox enforced |
| Run evidence | transcript | your own logging | canonical trajectory.jsonl |
| Improvement | manual | manual | jeju evolve with evaluation evidence |
| Distribution | — | pip package | content-addressed agent bundle |
Jeju is experimental. It is strongest when a local workflow can be packaged as a focused agent with clear tools, permissions, run evidence, and optional evaluation.
The showcase is a bug rescue workflow. A tiny Python ledger project has failing
rounding tests, and Jeju packages the repair as a bounded agent: DeepSeek V4
Flash reads the fixture, runs the test harness, edits the implementation, reruns
tests, writes REPAIR.md, and records the full trajectory. It exercises the
whole positioning end to end — a declarative manifest, write and shell access
fenced inside a sandboxed workspace, and a recorded trajectory you can audit
afterward — and needs no external search services.
Requirements: macOS or Linux, a DeepSeek API key, Python 3 for the fixture tests, and Go 1.25 or newer only for source installs.
# Install the latest released CLI on macOS or Linux.
curl -fsSL https://raw.githubusercontent.com/cosmtrek/jeju/master/scripts/install.sh | sh
jeju version
# Run the DeepSeek V4 Flash showcase.
export DEEPSEEK_API_KEY=sk-...
git clone https://github.com/cosmtrek/jeju.git
cd jeju
./scripts/run-bug-rescue-agent.sh
# Inspect the recorded run printed by the script.
jeju inspect --runs-dir .jeju-dev/runs/bug-rescue <run_id>
jeju view --runs-dir .jeju-dev/runs/bug-rescue <run_id>The agent turns the failing ledger tests green by fixing the cent-rounding bug
in a copied workspace, writes REPAIR.md, and saves the run:
.jeju-dev/runs/bug-rescue/<run_id>/
├── trajectory.jsonl # canonical append-only run record
└── report.html # derived inspection view
trajectory.jsonl is the source of truth; jeju view opens the derived report
and refreshes it when the trajectory is newer. For a no-credential mock
lifecycle check, use jeju init <name> or make test-agent. Windows is not
guaranteed yet; install from source with
go install github.com/cosmtrek/jeju/cmd/jeju@latest.
The fastest path is to install the jeju-agent-builder skill in Codex, Claude
Code, or another agent environment, then let that agent author, run, and inspect
Jeju agents for you:
npx skills add cosmtrek/jeju --skill jeju-agent-builderUse jeju-agent-builder to create and smoke-test a minimal Jeju agent for <workflow>.
You can also build manually: jeju init <name> --dir ~/jeju-agents/<name>, edit
the manifest and prompt, run jeju validate, run a smoke task, and inspect the
trajectory with jeju inspect or jeju view. See
Manual For Agents for the authoring guide.
- Declarative behavior: the whole agent contract — including optional final output schema — lives in one manifest, with prompts and runtime skills as adjacent files. No behavior is hidden in code.
- Enforced boundaries: every run is constrained by explicit workspace, tool, skill, permission, sandbox, timeout, and context-window limits, and every tool call passes through a policy gate before it executes.
- Audited by construction: one canonical append-only
trajectory.jsonlrecords lifecycle, model, context, tool, permission, artifact, evaluation, and run-summary events, with a derivedreport.htmlview for review. - Evidence-driven improvement: run task sets, score outcomes, and let
jeju evolvesearch bounded config-space patches — a GEPA-style search that lifted held-out HotpotQA by +3.6pp answer F1 over the unevolved baseline. - Portable bundles: package a focused workflow as a content-addressed agent that developers or higher-level AI agents can add and run by a stable reference.
Reach for a plain script when deterministic automation is enough. Reach for Jeju when the workflow needs model reasoning plus explicit tools, permissions, run evidence, or evaluation — agent experiments, evaluation harnesses, reusable specialist agents (review, triage, docs, benchmarks), or capturing a high-frequency local task as a bounded agent another agent can invoke.
Jeju treats an agent as a small, explicit harness unit instead of an opaque
application. The runtime never reads YAML directly; configuration is loaded,
validated, and compiled into a CompiledAgent before execution:
Manifest -> Validate -> Compile -> Run -> Gate -> Trace -> Evaluate -> Inspect
An agent bundle keeps that contract and its adjacent files together. A minimal bundle is just enough structure to validate and run:
<name>/
├── agents/
│ └── <name>.agent.yaml # manifest — source of truth
├── prompts/
│ └── <name>.md # system instructions
├── workspace/
│ └── <name>/.gitkeep # local working directory
├── skills/ # optional runtime skills
│ └── <skill>/SKILL.md
├── eval/ # optional evaluators
│ └── <evaluator>.py
└── README.md
At a high level, a read-only specialist manifest looks like this:
apiVersion: jeju/v1alpha1
kind: Agent
metadata:
name: repo-inspector
description: "Inspect a local repository and produce a structured summary"
models:
providers:
primary:
type: openaiCompatible
preset: deepseek
model: deepseek-v4-flash
envKey: DEEPSEEK_API_KEY
instructions:
system: ../prompts/repo-inspector.md
runtime:
model: primary
loop:
type: react
limits:
maxSteps: 12
maxDurationSec: 300
workspace:
path: ../workspace/repo-inspector
tools:
- read
- search
permissions:
access: readOnly
approval: never
output:
name: repo_summary
schema:
type: object
required: [summary, findings]
additionalProperties: false
properties:
summary: { type: string }
findings:
type: array
items: { type: string }
evaluate:
enabled: true
evaluators:
- name: basic
uses: rules
rules: [finalAnswerExists, runCompleted]See Agent Manifest for the full field reference, defaults, supported values, and validation rules.
Agent Package is the distribution layer for one reusable kind: Agent. A small
jeju.package.yaml manifest wraps an existing bundle so the agent can be packed,
added to a local content-addressed store, and run by a stable reference:
jeju package pack ./agents/code-review --out dist/
jeju package add dist/coding-code-review-0.1.0.jpkg
jeju run package://coding/code-review@0.1.0 "Review current diff."
jeju run p:coding/code-review "Review current diff."The package layer only adds distribution metadata, source provenance,
digest-based storage, and stable refs; runs still take the normal
LoadFile -> Validate -> Compile -> Run path. Sources can be local artifacts,
GitHub or generic Git subdirectories, or jeju: registry refs. See
Agent Package for the manifest fields, source syntax,
and full command set.
Package-backed runs default to ~/.jeju/runs unless --runs-dir or
JEJU_RUNS_DIR is set, so invoking p:... from an arbitrary directory does not
create a local ./runs folder.
For work that needs several bounded perspectives but should stay inspectable,
kind: AgentTeam runs a lead-worker collaboration from a single goal. One lead
agent plans tasks across rounds; workers run as ordinary, isolated kind: Agent
runs; the controller records task state, child run references, and a final
synthesis. Workers never chat peer-to-peer — this is a bounded outer controller,
not a multi-agent platform.
jeju team run examples/code-review-team/teams/code-review.team.yaml "Review the current diff."See Agent Team for the manifest shape and topology.
jeju evolve improves a declarative agent without mutating the source in place.
An experiment points at a target agent, datasets, an objective metric, edit
boundaries, an evolver agent, and search limits; the loop is offline and
auditable, ending in a best-candidate bundle and report.
Two search strategies are built in: pareto (default), a GEPA-style search over
an instance-wise Pareto frontier with a cheap mini-batch cascade gate, and
hillclimb, greedy single-lineage search. Both are validated on a public
benchmark: on HotpotQA (distractor) with DeepSeek V4 Flash, evolving only the
solver prompt improved the held-out test split by +3.6pp answer F1 and +7pp
exact match over the unevolved baseline, while hillclimb under the same
budget managed only +1.7pp F1 / +4pp EM before stalling in a local optimum.
jeju evolve --dry-run experiments/evolve.yaml # validate the experiment
jeju evolve --baseline-only experiments/evolve.yaml # score the baseline
jeju evolve experiments/evolve.yaml # run the search
jeju evolve --test experiments/evolve.yaml # evaluate on the test splitSee Evolution Manifest, Self Evolution, and the HotpotQA evolve benchmark for the schema, design notes, and full study.
Runnable example agent bundles live under examples — not test fixtures, but recommended scenarios showing declarative behavior, explicit boundaries, run evidence, and evaluation or evolution where useful:
- Bug rescue agent
- Code review agent
- Code review team
- Commit plan agent
- HotpotQA evolve benchmark
- Privacy delegation agent
- SkillsBench Lite agent
Reference docs:
- Agent Manifest · Trajectory Format · Trajectory Visualization
- Agent Package · Agent Team
- Evolution Manifest · Self Evolution
- Manual For Agents · DeepSeek Setup
go test ./... # run the test suite
go vet ./... # static checks for runtime/compiler/tooling changes
make test-agent # mock fixture agent, end to end, no credentialsProvider-backed smoke runs are opt-in and may call real model APIs:
export DEEPSEEK_API_KEY=sk-...
make test-agent PROVIDER=deepseek
export MIMO_API_KEY=sk-...
make test-agent PROVIDER=mimoFor source-checkout development, keep generated runs out of repo-root runs/ by
pointing at .jeju-dev/, for example
jeju run --runs-dir .jeju-dev/runs/<scenario> <agent.yaml> "<task>". Inside a
generated user agent project, the default ./runs store remains the normal
local run history for manifest-based runs.
Jeju is released under the MIT License. See LICENSE for details.