DongXi Agent is a local coding-agent harness for learning how tools, context, permissions, file state, and turn lifecycle state fit together. It runs against Ollama, OpenAI, or Anthropic models, stores session state inside the target workspace, and grows one tested harness boundary at a time.
This is not a production agent. It is intentionally small enough to inspect, modify, and compare with larger systems such as Codex and Claude Code.
DongXi Agent now has a complete learning-sized runtime loop:
raw input
-> structured app message and attachments
-> submit hooks and shouldQuery decision
-> typed prompt payload with transcript, hidden context, file-state attachments, and loop metadata
-> configured model-provider call
-> parsed final/retry/tool output
-> validated tool execution through policy, sandbox, approval, and hooks
-> normalized ToolResult
-> Phase 5 turn commit and persisted session history
The current implementation includes:
- Ollama, OpenAI, and Anthropic-backed interactive and one-shot CLI.
- Codex-style workspace selection with
-C/--cd. - Structured app input with text, file/image attachments, inferred file references, and
user_prompt_submithooks. - Typed prompt payloads with visible transcript sections and hidden context sections.
- A documented
DongXiAgentmixin chain and top-level session schema validation. - Turn lifecycle metadata for Phase 2 prompt snapshots, Phase 4 model outputs, Phase 5 tool execution summaries, and final state commits.
- Structured tool registry metadata for source, risk, state mutation, file reads, approval, sandboxing, and parallel-safety hints.
- Workspace-safe read, search, file-info, symbol, diff, edit, and shell tools.
- File-state tracking for read baselines, stale reads, external disk changes, and changed-file attachment snippets.
- Human-in-the-loop permission requests, permission profiles, shell policy, and a small workspace shell sandbox guard.
- Freshness-checked file edits with bounded diff previews and optional post-edit validation.
- Local slash commands for audit, doctor, context, usage, traces, tasks, turns, artifacts, subagents, model selection, and remote-control status.
- Learning-sized skills, hooks, MCP metadata, plugin metadata, goals, notifications, task state, artifacts, and read-only delegation.
- Loopback or LAN remote-control mode with bearer-token auth and pairing-code share mode.
- Unit tests plus optional
dongxi-qwen3-4b-q4kmworkflow integration tests.
The interactive welcome screen shows the active approval mode and shell sandbox mode so users see the safety posture before the first turn.
Install dependencies for development:
uv sync --group devInstall the CLI globally from this checkout:
uv tool install .Run DongXi in the current repository:
dongxiRun a one-shot prompt in another workspace:
dongxi -C /path/to/repo "Inspect this repo and summarize the harness."Run without global installation while developing this repo:
uv run dongxi-agent "Inspect this repo and summarize the harness."DongXi uses --provider ollama by default and expects Ollama at http://127.0.0.1:11434. If --model is omitted for Ollama, it asks Ollama for local models and chooses a preferred available model. Resumed sessions reuse their saved provider and model unless --provider or --model is supplied.
OpenAI and Anthropic providers use the standard library HTTP client; no SDK dependency is required.
export OPENAI_API_KEY=...
uv run dongxi-agent --provider openai --model gpt-5.4-mini "Reply with <final>hello</final>."
export ANTHROPIC_API_KEY=...
uv run dongxi-agent --provider anthropic --model claude-sonnet-4-6 "Reply with <final>hello</final>."Provider defaults:
--provider openai:OPENAI_MODELorgpt-5.4-mini, usingOPENAI_BASE_URLorhttps://api.openai.com/v1.--provider anthropic:ANTHROPIC_MODELorclaude-sonnet-4-6, usingANTHROPIC_BASE_URLorhttps://api.anthropic.com/v1.
The /model <name> command can switch OpenAI and Anthropic sessions to any model name available to the configured API key. DongXi does not list remote provider catalogs locally.
The learning workflow uses an Ollama model named dongxi-qwen3-4b-q4km.
Create it from the official Qwen3 4B GGUF helper:
scripts/pull_qwen3_4b_gguf.shThen run:
uv run dongxi-agent --model dongxi-qwen3-4b-q4km "Reply with <final>hello</final>."The helper downloads Qwen/Qwen3-4B-GGUF Q4_K_M into the ignored local models/ directory and creates the Ollama model.
Useful model commands inside DongXi:
/model
/model dongxi-qwen3-4b-q4km
dongxi [prompt...]
-C, --cd, --cwd <path> workspace directory
--attach <path> attach a file or image; repeatable
--provider <name> ollama, openai, or anthropic
--model <name> model name
--host <url> Ollama host
--openai-base-url <url> OpenAI API base URL
--anthropic-base-url <url> Anthropic API base URL
--api-timeout <seconds> OpenAI/Anthropic request timeout
--resume <session|latest> resume a saved session
--approval ask|auto|never risky-tool approval policy
--sandbox workspace-write|off shell sandbox guard
--max-steps <n> max model/tool iterations per request
--max-new-tokens <n> max model output tokens per stepExample with attachments:
uv run dongxi-agent \
--attach screenshot.png \
--attach dongxi_agent/prompting.py \
"The bug is shown in the image; update the prompt file."Image attachments are sent as image payloads to the configured provider, so use a vision-capable model when visual content matters. OpenAI and Anthropic API image payloads currently assume PNG data URLs/base64 blocks because the model-client boundary receives only base64 image bytes. File attachments and fuzzy filename references are represented as explicit paths; DongXi prefers tool reads over flooding the prompt with raw file content.
DongXi stores sessions under the workspace root:
.dongxi-agent/sessions/
Resume the latest session:
uv run dongxi-agent --resume latestShow the current session file:
/session
Reset the current session history and memory:
/reset
The top-level session shape is documented in dongxi_agent/session_state.py. /doctor checks the live session against that schema, while /audit reports missing or invalid sections as deterministic findings.
Slash commands are handled locally before any model call. Unknown slash commands are rejected locally with help text instead of being sent to the model.
High-use commands:
| Command | Purpose |
|---|---|
/help |
Show command help. |
/status |
Show workspace, branch, provider, model, session, approval, validation, and state counts. |
/tools |
Show tool registry metadata and approval risk. |
/files |
Fuzzy-find workspace files. |
/symbols |
Show top-level Python symbols for one file. |
/diff |
Show the current git diff. |
/review |
Run a deterministic local pre-review over changed files. |
/test |
Show, set, run, enable, disable, or clear post-edit validation. |
/context |
Show context projection and compaction state. |
/usage |
Show prompt/context usage estimates. |
/traces |
Show recent turn and tool timing traces. |
/turns |
Show active and recent turn lifecycle state. |
/permissions |
Show permission decisions and session approvals. |
/permission-profile |
Show or select standard, read-only, or local-dev. |
/audit |
Check session and harness invariants. |
/doctor |
Check runtime, config, sandbox, context pressure, and audit health. |
/goal |
Show, set, pause, resume, complete, or clear the session goal. |
/tasks |
Show or update plan/background task state. |
/subagents |
Show bounded child-agent sessions and result previews. |
/remote |
Show remote-control status. |
/notifications |
Show pending and recent in-app notifications. |
/artifacts |
Show stored large tool-result artifacts. |
/worktree |
Show git worktree status and isolated-root suggestions. |
/memory |
Show distilled working memory. |
/exit |
Exit the interactive session. |
Additional capability commands include /skills, /plugins, /mcp, /hooks, /watch, and /model.
Tools are declared as ToolDefinition objects in dongxi_agent/tool_registry.py. The registry separates model-facing fields from harness-facing policy metadata:
source: builtin, skill, goal, task, workspace, git, plugin, MCP, or agent.visibility: whether the tool is always visible or deferred.requires_approval: whether a risky action needs permission.reads_files,mutates_files,mutates_session,sandboxed,supports_parallel.
Default tool groups:
- Workspace reads:
list_files,read_file,search,find_file,file_info,symbols,git_diff,worktree_status. - Execution and edits:
run_shell,write_file,patch_file. - App state:
get_goal,update_goal,task_list,task_update,watch_file,workspace_events,plugin_list. - Capability metadata:
load_skill,mcp_list_tools,mcp_call_tool. - Delegation:
delegate,subagent_status.
Every model-proposed tool call gets a stable call_id. The transcript records a paired tool_call item and tool result item. The result is a structured ToolResult with:
okkind- model-facing
content - optional failure
message - optional
metadata.errorwith a stable failurecodeandmessage - optional metadata such as shell policy, approval, edit, validation, artifact, or result size
Large tool outputs are stored under .dongxi-agent/artifacts/ with a prompt-sized preview and artifact reference.
DongXi has a layered, learning-sized safety model:
- Tool argument validation happens before approval or execution.
- Workspace paths are normalized and checked for parent-directory escapes, symlink escapes, and null bytes.
- Protected metadata directories such as
.git,.dongxi-agent,.dongxi,.agents, and.codexare blocked for file edits. run_shellis classified by shell policy asallow,prompt, orforbidden.- The shell sandbox guard rejects known network commands and obvious outside-workspace paths in
workspace-writemode. - Risky edits and risky shell commands go through permission decisions.
- File edits require a fresh full-file baseline for existing files.
- Optional post-edit validation runs after successful edits when policy allows it.
Approval modes:
--approval ask: prompt before risky actions.--approval auto: allow risky actions after policy checks; use only in trusted throwaway repos.--approval never: deny risky actions.
When --approval ask prompts, answer y to allow once, s to allow the same conservative cache key for the current session, or press enter to deny.
Permission profiles:
standard: approval policy and session approvals decide risky tools.read-only: deny workspace writes and risky shell execution.local-dev: auto-allow workspace writes after freshness checks while keeping risky shell prompts.
Shell sandbox modes:
workspace-write: default guard for shell commands.off: disables the sandbox guard; shell policy still runs.
This is not an OS sandbox. It is a harness boundary for learning tool validation, shell policy, and human-in-the-loop permissions.
DongXi keeps durable session history separate from the model-visible prompt projection.
Prompt construction records:
messages_for_query: transcript and current request sections visible as query context.hidden_context: prefix rules, memory, goal, turn state, tasks, skills, MCP metadata, file state, workspace events, validation, artifacts, plugin metadata, and workspace docs.context_projection_steps: ordered metadata for normalization, working-slice choice, reductions, and final transcript projection.loop_state: message counts, turn counts, active turn id, and tool-use context summary.memory_mode: transcript memory, stable hidden context, and live file-state memory channels.file_state_attachments: typededited_text_filesnippets for files changed on disk since DongXi last read them.
File state tracks:
- Recent reads with line range, total lines, mtime, size, hash, bounded content baseline, and full/partial status.
- Files changed by DongXi edits.
- Stale reads that should be refreshed before relying on old transcript content.
- External disk changes with bounded snippets until the file is read or edited again.
Small sessions use a linear transcript with old read deduplication. Long sessions auto-compact older history into a bounded summary while preserving recent events.
Recent flowchart-review work made the runtime phases inspectable:
- Phase 1 input:
PreparedInputcarries app messages, attachments, submit-hook context,should_query, and stop reasons. - Phase 2 context:
PromptPayloadexposes sections, hidden/query split, file-state attachments, loop state, projection steps, and memory mode. - Phase 4 generation: each model attempt records parsed output metadata before branching into final, retry, or tool handling.
- Phase 5 validation: each
ToolResultrecords a compact execution summary with failed stage, validation status, execution status, permission/shell decisions when present, and normalized result size. - Phase 5 commit: each turn records committed history counts, assistant/tool counts, model-output counts, tool-execution counts, file-state attachment counts, context projection counts, and final status.
Use these commands to inspect that state:
/turns
/context
/usage
/traces
/audit
DongXi keeps lightweight observability outside the model transcript:
/usagereports prompt and context-size estimates./tracesreports recent turn/tool timing events./hooksreports lifecycle hook events such ason_user_message,before_tool,after_tool,on_skill_loaded, andon_mcp_tool..dongxi/hooks.jsoncan configure safe command hooks. For example:
{
"hooks": {
"user_prompt_submit": [
{ "command": "printf 'extra context from hook'" }
]
}
}Hook commands go through shell policy and sandbox checks. Commands that are forbidden or require approval are skipped instead of prompting during hook execution.
DongXiAgent still uses mixins, but the expected chain is now explicit in dongxi_agent/agent_architecture.py. Each entry documents the state surface it owns and any notable dependencies on other mixins. /doctor verifies that the live class MRO still matches that documented chain.
The current design remains intentionally pragmatic:
- Mixins are the learning boundary for now.
- Shared mutable state is constrained by a documented top-level session schema.
- Feature-specific
ensure_*_state()methods normalize their own nested state. /auditand/doctorprovide deterministic checks instead of relying on prompt text or manual inspection.
Configure a validation command:
/test set uv run pytest -q
Useful commands:
/test: show configured validation state and latest result./test set <command>: configure and enable post-edit validation./test run: run the configured command immediately./test offand/test on: disable or enable automatic post-edit validation./test clear: remove the command.
Validation commands go through shell policy and sandbox preflight. Commands that execute project code, such as python, pytest, or uv run pytest, run automatically only when --approval auto is active; otherwise DongXi records a skipped validation result.
The latest validation result is stored in session state and shown in the prompt as bounded context.
DongXi includes a small local remote-control transport. It uses the same input router as the CLI, loopback HTTP by default, bearer-token auth, bounded JSON bodies, and a simple browser page.
Start local remote control:
uv run dongxi-agent --remote-controlStart LAN share mode:
uv run dongxi-agent --remote-control-shareShare mode binds to the LAN and prints a plain URL plus a short pairing code. The browser exchanges that code for an in-memory bearer token.
Use a fixed token during local development:
uv run dongxi-agent --remote-control --remote-control-token dev-tokenCheck status:
curl -sS http://127.0.0.1:8765/status \
-H "Authorization: Bearer dev-token"Submit a turn:
curl -sS http://127.0.0.1:8765/turn \
-H "Authorization: Bearer dev-token" \
-H "Content-Type: application/json" \
-d '{"message":"/cwd"}'Endpoints:
GET /health: unauthenticated health check.POST /pair: unauthenticated pairing-code exchange in share mode.GET /status: authenticated status.POST /turn: authenticated message submission, including slash commands.POST /rpc: authenticated JSON-RPC-style wrapper withremoteControl/statusandturn/start.
Remote-control attachment paths are workspace-scoped. Paths that escape the workspace are rejected before the model turn starts.
This repo is the implementation target for the learning plan in:
learning_artifact/coding-agent-learning-plan.md
Important artifact directories:
learning_artifact/: study notes, concept maps, questions, and flowchart-pass notes.learning_artifact/daily-notes/: the original 14-day learning notes.code-architecture/: architecture snapshots and the 2026-05-17 flowchart-review cycle.ROADMAP.md: phase-by-phase implementation plan.
The latest architecture review is:
code-architecture/flowchart-review-cycle-2026-05-17.md
It maps the current code against the flowcharts in ../flow/flow.md and records the gap closed in each pass.
Run the unit and workflow test suite:
uv run pytest -qRun lint:
uv run ruff check .Run only the Qwen/Ollama workflow integration tests:
uv run pytest tests/test_workflow_integration.py -qThose tests skip if Ollama or dongxi-qwen3-4b-q4km is unavailable.
dongxi_agent/
agent.py main model/tool/final loop
agent_architecture.py documented mixin chain and dependency notes
cli.py command-line interface
models.py Ollama, OpenAI, Anthropic, and fake model clients
prompting.py prompt payload construction
context.py transcript projection and compaction metadata
loop_state.py loop and memory-mode snapshots
turns.py turn lifecycle, model output, tool execution, and commit metadata
tools.py tool validation, policy, dispatch, and implementations
tool_registry.py ToolDefinition metadata and default registry
tool_results.py structured tool success/failure objects
file_state.py read baselines and external-change attachments
edit_policy.py freshness checks and diff previews
shell_policy.py shell allow/prompt/forbidden classifier
sandbox.py workspace shell sandbox guard
session_state.py documented top-level session schema
permissions.py permission requests, profiles, and session approvals
validation.py post-edit validation state
hooks.py lifecycle and submit hooks
goals.py session goal state
tasks.py tasks and background shell task state
subagents.py bounded read-only child sessions
artifacts.py large tool-result artifact storage
observability.py prompt usage and trace events
remote_control.py local HTTP remote-control transport
workspace.py live repository snapshot
workspace_tools.py file, symbol, diff, review, and worktree helpers
workspace_paths.py workspace path safety
workspace_events.py watched file events
skills.py SKILL.md discovery and loading
plugins.py local plugin marketplace metadata
mcp.py MCP-style metadata and mock/safe command calls
input_messages.py app-level message and attachment shaping
input_router.py CLI slash/local/model routing
audit.py deterministic harness invariant checks
doctor.py runtime/config diagnostics
Tests live in:
tests/test_dongxi_agent.py
tests/test_workflow_integration.py
- No cloud enrollment, WebSocket reconnect, ack/replay, or multi-client stream tracking for remote control.
- No real OS sandbox beyond the workspace shell guard.
- No concurrent tool execution yet; tool execution metadata currently records
dispatch_mode: serial. - The main agent still uses a broad mixin chain rather than full composition.
- Nested session sections are normalized by their owning mixins rather than represented as one fully typed session object.
- No production MCP client; MCP support is metadata-first with mock results or safe command runtime.
- No worktree-isolated write workers or parallel subagent management.
- No full Codex or Claude Code parity. Those systems are architecture references, not implementation targets.
DongXi Agent is licensed under the Apache License 2.0. See LICENSE.
DongXi starts from the mini-coding-agent teaching implementation. Codex and Claude Code are architecture references only.