Skip to content

dxnlp/dongxi-agent

Repository files navigation

DongXi Agent

DongXi Agent is a local coding-agent harness for learning how tools, context, permissions, file state, and turn lifecycle state fit together. It runs against Ollama, OpenAI, or Anthropic models, stores session state inside the target workspace, and grows one tested harness boundary at a time.

This is not a production agent. It is intentionally small enough to inspect, modify, and compare with larger systems such as Codex and Claude Code.

Current Status

DongXi Agent now has a complete learning-sized runtime loop:

raw input
  -> structured app message and attachments
  -> submit hooks and shouldQuery decision
  -> typed prompt payload with transcript, hidden context, file-state attachments, and loop metadata
  -> configured model-provider call
  -> parsed final/retry/tool output
  -> validated tool execution through policy, sandbox, approval, and hooks
  -> normalized ToolResult
  -> Phase 5 turn commit and persisted session history

The current implementation includes:

  • Ollama, OpenAI, and Anthropic-backed interactive and one-shot CLI.
  • Codex-style workspace selection with -C/--cd.
  • Structured app input with text, file/image attachments, inferred file references, and user_prompt_submit hooks.
  • Typed prompt payloads with visible transcript sections and hidden context sections.
  • A documented DongXiAgent mixin chain and top-level session schema validation.
  • Turn lifecycle metadata for Phase 2 prompt snapshots, Phase 4 model outputs, Phase 5 tool execution summaries, and final state commits.
  • Structured tool registry metadata for source, risk, state mutation, file reads, approval, sandboxing, and parallel-safety hints.
  • Workspace-safe read, search, file-info, symbol, diff, edit, and shell tools.
  • File-state tracking for read baselines, stale reads, external disk changes, and changed-file attachment snippets.
  • Human-in-the-loop permission requests, permission profiles, shell policy, and a small workspace shell sandbox guard.
  • Freshness-checked file edits with bounded diff previews and optional post-edit validation.
  • Local slash commands for audit, doctor, context, usage, traces, tasks, turns, artifacts, subagents, model selection, and remote-control status.
  • Learning-sized skills, hooks, MCP metadata, plugin metadata, goals, notifications, task state, artifacts, and read-only delegation.
  • Loopback or LAN remote-control mode with bearer-token auth and pairing-code share mode.
  • Unit tests plus optional dongxi-qwen3-4b-q4km workflow integration tests.

The interactive welcome screen shows the active approval mode and shell sandbox mode so users see the safety posture before the first turn.

Quick Start

Install dependencies for development:

uv sync --group dev

Install the CLI globally from this checkout:

uv tool install .

Run DongXi in the current repository:

dongxi

Run a one-shot prompt in another workspace:

dongxi -C /path/to/repo "Inspect this repo and summarize the harness."

Run without global installation while developing this repo:

uv run dongxi-agent "Inspect this repo and summarize the harness."

DongXi uses --provider ollama by default and expects Ollama at http://127.0.0.1:11434. If --model is omitted for Ollama, it asks Ollama for local models and chooses a preferred available model. Resumed sessions reuse their saved provider and model unless --provider or --model is supplied.

API Providers

OpenAI and Anthropic providers use the standard library HTTP client; no SDK dependency is required.

export OPENAI_API_KEY=...
uv run dongxi-agent --provider openai --model gpt-5.4-mini "Reply with <final>hello</final>."

export ANTHROPIC_API_KEY=...
uv run dongxi-agent --provider anthropic --model claude-sonnet-4-6 "Reply with <final>hello</final>."

Provider defaults:

  • --provider openai: OPENAI_MODEL or gpt-5.4-mini, using OPENAI_BASE_URL or https://api.openai.com/v1.
  • --provider anthropic: ANTHROPIC_MODEL or claude-sonnet-4-6, using ANTHROPIC_BASE_URL or https://api.anthropic.com/v1.

The /model <name> command can switch OpenAI and Anthropic sessions to any model name available to the configured API key. DongXi does not list remote provider catalogs locally.

Recommended Local Model

The learning workflow uses an Ollama model named dongxi-qwen3-4b-q4km.

Create it from the official Qwen3 4B GGUF helper:

scripts/pull_qwen3_4b_gguf.sh

Then run:

uv run dongxi-agent --model dongxi-qwen3-4b-q4km "Reply with <final>hello</final>."

The helper downloads Qwen/Qwen3-4B-GGUF Q4_K_M into the ignored local models/ directory and creates the Ollama model.

Useful model commands inside DongXi:

/model
/model dongxi-qwen3-4b-q4km

Common CLI Options

dongxi [prompt...]
  -C, --cd, --cwd <path>        workspace directory
  --attach <path>               attach a file or image; repeatable
  --provider <name>             ollama, openai, or anthropic
  --model <name>                model name
  --host <url>                  Ollama host
  --openai-base-url <url>       OpenAI API base URL
  --anthropic-base-url <url>    Anthropic API base URL
  --api-timeout <seconds>       OpenAI/Anthropic request timeout
  --resume <session|latest>     resume a saved session
  --approval ask|auto|never     risky-tool approval policy
  --sandbox workspace-write|off shell sandbox guard
  --max-steps <n>               max model/tool iterations per request
  --max-new-tokens <n>          max model output tokens per step

Example with attachments:

uv run dongxi-agent \
  --attach screenshot.png \
  --attach dongxi_agent/prompting.py \
  "The bug is shown in the image; update the prompt file."

Image attachments are sent as image payloads to the configured provider, so use a vision-capable model when visual content matters. OpenAI and Anthropic API image payloads currently assume PNG data URLs/base64 blocks because the model-client boundary receives only base64 image bytes. File attachments and fuzzy filename references are represented as explicit paths; DongXi prefers tool reads over flooding the prompt with raw file content.

Sessions

DongXi stores sessions under the workspace root:

.dongxi-agent/sessions/

Resume the latest session:

uv run dongxi-agent --resume latest

Show the current session file:

/session

Reset the current session history and memory:

/reset

The top-level session shape is documented in dongxi_agent/session_state.py. /doctor checks the live session against that schema, while /audit reports missing or invalid sections as deterministic findings.

Slash Commands

Slash commands are handled locally before any model call. Unknown slash commands are rejected locally with help text instead of being sent to the model.

High-use commands:

Command Purpose
/help Show command help.
/status Show workspace, branch, provider, model, session, approval, validation, and state counts.
/tools Show tool registry metadata and approval risk.
/files Fuzzy-find workspace files.
/symbols Show top-level Python symbols for one file.
/diff Show the current git diff.
/review Run a deterministic local pre-review over changed files.
/test Show, set, run, enable, disable, or clear post-edit validation.
/context Show context projection and compaction state.
/usage Show prompt/context usage estimates.
/traces Show recent turn and tool timing traces.
/turns Show active and recent turn lifecycle state.
/permissions Show permission decisions and session approvals.
/permission-profile Show or select standard, read-only, or local-dev.
/audit Check session and harness invariants.
/doctor Check runtime, config, sandbox, context pressure, and audit health.
/goal Show, set, pause, resume, complete, or clear the session goal.
/tasks Show or update plan/background task state.
/subagents Show bounded child-agent sessions and result previews.
/remote Show remote-control status.
/notifications Show pending and recent in-app notifications.
/artifacts Show stored large tool-result artifacts.
/worktree Show git worktree status and isolated-root suggestions.
/memory Show distilled working memory.
/exit Exit the interactive session.

Additional capability commands include /skills, /plugins, /mcp, /hooks, /watch, and /model.

Tool Surface

Tools are declared as ToolDefinition objects in dongxi_agent/tool_registry.py. The registry separates model-facing fields from harness-facing policy metadata:

  • source: builtin, skill, goal, task, workspace, git, plugin, MCP, or agent.
  • visibility: whether the tool is always visible or deferred.
  • requires_approval: whether a risky action needs permission.
  • reads_files, mutates_files, mutates_session, sandboxed, supports_parallel.

Default tool groups:

  • Workspace reads: list_files, read_file, search, find_file, file_info, symbols, git_diff, worktree_status.
  • Execution and edits: run_shell, write_file, patch_file.
  • App state: get_goal, update_goal, task_list, task_update, watch_file, workspace_events, plugin_list.
  • Capability metadata: load_skill, mcp_list_tools, mcp_call_tool.
  • Delegation: delegate, subagent_status.

Every model-proposed tool call gets a stable call_id. The transcript records a paired tool_call item and tool result item. The result is a structured ToolResult with:

  • ok
  • kind
  • model-facing content
  • optional failure message
  • optional metadata.error with a stable failure code and message
  • optional metadata such as shell policy, approval, edit, validation, artifact, or result size

Large tool outputs are stored under .dongxi-agent/artifacts/ with a prompt-sized preview and artifact reference.

Safety Model

DongXi has a layered, learning-sized safety model:

  1. Tool argument validation happens before approval or execution.
  2. Workspace paths are normalized and checked for parent-directory escapes, symlink escapes, and null bytes.
  3. Protected metadata directories such as .git, .dongxi-agent, .dongxi, .agents, and .codex are blocked for file edits.
  4. run_shell is classified by shell policy as allow, prompt, or forbidden.
  5. The shell sandbox guard rejects known network commands and obvious outside-workspace paths in workspace-write mode.
  6. Risky edits and risky shell commands go through permission decisions.
  7. File edits require a fresh full-file baseline for existing files.
  8. Optional post-edit validation runs after successful edits when policy allows it.

Approval modes:

  • --approval ask: prompt before risky actions.
  • --approval auto: allow risky actions after policy checks; use only in trusted throwaway repos.
  • --approval never: deny risky actions.

When --approval ask prompts, answer y to allow once, s to allow the same conservative cache key for the current session, or press enter to deny.

Permission profiles:

  • standard: approval policy and session approvals decide risky tools.
  • read-only: deny workspace writes and risky shell execution.
  • local-dev: auto-allow workspace writes after freshness checks while keeping risky shell prompts.

Shell sandbox modes:

  • workspace-write: default guard for shell commands.
  • off: disables the sandbox guard; shell policy still runs.

This is not an OS sandbox. It is a harness boundary for learning tool validation, shell policy, and human-in-the-loop permissions.

Context, Memory, And File State

DongXi keeps durable session history separate from the model-visible prompt projection.

Prompt construction records:

  • messages_for_query: transcript and current request sections visible as query context.
  • hidden_context: prefix rules, memory, goal, turn state, tasks, skills, MCP metadata, file state, workspace events, validation, artifacts, plugin metadata, and workspace docs.
  • context_projection_steps: ordered metadata for normalization, working-slice choice, reductions, and final transcript projection.
  • loop_state: message counts, turn counts, active turn id, and tool-use context summary.
  • memory_mode: transcript memory, stable hidden context, and live file-state memory channels.
  • file_state_attachments: typed edited_text_file snippets for files changed on disk since DongXi last read them.

File state tracks:

  • Recent reads with line range, total lines, mtime, size, hash, bounded content baseline, and full/partial status.
  • Files changed by DongXi edits.
  • Stale reads that should be refreshed before relying on old transcript content.
  • External disk changes with bounded snippets until the file is read or edited again.

Small sessions use a linear transcript with old read deduplication. Long sessions auto-compact older history into a bounded summary while preserving recent events.

Turn Lifecycle Metadata

Recent flowchart-review work made the runtime phases inspectable:

  • Phase 1 input: PreparedInput carries app messages, attachments, submit-hook context, should_query, and stop reasons.
  • Phase 2 context: PromptPayload exposes sections, hidden/query split, file-state attachments, loop state, projection steps, and memory mode.
  • Phase 4 generation: each model attempt records parsed output metadata before branching into final, retry, or tool handling.
  • Phase 5 validation: each ToolResult records a compact execution summary with failed stage, validation status, execution status, permission/shell decisions when present, and normalized result size.
  • Phase 5 commit: each turn records committed history counts, assistant/tool counts, model-output counts, tool-execution counts, file-state attachment counts, context projection counts, and final status.

Use these commands to inspect that state:

/turns
/context
/usage
/traces
/audit

Observability And Hooks

DongXi keeps lightweight observability outside the model transcript:

  • /usage reports prompt and context-size estimates.
  • /traces reports recent turn/tool timing events.
  • /hooks reports lifecycle hook events such as on_user_message, before_tool, after_tool, on_skill_loaded, and on_mcp_tool.
  • .dongxi/hooks.json can configure safe command hooks. For example:
{
  "hooks": {
    "user_prompt_submit": [
      { "command": "printf 'extra context from hook'" }
    ]
  }
}

Hook commands go through shell policy and sandbox checks. Commands that are forbidden or require approval are skipped instead of prompting during hook execution.

Architecture Guardrails

DongXiAgent still uses mixins, but the expected chain is now explicit in dongxi_agent/agent_architecture.py. Each entry documents the state surface it owns and any notable dependencies on other mixins. /doctor verifies that the live class MRO still matches that documented chain.

The current design remains intentionally pragmatic:

  • Mixins are the learning boundary for now.
  • Shared mutable state is constrained by a documented top-level session schema.
  • Feature-specific ensure_*_state() methods normalize their own nested state.
  • /audit and /doctor provide deterministic checks instead of relying on prompt text or manual inspection.

Post-Edit Validation

Configure a validation command:

/test set uv run pytest -q

Useful commands:

  • /test: show configured validation state and latest result.
  • /test set <command>: configure and enable post-edit validation.
  • /test run: run the configured command immediately.
  • /test off and /test on: disable or enable automatic post-edit validation.
  • /test clear: remove the command.

Validation commands go through shell policy and sandbox preflight. Commands that execute project code, such as python, pytest, or uv run pytest, run automatically only when --approval auto is active; otherwise DongXi records a skipped validation result.

The latest validation result is stored in session state and shown in the prompt as bounded context.

Remote Control

DongXi includes a small local remote-control transport. It uses the same input router as the CLI, loopback HTTP by default, bearer-token auth, bounded JSON bodies, and a simple browser page.

Start local remote control:

uv run dongxi-agent --remote-control

Start LAN share mode:

uv run dongxi-agent --remote-control-share

Share mode binds to the LAN and prints a plain URL plus a short pairing code. The browser exchanges that code for an in-memory bearer token.

Use a fixed token during local development:

uv run dongxi-agent --remote-control --remote-control-token dev-token

Check status:

curl -sS http://127.0.0.1:8765/status \
  -H "Authorization: Bearer dev-token"

Submit a turn:

curl -sS http://127.0.0.1:8765/turn \
  -H "Authorization: Bearer dev-token" \
  -H "Content-Type: application/json" \
  -d '{"message":"/cwd"}'

Endpoints:

  • GET /health: unauthenticated health check.
  • POST /pair: unauthenticated pairing-code exchange in share mode.
  • GET /status: authenticated status.
  • POST /turn: authenticated message submission, including slash commands.
  • POST /rpc: authenticated JSON-RPC-style wrapper with remoteControl/status and turn/start.

Remote-control attachment paths are workspace-scoped. Paths that escape the workspace are rejected before the model turn starts.

Learning Artifacts

This repo is the implementation target for the learning plan in:

learning_artifact/coding-agent-learning-plan.md

Important artifact directories:

  • learning_artifact/: study notes, concept maps, questions, and flowchart-pass notes.
  • learning_artifact/daily-notes/: the original 14-day learning notes.
  • code-architecture/: architecture snapshots and the 2026-05-17 flowchart-review cycle.
  • ROADMAP.md: phase-by-phase implementation plan.

The latest architecture review is:

code-architecture/flowchart-review-cycle-2026-05-17.md

It maps the current code against the flowcharts in ../flow/flow.md and records the gap closed in each pass.

Development

Run the unit and workflow test suite:

uv run pytest -q

Run lint:

uv run ruff check .

Run only the Qwen/Ollama workflow integration tests:

uv run pytest tests/test_workflow_integration.py -q

Those tests skip if Ollama or dongxi-qwen3-4b-q4km is unavailable.

Package Layout

dongxi_agent/
  agent.py              main model/tool/final loop
  agent_architecture.py documented mixin chain and dependency notes
  cli.py                command-line interface
  models.py             Ollama, OpenAI, Anthropic, and fake model clients
  prompting.py          prompt payload construction
  context.py            transcript projection and compaction metadata
  loop_state.py         loop and memory-mode snapshots
  turns.py              turn lifecycle, model output, tool execution, and commit metadata
  tools.py              tool validation, policy, dispatch, and implementations
  tool_registry.py      ToolDefinition metadata and default registry
  tool_results.py       structured tool success/failure objects
  file_state.py         read baselines and external-change attachments
  edit_policy.py        freshness checks and diff previews
  shell_policy.py       shell allow/prompt/forbidden classifier
  sandbox.py            workspace shell sandbox guard
  session_state.py      documented top-level session schema
  permissions.py        permission requests, profiles, and session approvals
  validation.py         post-edit validation state
  hooks.py              lifecycle and submit hooks
  goals.py              session goal state
  tasks.py              tasks and background shell task state
  subagents.py          bounded read-only child sessions
  artifacts.py          large tool-result artifact storage
  observability.py      prompt usage and trace events
  remote_control.py     local HTTP remote-control transport
  workspace.py          live repository snapshot
  workspace_tools.py    file, symbol, diff, review, and worktree helpers
  workspace_paths.py    workspace path safety
  workspace_events.py   watched file events
  skills.py             SKILL.md discovery and loading
  plugins.py            local plugin marketplace metadata
  mcp.py                MCP-style metadata and mock/safe command calls
  input_messages.py     app-level message and attachment shaping
  input_router.py       CLI slash/local/model routing
  audit.py              deterministic harness invariant checks
  doctor.py             runtime/config diagnostics

Tests live in:

tests/test_dongxi_agent.py
tests/test_workflow_integration.py

Current Limitations

  • No cloud enrollment, WebSocket reconnect, ack/replay, or multi-client stream tracking for remote control.
  • No real OS sandbox beyond the workspace shell guard.
  • No concurrent tool execution yet; tool execution metadata currently records dispatch_mode: serial.
  • The main agent still uses a broad mixin chain rather than full composition.
  • Nested session sections are normalized by their owning mixins rather than represented as one fully typed session object.
  • No production MCP client; MCP support is metadata-first with mock results or safe command runtime.
  • No worktree-isolated write workers or parallel subagent management.
  • No full Codex or Claude Code parity. Those systems are architecture references, not implementation targets.

License

DongXi Agent is licensed under the Apache License 2.0. See LICENSE.

DongXi starts from the mini-coding-agent teaching implementation. Codex and Claude Code are architecture references only.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors