Sisyphus is a stateful harness for AI-assisted software work. It externalizes agent memory into repository-local task state, artifact projections, lifecycle gates, verification claims, evidence graphs, episode traces, and promotion records.
Instead of asking an AI worker to infer progress from chat history, Sisyphus exposes a structured control plane through CLI and MCP tools:
state_t
-> observation_t
-> action_t
-> transition
-> state_t+1
That makes Sisyphus useful as both a task lifecycle tool and an agent evaluation environment. Current runtime support includes task observation rendering, explicit action risk levels, episode trace capture, curated evidence, reward-aligned eval output, benchmark fixtures, test-first loop checks, and SFT/RL dataset export. Online RL training is not part of the current runtime; the implemented boundary is the environment interface and offline data path.
Sisyphus is designed to be installed once and then run inside any target project repository.
Preferred command:
sisyphus request "Add an agent dashboard"That command creates and manages task state in the current repository:
.planning/tasks/....planning/inbox/...- project worktrees and task branches
Sisyphus itself does not need to live inside the target repository. The important rule is simple: run sisyphus from the repo you want to manage, or pass --repo explicitly.
The agent-facing loop starts from recorded task state rather than chat transcript reconstruction:
- Sisyphus owns task state, lifecycle rules, conformance, gates, evidence, and closeout.
sisyphus observe <task-id> --jsonrenders compact state for a worker or policy.- The worker selects an allowed action such as search, context build, plan revision, subtask generation, or verification.
- Sisyphus validates the action against lifecycle and risk boundaries.
- The transition result, state diff, reward facts, and evidence remain inspectable in repository-local artifacts.
Review and judgment remain conservative. Actions such as plan approval, spec freeze, close, promotion execution, and merged-PR recording are review-gated or human-only in the action registry.
- Python 3.11+
uvgit
Use PowerShell:
git clone <repo-url>
cd Sisyphus
uv sync
uv run sisyphus statusTo install it as a reusable command:
git clone <repo-url>
cd Sisyphus
uv tool install .
sisyphus statusUse Terminal:
git clone <repo-url>
cd Sisyphus
uv sync
uv run sisyphus statusTo install it as a reusable command:
git clone <repo-url>
cd Sisyphus
uv tool install .
sisyphus statusRepo-local environment:
uv sync --extra discordTool install:
uv tool install ".[discord]"If you installed with uv tool install ., update from the repo root with:
uv tool install . --forceThe default target repository is the current working directory.
Example:
cd /path/to/my-product
sisyphus request "Build a voice meeting assistant"That creates task state inside /path/to/my-product, not inside the Sisyphus source repository.
If you need to manage a different repository from the current shell location, use --repo:
sisyphus --repo /path/to/my-product request "Build a voice meeting assistant"Create and run a task:
sisyphus request "Add an agent dashboard"Create a task but stop before execution:
sisyphus request "Draft the plan only" --no-runQueue a conversation event without immediately processing it:
sisyphus ingest conversation "Add an agent dashboard" --no-runProcess pending inbox events once:
sisyphus daemon --onceRun the long-lived service loop:
sisyphus serveShow task and agent status:
sisyphus status
sisyphus status --agents
sisyphus agents --jsonManual lifecycle commands:
sisyphus observe <task-id> --json
sisyphus plan approve <task-id> --by reviewer
sisyphus plan request-changes <task-id> --by reviewer --notes "split the work more clearly"
sisyphus plan revise <task-id> --by worker --notes "updated the plan"
sisyphus spec freeze <task-id> --by reviewer
sisyphus subtasks generate <task-id>
sisyphus verify <task-id>
sisyphus close <task-id>Harness and evaluation commands:
sisyphus episode check <task-id> --json
sisyphus eval loop <task-id> --json
sisyphus eval test-first <task-id> --json
sisyphus benchmark run --json
sisyphus dataset export --format sft --task-id <task-id>
sisyphus dataset export --format rl --output artifacts/rollouts.jsonlThe operator-facing workflow is:
- Intake a request.
- Create a repository-local task workspace.
- Draft and review the plan.
- Freeze the spec.
- Generate subtasks.
- Run worker execution.
- Verify results.
- Close the task.
The orchestration loop can pause in needs_user_input when review limits are hit or human guidance is required.
Internally, feature work is also projected into an artifact-governed path:
Feature task record
-> FeatureChangeArtifact projection snapshot
-> FeatureChange evaluation
-> ObligationIntent
-> ProtocolSpec + ObligationSpec + InputContract
-> CompiledObligation queue
-> ExecutionPolicy-backed daemon convergence
-> VerificationClaim / promotion decision
The DSL owns what must be read, produced, and verified. Execution policy owns who or what performs the work, such as a local Sisyphus verifier, tool runner, or future agent/provider overlay.
Key persisted artifact outputs currently include:
.planning/tasks/<task-id>/artifacts/projection/feature-change.json.planning/tasks/<task-id>/artifacts/obligations/compiled.json.planning/tasks/<task-id>/artifacts/episodes/<episode-id>.jsonl.planning/tasks/<task-id>/artifacts/evidence/evidence-graph.json
Related documentation:
docs/research/stateful-agent-harness.mddocs/research/harness-1-comparison.mddocs/rl-action-space.mddocs/reward-model.mddocs/episode-trace.mddocs/curated-evidence.mddocs/dataset-export.md
Set the token and start the bot:
Windows PowerShell:
$env:DISCORD_BOT_TOKEN="YOUR_TOKEN"
sisyphus discord-bot --channel-id 123456789012345678macOS:
export DISCORD_BOT_TOKEN="YOUR_TOKEN"
sisyphus discord-bot --channel-id 123456789012345678By default the bot manages the repository in the current directory. To target a different repository, add --repo.
Sisyphus can also be imported directly:
from pathlib import Path
import sisyphus
result = sisyphus.request_task(
repo_root=Path("/path/to/my-product"),
message="Build a voice meeting assistant",
title="Voice Meeting Assistant",
auto_run=True,
)
print(result.ok)
print(result.task_id)
print(result.task["status"])
print(result.task["workflow_phase"])Useful API entrypoints:
sisyphus.queue_conversation(...)sisyphus.request_task(...)
Sisyphus can be exposed to coding agents over MCP through:
sisyphus-mcpThe MCP entrypoint is backed by the official MCP Python SDK over stdio.
The recommended launcher also sets PYTHONPATH=/absolute/path/to/Sisyphus/src so active MCP registrations prefer the current repo source over any stale installed package copy.
Quick start from the Sisyphus repo root:
./init-mcp.sh
./init-mcp.sh --repo /absolute/path/to/your/repositoryThat script registers Sisyphus in Codex and writes a Claude Code project .mcp.json for the managed repository.
Client setup examples for Codex and Claude are documented in docs/mcp-clients.md. Repo-level agent guidance for preferring Sisyphus MCP tools and resources lives in AGENTS.md.
Repository-level configuration prefers .sisyphus.toml.
Legacy repositories can continue using .taskflow.toml as a fallback compatibility filename.
Default values:
base_branch = "main"
worktree_root = "../_worktrees"
task_dir = ".planning/tasks"
branch_prefix_feature = "feat"
branch_prefix_issue = "fix"Example:
base_branch = "dev"
worktree_root = "../_worktrees"
task_dir = ".planning/tasks"
branch_prefix_feature = "feat"
branch_prefix_issue = "fix"
[commands]
lint = "echo lint-ok"
test = "python -m unittest discover -s tests -v"
[verify]
default = ["lint"]
feature = ["lint", "test"]
issue = ["lint"]Run the full suite:
uv run python -m unittest discover -s tests -vsisyphusis the preferred command surface.- The direct MCP launcher is
python -m sisyphus.mcp_server. - For durable local registration, include
PYTHONPATH=/absolute/path/to/Sisyphus/srcin the MCP server environment. - The package name is
sisyphus. - Project philosophy: see
docs/philosophy.md. - LinkedIn weekly summary example: see
docs/linkedin-weekly-main-summary-2026-04-17.md. - Phone-first automation proposal: see
docs/mobile-automation-spec.md.