An intelligent skills framework for AI agents.
Think clearly. Work thoroughly. Deliver excellence.
Sage is a skills framework that makes AI agents think before they act, stay focused under complexity, and deliver outcomes you can trust. Built for product and engineering teams, open to any domain.
- Think first, build second — a framing round challenges assumptions before solutioning begins, preventing the most expensive mistake: solving the wrong problem
- Focus over noise — loads only what the task needs, producing sharper reasoning
- Reliable by design — 5-layer enforcement, 3 independent sub-agent reviews, quality gates with deterministic scripts
- Gets smarter over time — self-learning, memory, and ontology compound into institutional knowledge of your codebase
- Grows with its ecosystem — 38 built-in skills, extensible with 90K+ community skills from skills.sh
Most AI frameworks skip from request to implementation. Sage's navigator thinks first — mapping every request to an intent spectrum (UNDERSTAND → ENVISION → DELIVER → REFLECT) and detecting what's missing before work begins.
It starts with a framing round: surface the pain, challenge the premises, and arrive at a chosen framing — before any solutioning happens. Building without research? It tells you what 15 minutes of discovery would prevent, then lets you decide. Gap detection, not gatekeeping.
Routing is deterministic first, intelligent second: keywords match workflows before any LLM judgment. When keywords don't match, a focused sub-agent classifier picks the right phase. Every routing decision is confirmed with the user before proceeding. Smart enough to route accurately. Humble enough to ask when unsure.
AI agents drift silently — skipping steps, hallucinating imports, building the wrong thing confidently. Sage catches this at every stage:
Before implementation:
- Auto-review (sub-agent) verifies spec quality after approval — framing alignment, testable criteria, boundary completeness, edge cases, internal consistency
- Auto-review (sub-agent) verifies plan quality after approval — spec-plan alignment, task decomposition, dependency ordering, coverage gaps
During implementation:
- 7 universal coding principles loaded into the build-loop — clarity, error handling, boundary guards, minimal scope, safe APIs, consistency, behavior testing
After implementation:
- 5 quality gates sequence automatically — spec compliance, constitution compliance, code quality (independent sub-agent), hallucination check, test verification
- 2 advisory gates activate when applicable — browser check (Lightpanda), design check (frontend files)
- Auto-QA (sub-agent) verifies code against spec — alignment, test coverage, error handling, boundary conditions, integration consistency, coding principles
Six independent sub-agent review points. The agent that writes the code — or diagnoses the bug — never reviews its own work alone.
Most frameworks dump all instructions into the context window and hope for the best. Sage loads in two layers: the eager layer (process rules, workflow gates, engineering principles — ~200 lines, always in context) enforces what must never be skipped. The lazy layer (capabilities like TDD discipline, coding principles, systematic debugging, build-loop orchestration — loaded when the workflow step needs them) adds depth without bloating context. A focused agent with the right 500 tokens outperforms a distracted agent with 50,000 tokens of everything.
Close your IDE, hit a context limit, come back tomorrow — Sage picks up
exactly where you left off. A cycle manifest captures state, context
summary, decisions, open questions, and handoff guidance at every
checkpoint. Type /continue and Sage reads the manifest, routes to the
correct workflow, and preserves the judgment context that would otherwise
be lost.
Most agent frameworks are stateless. The agent that made a mistake yesterday makes it again today. Sage has three skills that build institutional memory — all backed by sage-memory MCP:
- sage-self-learning captures mistakes as WHEN/CHECK/BECAUSE prevention rules. Every session starts by searching past mistakes before doing anything.
- sage-memory stores project knowledge as focused prose insights — how your auth works, why billing uses event sourcing, what conventions the team follows.
- sage-ontology maps entity relationships — not just "billing exists" but "billing depends on payments, which triggers webhooks, which notify users." Touch one module, know the blast radius.
Day 1, the agent knows nothing. Day 30, it knows your codebase's landmines, patterns, and conventions.
curl -fsSL https://raw.githubusercontent.com/xoai/sage/main/install.sh | bashWorks on macOS and Linux. On Windows, use Git Bash or WSL:
# Windows — open Git Bash, then:
curl -fsSL https://raw.githubusercontent.com/xoai/sage/main/install.sh | bashAll sage commands run in bash. On Windows, use Git Bash or WSL
for both installation and daily use.
sage new my-app # scaffold a new project with Sage
cd my-appOpen the project in your IDE, then follow a natural progression:
/sage # 1. describe what you want to build
# Sage classifies intent, detects gaps,
# and recommends the right workflow
/research # 2. (optional) user interviews → JTBD →
# opportunity map — understand the problem
# before solutioning
/architect # 3. (optional) system design → ADRs →
# milestone plan — for non-trivial systems
/build # 4. spec → plan → build-loop → quality gates
# auto-review, TDD, coding principles, auto-QA
Not every project needs every step. A simple feature can go straight
to /build. A complex product benefits from /research → /design
→ /architect → /build. Sage tells you what you're skipping and
lets you decide.
cd your-project
sage init # interactive — detects stack, asks for preset
sage init --preset startup # or pick a preset directly
sage init --prefix # namespace commands as sage:build, sage:fix, etc.Available presets: base (default), startup, enterprise, opensource.
Presets add engineering principles on top of the universal base (TDD, no
secrets, explicit deps). Configure later in .sage/config.yaml.
Then teach Sage your codebase:
# 1. Set up persistent memory (one-time)
sage setup memory # configures sage-memory MCP server
# 2. Learn your codebase (run inside your IDE)
sage learn # broad scan — architecture, patterns, conventions
sage learn src/billing # deep dive — learn a specific moduleAfter install, sage upgrade will prompt to upgrade the sage-memory
package when a newer version is available on PyPI, and sage update
syncs the latest skill prose into your project automatically — no
manual sage-memory install-skills invocations required.
After learning, Sage knows your conventions, architecture, and landmines. Every future session starts by searching this memory — no more explaining context from scratch.
Then work naturally:
/sage # describe your task — Sage reads memory,
# checks for work in progress, and routes
# to the right workflow
/fix # diagnose → scope → fix → verify
# reads prior QA reports and design reviews
/build # spec → plan → build-loop → quality gates
# reads prior research, design specs, ADRs
/autoresearch # autonomous iteration toward a metric
# modify → commit → verify → keep/revert
/continue # resume where you left off — reads the
# cycle manifest for full context handoff
sage upgrade # pulls latest Sage framework from GitHub
sage update # regenerates platform files, preserves .sage/ statesage update regenerates CLAUDE.md, commands, workflows, and gate
scripts while preserving your project state (decisions, work
artifacts, memory). You may need to restart your IDE to load latest
configs.
Run in your terminal:
| Command | What It Does |
|---|---|
sage new <n> |
Create a new project with Sage |
sage init |
Add Sage to the current directory |
sage update |
Regenerate platform files after changes |
sage upgrade |
Update Sage to the latest version |
sage learn [path] |
Learn a codebase or module |
sage setup memory |
Configure persistent memory (sage-memory MCP) |
sage find <query> |
Search skills.sh catalog (90K+ skills) |
sage add <source> |
Install skills from owner/repo, URL, or local path |
sage add <source> --skill <n> |
Install a specific skill from a repo |
sage remove <skill> |
Remove a skill from project |
sage skills |
List installed skills |
sage update [target] |
Update community skills to latest |
Three layers, deterministic first:
- Keywords (instant) — "build" →
/build, "fix" →/fix, "audit" →/analyze. Handles 60-70% of requests with zero LLM judgment. - Sub-agent classifier (focused) — independent context, single job: classify into UNDERSTAND / ENVISION / DELIVER / REFLECT.
- Confirmation (human decides) — 2-3 options with skill chains visible. The user confirms before anything runs.
Use inside your IDE (Claude Code, Antigravity):
| Command | What It Does |
|---|---|
/sage |
Start here. Routes via keywords → classify → confirm |
/build |
Spec → plan → build-loop → quality gates (with auto-review, coding principles, auto-QA). Accepts --quality-locked, --autonomous |
/fix |
Diagnose → scope → fix → verify (reads QA and design-review reports) |
/architect |
Elicit → design → milestone plan → phased build (with ADR auto-review). Accepts --quality-locked, --autonomous |
/research |
Interview → JTBD → opportunity map |
/design |
Brief → spec → copy (reads research context) |
/analyze |
UX audit → evaluation → findings |
/qa |
Browser-based functional testing (optional Lightpanda MCP) |
/design-review |
Design quality audit + AI slop detection + design system compliance |
/review |
Independent artifact evaluation via sub-agent delegation |
/autoresearch |
Autonomous iteration toward a measurable metric (reduce bundle, increase coverage) |
/map |
Build ontology knowledge graph — modules, services, dependencies |
/learn |
Codebase scan → memory storage |
/reflect |
Review cycle → extract learnings → seed next cycle |
/continue |
Resume any active cycle with full context |
/status |
Compute project state from artifacts |
Two optional flags change how the workflow operates without changing what it produces:
| Flag | Effect |
|---|---|
--quality-locked |
At each review checkpoint, loop review/revise until findings are clean (no Critical, no Major, only cosmetic Minor) or cap hit (10 iterations). Use when you want Sage to push for a clean output bar. |
--autonomous |
Skip user-facing elicitation. Agent makes brief/spec/plan decisions by reading memory, codebase patterns, constitution principles, and prior cycles. Every decision cites its source. Unconfident substantive decisions fall back to asking. Use when you want Sage to draft a recommended approach from your project's context. |
/build --quality-locked # interactive, quality-locked
/build --autonomous "ship dark mode" # autonomous decisions, normal review
/build --autonomous --quality-locked "..." # full autonomy, quality-locked
/architect --autonomous "design billing v2"Flags are independent and combinable. Both have hard iteration caps
and explicit cap-reached prompts — no runaway behavior. Flag state
persists in the cycle manifest, so /continue restores them.
Set defaults in .sage/config.yaml so the flags apply automatically
to every /build, /architect, and /fix invocation:
quality_locked: true # always loop review until clean
autonomous: false # use interactive elicitationThe agent announces active modes and their source at workflow start:
Sage → build workflow.
Modes: --quality-locked (from .sage/config.yaml)
Goal: Ship dark mode
Per-run override: the --no-quality-locked and --no-autonomous
flags disable a config default for a single run:
/build --no-quality-locked "quick typo fix" # override config defaultPrecedence (highest wins): --no-X flag → --X flag → config
default → off. Passing both --X and --no-X is an error.
Sage communicates clearly at every step:
Decision points — numbered options when you need to choose a direction.
Checkpoints — [A] Approve / [R] Revise shortcuts on deliverables.
Continuations — [C] Continue with a recommended next step.
Free-form input always works. These patterns guide, they don't constrain.
Sage organizes work into four phases. Each phase has dedicated workflows that chain skills automatically:
UNDERSTAND ENVISION DELIVER REFLECT
/research /analyze /design /architect /build /fix /reflect
/learn /map /autoresearch
/review /qa
/design-review
/research chains user-interview → JTBD → opportunity-map.
/design chains ux-brief → ux-specify → ux-writing and reads
research findings automatically. /build chains spec → plan →
build-loop → quality-gates and reads design specs. /reflect
reviews the full cycle, extracts WHEN/CHECK/BECAUSE learnings,
and seeds the next cycle with concrete recommendations.
You can enter at any phase. But the further right you start, the more you're building on assumptions.
Agents rationalize. Tell them "MUST write spec" and they'll decide the conversation IS the spec. Every instruction that requires interpretation will be reinterpreted. Sage solves this with 5 independent enforcement layers and observable conditions that can't be argued away.
Layer 1 — Always-on rules in the system prompt. Even if nothing else loads, the gates prevent the worst violations. Eight rules covering memory-before-work, spec-first, artifact-only state, checkpoints (no unilateral deferral), self-check, decisions logging, learning from corrections, and skills-before-assumptions.
Layer 2 — Command preambles. Every slash command has enforcement rules the agent reads before its first token. Named rationalizations are blocked: "the design is clear" → NOT a spec file.
Layer 3 — Capabilities loaded at the right workflow step. build-loop
orchestrates task-by-task execution. coding-principles enforces 7
universal quality standards. tdd enforces test-first. systematic-debug
structures root cause investigation.
Layer 4 — Bash gate scripts. Deterministic. Run BEFORE the agent
reviews. sage-verify.sh runs your test suite, sage-hallucination-check.sh
verifies imports exist, sage-spec-check.sh confirms deliverables match
the plan. The script says tests fail → gate fails, regardless of what
the agent thinks.
Layer 5 — Self-learning. Corrections from past sessions are stored as WHEN/CHECK/BECAUSE rules and searched before every Standard+ task. The agent reads its own past failures before repeating them.
Every rule is an observable condition, not an action instruction. "spec.md MUST EXIST on disk" is binary — the agent can't argue a file into existence. "MUST write spec" is rationalizable — the agent decides the conversation is the spec. File existence beats language.
The agent must bypass all five layers to skip the spec. Each layer is independently enforceable.
Sage delegates three review points to sub-agents with independent context windows. The producing agent's conversation history — where self-bias lives — is not included.
| Review Point | When | What the Sub-Agent Checks |
|---|---|---|
| Auto-review: spec | After spec [A] | Framing alignment, testable criteria, boundary completeness, edge cases, consistency |
| Auto-review: plan | After plan [A] | Spec-plan alignment, task decomposition, dependencies, coverage gaps, risk |
| Auto-review: ADR | After design [A] in /architect | Trade-off analysis, migration path, risk assessment, blast radius, reversibility |
| Auto-review: root cause | After diagnosis [A] in /fix | Evidence quality, symptom vs cause, alternative causes, reproduction chain |
| Auto-review: fix plan | After fix plan [A] in /fix | Root cause coverage, file completeness, test strategy, regression risk |
| Gate 3: code quality | During quality gates | Readability, error handling, security, performance, conventions |
| Auto-QA | After gates pass | Spec-implementation alignment, test coverage, error handling, boundaries, integration, coding principles |
All are advisory — the user can always [P] Proceed. Findings are
logged to decisions.md for /reflect to learn from.
Requires Claude Code's Task tool. When Task tool is not available (e.g., Antigravity), reviews are skipped silently.
Seven universal principles loaded during implementation — not a post-hoc checklist, but a mindset active AS code is written:
- Clarity over cleverness — descriptive names, obvious flow, no tricks
- Fail loudly, recover gracefully — every external call has error handling
- Guard the boundaries — validate at every entry point
- Smallest scope, shortest lifetime — local over global, pure over stateful
- Make the right thing easy — APIs that invite correct usage
- Consistency beats perfection — match the existing codebase
- Test what matters — test behavior and boundaries, not implementation
Language-agnostic. Apply to Python, TypeScript, Go, Rust, anything. Stack skills add language-specific idioms on top.
Sage uses a three-tier constitution model:
Base (5 principles, all projects) — TDD, no silent failures, no secrets in code, explicit dependencies, reversible changes.
Preset (chosen during init) — startup (ship small, monolith first), enterprise (auth everywhere, audit trails, postmortems), or opensource (docs mirror code, semver contract).
Project additions — your own principles in .sage/config.yaml.
The generator merges all three tiers into the always-on instructions. Lower tiers add constraints but cannot remove inherited ones.
Skills are Sage's knowledge architecture — a principled way to put LLMs in the best position to do excellent work.
Every skill uses progressive disclosure: a short description triggers activation, SKILL.md provides the full process, and reference files offer depth when needed. This mirrors how experts work — you don't recite the entire textbook before solving a problem. You know what you know, and you reach for references when the situation demands it.
Skills are designed to maximize LLM capabilities. Clear structure (frontmatter, process steps, quality criteria) gives the agent unambiguous guidance. Domain vocabulary in the right places improves reasoning. Reference material separated from instructions keeps the agent focused on the task, not on parsing a wall of text.
Sage ships with skills across four domains:
- Product management — JTBD, opportunity mapping, user interviews, PRDs, problem-solving
- UX design — audit, evaluate, discovery, brief, specify, writing, heuristic review, research, plan-tasks
- Engineering — React, React Native, Next.js, Flutter, web, mobile, API, BaaS, plus full-stack presets (Next.js + Supabase, Flutter + Firebase, React Native + Expo, Next.js fullstack)
- Framework — memory, ontology, self-learning, autoresearch, skill-builder, and research packs (discover, draft, observe, source-process, validate)
Search and install from 90K+ community skills:
sage find react # search skills.sh
sage add vercel-labs/agent-skills # browse + pick from multi-skill repo
sage add vercel-labs/agent-skills --skill frontend-design # install specific skill
sage add ./my-local-skills # install from local path
sage remove frontend-design # uninstallSkills install to sage/skills/ and auto-deploy to your platform
(.claude/skills/ loader stubs for Claude Code, full copies to
.agent/skills/ for Antigravity).
Contributing is deliberately simple. Drop a folder with a SKILL.md
into sage/skills/ and it works. Add Sage frontmatter (type, tags,
relationships) for smarter integration.
Sage configuration lives in .sage/config.yaml:
sage-version: "1.1.1"
project-name: "my-app"
detected-stack: [react, typescript]
auto_review: true # sub-agent review after spec/plan approval
auto_qa: true # sub-agent QA after quality gates
independent_gate3: true # sub-agent code quality review (Gate 3)
command_prefix: false # prefix commands as sage:build, sage:fix, etc.All toggles default to true (except command_prefix). Set to false to disable:
| Setting | What It Controls |
|---|---|
auto_review |
Sub-agent review of spec, plan, and ADR after approval |
auto_qa |
Sub-agent code verification after quality gates pass |
independent_gate3 |
Sub-agent code quality review at Gate 3 (falls back to self-review) |
command_prefix |
Namespace all commands as sage:build, sage:fix, etc. (set via --prefix flag) |
For non-trivial work where independent review changes your mind, Sage offers an opt-in cross-model build cycle. The host (Claude Code, Opus) keeps the planner role and orchestrates; external CLIs handle adversarial review and implementation:
brief → spec → external spec review (loop) → plan → external plan review
→ external implement → external code review (loop) → reflect
Defaults: Codex CLI (gpt-5-codex) reviews specs/plans and code; Kimi
CLI implements. All bindings live in a single config file you can edit:
# .sage/agents.toml — swap any role's tool with a one-line change
[roles.code_reviewer]
agent = "codex"
model = "gpt-5-codex"
mode = "read-only"Install per project (Python 3.11+, plus whatever CLIs you bind):
cd my-project
sage setup multi-agent # adds /build-x, /review-spec, /review-plan,
# /implement, /review-code — never shadows /build
sage setup multi-agent --remove # clean uninstall, user edits backed upThe augmented cycle re-uses Sage's existing /architect, /research,
and /design workflows where they fit, then layers external review +
external implementation on top. Survives sage update — your
.sage/agents.toml and .sage/prompts/ are never touched; framework-
owned scripts and command files refresh from the template with drift
detection ([K]eep | [R]eplace | [D]iff if you've edited locally).
Claude Code only in v1.
Learn more:
- docs/multi-agent.md — comprehensive user guide (install, configure, daily use, customize, troubleshoot)
- runtime/multi-agent/README.md — contributor-facing (template layout, ownership split, how to test)
.sage/docs/multi-agent.md(post-install) — protocol contract, schema, integration points
When Sage runs in your project, it manages state in .sage/:
.sage/
├── config.yaml # Project config — preset, stack, toggles
├── decisions.md # Append-only decision log (never edited, never summarized)
├── conventions.md # Project conventions (enriched by codebase-scan)
├── docs/ # Project knowledge (analyses, ADRs, research)
│ ├── decision-*.md # Architecture Decision Records
│ ├── ux-audit-*.md # UX audit findings
│ ├── jtbd-*.md # Jobs-to-be-Done analysis
│ └── reflect-*.md # Cycle reflections with learnings
├── work/ # Per-initiative deliverables
│ └── YYYYMMDD-slug/
│ ├── brief.md # Scope definition (medium+ tasks)
│ ├── spec.md # Feature specification
│ ├── plan.md # Implementation plan with tasks
│ ├── manifest.md # Cycle state + handoff context
│ ├── qa-report.md # QA test results (from /qa)
│ └── design-review.md # Design audit findings (from /design-review)
└── gates/
├── gate-modes.yaml # Which gates run per workflow mode
└── scripts/ # Deterministic verification scripts
Artifact-only state. There is no progress.md or state file that the agent summarizes. The artifacts ARE the state: spec.md exists = spec phase complete. plan.md exists = planning done. File existence is binary — the agent can't hallucinate a file into existence.
decisions.md is newest-first. The agent prepends entries after the
header — recent context is always read first. When the file exceeds
~200 lines, old entries archive to decisions-{date}.md.
Sage is platform-agnostic. It works wherever AI agents work.
| Platform | How Sage Integrates | Sub-agent reviews | Status |
|---|---|---|---|
| Claude Code | CLAUDE.md + .claude/commands/ (markdown) |
Task tool | Full |
| Antigravity | GEMINI.md + .agent/ (markdown) |
Native | Full |
| Codex (OpenAI) | AGENTS.md + .codex/agents/ (TOML sub-agents) |
Native (TOML) | Full |
| Opencode | AGENTS.md + .opencode/{commands,agents}/ (markdown) |
Native (markdown) | Full |
| Gemini CLI | GEMINI.md + .gemini/commands/ (TOML) |
Single-pass fallback in v1 | Full |
| Claude Code Plugin | Plugin format — install with /plugin install sage@xoai |
Task tool | Full |
Six distribution paths from one source:
Sage Framework (source of truth)
├── generate-claude-code.sh → CLAUDE.md + .claude/
├── generate-antigravity.sh → GEMINI.md + .agent/
├── generate-codex.sh → AGENTS.md + .codex/agents/
├── generate-opencode.sh → AGENTS.md + .opencode/{commands,agents}/
├── generate-gemini-cli.sh → GEMINI.md + .gemini/commands/
└── generate-plugin.sh → sage-plugin/ (Claude Code plugin)
All in-project paths share the same .sage/ project state. Multiple
platforms can be installed simultaneously — Sage detects them and
generates files for each. AGENTS.md is shared between Codex and
Opencode; GEMINI.md is shared between Antigravity and Gemini CLI.
sage init # detect existing platforms; ask if none
sage init --platform codex # explicit: just Codex
sage init --platform codex,opencode # multiple
sage init --platform all # all 5 platforms
sage update # regenerate using the persisted list
sage update --platform gemini-cli # override on updateThe selected platforms persist in .sage/config.yaml under
platforms:. sage update reads this list and regenerates for each.
Sage copies its framework source into each project. This is intentional:
- Self-contained. No external dependencies. Works offline.
- Version-locked. Your project uses the exact version you installed. No surprise updates. Upgrade when you're ready.
- Inspectable. Read any skill, workflow, or capability. No magic. If something isn't working, you can see exactly what it's doing.
- Portable. Clone the repo and everything is there. No global installs, no PATH configuration, no package managers.
If you prefer managed installs, the Claude Code plugin offers the same functionality without in-project files.
MIT