████████ ███████ ███████ ████████ █████
██ ██ ██ ██ ██ ██
██ █████ ███████ ██ ███████
██ ██ ██ ██ ██ ██
██ ███████ ███████ ██ ██ ██
─────────────────────────────────────────────
▚▚ A U T O N O M O U S Q A D E P T ▞▞
A drop-in bundle that turns any AI coding assistant into a combined QA department + bugfix dev team for any software project. Point your agent at it and it will: discover the product's features, surfaces, and business logic; build a measurable coverage matrix; test every route, endpoint, screen, CLI, job, and flow; log reproducible bugs with severity; fix them in safe batches; add regression tests; retest; and write a final report — all driven from resumable files so the run survives interruptions and can be handed between agents.
Works with Claude Code (as an installable plugin/skill), Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Aider, Gemini CLI, or plain chat — across web, backend/API, serverless/edge, iOS, Android, CLI, desktop, data pipelines, and infra/CI.
- Why this exists
- How it works
- Quick start
- Installation (every tool)
- Using it
- Run modes
- What it produces — the
qa/workspace - Supported surfaces
- Bundled scripts
- Safety boundaries
- Repository layout
- Core principle
- Publishing your own copy
- FAQ
- License
"The tests pass" is where most automated QA stops. This bundle treats that as the
starting line. It models your project as a product — every feature, screen,
endpoint, flow, and rule is an individual unit that must be exercised against an
explicit expected behavior and marked Pass / Fail / Blocked / Unknown
with evidence. Gaps become reproducible bugs; bugs get fixed in minimal,
safe batches with regression tests; everything is recorded in plain-text files so
the work is auditable, resumable, and portable across agents and tools.
The agent runs a single resumable loop:
Resume → Discover → Plan → Test → Triage → Fix → Retest → Report
All state lives in a qa/ directory at your project's root (not inside this
bundle). Because every decision, bug, and result is written to disk, any agent —
including a fresh session with no memory of the run — can read qa/run-ledger.md
and pick up exactly where the last one stopped. Files are the substrate; agents are
interchangeable.
The "brain" is a single skill file —
SKILL.md —
which holds the operating loop and points to deep, load-on-demand playbooks. Every
adapter (Claude plugin, Cursor rules, AGENTS.md) funnels through that one file so
instructions never drift between copies.
Claude Code (installable plugin):
/plugin marketplace add <your-github-user>/<this-repo>
/plugin install autonomous-qa-department@autonomous-qa-marketplace
Then just say "test everything", "act as QA and audit this project for bugs",
or run /autonomous-qa.
Any other AI / IDE: copy this bundle into your project and tell the agent:
Read AGENTS.md and run the full autonomous QA workflow on this project.
It populates a qa/ folder at your project root and works the loop until the
exit criteria
are met. Resume any time — it reads qa/run-ledger.md and continues.
Full per-tool instructions, including the skill-only install and how to verify the install, live in INSTALL.md. Summary:
| Tool | How | What you get |
|---|---|---|
| Claude Code — plugin (recommended) | /plugin marketplace add <user>/<repo> then /plugin install autonomous-qa-department@autonomous-qa-marketplace. A local clone path also works as a marketplace. |
Auto-triggering skill plus /autonomous-qa and /qa-agent-* slash commands |
| Claude Code — skill only | Copy plugins/autonomous-qa-department/skills/autonomous-qa-department into <project>/.claude/skills/ |
Auto-triggering skill (no slash commands) |
| Cursor | Copy the bundle to the project root including dotfiles (rsync -a --exclude .git ./ <project>/) |
/autonomous-qa command + always-on rule via .cursor/ adapters |
| Copilot / Codex / Windsurf / Aider / Gemini CLI | Copy the bundle to the project root, then point the agent at AGENTS.md | Universal entrypoint; for Copilot, add a one-liner to .github/copilot-instructions.md |
| Plain chat / web assistant | Paste SKILL.md as the first message, provide the repo |
Follow the loop manually; keep qa/ files updated to stay resumable |
In every case the agent writes run-state into a qa/ folder at the target
project root and seeds it from the templates on first run.
Once installed, start a run in whatever way your tool supports:
- Natural language: "test everything", "act as QA and audit this project for bugs", "do a full quality pass", "QA this API / app / CLI".
- Slash command (Claude Code / Cursor):
/autonomous-qafor the full loop, or a surface-specific entrypoint such as/qa-agent-web,/qa-agent-backend,/qa-agent-ios,/qa-agent-android,/qa-agent-cli,/qa-agent-desktop,/qa-agent-data,/qa-agent-serverless,/qa-agent-infra. - File pointer (other tools): "Read AGENTS.md and run the full autonomous QA workflow on this project."
The agent asks you which run mode to use, then discovers your stack,
builds the coverage matrix, and works it down by risk. It is resumable — stop
any time and re-issue the same command; it reads qa/run-ledger.md and continues.
A run is "done" when the
exit criteria
hold (every P0/P1 coverage row closed; lower-priority rows closed or explicitly
deferred).
At the start of every test cycle the agent asks which mode to use. The mode changes only how blockers are handled — never the safety lines.
non-stop— Never pause. Log every blocker/open question as aBLK-###, mark affected unitsBlocked, and keep going with all other work. At the end it presents every accumulated blocker as one batch of direct questions. Best for unattended / overnight / "just run it and tell me everything" runs.stop at blockers— Halt at each blocker that needs a human decision, ask a specific question, wait, and continue. Best for interactive or high-stakes, ambiguous-business-logic work.
For genuinely headless runs (a cron/scheduled agent with no human to answer), it
defaults to non-stop and notes that in the ledger.
The run's entire memory is a set of plain-text files at your project root. They are both the working state and the deliverable — readable, diffable, and resumable:
| File | What it holds |
|---|---|
qa/run-ledger.md |
Resumable state machine: current phase, cursor, decisions, blockers |
qa/test-plan.md |
Detected surfaces, stack, build/test/lint commands, strategy |
qa/feature-inventory.md |
Every feature + its expected behavior + source |
qa/application-map.md |
Apps, routes, screens, services, jobs |
qa/element-inventory.md |
Every interactive UI element + intended function |
qa/business-flow-map.md |
End-to-end user/business journeys |
qa/coverage-matrix.md |
The spine — every testable unit × test type × status |
qa/risk-register.md |
Risk-ranked areas → test/fix priority |
qa/bug-registry.md |
Every confirmed bug, stable IDs (BUG-0001…), full reproduction |
qa/fix-plan.md |
Bugs batched by shared root cause |
qa/regression-log.md |
Retest results and regression tests added |
qa/final-report.md |
Metrics, outcomes, residual risk, next steps |
The copies shipped inside this bundle are templates / schema definitions; the
live run is the copy created at your project root. Evidence captured during testing
goes under qa/evidence/ (kept git-ignored in consumer projects).
The agent detects which of these apply and reads the matching playbook for each:
web · backend/API · serverless / edge · iOS · Android · CLI · desktop · data pipelines · infra / CI
Surface playbooks live under references/surfaces/. A stack the detector doesn't recognize is fine — the agent inspects it manually and fills the test plan.
Optional, read-only conveniences (you can always do the work by hand). From a
root-drop install they run as scripts/...; in plugin/skill mode invoke them by
their install path (e.g. ${CLAUDE_PLUGIN_ROOT}/skills/.../scripts/...):
qa-detect.sh <project> # detect surfaces & stack (read-only)
qa-detect.sh <project> --json # machine-readable surface report
qa-all.sh <project> # run the project's native checks (see safety note)
qa-rollup.sh <project>/qa # compute the coverage / bug rollupThe department runs autonomously on safe, reversible, local/sandbox work — but it stops and records a blocker rather than guessing for anything that is:
- destructive or hard to reverse,
- production-touching or operating on real-user data,
- money / payment / billing related,
- credential-gated or needing paid external access,
- an app-store or payment-provider operation, or
- dependent on a business rule it cannot safely infer.
It also confirms a target is non-production before running native checks or migrations (native test commands can themselves be destructive). It never deletes, skips, or weakens failing tests to make CI green, never hides failures, and keeps fix diffs minimal. Full boundaries: references/10-safety.md.
.claude-plugin/
marketplace.json ← so people can /plugin install it (lists the plugin below)
.cursor/ ← Cursor rules + commands (thin adapters → the skill)
AGENTS.md ← universal entrypoint for any AI/IDE
README.md / INSTALL.md ← this, and how to install anywhere
LICENSE ← MIT
plugins/
autonomous-qa-department/ ← the Claude Code plugin
.claude-plugin/plugin.json ← plugin manifest
commands/ ← slash commands (/autonomous-qa, /qa-agent-*)
skills/
autonomous-qa-department/ ← the skill itself
SKILL.md ← the brain: operating loop + pointers (start here)
references/ ← deep playbooks, loaded on demand
00-orchestration.md 01-discovery.md 02-coverage-model.md
03-test-design.md 04-execution.md 05-triage.md 06-bugfix.md
07-regression.md 08-nonfunctional.md 09-reporting.md 10-safety.md
surfaces/ web · api-backend · serverless · iOS · Android ·
cli · desktop · data-pipeline · infra
qa/ ← state templates (the department's memory)
run-ledger.md coverage-matrix.md risk-register.md test-plan.md
feature-inventory.md application-map.md element-inventory.md
business-flow-map.md bug-registry.md fix-plan.md regression-log.md
final-report.md
scripts/ qa-detect.sh · qa-all.sh · qa-rollup.sh
A passing test suite is not proof of quality. Every feature, element, endpoint, and flow is individually verified against an expected behavior and marked Pass / Fail / Blocked / Unknown with evidence. "Tests are green" is a starting point, never a conclusion.
This repo is itself a Claude Code marketplace, so anyone can fork it and publish their own. Before you publish:
- Edit the
ownerin .claude-plugin/marketplace.json and theauthorin plugins/autonomous-qa-department/.claude-plugin/plugin.json to your own name/email. - Optionally rename the marketplace
name(autonomous-qa-marketplace) and update the install commands in this README and INSTALL.md to match. - Update the
Copyrightline in LICENSE. - Push to a git host, then install with
/plugin marketplace add <your-github-user>/<your-repo>.
Does it modify my code? Only during the Fix phase, with minimal diffs, and only within the safety boundaries above. Discovery is read-only; testing actively exercises the app (drives the UI, calls APIs, runs the CLI, triggers jobs) against a non-prod/sandbox target — native test commands run only after the non-prod checklist clears — but it doesn't change your source outside the Fix phase.
Is the run safe to interrupt? Yes. State is flushed to qa/run-ledger.md after
every meaningful step; re-issue the same command to resume.
Can multiple agents work in parallel? If your harness supports subagents, it dispatches one QA agent per surface and one fix agent per batch (isolating file-writing fixes in separate worktrees/branches). Without subagents it runs the same steps sequentially for an identical result. See references/00-orchestration.md.
Where does the run state go? Into a qa/ folder at the root of the project
under test — never inside this bundle. Add qa/evidence/ to that project's
.gitignore.
MIT.
New here? Open SKILL.md — it explains the whole system in one page.