🧪 testa — Autonomous QA & Bugfix Department

 ████████ ███████ ███████ ████████  █████
    ██    ██      ██          ██    ██   ██
    ██    █████   ███████     ██    ███████
    ██    ██           ██     ██    ██   ██
    ██    ███████ ███████     ██    ██   ██
─────────────────────────────────────────────
 ▚▚  A U T O N O M O U S   Q A   D E P T  ▞▞

🧪 testa — Autonomous QA & Bugfix Department

A drop-in bundle that turns any AI coding assistant into a combined QA department + bugfix dev team for any software project. Point your agent at it and it will: discover the product's features, surfaces, and business logic; build a measurable coverage matrix; test every route, endpoint, screen, CLI, job, and flow; log reproducible bugs with severity; fix them in safe batches; add regression tests; retest; and write a final report — all driven from resumable files so the run survives interruptions and can be handed between agents.

Works with Claude Code (as an installable plugin/skill), Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Aider, Gemini CLI, or plain chat — across web, backend/API, serverless/edge, iOS, Android, CLI, desktop, data pipelines, and infra/CI.

Why this exists

"The tests pass" is where most automated QA stops. This bundle treats that as the starting line. It models your project as a product — every feature, screen, endpoint, flow, and rule is an individual unit that must be exercised against an explicit expected behavior and marked Pass / Fail / Blocked / Unknown with evidence. Gaps become reproducible bugs; bugs get fixed in minimal, safe batches with regression tests; everything is recorded in plain-text files so the work is auditable, resumable, and portable across agents and tools.

How it works

The agent runs a single resumable loop:

Resume → Discover → Plan → Test → Triage → Fix → Retest → Report

All state lives in a qa/ directory at your project's root (not inside this bundle). Because every decision, bug, and result is written to disk, any agent — including a fresh session with no memory of the run — can read qa/run-ledger.md and pick up exactly where the last one stopped. Files are the substrate; agents are interchangeable.

The "brain" is a single skill file — SKILL.md — which holds the operating loop and points to deep, load-on-demand playbooks. Every adapter (Claude plugin, Cursor rules, AGENTS.md) funnels through that one file so instructions never drift between copies.

Quick start

Claude Code (installable plugin):

/plugin marketplace add <your-github-user>/<this-repo>
/plugin install autonomous-qa-department@autonomous-qa-marketplace

Then just say "test everything", "act as QA and audit this project for bugs", or run /autonomous-qa.

Any other AI / IDE: copy this bundle into your project and tell the agent:

Read AGENTS.md and run the full autonomous QA workflow on this project.

It populates a qa/ folder at your project root and works the loop until the exit criteria are met. Resume any time — it reads qa/run-ledger.md and continues.

Installation (every tool)

Full per-tool instructions, including the skill-only install and how to verify the install, live in INSTALL.md. Summary:

Tool	How	What you get
Claude Code — plugin (recommended)	`/plugin marketplace add <user>/<repo>` then `/plugin install autonomous-qa-department@autonomous-qa-marketplace`. A local clone path also works as a marketplace.	Auto-triggering skill plus `/autonomous-qa` and `/qa-agent-*` slash commands
Claude Code — skill only	Copy `plugins/autonomous-qa-department/skills/autonomous-qa-department` into `<project>/.claude/skills/`	Auto-triggering skill (no slash commands)
Cursor	Copy the bundle to the project root including dotfiles (`rsync -a --exclude .git ./ <project>/`)	`/autonomous-qa` command + always-on rule via `.cursor/` adapters
Copilot / Codex / Windsurf / Aider / Gemini CLI	Copy the bundle to the project root, then point the agent at AGENTS.md	Universal entrypoint; for Copilot, add a one-liner to `.github/copilot-instructions.md`
Plain chat / web assistant	Paste `SKILL.md` as the first message, provide the repo	Follow the loop manually; keep `qa/` files updated to stay resumable

In every case the agent writes run-state into a qa/ folder at the target project root and seeds it from the templates on first run.

Using it

Once installed, start a run in whatever way your tool supports:

Natural language: "test everything", "act as QA and audit this project for bugs", "do a full quality pass", "QA this API / app / CLI".
Slash command (Claude Code / Cursor): /autonomous-qa for the full loop, or a surface-specific entrypoint such as /qa-agent-web, /qa-agent-backend, /qa-agent-ios, /qa-agent-android, /qa-agent-cli, /qa-agent-desktop, /qa-agent-data, /qa-agent-serverless, /qa-agent-infra.
File pointer (other tools): "Read AGENTS.md and run the full autonomous QA workflow on this project."

The agent asks you which run mode to use, then discovers your stack, builds the coverage matrix, and works it down by risk. It is resumable — stop any time and re-issue the same command; it reads qa/run-ledger.md and continues. A run is "done" when the exit criteria hold (every P0/P1 coverage row closed; lower-priority rows closed or explicitly deferred).

Run modes

At the start of every test cycle the agent asks which mode to use. The mode changes only how blockers are handled — never the safety lines.

non-stop — Never pause. Log every blocker/open question as a BLK-###, mark affected units Blocked, and keep going with all other work. At the end it presents every accumulated blocker as one batch of direct questions. Best for unattended / overnight / "just run it and tell me everything" runs.
stop at blockers — Halt at each blocker that needs a human decision, ask a specific question, wait, and continue. Best for interactive or high-stakes, ambiguous-business-logic work.

For genuinely headless runs (a cron/scheduled agent with no human to answer), it defaults to non-stop and notes that in the ledger.

What it produces — the `qa/` workspace

The run's entire memory is a set of plain-text files at your project root. They are both the working state and the deliverable — readable, diffable, and resumable:

File	What it holds
`qa/run-ledger.md`	Resumable state machine: current phase, cursor, decisions, blockers
`qa/test-plan.md`	Detected surfaces, stack, build/test/lint commands, strategy
`qa/feature-inventory.md`	Every feature + its expected behavior + source
`qa/application-map.md`	Apps, routes, screens, services, jobs
`qa/element-inventory.md`	Every interactive UI element + intended function
`qa/business-flow-map.md`	End-to-end user/business journeys
`qa/coverage-matrix.md`	The spine — every testable unit × test type × status
`qa/risk-register.md`	Risk-ranked areas → test/fix priority
`qa/bug-registry.md`	Every confirmed bug, stable IDs (`BUG-0001…`), full reproduction
`qa/fix-plan.md`	Bugs batched by shared root cause
`qa/regression-log.md`	Retest results and regression tests added
`qa/final-report.md`	Metrics, outcomes, residual risk, next steps

The copies shipped inside this bundle are templates / schema definitions; the live run is the copy created at your project root. Evidence captured during testing goes under qa/evidence/ (kept git-ignored in consumer projects).

Supported surfaces

The agent detects which of these apply and reads the matching playbook for each:

web · backend/API · serverless / edge · iOS · Android · CLI · desktop · data pipelines · infra / CI

Surface playbooks live under references/surfaces/. A stack the detector doesn't recognize is fine — the agent inspects it manually and fills the test plan.

Bundled scripts

Optional, read-only conveniences (you can always do the work by hand). From a root-drop install they run as scripts/...; in plugin/skill mode invoke them by their install path (e.g. ${CLAUDE_PLUGIN_ROOT}/skills/.../scripts/...):

qa-detect.sh <project>          # detect surfaces & stack (read-only)
qa-detect.sh <project> --json   # machine-readable surface report
qa-all.sh <project>             # run the project's native checks (see safety note)
qa-rollup.sh <project>/qa       # compute the coverage / bug rollup

Safety boundaries

The department runs autonomously on safe, reversible, local/sandbox work — but it stops and records a blocker rather than guessing for anything that is:

destructive or hard to reverse,
production-touching or operating on real-user data,
money / payment / billing related,
credential-gated or needing paid external access,
an app-store or payment-provider operation, or
dependent on a business rule it cannot safely infer.

It also confirms a target is non-production before running native checks or migrations (native test commands can themselves be destructive). It never deletes, skips, or weakens failing tests to make CI green, never hides failures, and keeps fix diffs minimal. Full boundaries: references/10-safety.md.

Repository layout

.claude-plugin/
  marketplace.json        ← so people can /plugin install it (lists the plugin below)
.cursor/                  ← Cursor rules + commands (thin adapters → the skill)
AGENTS.md                 ← universal entrypoint for any AI/IDE
README.md / INSTALL.md    ← this, and how to install anywhere
LICENSE                   ← MIT
plugins/
  autonomous-qa-department/         ← the Claude Code plugin
    .claude-plugin/plugin.json      ← plugin manifest
    commands/                       ← slash commands (/autonomous-qa, /qa-agent-*)
    skills/
      autonomous-qa-department/     ← the skill itself
        SKILL.md          ← the brain: operating loop + pointers (start here)
        references/       ← deep playbooks, loaded on demand
          00-orchestration.md  01-discovery.md  02-coverage-model.md
          03-test-design.md  04-execution.md  05-triage.md  06-bugfix.md
          07-regression.md  08-nonfunctional.md  09-reporting.md  10-safety.md
          surfaces/        web · api-backend · serverless · iOS · Android ·
                           cli · desktop · data-pipeline · infra
        qa/               ← state templates (the department's memory)
          run-ledger.md  coverage-matrix.md  risk-register.md  test-plan.md
          feature-inventory.md  application-map.md  element-inventory.md
          business-flow-map.md  bug-registry.md  fix-plan.md  regression-log.md
          final-report.md
        scripts/          qa-detect.sh · qa-all.sh · qa-rollup.sh

Core principle

A passing test suite is not proof of quality. Every feature, element, endpoint, and flow is individually verified against an expected behavior and marked Pass / Fail / Blocked / Unknown with evidence. "Tests are green" is a starting point, never a conclusion.

Publishing your own copy

This repo is itself a Claude Code marketplace, so anyone can fork it and publish their own. Before you publish:

Edit the owner in .claude-plugin/marketplace.json and the author in plugins/autonomous-qa-department/.claude-plugin/plugin.json to your own name/email.
Optionally rename the marketplace name (autonomous-qa-marketplace) and update the install commands in this README and INSTALL.md to match.
Update the Copyright line in LICENSE.
Push to a git host, then install with /plugin marketplace add <your-github-user>/<your-repo>.

FAQ

Does it modify my code? Only during the Fix phase, with minimal diffs, and only within the safety boundaries above. Discovery is read-only; testing actively exercises the app (drives the UI, calls APIs, runs the CLI, triggers jobs) against a non-prod/sandbox target — native test commands run only after the non-prod checklist clears — but it doesn't change your source outside the Fix phase.

Is the run safe to interrupt? Yes. State is flushed to qa/run-ledger.md after every meaningful step; re-issue the same command to resume.

Can multiple agents work in parallel? If your harness supports subagents, it dispatches one QA agent per surface and one fix agent per batch (isolating file-writing fixes in separate worktrees/branches). Without subagents it runs the same steps sequentially for an identical result. See references/00-orchestration.md.

Where does the run state go? Into a qa/ folder at the root of the project under test — never inside this bundle. Add qa/evidence/ to that project's .gitignore.

License

MIT.

New here? Open SKILL.md — it explains the whole system in one page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧪 testa — Autonomous QA & Bugfix Department

Contents

Why this exists

How it works

Quick start

Installation (every tool)

Using it

Run modes

What it produces — the `qa/` workspace

Supported surfaces

Bundled scripts

Safety boundaries

Repository layout

Core principle

Publishing your own copy

FAQ

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude-plugin		.claude-plugin
.cursor		.cursor
plugins/autonomous-qa-department		plugins/autonomous-qa-department
.gitignore		.gitignore
AGENTS.md		AGENTS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🧪 testa — Autonomous QA & Bugfix Department

Contents

Why this exists

How it works

Quick start

Installation (every tool)

Using it

Run modes

What it produces — the qa/ workspace

Supported surfaces

Bundled scripts

Safety boundaries

Repository layout

Core principle

Publishing your own copy

FAQ

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What it produces — the `qa/` workspace

Packages