Skip to content

bUxEE/testa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ████████ ███████ ███████ ████████  █████
    ██    ██      ██          ██    ██   ██
    ██    █████   ███████     ██    ███████
    ██    ██           ██     ██    ██   ██
    ██    ███████ ███████     ██    ██   ██
─────────────────────────────────────────────
 ▚▚  A U T O N O M O U S   Q A   D E P T  ▞▞

🧪 testa — Autonomous QA & Bugfix Department

A drop-in bundle that turns any AI coding assistant into a combined QA department + bugfix dev team for any software project. Point your agent at it and it will: discover the product's features, surfaces, and business logic; build a measurable coverage matrix; test every route, endpoint, screen, CLI, job, and flow; log reproducible bugs with severity; fix them in safe batches; add regression tests; retest; and write a final report — all driven from resumable files so the run survives interruptions and can be handed between agents.

Works with Claude Code (as an installable plugin/skill), Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Aider, Gemini CLI, or plain chat — across web, backend/API, serverless/edge, iOS, Android, CLI, desktop, data pipelines, and infra/CI.


Contents


Why this exists

"The tests pass" is where most automated QA stops. This bundle treats that as the starting line. It models your project as a product — every feature, screen, endpoint, flow, and rule is an individual unit that must be exercised against an explicit expected behavior and marked Pass / Fail / Blocked / Unknown with evidence. Gaps become reproducible bugs; bugs get fixed in minimal, safe batches with regression tests; everything is recorded in plain-text files so the work is auditable, resumable, and portable across agents and tools.

How it works

The agent runs a single resumable loop:

Resume → Discover → Plan → Test → Triage → Fix → Retest → Report

All state lives in a qa/ directory at your project's root (not inside this bundle). Because every decision, bug, and result is written to disk, any agent — including a fresh session with no memory of the run — can read qa/run-ledger.md and pick up exactly where the last one stopped. Files are the substrate; agents are interchangeable.

The "brain" is a single skill file — SKILL.md — which holds the operating loop and points to deep, load-on-demand playbooks. Every adapter (Claude plugin, Cursor rules, AGENTS.md) funnels through that one file so instructions never drift between copies.

Quick start

Claude Code (installable plugin):

/plugin marketplace add <your-github-user>/<this-repo>
/plugin install autonomous-qa-department@autonomous-qa-marketplace

Then just say "test everything", "act as QA and audit this project for bugs", or run /autonomous-qa.

Any other AI / IDE: copy this bundle into your project and tell the agent:

Read AGENTS.md and run the full autonomous QA workflow on this project.

It populates a qa/ folder at your project root and works the loop until the exit criteria are met. Resume any time — it reads qa/run-ledger.md and continues.

Installation (every tool)

Full per-tool instructions, including the skill-only install and how to verify the install, live in INSTALL.md. Summary:

Tool How What you get
Claude Code — plugin (recommended) /plugin marketplace add <user>/<repo> then /plugin install autonomous-qa-department@autonomous-qa-marketplace. A local clone path also works as a marketplace. Auto-triggering skill plus /autonomous-qa and /qa-agent-* slash commands
Claude Code — skill only Copy plugins/autonomous-qa-department/skills/autonomous-qa-department into <project>/.claude/skills/ Auto-triggering skill (no slash commands)
Cursor Copy the bundle to the project root including dotfiles (rsync -a --exclude .git ./ <project>/) /autonomous-qa command + always-on rule via .cursor/ adapters
Copilot / Codex / Windsurf / Aider / Gemini CLI Copy the bundle to the project root, then point the agent at AGENTS.md Universal entrypoint; for Copilot, add a one-liner to .github/copilot-instructions.md
Plain chat / web assistant Paste SKILL.md as the first message, provide the repo Follow the loop manually; keep qa/ files updated to stay resumable

In every case the agent writes run-state into a qa/ folder at the target project root and seeds it from the templates on first run.

Using it

Once installed, start a run in whatever way your tool supports:

  • Natural language: "test everything", "act as QA and audit this project for bugs", "do a full quality pass", "QA this API / app / CLI".
  • Slash command (Claude Code / Cursor): /autonomous-qa for the full loop, or a surface-specific entrypoint such as /qa-agent-web, /qa-agent-backend, /qa-agent-ios, /qa-agent-android, /qa-agent-cli, /qa-agent-desktop, /qa-agent-data, /qa-agent-serverless, /qa-agent-infra.
  • File pointer (other tools): "Read AGENTS.md and run the full autonomous QA workflow on this project."

The agent asks you which run mode to use, then discovers your stack, builds the coverage matrix, and works it down by risk. It is resumable — stop any time and re-issue the same command; it reads qa/run-ledger.md and continues. A run is "done" when the exit criteria hold (every P0/P1 coverage row closed; lower-priority rows closed or explicitly deferred).

Run modes

At the start of every test cycle the agent asks which mode to use. The mode changes only how blockers are handled — never the safety lines.

  • non-stop — Never pause. Log every blocker/open question as a BLK-###, mark affected units Blocked, and keep going with all other work. At the end it presents every accumulated blocker as one batch of direct questions. Best for unattended / overnight / "just run it and tell me everything" runs.
  • stop at blockers — Halt at each blocker that needs a human decision, ask a specific question, wait, and continue. Best for interactive or high-stakes, ambiguous-business-logic work.

For genuinely headless runs (a cron/scheduled agent with no human to answer), it defaults to non-stop and notes that in the ledger.

What it produces — the qa/ workspace

The run's entire memory is a set of plain-text files at your project root. They are both the working state and the deliverable — readable, diffable, and resumable:

File What it holds
qa/run-ledger.md Resumable state machine: current phase, cursor, decisions, blockers
qa/test-plan.md Detected surfaces, stack, build/test/lint commands, strategy
qa/feature-inventory.md Every feature + its expected behavior + source
qa/application-map.md Apps, routes, screens, services, jobs
qa/element-inventory.md Every interactive UI element + intended function
qa/business-flow-map.md End-to-end user/business journeys
qa/coverage-matrix.md The spine — every testable unit × test type × status
qa/risk-register.md Risk-ranked areas → test/fix priority
qa/bug-registry.md Every confirmed bug, stable IDs (BUG-0001…), full reproduction
qa/fix-plan.md Bugs batched by shared root cause
qa/regression-log.md Retest results and regression tests added
qa/final-report.md Metrics, outcomes, residual risk, next steps

The copies shipped inside this bundle are templates / schema definitions; the live run is the copy created at your project root. Evidence captured during testing goes under qa/evidence/ (kept git-ignored in consumer projects).

Supported surfaces

The agent detects which of these apply and reads the matching playbook for each:

web · backend/API · serverless / edge · iOS · Android · CLI · desktop · data pipelines · infra / CI

Surface playbooks live under references/surfaces/. A stack the detector doesn't recognize is fine — the agent inspects it manually and fills the test plan.

Bundled scripts

Optional, read-only conveniences (you can always do the work by hand). From a root-drop install they run as scripts/...; in plugin/skill mode invoke them by their install path (e.g. ${CLAUDE_PLUGIN_ROOT}/skills/.../scripts/...):

qa-detect.sh <project>          # detect surfaces & stack (read-only)
qa-detect.sh <project> --json   # machine-readable surface report
qa-all.sh <project>             # run the project's native checks (see safety note)
qa-rollup.sh <project>/qa       # compute the coverage / bug rollup

Safety boundaries

The department runs autonomously on safe, reversible, local/sandbox work — but it stops and records a blocker rather than guessing for anything that is:

  • destructive or hard to reverse,
  • production-touching or operating on real-user data,
  • money / payment / billing related,
  • credential-gated or needing paid external access,
  • an app-store or payment-provider operation, or
  • dependent on a business rule it cannot safely infer.

It also confirms a target is non-production before running native checks or migrations (native test commands can themselves be destructive). It never deletes, skips, or weakens failing tests to make CI green, never hides failures, and keeps fix diffs minimal. Full boundaries: references/10-safety.md.

Repository layout

.claude-plugin/
  marketplace.json        ← so people can /plugin install it (lists the plugin below)
.cursor/                  ← Cursor rules + commands (thin adapters → the skill)
AGENTS.md                 ← universal entrypoint for any AI/IDE
README.md / INSTALL.md    ← this, and how to install anywhere
LICENSE                   ← MIT
plugins/
  autonomous-qa-department/         ← the Claude Code plugin
    .claude-plugin/plugin.json      ← plugin manifest
    commands/                       ← slash commands (/autonomous-qa, /qa-agent-*)
    skills/
      autonomous-qa-department/     ← the skill itself
        SKILL.md          ← the brain: operating loop + pointers (start here)
        references/       ← deep playbooks, loaded on demand
          00-orchestration.md  01-discovery.md  02-coverage-model.md
          03-test-design.md  04-execution.md  05-triage.md  06-bugfix.md
          07-regression.md  08-nonfunctional.md  09-reporting.md  10-safety.md
          surfaces/        web · api-backend · serverless · iOS · Android ·
                           cli · desktop · data-pipeline · infra
        qa/               ← state templates (the department's memory)
          run-ledger.md  coverage-matrix.md  risk-register.md  test-plan.md
          feature-inventory.md  application-map.md  element-inventory.md
          business-flow-map.md  bug-registry.md  fix-plan.md  regression-log.md
          final-report.md
        scripts/          qa-detect.sh · qa-all.sh · qa-rollup.sh

Core principle

A passing test suite is not proof of quality. Every feature, element, endpoint, and flow is individually verified against an expected behavior and marked Pass / Fail / Blocked / Unknown with evidence. "Tests are green" is a starting point, never a conclusion.

Publishing your own copy

This repo is itself a Claude Code marketplace, so anyone can fork it and publish their own. Before you publish:

  1. Edit the owner in .claude-plugin/marketplace.json and the author in plugins/autonomous-qa-department/.claude-plugin/plugin.json to your own name/email.
  2. Optionally rename the marketplace name (autonomous-qa-marketplace) and update the install commands in this README and INSTALL.md to match.
  3. Update the Copyright line in LICENSE.
  4. Push to a git host, then install with /plugin marketplace add <your-github-user>/<your-repo>.

FAQ

Does it modify my code? Only during the Fix phase, with minimal diffs, and only within the safety boundaries above. Discovery is read-only; testing actively exercises the app (drives the UI, calls APIs, runs the CLI, triggers jobs) against a non-prod/sandbox target — native test commands run only after the non-prod checklist clears — but it doesn't change your source outside the Fix phase.

Is the run safe to interrupt? Yes. State is flushed to qa/run-ledger.md after every meaningful step; re-issue the same command to resume.

Can multiple agents work in parallel? If your harness supports subagents, it dispatches one QA agent per surface and one fix agent per batch (isolating file-writing fixes in separate worktrees/branches). Without subagents it runs the same steps sequentially for an identical result. See references/00-orchestration.md.

Where does the run state go? Into a qa/ folder at the root of the project under test — never inside this bundle. Add qa/evidence/ to that project's .gitignore.

License

MIT.


New here? Open SKILL.md — it explains the whole system in one page.

About

AI Q&A for your projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages