AIWG Framework

AUDIENCE for the platform lead

You're being asked to roll AI tooling out across the team.

Cursor for the engineers, Copilot for the contractors, Claude Code for the senior team that wants more agency. Maybe Factory AI is starting to come up in conversations. Procurement wants one license to manage. Security wants to know where the data goes. Compliance wants to know what gets logged.

Meanwhile, the engineers want the assistance to actually be assistive on the work that matters — not the toy refactors, the real architecture decisions, the parts that touch production.

76% of professional developers said they wouldn't trust AI for production deployment in the 2025 Stack Overflow Developer Survey. 46% actively distrusted accuracy. Those numbers exist because most AI tooling is built to optimize for "sounds plausible" rather than "actually correct," and most rollouts don't have a place to put the human checkpoints. AIWG is what changes when you build governance in from the start.

GATES phase gates and multi-agent review

Every artifact AIWG produces — a Software Architecture Document, a test plan, a deployment runbook — goes through a multi-agent review panel before being baselined.

Architecture Document Creation:
  1. Architecture Designer drafts SAD
  2. Review Panel (parallel):
     - Security Auditor    → threat perspective
     - Performance Engineer → scalability perspective
     - Test Architect       → testability perspective
     - Technical Writer     → clarity and consistency
  3. Documentation Synthesizer merges all feedback
  4. Human approval gate → accept, iterate, or escalate

Phase transitions enforce gate criteria. Inception cannot close until a security screening exists. Elaboration cannot close until an architecture baseline is signed off. Construction cannot close until tests pass and IOC criteria are met. Each gate produces evidence in .aiwg/gates/ — a real document a human approver actually reads, not a checkbox.

HITL operator-supervised agent runs

aiwg serve launches a browser-based operator console. Long-running agents (Ralph, daemon missions, parallel orchestrations) report up to a HITL drawer where humans approve, course-correct, or abort.

+----------------------------------------------------------+
|  AIWG SERVE · Mission: Sprint 4 · 4 active runs           |
+----------------------------------------------------------+
|  [·] sprint-4/refactor-auth      iteration 12/25  [APPROVE] [ABORT]
|  [·] sprint-4/payment-webhook   iteration  3/25  [APPROVE] [ABORT]
|  [!] sprint-4/migration-rollout WAITING ON HITL  [REVIEW →]
|  [·] sprint-4/observability     iteration  8/25  [APPROVE] [ABORT]
+----------------------------------------------------------+

The PTY orchestrator under the drawer drives the actual agent processes. Operators see the live terminal, the proposed action, and the rationale before the action runs. Approval is granular per-action, not blanket per-mission.

Recommendation is not authorization to act — the human-authorization rule is enforced at every high-stakes decision point.

TRACEABILITY requirements → code → tests → deployment

Bidirectional traceability is built into the artifact graph. A use case in requirements/UC-007.md declares which acceptance tests cover it, which architecture components implement it, which deployment runbook procedures it depends on.

@.aiwg/requirements/UC-007-payment-flow.md
  ⤷ covered_by: @.aiwg/testing/acceptance/AT-007.md
  ⤷ implemented_by: @.aiwg/architecture/components/payment-service.md
  ⤷ deployed_via: @.aiwg/deployment/runbooks/payment-deploy.md
  ⤷ risk_register: @.aiwg/risks/R-014-pci-compliance.md

The traceability matrix is queryable: aiwg check-traceability walks the graph and reports orphans, dead-ends, and broken refs. The activity-log rule appends every artifact change to .aiwg/activity.log — a chronological audit trail of who did what when, generated automatically by every agent that writes an artifact.

COMPLIANCE the runtime your security team will sign off on

AIWG is MIT licensed. The CLI runs on your infrastructure. The artifacts live in your repository. The audit log is plain text in .aiwg/activity.log.

No telemetry phones home. No prompts get shipped to a hosted service for analytics. No license server gates feature access. Your model provider (Claude, Cursor, Copilot, whichever) sees what the agents send to it — AIWG sees nothing.

For regulated workloads, the optional isolation runtime adds VM-grade per-agent containment with virtiofs shared storage, audit telemetry, and reconciliation after restarts.

Standards reference: artifact provenance follows W3C PROV; quality assessments use GRADE; FAIR principles inform the data architecture. The framework cites Miller (1956) on cognitive load, MetaGPT and AutoGen on multi-agent ensembles, R-LAM (2026) on reproducibility — the choices were not invented for marketing copy, they're load-bearing in the design.

NOT-A-SAAS what AIWG is not

If your evaluation matrix has rows for "hosted control plane" or "managed agent runtime," AIWG is the wrong answer to several of them.

Not a SaaS. AIWG does not run agents for you. It deploys agents to your existing AI tools.

Not a hosted runtime. The orchestration runtime (Agent Loop, daemon, Mission Control, aiwg serve) runs on your hardware. So does the optional isolation runtime.

No vendor lock-in. The same agents run in Claude Code, Cursor, Copilot, Factory AI, Codex, Hermes, OpenCode, OpenClaw, Warp, Windsurf. Migration between providers is a config flag, not a re-architecture.

No telemetry-back-to-vendor. AIWG is MIT-licensed CLI tooling. There is no "us" collecting data.

If you want a hosted, paid, SaaS platform: LangSmith, CrewAI AMP, and Foundry are real options and they have legitimate tradeoffs. AIWG is the answer when those tradeoffs aren't the right ones for your team.

ROLLOUT what a measured rollout looks like

A typical engineering-team rollout in three phases:

Phase 1 — Pilot (2 weeks)

One small team, one project. aiwg use sdlc in the project directory. The pilot validates the artifact workflow, the gate process, and the integration with your existing CI. No production deployment.

Phase 2 — Adoption (4–8 weeks)

Expand to a handful of teams. Configure team-level customizations as project-local extensions (per-project, no fork). Roll out the HITL drawer for production-touching missions. Train senior engineers on the multi-agent review pattern.

Phase 3 — Operate

Steady-state. Cross-team customizations promoted to corpus repos for sharing without going public. Daemon mode runs scheduled missions overnight. Mission Control dispatches parallel work during sprint cycles.

cd ~/work/our-product && aiwg use sdlc
NEXT where to go from here

> /loop — for the individual contributor evaluating AIWG before bringing it up internally.

> /sandbox — isolation requirements? On-prem? Air-gapped? VM-grade per-agent runtime.

> /extend — building team-internal customizations? The 3-path graduation ladder.

> /h/team — the alternate hero for this audience.