Agile Smart Agent for `Claude Code`

AI development management system using Doc-Driven Story-based workflow: automatic specification generation, task decomposition, verification, and metrics.

Designed for Claude Code with support for Codex as an additional external execution agent.

What is it? System for managing AI code development with mandatory verification and metrics.

Quick start: ./install.sh → Create REQ-001.md → /makedesign REQ-001.md → /code T-001-001 → /codereview T-001-001 → /done T-001-001

Key features:

Story-based task tracking (one file = all tasks)
Mandatory code verification (no "wishful thinking")
Automatic time tracking and efficiency metrics
Support for Claude (built-in) and Codex (external CLI)

Problem

AI agents for code generation suffer from chronic issues:

Ignore requirements - do what wasn't asked
Don't follow system prompts - forget project rules
"Wishful thinking" - report completion without actual code verification

Typical scenario:

You: "Implement task T-001-001 from Story"
AI: ✓ Done! Here's what I did: [list of claims]
You: "Show me the code"
AI: [half not implemented or implemented incorrectly]

Solution: Story-based Architecture with Mandatory Verification

Document Hierarchy (markdown files)

backlog/requirements/REQ-001.md (Requirement - user request)
           ↓  /makedesign
backlog/design/DSGN-001.md (Design - architecture design from AI)
           ↓
backlog/stories/STORY-001.md (Story - file with all tasks)
    ├── T-001-001: Task 1  ← /code T-001-001
    ├── T-001-002: Task 2  ← /code T-001-002
    └── T-001-003: Task 3  ← /code T-001-003

Key idea:

REQ-XXX.md = Requirement (user request, written manually)
DSGN-XXX.md = Design (architectural design, generated by /makedesign)
STORY-XXX.md = Story (ONE file with ALL tasks, generated by /makedesign)
T-XXX-YYY = Task ID format (T-001-001, T-001-002, ...)
Each task: status, acceptance criteria, dependencies, estimate

8 Slash Commands with Mandatory Checks

Command	Role	What It Does
`/makedesign [claude\|codex] REQ-001.md`	Architect	Analyzes requirement → creates DSGN-001.md (design) + STORY-001.md (tasks) via Claude (default) or Codex CLI (alternative architecture)
`/code [claude\|codex] TASK-ID`	Executor	Implements code via Claude (default, built-in) or Codex CLI (external utility) + tracks time ⏱️
`/codereview [claude\|codex] TASK-ID`	Critic	Checks ALL criteria via Claude (default, built-in) or Codex CLI (external utility) → shows code → bug report on failure + tracks time ⏱️
`/fix [claude\|codex] TASK-ID`	Fixer	Reads bug report → fixes via Claude (default, surgical edits) or Codex CLI (regeneration) + tracks time ⏱️
`/test TASK-ID`	Tester 🚧	In development: Testing → marks acceptance criteria `[x]` on successful check + tracks time ⏱️
`/done TASK-ID`	Finalizer	Git workflow (commit → merge to master) → updates Story → calculates metrics 📊. Note: code checks done by git hooks, not `/done`
`/bug [claude\|codex] "description"`	Bug Parser 🪲	Parses user complaint via Claude (default) or Codex CLI → creates HOTFIX-XXX.md with criteria → suggests `/code HOTFIX-XXX`
`/report [STORY-ID]`	Analyst	Generates reports with Mermaid diagrams 📊 (Gantt, Pie, Bar, Line charts) from time tracking

Installation

Step 1: Install Commands Globally (once)

git clone <repo-url> smart-agent-claude
cd smart-agent-claude
./install.sh

What happens:

Installation creates file hierarchy in ~/.claude/ - commands work globally in any project.

~/.claude/
├── commands/          # 8 slash commands (work everywhere)
│   ├── makedesign.md
│   ├── code.md
│   ├── codereview.md
│   ├── fix.md
│   ├── done.md
│   ├── bug.md
│   ├── report.md
│   └── test.md
└── templates/         # Document templates
    ├── story-template.md
    ├── design-template.md
    └── requirements-template.md

Step 2: Work in Your Project

cd /path/to/your-project

# Optional: create coding rules for the agent
touch CLAUDE.md

Note: The backlog/ structure is created automatically by commands on first use. No need to create manually!

Project structure:

your-project/
├── backlog/
│   ├── requirements/      # Freeform requirements (written manually)
│   │   └── REQ-001.md
│   ├── design/            # Design documents (generated by /makedesign)
│   │   └── DSGN-001.md
│   ├── stories/           # Stories with tasks (generated by /makedesign)
│   │   └── STORY-001.md   # T-001-001, T-001-002, T-001-003...
│   ├── hotfix/            # Hotfix tasks (created by /bug)
│   │   └── HOTFIX-001.md  # Bugfixes from user complaints
│   └── issues/            # Bug reports (created by /codereview)
│       └── T-001-001-issues.md
├── CLAUDE[/AGENTS].md              # Project rules (optional)
└── src/                   # Your code

Quick Start

Complete Development Cycle

1. Create requirement (freeform text in REQ-001.md):

Email validation for registration using regex

2. Generate design and tasks:

/makedesign REQ-001.md

Creates:

DSGN-001.md - architecture design
STORY-001.md - task list (T-001-001, T-001-002, etc.)

Numbers are automatic: REQ-001 → DSGN-001 → STORY-001 → T-001-XXX

3. Implement task:

/code T-001-001

Creates feature branch → writes code → provides evidence of changes

4. Verify implementation:

/codereview T-001-001

Checks acceptance criteria → shows real code proof → verdict: PASSED/FAILED

5. Fix if needed:

/fix T-001-001           # If /codereview found issues
/codereview T-001-001    # Re-check

6. Finalize:

/done T-001-001

Commits → merges to master → calculates metrics (time, efficiency, overshoot)

Hotfix Flow

User complaint → Structured fix:

/bug "Buttons don't click!"              # Parses complaint → creates HOTFIX-001.md
/code HOTFIX-001                         # Implements fix
/codereview HOTFIX-001                   # Verifies fix
/fix HOTFIX-001           # If needed
/done HOTFIX-001                         # Finalizes

Model Selection for All Commands

All commands support model selection (Claude vs Codex):

Syntax

Command	Default	With Codex	With Claude
`/makedesign`	`/makedesign REQ-001.md`	`/makedesign codex REQ-001.md`	`/makedesign claude REQ-001.md`
`/code`	`/code TASK-ID`	`/code codex TASK-ID`	`/code claude TASK-ID`
`/codereview`	`/codereview TASK-ID`	`/codereview codex TASK-ID`	`/codereview claude TASK-ID`
`/fix`	`/fix TASK-ID`	`/fix codex TASK-ID`	`/fix claude TASK-ID`
`/bug`	`/bug "description"`	`/bug codex "description"`	`/bug claude "description"`
`/done`	`/done TASK-ID` (no model selection)	-	-
`/report`	`/report [STORY-ID\|trends]` (no model selection)	-	-

Claude vs Codex

Aspect	Claude (default)	Codex (optional)
Speed	⚡ Fast (built-in)	Slower (external CLI)
Edits	🎯 Surgical precise changes	🔄 Can regenerate completely
Best for	Quick tasks, fixes, reviews	Alternative architecture, complex refactoring
Autonomy	Uses Read/Write/Edit tools	Full workspace access via CLI
Logs	No separate logs	Activity logs in `.claude/codex/`
When to use	Default for all commands	When Claude fails after 1-2 iterations

Recommended Workflow

Option 1: All via Claude (default)

# Fast and efficient for most tasks
> /code TASK-001              # Claude writes code
> /codereview TASK-001        # Claude checks
# ✅ PASSED or ❌ FAILED

# If FAILED:
> /fix TASK-001               # Claude surgically fixes
> /codereview TASK-001
# ✅ PASSED

> /done TASK-001

Option 2: Claude → Codex (if Claude failed)

# 1. First attempt: Claude writes code
> /code TASK-001
> /codereview TASK-001
# ❌ FAILED: Found 3 Issues

# 2. First iteration: Claude (surgical edits)
> /fix TASK-001
> /codereview TASK-001
# ❌ FAILED: 1 Issue remains

# 3. Second iteration: Codex (regeneration)
> /fix codex TASK-001
> /codereview TASK-001
# ✅ PASSED

> /done TASK-001

Time Tracking with Models

Model is recorded in bug report and Story:

## Fix Iterations (backlog/issues/TASK-001-issues.md)

### Iteration #1 (Model: claude)
- Started: 2025-10-07 10:00
- Finished: 2025-10-07 10:15
- Duration: 15m
- Issues fixed: #1, #2

### Iteration #2 (Model: codex)
- Started: 2025-10-07 10:30
- Finished: 2025-10-07 10:45
- Duration: 15m
- Issues fixed: #3

Story file (Time Tracking):

- **Time Tracking:**
  - /code (codex): 10:00 → 10:45 (45m)
  - /codereview (claude): 10:45 → 10:50 (5m) → FAILED
  - /fix (iter 1, claude): 11:00 → 11:15 (15m)
  - /codereview (claude): 11:15 → 11:18 (3m) → FAILED
  - /fix (iter 2, codex): 11:30 → 11:45 (15m)
  - /codereview (claude): 11:45 → 11:48 (3m) → PASSED
  - /done: 11:48 → 11:50 (2m)

/done breakdown:

- **Breakdown:**
  - /code (codex): 45m (52%)
  - /codereview (claude): 11m (13%)
  - /fix (claude + codex): 30m (35%)
  - /done: 2m (2%)
- **Models used:** codex (2x), claude (4x)
- **Iterations:** 2 (2 fix cycles)

Workflow Diagrams

Standard Development Flow

REQ → /makedesign → STORY → /code → /codereview → /done ✅
                                         ↓ (if FAILED)
                                      /fix → /codereview

Hotfix Flow

User complaint → /bug → HOTFIX-XXX.md → /code → /codereview → /fix (if needed) → /done ✅

Metrics and Analytics

All commands automatically track work time via bash!

How Time Tracking Works

Each command (/code, /codereview, /fix, /done) records:
- Start time (automatic)
- End time (automatic)
- Elapsed time (calculated via bash)

Recorded in Story:

- **Time Tracking:**
  - /code: 2025-10-06 10:00 → 10:45 (45m)
  - /codereview: 2025-10-06 10:45 → 10:52 (7m) → PASSED
  - /fix: 2025-10-06 11:00 → 11:30 (30m)
  - /codereview: 2025-10-06 11:30 → 11:35 (5m) → PASSED
  - /done: 2025-10-06 11:35 → 11:40 (5m)
- **Actual:** 1h 32m ⚠️ (+54% overshoot)
- **Efficiency:** 65%
- **Iterations:** 2 (1 fix cycle)

/done calculates final metrics:
- Total Actual: sum of all stages (automatically via bash)
- Overshoot: ((actual - estimate) / estimate) * 100
- Efficiency: (estimate / actual) * 100
- Iterations: number of review passes (counts FAILED → /fix cycles)

`/report` Command - Three Modes

Comparison:

Aspect	`/report`	`/report STORY-ID`	`/report trends`
Purpose	Project overview	Single Story details	Improvement dynamics
Scope	All Stories (aggregated)	One Story (detailed)	All tasks (over time)
Grouping	By Story	By tasks within Story	By weeks/months + types
Question	"How are Stories going?"	"How is this Story?"	"Am I improving?"
Focus	Which Stories problematic	Which tasks problematic	Which task types difficult
Time	Current state snapshot	One Story snapshot	Change trend
Diagrams	Pie (time by Story), Bar (efficiency by Story)	Gantt (task timeline), Pie (stages), Burndown	Line (learning curve), Bar (type patterns)
Output file	`./backlog/report-common.md`	`./backlog/report-STORY-ID.md`	`./backlog/report-trends.md`

In simple terms:

/report → "Where are we now" (all Stories overview)
/report STORY-ID → "What about this Story" (task details)
/report trends → "Where are we heading" (improvement over time)

Why Track Metrics?

Baseline for AI performance:
- Estimates = human time baseline
- Actual = real AI agent time
- Compare AI vs human speed
Finding bottlenecks:
- Which task types always overshoot
- Which stage takes most time (code/review/fix)
- Where better estimates are needed
Learning curve:
- Is efficiency improving over time
- Are estimates becoming more accurate
- Fewer fix cycles after learning
Planning:
- Time forecast for new Stories
- Buffers for risky tasks (refactoring +60%)

Git Workflow Rules

Forbidden Commands

git checkout     # use git switch
git restore      # manage changes explicitly
git stash        # commit, don't hide
git reset --hard # irreversible!

Correct Workflow

# /code creates feature branch: feature/T-001-001-validate-email
# Work in branch → commit changes
# /done merges to master with --no-ff (preserves history)

When to Create Story Manually

Via /makedesign (recommended)

Large features (5+ tasks)
Complex dependencies
Need architectural design

Manually

Small features (2-3 tasks)
Simple bugfixes
Quick prototypes

# Copy template
cp ~/.claude/templates/story-template.md backlog/stories/STORY-099.md

# Edit
vim backlog/stories/STORY-099.md

# Use
> /code T-099-001
> /codereview T-099-001
> /done T-099-001

TODO

`/test` Command (in development)

Goal: Automatically mark acceptance criteria checkboxes based on test results.

Planned:

Auto-detect test framework (pytest, jest, cargo test, etc.)
Match test results to acceptance criteria
Mark - [ ] as - [x] for passing tests
Generate test coverage report

Status: Stub ready, outputs "in development"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
templates		templates
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh

License

eavookindroid/agile-smart-agent-claude

Folders and files

Latest commit

History

Repository files navigation

Agile Smart Agent for Claude Code

Table of Contents

Problem

Solution: Story-based Architecture with Mandatory Verification

Document Hierarchy (markdown files)

8 Slash Commands with Mandatory Checks

Installation

Step 1: Install Commands Globally (once)

Step 2: Work in Your Project

Quick Start

Complete Development Cycle

Hotfix Flow

Model Selection for All Commands

Syntax

Claude vs Codex

Recommended Workflow

Option 1: All via Claude (default)

Option 2: Claude → Codex (if Claude failed)

Time Tracking with Models

Workflow Diagrams

Standard Development Flow

Hotfix Flow

Metrics and Analytics

How Time Tracking Works

/report Command - Three Modes

Why Track Metrics?

Git Workflow Rules

Forbidden Commands

Correct Workflow

When to Create Story Manually

Via /makedesign (recommended)

Manually

TODO

/test Command (in development)

License

Created By

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Agile Smart Agent for `Claude Code`

`/report` Command - Three Modes

`/test` Command (in development)

Packages