Skip to content

eavookindroid/agile-smart-agent-claude

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agile Smart Agent for Claude Code

AI development management system using Doc-Driven Story-based workflow: automatic specification generation, task decomposition, verification, and metrics.

Designed for Claude Code with support for Codex as an additional external execution agent.

What is it? System for managing AI code development with mandatory verification and metrics.

Quick start: ./install.sh β†’ Create REQ-001.md β†’ /makedesign REQ-001.md β†’ /code T-001-001 β†’ /codereview T-001-001 β†’ /done T-001-001

Key features:

  • Story-based task tracking (one file = all tasks)
  • Mandatory code verification (no "wishful thinking")
  • Automatic time tracking and efficiency metrics
  • Support for Claude (built-in) and Codex (external CLI)

Table of Contents

Problem

AI agents for code generation suffer from chronic issues:

  • Ignore requirements - do what wasn't asked
  • Don't follow system prompts - forget project rules
  • "Wishful thinking" - report completion without actual code verification

Typical scenario:

You: "Implement task T-001-001 from Story"
AI: βœ“ Done! Here's what I did: [list of claims]
You: "Show me the code"
AI: [half not implemented or implemented incorrectly]

Solution: Story-based Architecture with Mandatory Verification

Document Hierarchy (markdown files)

backlog/requirements/REQ-001.md (Requirement - user request)
           ↓  /makedesign
backlog/design/DSGN-001.md (Design - architecture design from AI)
           ↓
backlog/stories/STORY-001.md (Story - file with all tasks)
    β”œβ”€β”€ T-001-001: Task 1  ← /code T-001-001
    β”œβ”€β”€ T-001-002: Task 2  ← /code T-001-002
    └── T-001-003: Task 3  ← /code T-001-003

Key idea:

  • REQ-XXX.md = Requirement (user request, written manually)
  • DSGN-XXX.md = Design (architectural design, generated by /makedesign)
  • STORY-XXX.md = Story (ONE file with ALL tasks, generated by /makedesign)
  • T-XXX-YYY = Task ID format (T-001-001, T-001-002, ...)
  • Each task: status, acceptance criteria, dependencies, estimate

8 Slash Commands with Mandatory Checks

Command Role What It Does
/makedesign [claude|codex] REQ-001.md Architect Analyzes requirement β†’ creates DSGN-001.md (design) + STORY-001.md (tasks) via Claude (default) or Codex CLI (alternative architecture)
/code [claude|codex] TASK-ID Executor Implements code via Claude (default, built-in) or Codex CLI (external utility) + tracks time ⏱️
/codereview [claude|codex] TASK-ID Critic Checks ALL criteria via Claude (default, built-in) or Codex CLI (external utility) β†’ shows code β†’ bug report on failure + tracks time ⏱️
/fix [claude|codex] TASK-ID Fixer Reads bug report β†’ fixes via Claude (default, surgical edits) or Codex CLI (regeneration) + tracks time ⏱️
/test TASK-ID Tester 🚧 In development: Testing β†’ marks acceptance criteria [x] on successful check + tracks time ⏱️
/done TASK-ID Finalizer Git workflow (commit β†’ merge to master) β†’ updates Story β†’ calculates metrics πŸ“Š. Note: code checks done by git hooks, not /done
/bug [claude|codex] "description" Bug Parser πŸͺ² Parses user complaint via Claude (default) or Codex CLI β†’ creates HOTFIX-XXX.md with criteria β†’ suggests /code HOTFIX-XXX
/report [STORY-ID] Analyst Generates reports with Mermaid diagrams πŸ“Š (Gantt, Pie, Bar, Line charts) from time tracking

Installation

Step 1: Install Commands Globally (once)

git clone <repo-url> smart-agent-claude
cd smart-agent-claude
./install.sh

What happens:

Installation creates file hierarchy in ~/.claude/ - commands work globally in any project.

~/.claude/
β”œβ”€β”€ commands/          # 8 slash commands (work everywhere)
β”‚   β”œβ”€β”€ makedesign.md
β”‚   β”œβ”€β”€ code.md
β”‚   β”œβ”€β”€ codereview.md
β”‚   β”œβ”€β”€ fix.md
β”‚   β”œβ”€β”€ done.md
β”‚   β”œβ”€β”€ bug.md
β”‚   β”œβ”€β”€ report.md
β”‚   └── test.md
└── templates/         # Document templates
    β”œβ”€β”€ story-template.md
    β”œβ”€β”€ design-template.md
    └── requirements-template.md

Step 2: Work in Your Project

cd /path/to/your-project

# Optional: create coding rules for the agent
touch CLAUDE.md

Note: The backlog/ structure is created automatically by commands on first use. No need to create manually!

Project structure:

your-project/
β”œβ”€β”€ backlog/
β”‚   β”œβ”€β”€ requirements/      # Freeform requirements (written manually)
β”‚   β”‚   └── REQ-001.md
β”‚   β”œβ”€β”€ design/            # Design documents (generated by /makedesign)
β”‚   β”‚   └── DSGN-001.md
β”‚   β”œβ”€β”€ stories/           # Stories with tasks (generated by /makedesign)
β”‚   β”‚   └── STORY-001.md   # T-001-001, T-001-002, T-001-003...
β”‚   β”œβ”€β”€ hotfix/            # Hotfix tasks (created by /bug)
β”‚   β”‚   └── HOTFIX-001.md  # Bugfixes from user complaints
β”‚   └── issues/            # Bug reports (created by /codereview)
β”‚       └── T-001-001-issues.md
β”œβ”€β”€ CLAUDE[/AGENTS].md              # Project rules (optional)
└── src/                   # Your code

Quick Start

Complete Development Cycle

1. Create requirement (freeform text in REQ-001.md):

Email validation for registration using regex

2. Generate design and tasks:

/makedesign REQ-001.md

Creates:

  • DSGN-001.md - architecture design
  • STORY-001.md - task list (T-001-001, T-001-002, etc.)

Numbers are automatic: REQ-001 β†’ DSGN-001 β†’ STORY-001 β†’ T-001-XXX

3. Implement task:

/code T-001-001

Creates feature branch β†’ writes code β†’ provides evidence of changes

4. Verify implementation:

/codereview T-001-001

Checks acceptance criteria β†’ shows real code proof β†’ verdict: PASSED/FAILED

5. Fix if needed:

/fix T-001-001           # If /codereview found issues
/codereview T-001-001    # Re-check

6. Finalize:

/done T-001-001

Commits β†’ merges to master β†’ calculates metrics (time, efficiency, overshoot)


Hotfix Flow

User complaint β†’ Structured fix:

/bug "Buttons don't click!"              # Parses complaint β†’ creates HOTFIX-001.md
/code HOTFIX-001                         # Implements fix
/codereview HOTFIX-001                   # Verifies fix
/fix HOTFIX-001           # If needed
/done HOTFIX-001                         # Finalizes

Model Selection for All Commands

All commands support model selection (Claude vs Codex):

Syntax

Command Default With Codex With Claude
/makedesign /makedesign REQ-001.md /makedesign codex REQ-001.md /makedesign claude REQ-001.md
/code /code TASK-ID /code codex TASK-ID /code claude TASK-ID
/codereview /codereview TASK-ID /codereview codex TASK-ID /codereview claude TASK-ID
/fix /fix TASK-ID /fix codex TASK-ID /fix claude TASK-ID
/bug /bug "description" /bug codex "description" /bug claude "description"
/done /done TASK-ID (no model selection) - -
/report /report [STORY-ID|trends] (no model selection) - -

Claude vs Codex

Aspect Claude (default) Codex (optional)
Speed ⚑ Fast (built-in) Slower (external CLI)
Edits 🎯 Surgical precise changes πŸ”„ Can regenerate completely
Best for Quick tasks, fixes, reviews Alternative architecture, complex refactoring
Autonomy Uses Read/Write/Edit tools Full workspace access via CLI
Logs No separate logs Activity logs in .claude/codex/
When to use Default for all commands When Claude fails after 1-2 iterations

Recommended Workflow

Option 1: All via Claude (default)

# Fast and efficient for most tasks
> /code TASK-001              # Claude writes code
> /codereview TASK-001        # Claude checks
# βœ… PASSED or ❌ FAILED

# If FAILED:
> /fix TASK-001               # Claude surgically fixes
> /codereview TASK-001
# βœ… PASSED

> /done TASK-001

Option 2: Claude β†’ Codex (if Claude failed)

# 1. First attempt: Claude writes code
> /code TASK-001
> /codereview TASK-001
# ❌ FAILED: Found 3 Issues

# 2. First iteration: Claude (surgical edits)
> /fix TASK-001
> /codereview TASK-001
# ❌ FAILED: 1 Issue remains

# 3. Second iteration: Codex (regeneration)
> /fix codex TASK-001
> /codereview TASK-001
# βœ… PASSED

> /done TASK-001

Time Tracking with Models

Model is recorded in bug report and Story:

## Fix Iterations (backlog/issues/TASK-001-issues.md)

### Iteration #1 (Model: claude)
- Started: 2025-10-07 10:00
- Finished: 2025-10-07 10:15
- Duration: 15m
- Issues fixed: #1, #2

### Iteration #2 (Model: codex)
- Started: 2025-10-07 10:30
- Finished: 2025-10-07 10:45
- Duration: 15m
- Issues fixed: #3

Story file (Time Tracking):

- **Time Tracking:**
  - /code (codex): 10:00 β†’ 10:45 (45m)
  - /codereview (claude): 10:45 β†’ 10:50 (5m) β†’ FAILED
  - /fix (iter 1, claude): 11:00 β†’ 11:15 (15m)
  - /codereview (claude): 11:15 β†’ 11:18 (3m) β†’ FAILED
  - /fix (iter 2, codex): 11:30 β†’ 11:45 (15m)
  - /codereview (claude): 11:45 β†’ 11:48 (3m) β†’ PASSED
  - /done: 11:48 β†’ 11:50 (2m)

/done breakdown:

- **Breakdown:**
  - /code (codex): 45m (52%)
  - /codereview (claude): 11m (13%)
  - /fix (claude + codex): 30m (35%)
  - /done: 2m (2%)
- **Models used:** codex (2x), claude (4x)
- **Iterations:** 2 (2 fix cycles)

Workflow Diagrams

Standard Development Flow

REQ β†’ /makedesign β†’ STORY β†’ /code β†’ /codereview β†’ /done βœ…
                                         ↓ (if FAILED)
                                      /fix β†’ /codereview

Hotfix Flow

User complaint β†’ /bug β†’ HOTFIX-XXX.md β†’ /code β†’ /codereview β†’ /fix (if needed) β†’ /done βœ…

Metrics and Analytics

All commands automatically track work time via bash!

How Time Tracking Works

  1. Each command (/code, /codereview, /fix, /done) records:

    • Start time (automatic)
    • End time (automatic)
    • Elapsed time (calculated via bash)
  2. Recorded in Story:

    - **Time Tracking:**
      - /code: 2025-10-06 10:00 β†’ 10:45 (45m)
      - /codereview: 2025-10-06 10:45 β†’ 10:52 (7m) β†’ PASSED
      - /fix: 2025-10-06 11:00 β†’ 11:30 (30m)
      - /codereview: 2025-10-06 11:30 β†’ 11:35 (5m) β†’ PASSED
      - /done: 2025-10-06 11:35 β†’ 11:40 (5m)
    - **Actual:** 1h 32m ⚠️ (+54% overshoot)
    - **Efficiency:** 65%
    - **Iterations:** 2 (1 fix cycle)
  3. /done calculates final metrics:

    • Total Actual: sum of all stages (automatically via bash)
    • Overshoot: ((actual - estimate) / estimate) * 100
    • Efficiency: (estimate / actual) * 100
    • Iterations: number of review passes (counts FAILED β†’ /fix cycles)

/report Command - Three Modes

Comparison:

Aspect /report /report STORY-ID /report trends
Purpose Project overview Single Story details Improvement dynamics
Scope All Stories (aggregated) One Story (detailed) All tasks (over time)
Grouping By Story By tasks within Story By weeks/months + types
Question "How are Stories going?" "How is this Story?" "Am I improving?"
Focus Which Stories problematic Which tasks problematic Which task types difficult
Time Current state snapshot One Story snapshot Change trend
Diagrams Pie (time by Story), Bar (efficiency by Story) Gantt (task timeline), Pie (stages), Burndown Line (learning curve), Bar (type patterns)
Output file ./backlog/report-common.md ./backlog/report-STORY-ID.md ./backlog/report-trends.md

In simple terms:

  • /report β†’ "Where are we now" (all Stories overview)
  • /report STORY-ID β†’ "What about this Story" (task details)
  • /report trends β†’ "Where are we heading" (improvement over time)

Why Track Metrics?

  1. Baseline for AI performance:

    • Estimates = human time baseline
    • Actual = real AI agent time
    • Compare AI vs human speed
  2. Finding bottlenecks:

    • Which task types always overshoot
    • Which stage takes most time (code/review/fix)
    • Where better estimates are needed
  3. Learning curve:

    • Is efficiency improving over time
    • Are estimates becoming more accurate
    • Fewer fix cycles after learning
  4. Planning:

    • Time forecast for new Stories
    • Buffers for risky tasks (refactoring +60%)

Git Workflow Rules

Forbidden Commands

git checkout     # use git switch
git restore      # manage changes explicitly
git stash        # commit, don't hide
git reset --hard # irreversible!

Correct Workflow

# /code creates feature branch: feature/T-001-001-validate-email
# Work in branch β†’ commit changes
# /done merges to master with --no-ff (preserves history)

When to Create Story Manually

Via /makedesign (recommended)

  • Large features (5+ tasks)
  • Complex dependencies
  • Need architectural design

Manually

  • Small features (2-3 tasks)
  • Simple bugfixes
  • Quick prototypes
# Copy template
cp ~/.claude/templates/story-template.md backlog/stories/STORY-099.md

# Edit
vim backlog/stories/STORY-099.md

# Use
> /code T-099-001
> /codereview T-099-001
> /done T-099-001

TODO

/test Command (in development)

Goal: Automatically mark acceptance criteria checkboxes based on test results.

Planned:

  • Auto-detect test framework (pytest, jest, cargo test, etc.)
  • Match test results to acceptance criteria
  • Mark - [ ] as - [x] for passing tests
  • Generate test coverage report

Status: Stub ready, outputs "in development"


License

MIT


Created By

Β© Artel Team

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages