A framework that enables skills to evolve and improve themselves through performance monitoring, variant generation, and data-driven optimization—leveraging Claude Code's native hot-reload capability.
This project implements a meta-cognitive system where skills can:
- 📊 Monitor their own performance metrics
- 🧬 Generate improved variants of themselves
- 🧪 Safely A/B test changes
- 🚀 Automatically deploy improvements
- 🔄 Rollback if performance degrades
┌─────────────────────────────────────────────────┐
│ Evolution Engine (Meta Skill) │
│ ┌───────────────────────────────────────────┐ │
│ │ Monitor → Analyze → Generate → Test │ │
│ │ ↓ │ │
│ │ Evaluate → Select → Deploy → Monitor │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
↓ manages
┌─────────────────────────────────────────────────┐
│ Self-Evolving Skills │
│ • performance metrics │
│ • version history │
│ • improvement suggestions │
└─────────────────────────────────────────────────┘
self-evolution/
├── lib/
│ └── evolution-engine.js # Core evolution engine
├── skills/
│ ├── evolution-engine.md # Meta-skill documentation
│ └── smart-commit/ # Example self-evolving skill
│ ├── skill.md # Current version
│ ├── meta.json # Performance metadata
│ └── variants/ # Version history
│ └── v1.0/
│ └── skill.md
├── package.json
└── README.md
| Level | Description | Safety |
|---|---|---|
| Prompt Evolution | Modify skill prompts/instructions | Safest |
| Parameter Tuning | Adjust config values (timeout, temperature) | Safe |
| Code Mutation | Generate equivalent but more optimal code | Medium |
| Architecture Evolution | Restructure skill fundamentally | Requires approval |
Each skill tracks:
- Success Rate: Percentage of accepted outputs
- User Feedback: Satisfaction scores
- Execution Time: Average processing duration
- Error Rate: Failure frequency
- Patterns Learned: Successful and failure patterns
- Create a skill directory with
skill.mdandmeta.json:
{
"name": "your-skill",
"version": "1.0.0",
"created_at": "2025-01-11T10:00:00Z",
"metrics": {
"success_rate": 0.0,
"total_executions": 0,
"acceptance_count": 0,
"rejection_count": 0
},
"evolution_history": [],
"learning_data": {
"successful_examples": [],
"failed_examples": []
}
}- Use the EvolutionEngine to initialize and track:
const EvolutionEngine = require('./lib/evolution-engine');
const engine = new EvolutionEngine({
minSamplesBeforeEvolution: 20,
minImprovementThreshold: 0.1
});
// Initialize skill
await engine.initSkill('your-skill');
// Record execution results
await engine.recordExecution('your-skill', {
success: true,
execution_time: 1500,
input: {...},
output: {...},
feedback: 'good'
});
// Analyze and generate variants
const analysis = await engine.analyze('your-skill');
const variants = await engine.generateVariants('your-skill');
// Deploy improvement
await engine.deploy('your-skill', variants[0], {
improvement: 0.15,
metrics: {...}
});The included smart-commit skill demonstrates self-evolution:
- Generates conventional commit messages
- Learns from user feedback
- Adapts to project-specific patterns
- Improves message quality over time
Collect performance data over N executions
↓
Track success rate, timing, feedback
Identify patterns in successful outputs
↓
Discover common failure modes
↓
Generate improvement hypotheses
Create M candidate variants using:
- Pattern reinforcement
- Failure mitigation
- Style optimization
Run shadow tests on all variants
↓
Collect performance metrics
Select best-performing variant
↓
Deploy with safety checks
↓
Monitor for degradation
- ✅ All changes tested before deployment
- ✅ Automatic rollback on performance drop
- ✅ Complete audit trail of evolution
- ✅ Human approval for high-risk changes
- ✅ Version history with instant rollback
{
minSamplesBeforeEvolution: 20, // Min executions before evolving
minImprovementThreshold: 0.1, // 10% improvement required
maxVariants: 3, // Max variants to generate
autoRollback: true, // Auto-rollback on degradation
requireApproval: true, // Require human approval
skillsDir: './skills' // Skills directory
}┌─────────────────────────────────────────────┐
│ 1. Use skill → Collect metrics │
│ 2. Analyze patterns → Identify improvements│
│ 3. Generate variants → Test in shadow │
│ 4. Deploy best → Monitor performance │
│ 5. Rollback if needed → Continue learning │
└─────────────────────────────────────────────┘
{
"evolution_history": [
{
"version": "1.1.0",
"from_version": "1.0.0",
"change_type": "prompt_enhancement",
"changes": "Add learned patterns: conventional commits, monorepo",
"improvement": 0.15,
"timestamp": "2025-01-12T10:00:00Z"
},
{
"version": "1.2.0",
"from_version": "1.1.0",
"change_type": "failure_fix",
"changes": "Address failures: missing scope, too long",
"improvement": 0.08,
"timestamp": "2025-01-14T15:30:00Z"
}
]
}- Start Small: Begin with prompt-level evolution
- Collect Data: Gather sufficient baseline metrics
- Review Changes: Inspect auto-generated variants
- Monitor Closely: Watch post-deployment performance
- Keep History: Maintain detailed evolution logs
- Evolution speed depends on usage frequency
- Requires sufficient execution data for analysis
- Cannot fundamentally change skill purpose
- Human oversight needed for major changes
Balance competing metrics:
- Maximize success rate
- Minimize execution time
- Maximize user satisfaction
Apply learnings between skills:
await engine.transferLearning('source-skill', 'target-skill');- Crossover: Combine features from successful versions
- Mutation: Introduce small controlled changes
- Selection: Keep top-performing variants
To add new self-evolving skills:
- Create skill directory in
skills/ - Add
skill.mdwith current version - Create
meta.jsonwith tracking config - Implement performance recording
- Test evolution cycle
MIT
Built with Claude Code's native hot-reload capability 🔥