Self-improving AI agents that evolve with every interaction
Evolve is an advanced optimization framework that enables AI agents to continuously improve their performance through iterative learning, pattern recognition, and intelligent adaptation. Built to be framework-agnostic, Evolve can enhance AI agents built with Mastra, LangChain, or any TypeScript-based agent framework.
Traditional AI agents remain static after deployment, requiring manual intervention to improve their performance. Evolve changes this paradigm by creating agents that:
- Learn from every interaction - Continuously analyze performance and identify improvement opportunities
- Adapt autonomously - Implement optimizations without human intervention
- Evolve strategically - Use multi-agent collaboration and research-driven approaches
- Work with any framework - Integrate seamlessly with Mastra, LangChain, or custom implementations
- Iterative Optimization: Agents improve through multiple refinement cycles
- Pattern Recognition: Identify and learn from failure patterns
- Research-Driven: Integrate external knowledge for informed improvements
- Convergence Detection: Know when optimal performance is reached
- Dual Optimization Modes: AI-driven iterative improvement or systematic grid search
- Framework Agnostic: Works with Mastra, LangChain, or any TypeScript agent system
- Specialized AI Agents: Prompt research and engineering agents work in tandem
- Budget Controls: Built-in cost management with configurable limits
- Performance Tracking: Monitor improvement across iterations
- Pattern Analysis: Understand systematic issues and their solutions
- History Management: Track evolution journey with full audit trail
- Predictive Insights: Anticipate optimization opportunities
βββββββββββββββββββββββββββββββββββββββββββ
β Improvement Service β
β (Orchestrates optimization modes) β
ββββββββββββββ¬ββββββββββββββββ¬βββββββββββββ
β β
ββββββββββΌβββββββ βββββββΌβββββββββββ
β Iterative β β Grid Search β
β Optimization β β Service β
βββββββββ¬ββββββββ ββββββββββββββββββ
β
βββββββββΌβββββββββββββββββββββ
β AI Agent Collaboration β
β β’ Prompt Research Agent β
β β’ Prompt Engineer Agent β
β (Implements improvements) β
βββββββββββββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Your AI Agent β
β (Mastra/Lang- β
β Chain/Custom) β
βββββββββββββββββββ
βββββββββββββββ
β CLI β Commands & User Interface
βββββββββββββββ€
β Services β Business Logic & Orchestration
βββββββββββββββ€
β Agents β Specialized AI Agents
βββββββββββββββ€
β Repositoriesβ Data Access Layer (Drizzle ORM)
βββββββββββββββ
# Install dependencies
pnpm install
# Set up environment
cp .env.example .env
# Add your OpenAI or Anthropic API keys
# Initialize database (creates system agents)
pnpm db:migrate# Create a custom agent with your prompt and configuration
pnpm cli agent create myagent \
--name "My Custom Agent" \
--type scorer \
--model gpt-4o \
--temperature 0.7 \
--prompt "Your custom prompt here"
# Create agent with optimized evaluator configuration
pnpm cli agent create scorer \
--name "Scoring Agent" \
--prompt "Rate the following: {{input}}" \
--evaluator-target "score" \
--evaluator-strategy "numeric"# Run with text input (agent required)
pnpm cli run "Your content to process" --agent myagent
# Or with JSON input
pnpm cli run --input-file input.json --agent myagent
# Save output for assessment
pnpm cli run "Your content" --agent myagent --output-file results.json# List runs pending assessment
pnpm cli assess pending
# Add assessments
pnpm cli assess add <runId> correct
pnpm cli assess add <runId> incorrect --score 0.8
# Build evaluation dataset from assessments
pnpm cli dataset build# Run iterative improvement (automated flow)
pnpm cli improve myagent --iterations 10 --target 0.9
# Run parameter exploration
pnpm cli improve myagent --explore --iterations 5
# Evaluate agent performance
pnpm cli eval myagent
# View optimization history
pnpm cli improve statsimport { ImprovementService } from './services/improvement.service.js';
const service = new ImprovementService(db);
await service.initialize();
// Run iterative optimization
const result = await service.runIterativeOptimization({
baseConfigKey: 'default',
targetScore: 0.9,
maxIterations: 10,
evaluationStrategy: 'hybrid',
enableResearch: true,
verbose: true
});
console.log(`Evolution complete! Performance improved by ${result.result.totalImprovement * 100}%`);Evolve currently uses Mastra for agent orchestration, but the architecture is designed for framework independence:
// Current Mastra integration
import { Agent } from '@mastra/core';
// Future: Framework adapters
import { MastraAdapter } from '@evolve/mastra';
import { LangChainAdapter } from '@evolve/langchain';
import { CustomAdapter } from '@evolve/custom';- LangChain - Complete LangChain.js support
- Vercel AI SDK - Integration with Vercel's AI toolkit
- AutoGPT - Enhance autonomous agents
- CrewAI - Multi-agent crew optimization
- Custom - Build your own adapter
- Prompt Evolution - Iteratively refine prompts for better outputs
- Parameter Tuning - Optimize temperature, tokens, and other parameters
- Model Selection - Automatically choose the best model for the task
- Knowledge Integration - Incorporate external research and best practices
Choose how to measure agent performance:
- Numeric Scoring - Traditional accuracy metrics (RMSE, MAE, correlation)
- Fact-Based - Validate factual correctness and completeness
- Hybrid - Combine multiple evaluation approaches
- Custom - Define your own evaluation criteria
Every agent must specify which field to evaluate and how:
# Numeric field comparison (efficient)
--evaluator-target "score" --evaluator-strategy "numeric"
# Nested field comparison
--evaluator-target "result.confidence" --evaluator-strategy "numeric"
# Exact match comparison
--evaluator-target "status" --evaluator-strategy "exact"
# LLM-based comparison (for text fields - use sparingly due to cost)
--evaluator-target "summary" --evaluator-strategy "llm"
# Auto strategy (detects type automatically)
--evaluator-target "value" --evaluator-strategy "auto"Benefits:
- No surprise costs: Prevents accidental expensive LLM Judge usage
- 50-80% reduction in LLM calls for numeric evaluations
- 10x faster evaluation for numeric/exact comparisons
- Explicit intent: Forces conscious decision about evaluation strategy
Migration for existing agents:
# Update an existing agent
pnpm cli agent update <agent-key> \
--evaluator-target "<field-name>" \
--evaluator-strategy "numeric"Specialized agents working together:
- Evaluation Agent - Assesses current performance with pluggable strategies
- Research Agent - Finds improvement strategies from knowledge sources
- Optimization Agent - Implements enhancements based on research
- Flow Orchestrator - Coordinates the evolution process
# Run agent
pnpm cli run "Your content"
# With structured JSON input
pnpm cli run --input-file examples/input.json
# Save output to file
pnpm cli run "Your content" --output-file results.json
# Run with ground truth collection
pnpm cli run "Your content" --collect
# Evaluate agent performance
pnpm cli eval myagent# List pending assessments
pnpm assess pending
# Add assessment
pnpm assess add <runId> correct
pnpm assess add <runId> incorrect --score 0.7
# Build dataset from assessments
pnpm dataset build
# Export dataset
pnpm dataset export -o dataset.json# List all agents (system and user-defined)
pnpm cli agent list
# Create new agent with inline prompt
pnpm cli agent create myagent \
--name "My Agent" \
--type scorer \
--model gpt-4o \
--temperature 0.3 \
--prompt "Your evaluation prompt here"
# Create agent with prompt from file
pnpm cli agent create myagent \
--name "My Agent" \
--type scorer \
--model gpt-4o \
--temperature 0.3 \
--prompt-file prompts/my-prompt.txt
# View agent details
pnpm cli agent show myagent
# Update existing agent
pnpm cli agent update myagent --temperature 0.5
# Delete agent
pnpm cli agent delete myagent# Run iterative improvement (automated flow)
pnpm cli improve myagent --iterations 10 --target 0.9
# Run parameter exploration
pnpm cli improve myagent --explore --iterations 5
# Evaluate agent performance
pnpm cli eval myagent
# Analyze prompt performance
pnpm cli improve analyze v1class DomainSpecificEvaluator implements EvaluationStrategy {
name = 'domain-specific';
type = 'custom' as const;
async evaluate(predictions: any[], groundTruth: any[]): Promise<EvaluationResult> {
// Your custom evaluation logic
return {
score: calculateScore(predictions, groundTruth),
metrics: { /* custom metrics */ },
details: [ /* evaluation details */ ]
};
}
generateFeedback(result: EvaluationResult): DetailedFeedback {
// Generate domain-specific feedback
return {
summary: 'Custom evaluation results',
strengths: [],
weaknesses: [],
actionItems: []
};
}
isApplicable(context: EvaluationContext): boolean {
return context.dataType === 'domain-specific';
}
}
// Register and use
registry.register(new DomainSpecificEvaluator());- runs: Agent execution records with scores and reasoning
- assessments: Human/LLM assessments of scoring accuracy
- eval_datasets: Training/evaluation datasets built from assessments
- configs: Agent configurations (model, temperature, prompts)
- prompts: Prompt templates and variations
- optimization_history: Evolution tracking and checkpoints
// Analyze patterns across iterations
const patterns = patternAnalyzer.getPersistentPatterns(minIterations: 3);
// Get improvement suggestions from patterns
const improvements = patternAnalyzer.suggestImprovements(patterns);// Save evolution state
const checkpoint = await evolve.saveCheckpoint();
// Resume from checkpoint
const result = await orchestrator.resumeFromState(checkpoint);// Add custom knowledge sources
researcher.addKnowledgeSource({
name: 'Internal Knowledge Base',
query: async (topic) => await searchKnowledgeBase(topic)
});# Build TypeScript
pnpm build
# Run tests
pnpm test
# Open database studio
pnpm db:studio
# Generate migrations
pnpm db:generate
# Run in development
pnpm dev <command>- β Iterative optimization flow
- β Multi-agent collaboration
- β Pattern recognition
- β Pluggable evaluation
- π Abstract agent interface
- π LangChain adapter
- π Vercel AI SDK support
- π Framework detection
- π AutoML for configurations
- π Cross-agent learning
- π Real-time adaptation
- π Production monitoring
- π Evaluator marketplace
- π Community patterns
- π Cloud platform
- π Enterprise features
We welcome contributions! Whether you're:
- Adding framework support
- Creating custom evaluators
- Improving optimization strategies
- Enhancing documentation
See CONTRIBUTING.md for guidelines.
MIT
Built with inspiration from:
- AutoGPT for autonomous agent concepts
- LangChain for agent framework patterns
- Mastra for current implementation
- The broader AI/ML community
Evolve - Because static agents are yesterday's technology. Let your AI grow smarter with every interaction.
Currently implemented with Mastra, evolving to support all TypeScript agent frameworks!