Skip to content
/ evolve Public

Self-improving AI agents that evolve with every interaction

Notifications You must be signed in to change notification settings

mslavov/evolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Evolve - Self-Improving AI Agents

Self-improving AI agents that evolve with every interaction

Evolve is an advanced optimization framework that enables AI agents to continuously improve their performance through iterative learning, pattern recognition, and intelligent adaptation. Built to be framework-agnostic, Evolve can enhance AI agents built with Mastra, LangChain, or any TypeScript-based agent framework.

πŸš€ Vision

Traditional AI agents remain static after deployment, requiring manual intervention to improve their performance. Evolve changes this paradigm by creating agents that:

  • Learn from every interaction - Continuously analyze performance and identify improvement opportunities
  • Adapt autonomously - Implement optimizations without human intervention
  • Evolve strategically - Use multi-agent collaboration and research-driven approaches
  • Work with any framework - Integrate seamlessly with Mastra, LangChain, or custom implementations

✨ Key Features

🧠 Intelligent Evolution

  • Iterative Optimization: Agents improve through multiple refinement cycles
  • Pattern Recognition: Identify and learn from failure patterns
  • Research-Driven: Integrate external knowledge for informed improvements
  • Convergence Detection: Know when optimal performance is reached

πŸ”Œ Clean Architecture

  • Dual Optimization Modes: AI-driven iterative improvement or systematic grid search
  • Framework Agnostic: Works with Mastra, LangChain, or any TypeScript agent system
  • Specialized AI Agents: Prompt research and engineering agents work in tandem
  • Budget Controls: Built-in cost management with configurable limits

πŸ“Š Advanced Analytics

  • Performance Tracking: Monitor improvement across iterations
  • Pattern Analysis: Understand systematic issues and their solutions
  • History Management: Track evolution journey with full audit trail
  • Predictive Insights: Anticipate optimization opportunities

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Improvement Service              β”‚
β”‚  (Orchestrates optimization modes)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Iterative    β”‚ β”‚  Grid Search   β”‚
    β”‚ Optimization  β”‚ β”‚   Service      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   AI Agent Collaboration   β”‚
    β”‚ β€’ Prompt Research Agent    β”‚
    β”‚ β€’ Prompt Engineer Agent    β”‚
    β”‚  (Implements improvements)      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Your AI Agent  β”‚
    β”‚ (Mastra/Lang-  β”‚
    β”‚  Chain/Custom)  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer Structure

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     CLI     β”‚  Commands & User Interface
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Services   β”‚  Business Logic & Orchestration
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Agents    β”‚  Specialized AI Agents
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Repositoriesβ”‚  Data Access Layer (Drizzle ORM)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Installation

# Install dependencies
pnpm install

# Set up environment
cp .env.example .env
# Add your OpenAI or Anthropic API keys

# Initialize database (creates system agents)
pnpm db:migrate

Workflow

1. Define Your Agent

# Create a custom agent with your prompt and configuration
pnpm cli agent create myagent \
  --name "My Custom Agent" \
  --type scorer \
  --model gpt-4o \
  --temperature 0.7 \
  --prompt "Your custom prompt here"

# Create agent with optimized evaluator configuration
pnpm cli agent create scorer \
  --name "Scoring Agent" \
  --prompt "Rate the following: {{input}}" \
  --evaluator-target "score" \
  --evaluator-strategy "numeric"

2. Run Your Agent

# Run with text input (agent required)
pnpm cli run "Your content to process" --agent myagent

# Or with JSON input
pnpm cli run --input-file input.json --agent myagent

# Save output for assessment
pnpm cli run "Your content" --agent myagent --output-file results.json

3. Assess & Build Dataset

# List runs pending assessment
pnpm cli assess pending

# Add assessments
pnpm cli assess add <runId> correct
pnpm cli assess add <runId> incorrect --score 0.8

# Build evaluation dataset from assessments
pnpm cli dataset build

4. Optimize Your Agent

# Run iterative improvement (automated flow)
pnpm cli improve myagent --iterations 10 --target 0.9

# Run parameter exploration
pnpm cli improve myagent --explore --iterations 5

# Evaluate agent performance
pnpm cli eval myagent

# View optimization history
pnpm cli improve stats

Programmatic Usage

import { ImprovementService } from './services/improvement.service.js';

const service = new ImprovementService(db);
await service.initialize();

// Run iterative optimization
const result = await service.runIterativeOptimization({
  baseConfigKey: 'default',
  targetScore: 0.9,
  maxIterations: 10,
  evaluationStrategy: 'hybrid',
  enableResearch: true,
  verbose: true
});

console.log(`Evolution complete! Performance improved by ${result.result.totalImprovement * 100}%`);

πŸ”§ Framework Integration

Current Implementation (Mastra)

Evolve currently uses Mastra for agent orchestration, but the architecture is designed for framework independence:

// Current Mastra integration
import { Agent } from '@mastra/core';

// Future: Framework adapters
import { MastraAdapter } from '@evolve/mastra';
import { LangChainAdapter } from '@evolve/langchain';
import { CustomAdapter } from '@evolve/custom';

Planned Adapters

  • LangChain - Complete LangChain.js support
  • Vercel AI SDK - Integration with Vercel's AI toolkit
  • AutoGPT - Enhance autonomous agents
  • CrewAI - Multi-agent crew optimization
  • Custom - Build your own adapter

πŸ“š Core Concepts

Evolution Strategies

  1. Prompt Evolution - Iteratively refine prompts for better outputs
  2. Parameter Tuning - Optimize temperature, tokens, and other parameters
  3. Model Selection - Automatically choose the best model for the task
  4. Knowledge Integration - Incorporate external research and best practices

Evaluation Methods

Choose how to measure agent performance:

  • Numeric Scoring - Traditional accuracy metrics (RMSE, MAE, correlation)
  • Fact-Based - Validate factual correctness and completeness
  • Hybrid - Combine multiple evaluation approaches
  • Custom - Define your own evaluation criteria

Evaluator Configuration (Required)

⚠️ Important: Evaluator configuration is now required for all agents to prevent unexpected LLM costs during evaluation.

Every agent must specify which field to evaluate and how:

# Numeric field comparison (efficient)
--evaluator-target "score" --evaluator-strategy "numeric"

# Nested field comparison
--evaluator-target "result.confidence" --evaluator-strategy "numeric"

# Exact match comparison
--evaluator-target "status" --evaluator-strategy "exact"

# LLM-based comparison (for text fields - use sparingly due to cost)
--evaluator-target "summary" --evaluator-strategy "llm"

# Auto strategy (detects type automatically)
--evaluator-target "value" --evaluator-strategy "auto"

Benefits:

  • No surprise costs: Prevents accidental expensive LLM Judge usage
  • 50-80% reduction in LLM calls for numeric evaluations
  • 10x faster evaluation for numeric/exact comparisons
  • Explicit intent: Forces conscious decision about evaluation strategy

Migration for existing agents:

# Update an existing agent
pnpm cli agent update <agent-key> \
  --evaluator-target "<field-name>" \
  --evaluator-strategy "numeric"

Multi-Agent Collaboration

Specialized agents working together:

  • Evaluation Agent - Assesses current performance with pluggable strategies
  • Research Agent - Finds improvement strategies from knowledge sources
  • Optimization Agent - Implements enhancements based on research
  • Flow Orchestrator - Coordinates the evolution process

πŸ“‹ Commands

Scoring & Evaluation

# Run agent
pnpm cli run "Your content"

# With structured JSON input
pnpm cli run --input-file examples/input.json

# Save output to file
pnpm cli run "Your content" --output-file results.json

# Run with ground truth collection
pnpm cli run "Your content" --collect

# Evaluate agent performance
pnpm cli eval myagent

Assessment & Dataset Management

# List pending assessments
pnpm assess pending

# Add assessment
pnpm assess add <runId> correct
pnpm assess add <runId> incorrect --score 0.7

# Build dataset from assessments
pnpm dataset build

# Export dataset
pnpm dataset export -o dataset.json

Agent Management

# List all agents (system and user-defined)
pnpm cli agent list

# Create new agent with inline prompt
pnpm cli agent create myagent \
  --name "My Agent" \
  --type scorer \
  --model gpt-4o \
  --temperature 0.3 \
  --prompt "Your evaluation prompt here"

# Create agent with prompt from file
pnpm cli agent create myagent \
  --name "My Agent" \
  --type scorer \
  --model gpt-4o \
  --temperature 0.3 \
  --prompt-file prompts/my-prompt.txt


# View agent details
pnpm cli agent show myagent

# Update existing agent
pnpm cli agent update myagent --temperature 0.5

# Delete agent
pnpm cli agent delete myagent

Optimization

# Run iterative improvement (automated flow)
pnpm cli improve myagent --iterations 10 --target 0.9

# Run parameter exploration
pnpm cli improve myagent --explore --iterations 5

# Evaluate agent performance
pnpm cli eval myagent

# Analyze prompt performance
pnpm cli improve analyze v1

πŸ§ͺ Evaluation Strategies

Registering Custom Evaluators

class DomainSpecificEvaluator implements EvaluationStrategy {
  name = 'domain-specific';
  type = 'custom' as const;
  
  async evaluate(predictions: any[], groundTruth: any[]): Promise<EvaluationResult> {
    // Your custom evaluation logic
    return {
      score: calculateScore(predictions, groundTruth),
      metrics: { /* custom metrics */ },
      details: [ /* evaluation details */ ]
    };
  }
  
  generateFeedback(result: EvaluationResult): DetailedFeedback {
    // Generate domain-specific feedback
    return {
      summary: 'Custom evaluation results',
      strengths: [],
      weaknesses: [],
      actionItems: []
    };
  }
  
  isApplicable(context: EvaluationContext): boolean {
    return context.dataType === 'domain-specific';
  }
}

// Register and use
registry.register(new DomainSpecificEvaluator());

πŸ“Š Database Schema

  • runs: Agent execution records with scores and reasoning
  • assessments: Human/LLM assessments of scoring accuracy
  • eval_datasets: Training/evaluation datasets built from assessments
  • configs: Agent configurations (model, temperature, prompts)
  • prompts: Prompt templates and variations
  • optimization_history: Evolution tracking and checkpoints

πŸ”¬ Advanced Features

Pattern Recognition

// Analyze patterns across iterations
const patterns = patternAnalyzer.getPersistentPatterns(minIterations: 3);

// Get improvement suggestions from patterns
const improvements = patternAnalyzer.suggestImprovements(patterns);

Checkpoint & Recovery

// Save evolution state
const checkpoint = await evolve.saveCheckpoint();

// Resume from checkpoint
const result = await orchestrator.resumeFromState(checkpoint);

Research Integration

// Add custom knowledge sources
researcher.addKnowledgeSource({
  name: 'Internal Knowledge Base',
  query: async (topic) => await searchKnowledgeBase(topic)
});

πŸ› οΈ Development

# Build TypeScript
pnpm build

# Run tests
pnpm test

# Open database studio
pnpm db:studio

# Generate migrations
pnpm db:generate

# Run in development
pnpm dev <command>

πŸ“ˆ Roadmap

Phase 1: Core Evolution Engine βœ…

  • βœ… Iterative optimization flow
  • βœ… Multi-agent collaboration
  • βœ… Pattern recognition
  • βœ… Pluggable evaluation

Phase 2: Framework Independence (Q2 2024)

  • πŸ”„ Abstract agent interface
  • πŸ”„ LangChain adapter
  • πŸ”„ Vercel AI SDK support
  • πŸ”„ Framework detection

Phase 3: Advanced Intelligence (Q3 2024)

  • πŸ“‹ AutoML for configurations
  • πŸ“‹ Cross-agent learning
  • πŸ“‹ Real-time adaptation
  • πŸ“‹ Production monitoring

Phase 4: Ecosystem (Q4 2024)

  • πŸ“‹ Evaluator marketplace
  • πŸ“‹ Community patterns
  • πŸ“‹ Cloud platform
  • πŸ“‹ Enterprise features

πŸ“– Documentation

🀝 Contributing

We welcome contributions! Whether you're:

  • Adding framework support
  • Creating custom evaluators
  • Improving optimization strategies
  • Enhancing documentation

See CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT

πŸ™ Acknowledgments

Built with inspiration from:

  • AutoGPT for autonomous agent concepts
  • LangChain for agent framework patterns
  • Mastra for current implementation
  • The broader AI/ML community

Evolve - Because static agents are yesterday's technology. Let your AI grow smarter with every interaction.

Currently implemented with Mastra, evolving to support all TypeScript agent frameworks!

About

Self-improving AI agents that evolve with every interaction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published