An intelligent self-learning AI agent system that addresses the challenge of AI models having fixed knowledge cutoffs for rapidly evolving domains by creating autonomous agents that can:
- Generate learning curricula using OpenAI's Deep Research API
- Create training data through intelligent research and question generation
- Self-improve via Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO)
- Evaluate performance and adapt learning strategies
- Revise curricula based on performance gaps and mastery
- Track learning history across multiple iterations
I came across this problem when trying to using models like sonnet 4 and gpt 4.1 to code AI agents, which is a rapidly evolving field and hence the models didn't even know about newer models like o3, let alone the current best practices in building ai agents. Along with overcoming the problem of fixed knowledge cutoffs for models like gpt 4.1, we can also get plug and play APIs with highly specialized knowledge for a particular domain
The system is particularly valuable for rapidly evolving domains where traditional models quickly become outdated.
graph TD
A[Domain Input] --> B[Curriculum Generation]
B --> C[Training Data Generation]
C --> D[Supervised Fine-Tuning]
D --> E[SFT Evaluation]
E --> F[DPO Training]
F --> G[DPO Evaluation]
G --> H[Curriculum Revision]
H --> I{Continue Learning?}
I -->|Yes| C
I -->|No| J[Finalize Session]
K[Deep Research API] --> B
K --> C
K --> H
L[OpenAI Fine-Tuning] --> D
L --> F
M[Historical Learning] --> H
- ** Deep Research Client**: Leverages OpenAI's o3-deep-research for intelligent curriculum generation
- ** Training Data Generator**: Creates diverse Q&A pairs with parallel processing (50 concurrent requests)
- ** Fine-Tuner**: Manages OpenAI fine-tuning jobs with SFT and DPO methods
- ** Evaluator**: Comprehensive model evaluation with category-based analysis
- ** DPO Engine**: Improves models using Direct Preference Optimization on incorrect answers
- ** Curriculum Reviser**: Adapts learning paths based on performance with historical context
- ** LangGraph Orchestrator**: Manages the complete workflow with state persistence
- Python 3.11+
- OpenAI API key with fine-tuning access
- OpenAI Deep Research API access
- LangSmith account (optional, for monitoring)
-
Clone the repository
git clone <repository-url> cd ALAS
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file OPENAI_API_KEY=your_openai_api_key_here LANGSMITH_API_KEY=your_langsmith_api_key_here LANGSMITH_PROJECT=autonomous-learning-agent LANGSMITH_TRACING_V2=true
python demo_autonomous_learning.pyThis will run a complete 3-iteration learning cycle with cost estimates and progress tracking.
import asyncio
from src.workflows.autonomous_learning_agent import create_autonomous_learning_agent
async def main():
# Create agent with 5 iterations max
agent = create_autonomous_learning_agent(max_iterations=5)
# Run autonomous learning
result = await agent.run(
domain="Machine Learning Research",
session_id="my_session_001"
)
print(f"Completed {len(result['iterations'])} iterations")
print(f"Final model: {result['current_dpo_model_id']}")
asyncio.run(main())# Generate curriculum
from src.core.deep_research_client import create_deep_research_client
client = create_deep_research_client()
curriculum = await client.generate_curriculum(
domain="Python Programming",
learning_goals=["Master fundamentals", "Build projects"]
)
# Generate training data
from src.core.training_data_generator import create_training_data_generator
generator = create_training_data_generator()
training_data = await generator.generate_curriculum_training_data(curriculum)
# Fine-tune model
from src.core.fine_tuner import create_fine_tuner
fine_tuner = create_fine_tuner()
result = fine_tuner.fine_tune_from_file(
training_file_path="training_data.jsonl",
model="gpt-4.1-2025-04-14"
)ALAS includes full LangGraph Studio support for visual workflow management.
-
Install LangGraph CLI
pip install -U "langgraph-cli[inmem]" -
Start LangGraph Studio
langgraph dev
-
Open in browser
http://localhost:2024
The workflow will be available as the "agent" graph with full visual debugging and state inspection.
Each learning iteration follows this pattern:
- ** Curriculum Generation**: Create or revise learning topics based on domain and performance
- ** Training Data Creation**: Generate diverse Q&A pairs using Deep Research API
- ** Supervised Fine-Tuning**: Train model on correct examples
- ** SFT Evaluation**: Test model performance and identify weak areas
- ** DPO Training**: Improve model using preference optimization on incorrect answers
- ** DPO Evaluation**: Re-evaluate improved model
- ** Curriculum Revision**: Update learning plan based on results
ALAS maintains a learned_topics.json file that tracks:
- All topics mastered across iterations
- Accuracy scores and improvement over time
- Learning progression and iteration metadata
This enables the system to:
- Avoid repeating mastered topics
- Build advanced curricula on solid foundations
- Provide comprehensive learning context
The evaluation system provides detailed analysis:
{
"overall_accuracy": 0.85,
"category_performance": {
"Factual Recall": {"accuracy": 0.9},
"Conceptual Understanding": {"accuracy": 0.8},
"Application": {"accuracy": 0.85}
},
"topic_results": [
{
"topic_name": "Python Basics",
"accuracy": 0.95,
"mastered": true
}
]
}- Curriculum Generation: ~$0.10
- Training Data Generation: ~$0.50
- Supervised Fine-Tuning: ~$20-50
- Model Evaluation: ~$0.20
- DPO Fine-Tuning: ~$15-30
- Curriculum Revision: ~$0.10
Total per iteration: ~$35-80
3-iteration cycle: ~$110-240
- Parallel processing for faster execution
- Configurable iteration limits
- Smart curriculum revision to avoid redundant learning
- Efficient evaluation with targeted question generation
Key configuration options in src/config/settings.py:
class LearningSettings:
max_iterations: int = 5
topics_per_iteration: int = 10
questions_per_topic: int = 10
evaluation_threshold: float = 0.7
mastery_threshold: float = 0.9
max_concurrent_requests: int = 50Supported models for fine-tuning:
gpt-4.1-2025-04-14(recommended)gpt-4.1-mini-2025-04-14(cost-effective)gpt-4.1-nano-2025-04-14(fast iterations)
ALAS/
βββ src/
β βββ core/ # Core learning components
β β βββ deep_research_client.py # OpenAI Deep Research integration
β β βββ training_data_generator.py # Question & answer generation
β β βββ fine_tuner.py # OpenAI fine-tuning wrapper
β β βββ evaluator.py # Model evaluation system
β β βββ dpo_improvement.py # DPO training implementation
β β βββ curriculum_revision.py # Performance-based curriculum updates
β βββ workflows/ # LangGraph workflows
β β βββ autonomous_learning_agent.py # Main orchestration workflow
β βββ config/ # Configuration
β βββ settings.py
βββ data/ # Generated data storage
β βββ curricula/ # Learning curricula
β βββ training_data/ # Generated Q&A pairs
β βββ evaluations/ # Model evaluation results
β βββ sessions/ # Session summaries
βββ demo_autonomous_learning.py # Demo script
βββ langgraph.json # LangGraph Studio configuration
βββ learned_topics.json # Historical learning tracker
Run individual component tests:
# Test Deep Research API
python test_deep_research.py
# Test curriculum revision
python test_curriculum_revision.py
# Test DPO improvement
python test_dpo_improvement.py
# Test evaluation system
python test_evaluation.pyMore things to add:
-
Add finetuning for Deepseek r1, allow the agent to experiment with hyperparams
-
Setup periodic jobs to stay upto date latest information
-
Fork the repository
-
Create a feature branch (
git checkout -b feature/amazing-feature) -
Commit your changes (
git commit -m 'Add amazing feature') -
Push to the branch (
git push origin feature/amazing-feature) -
Open a Pull Request
- Inspired by the SEAL paper on self-adapting language models
- Built with OpenAI's APIs and LangGraph
- Uses LangSmith for monitoring and observability
Built with β€οΈ for autonomous AI learning