Diversity-Driven ML-Agent with Reflective Genetic Programming

🚀 Overview

This project implements a Reflective Genetic Programming (GP) agent for autonomous machine learning within the ML-Master framework: https://github.com/sjtu-sai-agents/ML-Master .

Inspired by ReEvo ( https://github.com/ai4co/reevo ), we define the genetic operators as follows:

Crossover: Short-term memory consolidation from current population
Mutation: Long-term memory recall via global best solutions

The GP agent outperforms baseline MCTS across three MLE-bench tasks, demonstrating superior exploration capabilities and resistance to premature convergence.

📊 Key Features

🧬 Reflective Genetic Programming

Population-based evolution with intelligent LLM-driven operators
Crossover operator combines strengths of two high-performing parents (short-term memory)
Mutation operator injects insights from global best solution (long-term memory)
Elitism strategy preserves best individuals across generations

📈 Comprehensive Diversity Metrics

Inspired by HSEvo, we track population diversity throughout evolution:

SWDI (Shannon-Wiener Diversity Index): Measures instantaneous population diversity using hierarchical clustering
CDI (Cumulative Diversity Index): Evaluates overall exploration via Minimum Spanning Tree analysis
Semantic embeddings from fine-tuned CodeT5 for meaningful code similarity assessment

🎯 Methodology

Evolutionary Operators

Our GP agent uses LLMs to perform semantic evolution on Python code, inspired by the ReEvo framework for Automatic Heuristic Design:

Crossover: Short-Term Memory

Selects two parents from current population via tournament selection
LLM analyzes why Parent A outperforms Parent B
Generates offspring combining strengths of both parents
Exploits immediate, local context of search frontier

Mutation: Long-Term Memory

Pairs current individual with global best solution
LLM incorporates insights from historical breakthrough
Prevents population from forgetting globally successful patterns
Acts as elitism strategy preserving elite knowledge

🚀 Quick Start

🛠️ Environment Setup

Prerequisites

First, install the MLE-Bench environment following the official instructions.

git clone https://github.com/yourusername/ML-Master-GP.git
cd ML-Master-GP
conda create -n ml-master-gp python=3.12
conda activate ml-master-gp

# Install MLE-Bench (follow their README)
# Then install additional requirements
pip install -r requirements.txt

Install CodeT5 Embedding Model

For diversity metrics, download the CodeT5 embedding model:

# The model should be placed in ./Salesforce/codet5p-110m-embedding/
# Or download from: https://huggingface.co/Salesforce/codet5p-110m-embedding

📦 Download MLE-Bench Data

Download and prepare the MLE-Bench dataset following their instructions. The dataset is over 2TB.

Expected structure:

/path/to/mle-bench/<competition-name>/
└── prepared
    ├── private/
    │   └── test.csv
    └── public/
        ├── description.md
        ├── sample_submission.csv
        └── train.csv

🧠 Configure LLM APIs

Set your API credentials in run.sh:

# DeepSeek config (for code generation)
code_model=deepseek-v3
code_temp=0.5
code_base_url="your_base_url"
code_api_key="your_api_key"

# GPT config (for evaluation feedback)
feedback_model=gpt-4o-2024-08-06
feedback_temp=0.5
feedback_base_url="your_base_url"
feedback_api_key="your_api_key"

# Dataset and experiment config
EXP_ID=nomad2018-predict-transparent-conductors
dataset_dir=/path/to/mle-bench

▶️ Run the GP Agent

Start the grading server (validates submissions):

bash launch_server.sh

Run the GP agent:

bash run.sh

For MCTS baseline comparison:

python main_mcts.py --exp_id nomad2018-predict-transparent-conductors \
    --dataset_dir /path/to/mle-bench

Results will be saved in:

./logs/ - Execution logs and diversity metrics
./working/ - Generated code solutions

🔧 Implementation Details

Critical Bug Fixes

During development, we resolved two critical stability issues:

Issue 1: Multi-threaded I/O Error

Problem: OSError: [Errno 5] Input/output error from print() statements in multi-threaded code.

Solution: Replaced all print() calls with thread-safe logging module.

Issue 2: Interpreter Resource Leak

Problem: Exception during execution caused interpreter slots to remain permanently occupied, leading to deadlock.

Solution: Implemented try-finally blocks to guarantee resource release:

try:
    # Execution logic
    ...
finally:
    # Force release of the slot
    with self.lock:
        if self.status_map[process_id] == 1:
            self.status_map[process_id] = 0
            self.current_parallel_run -= 1
    self.cleanup_session(process_id=process_id)

📊 Diversity Analysis Tools

Extract and visualize diversity metrics:

# Extract diversity metrics from logs
python extract_diversity.py --log_dir ./logs/rungpnomad1

# Plot diversity evolution
python extract_and_plot.py --gp_log ./logs/rungpnomad1 \
    --mcts_log ./logs/runnomad

# Compare code similarity between runs
python compare_similarity.py --log1 ./logs/run1 --log2 ./logs/run2

📝 Project Structure

ML-Master-GP/
├── agent/
│   ├── gp_agent.py          # Genetic Programming agent
│   └── mcts_agent.py         # MCTS baseline agent
├── backend/
│   ├── backend_openai.py     # OpenAI API backend
│   └── backend_qwen.py       # Qwen API backend
├── search/
│   ├── node.py               # Solution node representation
│   └── mcts_node.py          # MCTS-specific node
├── utils/
│   ├── diversity_utils.py    # SWDI/CDI computation
│   ├── llm_caller.py         # LLM interaction utilities
│   └── config_mcts.yaml      # Configuration file
├── interpreter/
│   └── interpreter_parallel.py  # Multi-threaded code execution
├── Salesforce/
│   └── codet5p-110m-embedding/  # CodeT5 model for embeddings
├── main_mcts.py              # Entry point for GP agent
├── extract_diversity.py      # Diversity metrics extraction
├── extract_and_plot.py       # Visualization tools
├── grading_server.py         # Submission validation server
└── report.tex                # Technical report (LaTeX)

🎓 Key Insights

Why GP Outperforms MCTS

Exploration vs Exploitation: GP's population-based approach explores diverse solutions simultaneously, while MCTS tends toward depth-first local refinement
Memory Mechanisms: Crossover (short-term) and mutation (long-term) create a balanced cognitive architecture
Diversity Maintenance: Explicit diversity metrics and injection strategies prevent premature convergence
Creative Problem-Solving: GP excels at tasks requiring innovative solutions rather than incremental improvements

Overfitting Observations

On the Nomad task, both GP and MCTS showed overfitting to validation metrics. This reflects the task's simplicity rather than algorithmic flaws. Early stopping can improve final test performance.

🙏 Acknowledgements

This work builds upon and is inspired by several excellent research projects:

🌲 ML-Master - Base framework for AI-for-AI agents with exploration and reasoning
💡 MLE-Bench - Comprehensive AutoML benchmarking platform
🧬 ReEvo - LLM-driven code evolution for heuristic design
📊 HSEvo - Diversity metrics and semantic similarity analysis for evolutionary algorithms
🤖 CodeT5 - Pre-trained code embedding model

📚 References

Evolutionary Computation: Eiben, A.E. and Smith, J.E., 2015. Introduction to evolutionary computing. Springer.
ReEvo: Ye et al., 2024. "ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution"
HSEvo: Liu et al., 2024. "Enhancing Evolutionary Algorithms via Semantic Diversity Metrics"
ML-Master: Liu et al., 2025. "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"

📄 License

This project is released for academic research purposes. Please cite our work if you use this code:

@article{xiang2025diversity,
  title={Diversity-Driven ML-Agent with Reflective Genetic Programming},
  author={Xiang, Chuyang},
  year={2025}
}

📧 Contact

For questions or issues, please open an issue on GitHub or contact the author.

Author: Chuyang Xiang (524031910627)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
backend		backend
dataset		dataset
interpreter		interpreter
search		search
tools		tools
utils		utils
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
grading_server.out		grading_server.out
grading_server.py		grading_server.py
launch_server.sh		launch_server.sh
main_mcts.py		main_mcts.py
report.tex		report.tex
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

Diversity-Driven ML-Agent with Reflective Genetic Programming

🚀 Overview

📊 Key Features

🧬 Reflective Genetic Programming

📈 Comprehensive Diversity Metrics

🎯 Methodology

Evolutionary Operators

Crossover: Short-Term Memory

Mutation: Long-Term Memory

🚀 Quick Start

🛠️ Environment Setup

Prerequisites

Install CodeT5 Embedding Model

📦 Download MLE-Bench Data

🧠 Configure LLM APIs

▶️ Run the GP Agent

🔧 Implementation Details

Critical Bug Fixes

Issue 1: Multi-threaded I/O Error

Issue 2: Interpreter Resource Leak

📊 Diversity Analysis Tools

📝 Project Structure

🎓 Key Insights

Why GP Outperforms MCTS

Overfitting Observations

🙏 Acknowledgements

📚 References

📄 License

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages