Skip to content

KROX777/ML-Master-GP

Repository files navigation

Diversity-Driven ML-Agent with Reflective Genetic Programming

πŸš€ Overview

This project implements a Reflective Genetic Programming (GP) agent for autonomous machine learning within the ML-Master framework: https://github.com/sjtu-sai-agents/ML-Master .

Inspired by ReEvo ( https://github.com/ai4co/reevo ), we define the genetic operators as follows:

  • Crossover: Short-term memory consolidation from current population
  • Mutation: Long-term memory recall via global best solutions

The GP agent outperforms baseline MCTS across three MLE-bench tasks, demonstrating superior exploration capabilities and resistance to premature convergence.

πŸ“Š Key Features

🧬 Reflective Genetic Programming

  • Population-based evolution with intelligent LLM-driven operators
  • Crossover operator combines strengths of two high-performing parents (short-term memory)
  • Mutation operator injects insights from global best solution (long-term memory)
  • Elitism strategy preserves best individuals across generations

πŸ“ˆ Comprehensive Diversity Metrics

Inspired by HSEvo, we track population diversity throughout evolution:

  • SWDI (Shannon-Wiener Diversity Index): Measures instantaneous population diversity using hierarchical clustering
  • CDI (Cumulative Diversity Index): Evaluates overall exploration via Minimum Spanning Tree analysis
  • Semantic embeddings from fine-tuned CodeT5 for meaningful code similarity assessment

🎯 Methodology

Evolutionary Operators

Our GP agent uses LLMs to perform semantic evolution on Python code, inspired by the ReEvo framework for Automatic Heuristic Design:

Crossover: Short-Term Memory

  • Selects two parents from current population via tournament selection
  • LLM analyzes why Parent A outperforms Parent B
  • Generates offspring combining strengths of both parents
  • Exploits immediate, local context of search frontier

Mutation: Long-Term Memory

  • Pairs current individual with global best solution
  • LLM incorporates insights from historical breakthrough
  • Prevents population from forgetting globally successful patterns
  • Acts as elitism strategy preserving elite knowledge

πŸš€ Quick Start

πŸ› οΈ Environment Setup

Prerequisites

First, install the MLE-Bench environment following the official instructions.

git clone https://github.com/yourusername/ML-Master-GP.git
cd ML-Master-GP
conda create -n ml-master-gp python=3.12
conda activate ml-master-gp

# Install MLE-Bench (follow their README)
# Then install additional requirements
pip install -r requirements.txt

Install CodeT5 Embedding Model

For diversity metrics, download the CodeT5 embedding model:

# The model should be placed in ./Salesforce/codet5p-110m-embedding/
# Or download from: https://huggingface.co/Salesforce/codet5p-110m-embedding

πŸ“¦ Download MLE-Bench Data

Download and prepare the MLE-Bench dataset following their instructions. The dataset is over 2TB.

Expected structure:

/path/to/mle-bench/<competition-name>/
└── prepared
    β”œβ”€β”€ private/
    β”‚   └── test.csv
    └── public/
        β”œβ”€β”€ description.md
        β”œβ”€β”€ sample_submission.csv
        └── train.csv

🧠 Configure LLM APIs

Set your API credentials in run.sh:

# DeepSeek config (for code generation)
code_model=deepseek-v3
code_temp=0.5
code_base_url="your_base_url"
code_api_key="your_api_key"

# GPT config (for evaluation feedback)
feedback_model=gpt-4o-2024-08-06
feedback_temp=0.5
feedback_base_url="your_base_url"
feedback_api_key="your_api_key"

# Dataset and experiment config
EXP_ID=nomad2018-predict-transparent-conductors
dataset_dir=/path/to/mle-bench

▢️ Run the GP Agent

Start the grading server (validates submissions):

bash launch_server.sh

Run the GP agent:

bash run.sh

For MCTS baseline comparison:

python main_mcts.py --exp_id nomad2018-predict-transparent-conductors \
    --dataset_dir /path/to/mle-bench

Results will be saved in:

  • ./logs/ - Execution logs and diversity metrics
  • ./working/ - Generated code solutions

πŸ”§ Implementation Details

Critical Bug Fixes

During development, we resolved two critical stability issues:

Issue 1: Multi-threaded I/O Error

Problem: OSError: [Errno 5] Input/output error from print() statements in multi-threaded code.

Solution: Replaced all print() calls with thread-safe logging module.

Issue 2: Interpreter Resource Leak

Problem: Exception during execution caused interpreter slots to remain permanently occupied, leading to deadlock.

Solution: Implemented try-finally blocks to guarantee resource release:

try:
    # Execution logic
    ...
finally:
    # Force release of the slot
    with self.lock:
        if self.status_map[process_id] == 1:
            self.status_map[process_id] = 0
            self.current_parallel_run -= 1
    self.cleanup_session(process_id=process_id)

πŸ“Š Diversity Analysis Tools

Extract and visualize diversity metrics:

# Extract diversity metrics from logs
python extract_diversity.py --log_dir ./logs/rungpnomad1

# Plot diversity evolution
python extract_and_plot.py --gp_log ./logs/rungpnomad1 \
    --mcts_log ./logs/runnomad

# Compare code similarity between runs
python compare_similarity.py --log1 ./logs/run1 --log2 ./logs/run2

πŸ“ Project Structure

ML-Master-GP/
β”œβ”€β”€ agent/
β”‚   β”œβ”€β”€ gp_agent.py          # Genetic Programming agent
β”‚   └── mcts_agent.py         # MCTS baseline agent
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ backend_openai.py     # OpenAI API backend
β”‚   └── backend_qwen.py       # Qwen API backend
β”œβ”€β”€ search/
β”‚   β”œβ”€β”€ node.py               # Solution node representation
β”‚   └── mcts_node.py          # MCTS-specific node
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ diversity_utils.py    # SWDI/CDI computation
β”‚   β”œβ”€β”€ llm_caller.py         # LLM interaction utilities
β”‚   └── config_mcts.yaml      # Configuration file
β”œβ”€β”€ interpreter/
β”‚   └── interpreter_parallel.py  # Multi-threaded code execution
β”œβ”€β”€ Salesforce/
β”‚   └── codet5p-110m-embedding/  # CodeT5 model for embeddings
β”œβ”€β”€ main_mcts.py              # Entry point for GP agent
β”œβ”€β”€ extract_diversity.py      # Diversity metrics extraction
β”œβ”€β”€ extract_and_plot.py       # Visualization tools
β”œβ”€β”€ grading_server.py         # Submission validation server
└── report.tex                # Technical report (LaTeX)

πŸŽ“ Key Insights

Why GP Outperforms MCTS

  1. Exploration vs Exploitation: GP's population-based approach explores diverse solutions simultaneously, while MCTS tends toward depth-first local refinement
  2. Memory Mechanisms: Crossover (short-term) and mutation (long-term) create a balanced cognitive architecture
  3. Diversity Maintenance: Explicit diversity metrics and injection strategies prevent premature convergence
  4. Creative Problem-Solving: GP excels at tasks requiring innovative solutions rather than incremental improvements

Overfitting Observations

On the Nomad task, both GP and MCTS showed overfitting to validation metrics. This reflects the task's simplicity rather than algorithmic flaws. Early stopping can improve final test performance.


πŸ™ Acknowledgements

This work builds upon and is inspired by several excellent research projects:

  • 🌲 ML-Master - Base framework for AI-for-AI agents with exploration and reasoning
  • πŸ’‘ MLE-Bench - Comprehensive AutoML benchmarking platform
  • 🧬 ReEvo - LLM-driven code evolution for heuristic design
  • πŸ“Š HSEvo - Diversity metrics and semantic similarity analysis for evolutionary algorithms
  • πŸ€– CodeT5 - Pre-trained code embedding model

πŸ“š References

  1. Evolutionary Computation: Eiben, A.E. and Smith, J.E., 2015. Introduction to evolutionary computing. Springer.
  2. ReEvo: Ye et al., 2024. "ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution"
  3. HSEvo: Liu et al., 2024. "Enhancing Evolutionary Algorithms via Semantic Diversity Metrics"
  4. ML-Master: Liu et al., 2025. "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"

πŸ“„ License

This project is released for academic research purposes. Please cite our work if you use this code:

@article{xiang2025diversity,
  title={Diversity-Driven ML-Agent with Reflective Genetic Programming},
  author={Xiang, Chuyang},
  year={2025}
}

πŸ“§ Contact

For questions or issues, please open an issue on GitHub or contact the author.

Author: Chuyang Xiang (524031910627)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors