AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

This repository contains the official implementation of AlphaSAGE (Structure-Aware Alpha Mining via Generative Flow Networks for Robust Exploration), a novel framework for automated mining of predictive signals (alphas) in quantitative finance.

🎯 Overview

The automated mining of predictive signals, or alphas, is a central challenge in quantitative finance. While Reinforcement Learning (RL) has emerged as a promising paradigm for generating formulaic alphas, existing frameworks are fundamentally hampered by three interconnected issues:

Reward Sparsity: Meaningful feedback is only available upon completion of a full formula, leading to inefficient and unstable exploration
Inadequate Representation: Sequential representations fail to capture the mathematical structure that determines an alpha's behavior
Mode Collapse: Standard RL objectives drive policies towards single optimal modes, contradicting the need for diverse, non-correlated alpha portfolios

🚀 Key Innovations

AlphaSAGE addresses these challenges through three cornerstone innovations:

1. Structure-Aware Encoder

Relational Graph Convolutional Network (RGCN) based encoder that captures the inherent mathematical structure of alpha expressions
Preserves semantic relationships between operators and operands
Enables better understanding of formula behavior and properties

2. Generative Flow Networks (GFlowNets) Framework

Replaces traditional RL with GFlowNets for diverse exploration
Naturally supports multi-modal sampling for generating diverse alpha portfolios
Avoids mode collapse inherent in standard policy gradient methods

3. Dense Multi-Faceted Reward Structure

Provides rich, intermediate feedback throughout the generation process
Combines multiple evaluation criteria for comprehensive alpha assessment
Enables more stable and efficient learning compared to sparse reward signals

The overview of AlphaSAGE is shown in the following figure:

📊 Results

Empirical results demonstrate that AlphaSAGE significantly outperforms existing baselines in:

Diversity: Mining more diverse alpha portfolios
Novelty: Generating novel alpha expressions
Predictive Power: Achieving higher predictive performance
Robustness: Maintaining performance across different market conditions

The backtest results of AlphaSAGE is shown in the following figure:

Detailed results and analysis can be found in our paper.

🛠 Installation

We use PDM to manage the dependencies. To install PDM, please refer to the official documentation.

git clone https://github.com/BerkinChen/AlphaSAGE.git
cd AlphaSAGE
pdm install

📁 Data Preparation

We use the data from Qlib to train the model. Please refer to the official documentation to download the data.

🚀 Quick Start

AlphaSAGE operates in two main stages:

Stage 1: Generate Alpha Pool with GFlowNets

Generate a diverse pool of alpha expressions using our structure-aware GFlowNets framework:

python train_gfn.py \
        --seed 0 \
        --instrument csi300 \
        --pool_capacity 50 \
        --log_freq 500 \
        --update_freq 64 \
        --n_episodes 10000 \
        --encoder_type gnn \
        --entropy_coef 0.01 \
        --entropy_temperature 1.0 \
        --mask_dropout_prob 1.0 \
        --ssl_weight 1.0 \
        --nov_weight 0.3 \
        --weight_decay_type linear \
        --final_weight_ratio 0.0

Key Parameters:

--encoder_type gnn: Uses our structure-aware RGCN encoder
--pool_capacity 50: Maximum number of alphas to maintain in the pool
--entropy_coef 0.01: Controls exploration vs exploitation balance
--ssl_weight 1.0: Self-supervised learning weight for structure awareness
--nov_weight 0.3: Novelty reward weight for diversity

Stage 2: Evaluate and Combine Alpha Pool

Following AlphaForge, we use adaptive combination to create the final alpha portfolio:

python run_adaptive_combination.py \
    --expressions_file results_dir\
    --instruments csi300 \
    --threshold_ric 0.015 \
    --threshold_ricir 0.15 \
    --chunk_size 400 \
    --window inf \
    --n_factors 20 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \

🔬 Baselines and Comparisons

AlphaQCM and AlphaGen Baselines

For comparison with AlphaGen and AlphaQCM, run the following commands:

AlphaQCM:

# Train AlphaQCM
python train_qcm.py \
    --instruments csi300 \
    --pool 20 \
    --seed 0

# Evaluate AlphaQCM results
python run_adaptive_combination.py \
    --expressions_file results_dir \
    --instruments csi300 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \
    --use_weights True

AlphaGen (PPO):

# Train AlphaGen with PPO
python train_ppo.py \
    --instruments csi300 \
    --pool 20 \
    --seed 0

# Evaluate AlphaGen results  
python run_adaptive_combination.py \
    --expressions_file results_dir \
    --instruments csi300 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \
    --use_weights True

Other Baselines

For AlphaForge and other ML baselines, please refer to the AlphaForge documentation.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
config/qcm_config		config/qcm_config
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
combine_AFF.py		combine_AFF.py
eval.sh		eval.sh
exp_AFF_calc_result.ipynb		exp_AFF_calc_result.ipynb
exp_GP_calc_result.ipynb		exp_GP_calc_result.ipynb
exp_ML_train_and_result.ipynb		exp_ML_train_and_result.ipynb
exp_RL_calc_result.ipynb		exp_RL_calc_result.ipynb
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
run.sh		run.sh
run_adaptive_combination.py		run_adaptive_combination.py
train_AFF.py		train_AFF.py
train_GP.py		train_GP.py
train_gfn.py		train_gfn.py
train_ppo.py		train_ppo.py
train_qcm.py		train_qcm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

🎯 Overview

🚀 Key Innovations

1. Structure-Aware Encoder

2. Generative Flow Networks (GFlowNets) Framework

3. Dense Multi-Faceted Reward Structure

📊 Results

🛠 Installation

📁 Data Preparation

🚀 Quick Start

Stage 1: Generate Alpha Pool with GFlowNets

Stage 2: Evaluate and Combine Alpha Pool

🔬 Baselines and Comparisons

AlphaQCM and AlphaGen Baselines

AlphaQCM:

AlphaGen (PPO):

Other Baselines

📄 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

License

tomjamescn/AlphaSAGE

Folders and files

Latest commit

History

Repository files navigation

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

🎯 Overview

🚀 Key Innovations

1. Structure-Aware Encoder

2. Generative Flow Networks (GFlowNets) Framework

3. Dense Multi-Faceted Reward Structure

📊 Results

🛠 Installation

📁 Data Preparation

🚀 Quick Start

Stage 1: Generate Alpha Pool with GFlowNets

Stage 2: Evaluate and Combine Alpha Pool

🔬 Baselines and Comparisons

AlphaQCM and AlphaGen Baselines

AlphaQCM:

AlphaGen (PPO):

Other Baselines

📄 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages