Skip to content

Official implementation of AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration (https://arxiv.org/abs/2509.25055).

License

Notifications You must be signed in to change notification settings

tomjamescn/AlphaSAGE

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration

License Python


This repository contains the official implementation of AlphaSAGE (Structure-Aware Alpha Mining via Generative Flow Networks for Robust Exploration), a novel framework for automated mining of predictive signals (alphas) in quantitative finance.

🎯 Overview

The automated mining of predictive signals, or alphas, is a central challenge in quantitative finance. While Reinforcement Learning (RL) has emerged as a promising paradigm for generating formulaic alphas, existing frameworks are fundamentally hampered by three interconnected issues:

  1. Reward Sparsity: Meaningful feedback is only available upon completion of a full formula, leading to inefficient and unstable exploration
  2. Inadequate Representation: Sequential representations fail to capture the mathematical structure that determines an alpha's behavior
  3. Mode Collapse: Standard RL objectives drive policies towards single optimal modes, contradicting the need for diverse, non-correlated alpha portfolios

πŸš€ Key Innovations

AlphaSAGE addresses these challenges through three cornerstone innovations:

1. Structure-Aware Encoder

  • Relational Graph Convolutional Network (RGCN) based encoder that captures the inherent mathematical structure of alpha expressions
  • Preserves semantic relationships between operators and operands
  • Enables better understanding of formula behavior and properties

2. Generative Flow Networks (GFlowNets) Framework

  • Replaces traditional RL with GFlowNets for diverse exploration
  • Naturally supports multi-modal sampling for generating diverse alpha portfolios
  • Avoids mode collapse inherent in standard policy gradient methods

3. Dense Multi-Faceted Reward Structure

  • Provides rich, intermediate feedback throughout the generation process
  • Combines multiple evaluation criteria for comprehensive alpha assessment
  • Enables more stable and efficient learning compared to sparse reward signals

The overview of AlphaSAGE is shown in the following figure: overview

πŸ“Š Results

Empirical results demonstrate that AlphaSAGE significantly outperforms existing baselines in:

  • Diversity: Mining more diverse alpha portfolios
  • Novelty: Generating novel alpha expressions
  • Predictive Power: Achieving higher predictive performance
  • Robustness: Maintaining performance across different market conditions

The backtest results of AlphaSAGE is shown in the following figure: backtest

Detailed results and analysis can be found in our paper.

πŸ›  Installation

We use PDM to manage the dependencies. To install PDM, please refer to the official documentation.

git clone https://github.com/BerkinChen/AlphaSAGE.git
cd AlphaSAGE
pdm install

πŸ“ Data Preparation

We use the data from Qlib to train the model. Please refer to the official documentation to download the data.

πŸš€ Quick Start

AlphaSAGE operates in two main stages:

Stage 1: Generate Alpha Pool with GFlowNets

Generate a diverse pool of alpha expressions using our structure-aware GFlowNets framework:

python train_gfn.py \
        --seed 0 \
        --instrument csi300 \
        --pool_capacity 50 \
        --log_freq 500 \
        --update_freq 64 \
        --n_episodes 10000 \
        --encoder_type gnn \
        --entropy_coef 0.01 \
        --entropy_temperature 1.0 \
        --mask_dropout_prob 1.0 \
        --ssl_weight 1.0 \
        --nov_weight 0.3 \
        --weight_decay_type linear \
        --final_weight_ratio 0.0

Key Parameters:

  • --encoder_type gnn: Uses our structure-aware RGCN encoder
  • --pool_capacity 50: Maximum number of alphas to maintain in the pool
  • --entropy_coef 0.01: Controls exploration vs exploitation balance
  • --ssl_weight 1.0: Self-supervised learning weight for structure awareness
  • --nov_weight 0.3: Novelty reward weight for diversity

Stage 2: Evaluate and Combine Alpha Pool

Following AlphaForge, we use adaptive combination to create the final alpha portfolio:

python run_adaptive_combination.py \
    --expressions_file results_dir\
    --instruments csi300 \
    --threshold_ric 0.015 \
    --threshold_ricir 0.15 \
    --chunk_size 400 \
    --window inf \
    --n_factors 20 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \

πŸ”¬ Baselines and Comparisons

AlphaQCM and AlphaGen Baselines

For comparison with AlphaGen and AlphaQCM, run the following commands:

AlphaQCM:

# Train AlphaQCM
python train_qcm.py \
    --instruments csi300 \
    --pool 20 \
    --seed 0

# Evaluate AlphaQCM results
python run_adaptive_combination.py \
    --expressions_file results_dir \
    --instruments csi300 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \
    --use_weights True

AlphaGen (PPO):

# Train AlphaGen with PPO
python train_ppo.py \
    --instruments csi300 \
    --pool 20 \
    --seed 0

# Evaluate AlphaGen results  
python run_adaptive_combination.py \
    --expressions_file results_dir \
    --instruments csi300 \
    --cuda 2 \
    --train_end_year 2020 \
    --seed 0 \
    --use_weights True

Other Baselines

For AlphaForge and other ML baselines, please refer to the AlphaForge documentation.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

We welcome contributions! Please feel free to submit a Pull Request.

About

Official implementation of AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration (https://arxiv.org/abs/2509.25055).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.6%
  • Jupyter Notebook 6.3%
  • Shell 0.1%