A reinforcement learning framework for trading strategies using PyTorch and TorchRL, with integrated experiment tracking via MLflow.
- Deep RL Trading Agents: PPO (Discrete & Continuous), DDPG, TD3 pipelines with custom trading environments
- Experiment Tracking: MLflow integration with config/data artifacts, combined evaluation plots, and metrics
- CLI Interface: command-line tools using Typer and Rich
- Comprehensive Analytics: Reward/action comparisons plus PPO action-probability visualization
- Modular Architecture: Reusable components for research
- Python 3.12 or higher
- poetry (preferred) / pip
poetry install # install dependencies into Poetry-managed venv
poetry shell # spawn a shell inside that environmentpython -m venv .venv
source .venv/bin/activate # macOS/Linux
pip install -r requirements.txtAll commands support the --help flag, which displays detailed information about their usage and options.
| Command | Purpose |
|---|---|
python src/cli.py train [...] |
Configure and launch a single agent training run |
python src/cli.py experiment [...] |
Batch experiments with shared MLflow tracking |
python src/cli.py dashboard [...] |
Manage the MLflow UI and helper scripts |
python src/cli.py generate-data [...] |
Create or inspect synthetic datasets used for training |
python src/cli.py list-experiments |
Enumerate tracked MLflow experiments |
python src/cli.py train # launch single-agent training with default config
python src/cli.py dashboard # start MLflow UI backed by sqlite:///mlflow.dbmasters_thesis/
├── src/
│ ├── trading_rl/ # Main RL package
│ │ ├── __init__.py # Package exports
│ │ ├── config.py # Configuration classes
│ │ ├── data_utils.py # Data processing utilities
│ │ ├── envs/ # Environment builders/factories (backend-aware)
│ │ ├── models.py # Neural network model factories
│ │ ├── trainers/ # PPO, DDPG, TD3 trainer implementations (build_models + train)
│ │ ├── train_trading_agent.py # MLflow-enabled training orchestration
│ │ └── utils.py # Helper utilities
│ ├── cli.py # Command-line interface
│ ├── data_generator.py # Synthetic data generation
│ └── logger.py # Logging utilities
├── pyproject.toml # Project configuration
├── README.md # This file
└── .gitignore # Git ignore rules
The project ships with a reusable logging package (src/logger) that standardizes log formatting, destinations, and utilities across the CLI, data tools, and TorchRL workflows.
- Centralized setup via
configure_loggingorsetup_component_logger, so every module inherits the same log level, handlers, and rotation strategy. - Rich console/file output with optional levels and structured (JSON) records for downstream processing.
- Productivity helpers such as
LogContextfor scoped timing,log_dataframe_infofor quick pandas diagnostics, andlog_processing_step/log_error_with_contextfor consistent telemetry. - Environment overrides allow tuning per component (e.g.,
export TRADING_RL_LOG_LEVEL=DEBUG) without code changes.
See src/logger/README.md for advanced usage (structured logging, decorators, env variables, etc.).
For detailed information about how experiments work, including system architecture and component interactions, see docs/experiment_workflow.md. Visual walk-throughs of the algorithms are available in docs/ppo_overview.md and docs/ddpg_overview.md. These documents include:
- Complete experiment workflow with diagrams
- Component details and data flow
- MLflow integration architecture
- Configuration options and usage examples
- Performance optimization and debugging guides
All training runs are tracked in MLflow.
- Performance: Final reward, total training steps, evaluation horizon
- Training Dynamics: Actor/value losses, exploration ratios, checkpoint metadata
- Position Activity: Per-episode position changes, portfolio trajectories, action distributions
- Configuration: Every hyperparameter, network architecture, dataset stats, environment settings
- Interactive loss and reward curves per trial
- Drill-down view of position changes and trading behaviour
- Artifact bundles (plots, CSV summaries, configs)