A professional, modular reinforcement learning implementation that solves the classic MountainCar-v0 environment from OpenAI Gymnasium using Q-Learning algorithm.
This project demonstrates how a car stuck in a valley can learn to reach the goal at the top of the mountain using Q-Learning, a model-free reinforcement learning algorithm. The agent learns an optimal policy through trial and error, discovering that building momentum by moving back and forth is key to reaching the goal.
- Objective: Drive an underpowered car up a steep mountain
- Challenge: The car's engine is not strong enough to climb the mountain in a single pass
- Solution: The agent must learn to build momentum by driving back and forth
- ✅ Modular Architecture: Clean separation of concerns with dedicated modules
- ✅ Q-Learning Implementation: Tabular Q-learning with discretized state space
- ✅ CLI Interface: Easy-to-use command-line scripts for training and evaluation
- ✅ Configuration Management: YAML-based configuration for easy parameter tuning
- ✅ Training Mode: Train the agent from scratch and save the Q-table
- ✅ Evaluation Mode: Load pre-trained Q-table and evaluate performance
- ✅ Visualization: Automatic plotting of training progress with rolling mean
- ✅ Performance Metrics: Comprehensive evaluation statistics
- Python 3.8 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/3bsalam-1/MountainCar.git cd MountainCar -
Install dependencies
pip install -r requirements.txt
-
Optional: Install as package
pip install -e .
gymnasium>=0.29.0- OpenAI Gym environmentsnumpy>=1.24.0- Numerical computingmatplotlib>=3.7.0- Plotting and visualization
Train the agent using the CLI script:
# Basic training (5000 episodes, no rendering)
python scripts/train.py
# Training with custom parameters
python scripts/train.py --episodes 10000 --learning-rate 0.8 --render
# Quick test (100 episodes)
python scripts/train.py --episodes 100 --no-renderTraining Options:
--episodes: Number of training episodes (default: 5000)--bins: Number of bins for state discretization (default: 20)--learning-rate: Learning rate α (default: 0.9)--discount-factor: Discount factor γ (default: 0.9)--epsilon: Initial exploration rate (default: 1.0)--render/--no-render: Enable/disable environment rendering--save-path: Path to save Q-table (default: models/mountain_car.pkl)--plot-path: Path to save training plot (default: outputs/plots/training_progress.png)
Evaluate a trained agent:
# Evaluate with rendering (default)
python scripts/evaluate.py
# Evaluate with custom model
python scripts/evaluate.py --model-path models/my_model.pkl --episodes 20
# Evaluate without rendering
python scripts/evaluate.py --no-renderEvaluation Options:
--model-path: Path to trained Q-table (default: models/mountain_car.pkl)--episodes: Number of evaluation episodes (default: 10)--bins: Number of bins for state discretization (default: 20)--render/--no-render: Enable/disable environment rendering
You can also use the package programmatically:
from mountaincar import Trainer, Evaluator
# Train an agent
trainer = Trainer(num_bins=20, learning_rate=0.9, render=False)
rewards = trainer.train(episodes=5000)
# Evaluate the trained agent
evaluator = Evaluator(model_path='models/mountain_car.pkl', render=True)
metrics = evaluator.evaluate(episodes=10)
print(f"Success Rate: {metrics['success_rate']:.1f}%")MountainCar/
├── src/
│ └── mountaincar/ # Main package
│ ├── __init__.py # Package initialization
│ ├── agent.py # Q-Learning agent implementation
│ ├── trainer.py # Training logic and loop
│ ├── evaluator.py # Evaluation and metrics
│ └── utils.py # Utilities (plotting, discretization)
├── scripts/
│ ├── train.py # Training CLI entry point
│ └── evaluate.py # Evaluation CLI entry point
├── config/
│ └── config.yaml # Configuration parameters
├── models/
│ └── mountain_car.pkl # Trained Q-tables
├── outputs/
│ └── plots/ # Training visualizations
│ └── mountain_car.png
├── tests/ # Unit tests (future)
├── .github/
│ └── workflows/ # CI/CD workflows
├── requirements.txt # Python dependencies
├── setup.py # Package setup configuration
├── .gitignore # Git ignore rules
├── README.md # This file
├── AUTHOR # Project author
└── LAST_UPDATED # Last update timestamp
| Parameter | Default | Description |
|---|---|---|
| Learning Rate (α) | 0.9 | How much new information overrides old information |
| Discount Factor (γ) | 0.9 | Importance of future rewards |
| Initial Epsilon (ε) | 1.0 | Exploration rate (100% random at start) |
| Epsilon Decay | 2/episodes | Linear decay to 0 by end of training |
| State Bins | 20×20 | Discretization grid size |
The continuous state space is discretized into a 20×20 grid:
- Position: 20 bins between -1.2 and 0.6
- Velocity: 20 bins between -0.07 and 0.07
The agent can choose from 3 discrete actions:
- 0: Push left
- 1: No push (neutral)
- 2: Push right
Q(s,a) ← Q(s,a) + α[r + γ·max(Q(s',a')) - Q(s,a)]
Where:
s= current statea= action takenr= reward receiveds'= next stateα= learning rateγ= discount factor
The training progress is visualized in outputs/plots/training_progress.png, showing the mean reward over a 100-episode rolling window. As training progresses:
- Early episodes: Agent explores randomly, often failing to reach the goal
- Mid training: Agent discovers the momentum strategy
- Late training: Agent consistently reaches the goal with optimal policy
- ✅ Modular imports: Load only what's needed
- ✅ Separation of concerns: Easy to optimize individual components
- ✅ Reusable components: Agent, trainer, and evaluator can be used independently
- ✅ Clear organization: Each module has a single responsibility
- ✅ Easy navigation: Find and modify specific functionality quickly
- ✅ Better version control: Changes are isolated to relevant modules
- ✅ Extensible design: Easy to add new algorithms or environments
- ✅ Test-friendly: Modular structure facilitates unit testing
- ✅ Configuration-driven: Change behavior without modifying code
Edit config/config.yaml to customize default parameters:
training:
episodes: 5000
num_bins: 20
render: false
hyperparameters:
learning_rate: 0.9
discount_factor: 0.9
epsilon: 1.0
paths:
model_save: models/mountain_car.pkl
plot_save: outputs/plots/training_progress.pngRun tests (when implemented):
pytest tests/This project demonstrates:
- Reinforcement Learning Basics: Agent-environment interaction, rewards, and policies
- Q-Learning Algorithm: Value-based method for learning optimal policies
- Exploration vs Exploitation: Balancing discovery with using known strategies
- State Discretization: Converting continuous spaces to discrete representations
- Software Engineering: Modular design, separation of concerns, and clean architecture
Contributions are welcome! Feel free to:
- Report bugs
- Suggest new features
- Submit pull requests
- Improve documentation
This project is open source and available for educational purposes.
3bsalam-1
December 10, 2025
Happy Learning! 🚗💨