State: In Progress
Over time, I intend to compare the functionality and implement well known RL algorithm. I'm going to focus on the model-free domain, but over time I would like to cover the whole following map.
NOTE: I intend to write only small implementations of the algorithms when possible and based on other sources on the Internet. I don't intend to do hard detailed implementations of the algorithms. This source is only for understanding.
| Algorithm | Type | Policy Formula | Algorithm Key Points | Other Formulas |
|---|---|---|---|---|
| Q-Learning | Value-based | Learns a Q-function for action-value estimation using the Bellman equation. Updates Q-values via: |
TD Error: |
|
| Deep Q-Learning (DQN) | Value-based | Extension of Q-learning using deep neural networks for Q-function approximation. Implements experience replay and fixed Q-targets. Loss Function: |
Loss Function: |
|
| REINFORCE | Policy-based | Directly learns a stochastic policy using full episode returns. Policy update proportional to the gradient of log probability of the policy weighted by return. Policy Gradient: |
The weight others: |
|
| Advantage Actor-Critic (A2C) | Actor-Critic | Actor: Critic: |
Combines value and policy methods, with an actor for actions and a critic for evaluation. Actor updates policy based on critic's advice. Critic evaluates action based on current policy. Uses advantage function to reduce update variance. |
can be seen as an estimate of the Advantage Function: |
NOTE:
- Complete note 001_DQN
- Write a short summary of 003 REINFORCE
- Code A2C
- Code REINFORCE
- Review TRPO
- Q-Learning:
- DQN: Code | Summary | Playing Atari with Deep Reinforcement Learning | There is another version in Nature.
- REINFORCE: Intro to Policy Optimization | Benchmarking Deep Reinforcement Learning for Continuous Control
- A2C / A3C: Summary | Asynchronous Methods for Deep Reinforcement Learning
- https://github.com/eemlcommunity/PracticalSessions2022/tree/main/rl
- https://www.youtube.com/live/-PxTOolYPzQ?si=X-8lBKmBUr3Kxma8&t=26323
- https://deeplearning.neuromatch.io/_images/lunar_lander.svg
- https://www.youtube.com/live/KUaoh2I5H88?si=-S-LcZvglrrPT6SC&t=25049
- https://www.youtube.com/watch?v=TjHH_--7l8g&ab_channel=Serrano.Academy