Deep Reinforcement Learning: Zero to Hero!

Welcome to drlzh.ai: a hands-on deep reinforcement learning course where you build the algorithms, not just read about them.

Start from MDPs and tabular RL, then work your way to the algorithms behind Atari agents, continuous-control robots, AlphaZero-style planning, RLHF for language models, Decision Transformers, VLA-style policies, world models, Dreamer, and meta-learning.

The root notebooks are the exercise track: code is intentionally replaced with guided TODO sections. The solution/ notebooks contain the complete, runnable versions, so you can unblock yourself without leaving the course.

Curriculum

Notebooks	Track	You build
`00`-`07`	Foundations	MDPs, tabular RL, DQN, REINFORCE, actor-critic methods, DDPG, TD3, SAC, PPO
`08`-`10`	Breaking assumptions	RND curiosity, multi-agent RL, offline RL with BC and IQL
`11`	Planning	Monte Carlo Tree Search, self-play, AlphaZero-style policy/value learning
`12`-`13`	Modern AI stack	RLHF with PPO, DPO, GRPO, Decision Transformers, and NanoVLA (`DTVLA`)
`14`	Production	TensorBoard, checkpointing, debugging, multiple seeds, Ray, Optuna
`15`-`16`	World models	MBPO with SAC, then `DR3AM`/Dreamer with RSSM latent imagination
`17`-`18`	Meta + wrap-up	MAML, FOMAML, fast adaptation, and course conclusion

The foundations are meant to be done in order. The advanced notebooks are self-contained, but the numbering gives you a good default path from exploration to the course capstone.

AI Companion

The Docker workspace includes the DRL-ZH AI Companion, a VS Code extension built for this course. It knows which notebook and TODO you are working on, offers Socratic hints instead of spoilers, and supports text or voice mode. Bring your own LLM key: Gemini is the default, with OpenAI, Anthropic, and Groq supported too.

Quick Start

The recommended setup is Docker: it gives you code-server, the notebooks, Python >=3.13,<3.14, the Jupyter kernel, dependencies, and the AI Companion in one reproducible workspace.

Install Docker and Git, then clone this repository and cd into it.
On Linux/macOS, run printf "UID=$(id -u)\nGID=$(id -g)\n" > .env so files are owned by you.
Start the default environment:
```
docker compose up --build -d
```
Open http://localhost:8080 in a Chromium-based browser and select the Python (drl-zh) kernel.
Open 00_Intro.ipynb and start filling in TODOs.

For NVIDIA GPU access, use:

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build -d

For a smaller CPU-only image, use:

docker compose -f docker-compose.yml -f docker-compose.cpu.yml up --build -d

Prefer a native setup? See MANUAL.md for Python, Poetry, VS Code, and Companion instructions.

Prerequisites

You should be comfortable with Python, PyTorch basics, and the usual math behind ML: probability, statistics, linear algebra, and derivatives. The notebooks teach the RL, but they assume you can read and modify real training code.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.vscode		.vscode
assets		assets
cpu		cpu
extension		extension
solution		solution
util		util
.dockerignore		.dockerignore
.gitignore		.gitignore
00_Intro.ipynb		00_Intro.ipynb
01_MDP.ipynb		01_MDP.ipynb
02_RL.ipynb		02_RL.ipynb
03_DQN.ipynb		03_DQN.ipynb
04_PG.ipynb		04_PG.ipynb
05_AC.ipynb		05_AC.ipynb
06_PPO.ipynb		06_PPO.ipynb
07_Next.ipynb		07_Next.ipynb
08_EXPL.ipynb		08_EXPL.ipynb
09_MARL.ipynb		09_MARL.ipynb
10_OFFL.ipynb		10_OFFL.ipynb
11_MCTS.ipynb		11_MCTS.ipynb
12_RLHF.ipynb		12_RLHF.ipynb
13_DTVLA.ipynb		13_DTVLA.ipynb
14_PROD.ipynb		14_PROD.ipynb
15_MBRL.ipynb		15_MBRL.ipynb
16_DR3AM.ipynb		16_DR3AM.ipynb
17_META.ipynb		17_META.ipynb
18_EOF.ipynb		18_EOF.ipynb
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANUAL.md		MANUAL.md
README.md		README.md
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning: Zero to Hero!

Curriculum

AI Companion

Quick Start

Prerequisites

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning: Zero to Hero!

Curriculum

AI Companion

Quick Start

Prerequisites

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages