An AI agent learning to play Contra (NES, 1988) using Rainbow DQN — combining 5 extensions of Deep Q-Network. The agent sees the game screen + game state features extracted from NES RAM, decides which buttons to press, and improves through thousands of attempts.
Note
Built from scratch with Claude Code, just for fun. Contra is one of the hardest NES games — after 5,000 episodes the agent learned to progress through Level 1, but beating it remains rare and inconsistent. Human-level play is an open challenge for reinforcement learning.
- Sees: 128x128 grayscale game frame with sprite overlays + 28 features extracted from RAM (4 stacked frames for motion detection)
- Decides: Which of 16 button combinations to press (right, jump, shoot, combinations)
- Learns from: Scroll progress, enemy kills, turret/boss damage, weapon upgrades, death penalties
| Extension | Description | Flag |
|---|---|---|
| Double DQN | Online network selects action, target evaluates — reduces overestimation | always on |
| Prioritised Replay (PER) | Sample surprising transitions more often | PRIORITISED_REPLAY |
| Dueling DQN | Separate V(state) + A(action) streams — learn state value independently | DUELING_DQN |
| Noisy Nets | Learnable noise in weights — exploration without epsilon-greedy | NOISY_NETS |
| N-step Returns | Multi-step reward bootstrapping — faster credit assignment | N_STEP_RETURNS |
Plus: Huber loss (robust to outliers), gradient clipping, hybrid observation (pixels + game state features).
Beyond The Rainbow (BTR) upgrades (Clark et al. 2024), all behind feature flags:
| Upgrade | Description | Flag |
|---|---|---|
| IMPALA CNN | 2× width ResNet-style CNN — more expressive backbone | IMPALA_CNN |
| Spectral Norm | Constrains Lipschitz constant of conv layers — stabilizes training | SPECTRAL_NORM |
| Munchausen RL | Soft DQN with log-policy bonus — encourages exploration | MUNCHAUSEN_RL |
Tip
Every extension can be toggled independently via .env flags. Try enabling/disabling them to see how each one affects training performance.
Enemy positions and bullets read from NES RAM and drawn as shape markers (14 enemy types from ROM disassembly):
- Rectangles — soldiers (running man, sniper, scuba diver, turret man)
- Triangles — turrets (rotating gun, red turret, wall cannon, boss turret)
- Circles — projectiles (enemy bullets, mortar shots)
- Diamonds — pickups (weapon box, flying capsule) and boss door
| Signal | Value | Purpose |
|---|---|---|
| Map progress | scroll_delta * PROGRESS_SCALE * speed_bonus |
Moving forward through the level |
| Enemy kill | score_delta * 15 |
Incentivize shooting |
| Turret/boss hit | +50 per hit |
Reward damaging multi-HP enemies |
| Weapon upgrade | +100 per strength level |
Pick up better weapons (Spread = +300) |
| Stagnation | -1 per step after 5s idle |
Prevent getting stuck |
| Death | -500 |
Avoid enemies and bullets |
Each level has its own model, replay buffer, and statistics. Switch levels with Cmd/Ctrl+1-8. Level names from ROM:
| Key | Level |
|---|---|
| 1 | Jungle |
| 2 | Base 1 |
| 3 | Waterfall |
| 4 | Base 2 |
| 5 | Snow Field |
| 6 | Energy Zone |
| 7 | Hangar |
| 8 | Alien's Lair |
- Auto-rollback: If average reward drops 50% from peak, loads the best checkpoint (per level)
- Auto-save: Best model saved on new peak (per level)
- Practice mode: Save/load game state for targeted training on specific sections
| Component | Technology |
|---|---|
| NES Emulator | cynes (Rust, ARM64 + x86_64) |
| RL Algorithm | Rainbow DQN (PyTorch) |
| GPU | Apple Silicon MPS / NVIDIA CUDA / CPU |
| Dashboard | FastAPI + WebSocket + Chart.js + Tippy.js |
| ROM Analysis | Contra NES Disassembly |
Note
cynes is a headless emulator — no audio output. Use Watch Mode (FCEUX) to hear the game.
Real-time web dashboard at http://localhost:41918:
- Live game preview — click to swap main/agent view
- Overview — episodes, timesteps, FPS, buffer, RAM usage
- Live tab — rewards breakdown, events feed, agent view (enemies/bullets/weapon), features, actions, Q-values
- Leaderboard — top runs per level
- Levels — switch levels, per-level stats
- Config — all hyperparameters with tooltips
- Reward History — chart with toggleable datasets (reward, avg survival, boss reach %), crosshair on hover
- Level Progress — death heatmap, practice marker, PB line
- Keyboard: Space (pause), Arrow Right (step), Cmd/Ctrl+1-8 (switch level)
Watch the agent play with real NES audio:
brew install fceux
./scripts/watch.shFCEUX sends screen pixels + RAM to Python. The agent applies overlay and runs inference. NES 2C02 palette matched between emulators.
Warning
The agent plays worse in FCEUX than in training:
- Input latency — file-based communication adds ~2 frame delay. In Contra, 2 frames decide between dodging a bullet and dying.
- Decision instability — early models assign nearly identical scores to different actions, so small pixel differences between emulators can flip the chosen action entirely.
The web dashboard shows the true agent performance.
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .
cp /path/to/contra.nes roms/contra.nes
cp .env.example .env # edit DEVICE=mps for Apple Silicon
./start-fresh.sh # fresh training
./start.sh # resume from checkpointcontra-rl-dqn/
├── contra/
│ ├── env/
│ │ ├── contra_env.py # NES environment, reward, sprite overlay, level switching
│ │ └── wrappers.py # Grayscale, resize, frame stack, stream capture
│ ├── training/
│ │ ├── dqn.py # Rainbow DQN: Dueling, Noisy, N-step, PER, Double
│ │ └── callbacks.py # Frame buffer, per-level best run recording
│ ├── web/
│ │ ├── server.py # FastAPI dashboard, WebSocket, level/model APIs
│ │ └── static/ # HTML, CSS, JS (Tippy.js tooltips, Chart.js)
│ └── stats/
│ └── tracker.py # Per-level stats, persistence, death heatmap
├── config/settings.py # All settings with feature flags
├── scripts/
│ ├── run.py # Main entry point
│ ├── watch.py # FCEUX watch mode
│ ├── watch.sh # One-command launcher
│ ├── fceux_agent.lua # FCEUX Lua bridge
│ └── ram_monitor.py # RAM debugging (play in FCEUX, monitor changes)
├── docs/ # Plans, narration script
├── roms/ # ROM + NES palette (gitignored)
└── checkpoints/ # Model checkpoints per level (gitignored)
Educational and research purposes. Contra is a trademark of Konami. You must provide your own legally obtained ROM.