Skip to content

stmn/contra-rl-dqn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contra RL — Rainbow DQN

An AI agent learning to play Contra (NES, 1988) using Rainbow DQN — combining 5 extensions of Deep Q-Network. The agent sees the game screen + game state features extracted from NES RAM, decides which buttons to press, and improves through thousands of attempts.

Note

Built from scratch with Claude Code, just for fun. Contra is one of the hardest NES games — after 5,000 episodes the agent learned to progress through Level 1, but beating it remains rare and inconsistent. Human-level play is an open challenge for reinforcement learning.

Dashboard

How It Works

Architecture

The Agent

  • Sees: 128x128 grayscale game frame with sprite overlays + 28 features extracted from RAM (4 stacked frames for motion detection)
  • Decides: Which of 16 button combinations to press (right, jump, shoot, combinations)
  • Learns from: Scroll progress, enemy kills, turret/boss damage, weapon upgrades, death penalties

Algorithm — Rainbow DQN (5/6) + BTR

Extension Description Flag
Double DQN Online network selects action, target evaluates — reduces overestimation always on
Prioritised Replay (PER) Sample surprising transitions more often PRIORITISED_REPLAY
Dueling DQN Separate V(state) + A(action) streams — learn state value independently DUELING_DQN
Noisy Nets Learnable noise in weights — exploration without epsilon-greedy NOISY_NETS
N-step Returns Multi-step reward bootstrapping — faster credit assignment N_STEP_RETURNS

Plus: Huber loss (robust to outliers), gradient clipping, hybrid observation (pixels + game state features).

Beyond The Rainbow (BTR) upgrades (Clark et al. 2024), all behind feature flags:

Upgrade Description Flag
IMPALA CNN 2× width ResNet-style CNN — more expressive backbone IMPALA_CNN
Spectral Norm Constrains Lipschitz constant of conv layers — stabilizes training SPECTRAL_NORM
Munchausen RL Soft DQN with log-policy bonus — encourages exploration MUNCHAUSEN_RL

Tip

Every extension can be toggled independently via .env flags. Try enabling/disabling them to see how each one affects training performance.

Sprite Overlay

Enemy positions and bullets read from NES RAM and drawn as shape markers (14 enemy types from ROM disassembly):

  • Rectangles — soldiers (running man, sniper, scuba diver, turret man)
  • Triangles — turrets (rotating gun, red turret, wall cannon, boss turret)
  • Circles — projectiles (enemy bullets, mortar shots)
  • Diamonds — pickups (weapon box, flying capsule) and boss door

Reward System

Signal Value Purpose
Map progress scroll_delta * PROGRESS_SCALE * speed_bonus Moving forward through the level
Enemy kill score_delta * 15 Incentivize shooting
Turret/boss hit +50 per hit Reward damaging multi-HP enemies
Weapon upgrade +100 per strength level Pick up better weapons (Spread = +300)
Stagnation -1 per step after 5s idle Prevent getting stuck
Death -500 Avoid enemies and bullets

Per-Level Models

Each level has its own model, replay buffer, and statistics. Switch levels with Cmd/Ctrl+1-8. Level names from ROM:

Key Level
1 Jungle
2 Base 1
3 Waterfall
4 Base 2
5 Snow Field
6 Energy Zone
7 Hangar
8 Alien's Lair

Stability

  • Auto-rollback: If average reward drops 50% from peak, loads the best checkpoint (per level)
  • Auto-save: Best model saved on new peak (per level)
  • Practice mode: Save/load game state for targeted training on specific sections

Tech Stack

Component Technology
NES Emulator cynes (Rust, ARM64 + x86_64)
RL Algorithm Rainbow DQN (PyTorch)
GPU Apple Silicon MPS / NVIDIA CUDA / CPU
Dashboard FastAPI + WebSocket + Chart.js + Tippy.js
ROM Analysis Contra NES Disassembly

Note

cynes is a headless emulator — no audio output. Use Watch Mode (FCEUX) to hear the game.

Dashboard

Dashboard Tour (YouTube)

Real-time web dashboard at http://localhost:41918:

  • Live game preview — click to swap main/agent view
  • Overview — episodes, timesteps, FPS, buffer, RAM usage
  • Live tab — rewards breakdown, events feed, agent view (enemies/bullets/weapon), features, actions, Q-values
  • Leaderboard — top runs per level
  • Levels — switch levels, per-level stats
  • Config — all hyperparameters with tooltips
  • Reward History — chart with toggleable datasets (reward, avg survival, boss reach %), crosshair on hover
  • Level Progress — death heatmap, practice marker, PB line
  • Keyboard: Space (pause), Arrow Right (step), Cmd/Ctrl+1-8 (switch level)

Watch Mode (FCEUX)

Watch the agent play with real NES audio:

brew install fceux
./scripts/watch.sh

FCEUX sends screen pixels + RAM to Python. The agent applies overlay and runs inference. NES 2C02 palette matched between emulators.

Warning

The agent plays worse in FCEUX than in training:

  • Input latency — file-based communication adds ~2 frame delay. In Contra, 2 frames decide between dodging a bullet and dying.
  • Decision instability — early models assign nearly identical scores to different actions, so small pixel differences between emulators can flip the chosen action entirely.

The web dashboard shows the true agent performance.

Quick Start

python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .

cp /path/to/contra.nes roms/contra.nes
cp .env.example .env  # edit DEVICE=mps for Apple Silicon

./start-fresh.sh      # fresh training
./start.sh            # resume from checkpoint

Project Structure

contra-rl-dqn/
├── contra/
│   ├── env/
│   │   ├── contra_env.py     # NES environment, reward, sprite overlay, level switching
│   │   └── wrappers.py       # Grayscale, resize, frame stack, stream capture
│   ├── training/
│   │   ├── dqn.py            # Rainbow DQN: Dueling, Noisy, N-step, PER, Double
│   │   └── callbacks.py      # Frame buffer, per-level best run recording
│   ├── web/
│   │   ├── server.py         # FastAPI dashboard, WebSocket, level/model APIs
│   │   └── static/           # HTML, CSS, JS (Tippy.js tooltips, Chart.js)
│   └── stats/
│       └── tracker.py        # Per-level stats, persistence, death heatmap
├── config/settings.py        # All settings with feature flags
├── scripts/
│   ├── run.py                # Main entry point
│   ├── watch.py              # FCEUX watch mode
│   ├── watch.sh              # One-command launcher
│   ├── fceux_agent.lua       # FCEUX Lua bridge
│   └── ram_monitor.py        # RAM debugging (play in FCEUX, monitor changes)
├── docs/                     # Plans, narration script
├── roms/                     # ROM + NES palette (gitignored)
└── checkpoints/              # Model checkpoints per level (gitignored)

License

Educational and research purposes. Contra is a trademark of Konami. You must provide your own legally obtained ROM.

About

RL agent learning to play Contra (NES) using DQN with experience replay, sprite overlay from RAM, and a real-time web dashboard. Built with PyTorch + cynes

Topics

Resources

Stars

Watchers

Forks

Contributors