Skip to content

xaskasdf/ps2-llm

Repository files navigation

PS2 LLM Demo

Running a large language model on a PlayStation 2.

This project started as an experiment born from two passions: retrogaming and LLMs. Having built a complete PS2 SDK from scratch (including tools that had to be rewritten due to incompatibilities with modern software and hardware), and having extensive experience working with language models, the question after seeing a team run an LLM on a Windows 98 PC was simple: "Can I run this on a 26-year-old game console?"

The answer is yes. The PS2's Emotion Engine (MIPS-III @ 294 MHz, 32 MB RAM) can run transformer inference by streaming model weights from CD-ROM one matrix at a time, keeping only activations and KV cache in memory. The current default model is brandon-tiny-10m-instruct, a custom 10M-parameter architecture running at Q8 precision.

Website: naranjositos.tech

PS2 LLM Demo running on PlayStation 2

How It Works

The PS2 has 32 MB of RAM total. Model weights don't need to fit in memory -- the inference engine streams them from CD-ROM one matrix at a time during the forward pass. Only activations, KV cache, token embeddings, and RMS norms stay in RAM.

This means models much larger than 32 MB can run on the console. A 77 MB model works -- it just reads more from CD. See MODELS.md for details on all models tested.

Models

Several models were tested during development. The current default is brandon-tiny-10m (Q8, ~10.4 MB), chosen for its balance of speed and coherence on PS2 hardware.

Model HuggingFace Quant PSNT Size Status
brandon-tiny-10m-instruct xaskasdf/brandon-tiny-10m-instruct Q8 ~10.4 MB Current default
SmolLM2-135M-Instruct HuggingFaceTB/SmolLM2-135M-Instruct Q4 ~77 MB Tested (too slow, 30 layers)
TinyLlama 110M karpathy/tinyllamas Q4 ~30 MB Tested
TinyLlama 110M karpathy/tinyllamas Ternary ~27 MB Tested (poor quality)
stories260K karpathy/tinyllamas float32 ~1 MB Early testing only

See MODELS.md for detailed specs, conversion pipelines, and guidance on adding new models.

Build

Requires the ps2_biw_engine SDK at ../ps2_biw_engine.

./build.sh

Output: build/ps2_llm_demo.elf (ELF) and build/ps2_llm_demo.iso (bootable CD image).

Model Conversion

Requires Python with numpy and torch:

# Brandon model (custom architecture): safetensors -> PSNT v3 Q8
python3 tools/brandon_to_psnt.py --quant q8 \
    third_party/brandon-tiny/model.safetensors \
    cd_rom/DATA/LLM/brandon-q8.psnt

# Standard HuggingFace models: HF -> llama2.c float32 -> PSNT
python3 tools/hf_to_llama2c.py third_party/model-dir/ model.bin
python3 tools/q4_quantize.py model.bin model.psnt          # Q4
python3 tools/psnt_quantize.py model.bin model.psnt         # Ternary

# SentencePiece tokenizer -> binary
python3 tools/sp_tokenizer_to_llama2c.py \
    third_party/brandon-tiny/tokenizer.model \
    cd_rom/DATA/LLM/tok8k.bin

Model Format

Weights use the PSNT (PS Net) binary format, a compact quantized format designed for the PS2's constraints. Supports ternary (2-bit), Q4 (4-bit), and Q8 (8-bit) quantization. See PSNT.md for the full specification.

Project Structure

  • game_main.c -- PS2 entry point, UI, controller input, CD-ROM streaming
  • llama_ps2.c -- Self-contained LLM inference engine (included inline by game_main.c)
  • game_scene.c -- Engine scene callback stubs
  • tools/ -- Python conversion and verification scripts
  • cd_rom/ -- Runtime data (models, tokenizers, IOP modules) burned to CD image
  • PSNT.md -- Model format specification
  • MODELS.md -- Model support details and history

License

See individual model licenses on their HuggingFace pages.

About

Running a large language model on a PlayStation 2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors