A from-scratch implementation of a Tiny Recursive Model (TRM) in Zig.
This project is a clean-room implementation inspired by nano-trm (and the paper Tiny Recursive Models), rejecting the easy prototyping of PyTorch for an explicit, typed, and performant approach.
While standard Large Language Models (LLMs) rely on the Transformer architecture, carpathian implements a Recursive Model.
-
State Management:
- Transformers: Maintain a KV-cache that grows with sequence length. Attention is computed over the entire history (or a sliding window). Memory is
O(N). - TRM: Maintains a fixed-size compressed state vector. The model updates this state recursively with every new token. Memory is
O(1)during inference.
- Transformers: Maintain a KV-cache that grows with sequence length. Attention is computed over the entire history (or a sliding window). Memory is
-
Processing:
- Transformers: Highly parallelizable during training, but inference is bound by memory bandwidth due to the cache.
- TRM: Sequential by nature (Recurrent). Every step depends on the immediate previous state.
-
Core Mechanism:
- Instead of
Attention(Q, K, V), TRM uses a gating mechanism to interpolate between the old state and a new candidate state, effectively "erasing" and "writing" to its memory.
- Instead of
src/model/: TRM-specific logic (recurrence loop, state updates).src/utils/: Custom tensor operations and math primitives (no heavy external ML libs).src/data/: Data loaders for sequential tasks (Sudoku, Maze).
Requires Zig 0.16.0 (nightly).
# Build the executable
zig build
# Run unit tests
zig build test
# Run inference/training (args TBD)
zig build run -- [args]