knuth-mmix is a Lean 4 package for assembling and executing a useful, typed slice of Donald
Knuth's MMIX architecture. It combines:
BitVec-backed MMIX word, register, and address types- a label-aware assembler that emits real 32-bit MMIX instruction words
- a small executable machine semantics for running assembled programs inside Lean
- a
[mmix| ... ]quasiquoter for writing assembly fragments directly in Lean code
The package is aimed at lightweight ISA experiments, executable examples, and formal work where it is useful to stay close to actual MMIX encodings without dropping into ad hoc integers.
- typed MMIX sizes:
Byte,Wyde,Tetra,Octa,Register, andAddress - sparse big-endian memory with MMIX-style alignment rounding
- instruction encoding for a broad MMIX integer, memory, control-flow, address, and special-register subset
- local/global register windows plus executable register-stack subroutine linkage
- executable semantics for assembled programs via
Machine.stepandMachine.run - helper routines such as
Program.materializefor constructing 64-bit literals from wyde lanes - build-time examples that check encoding and execution behavior
Add the dependency to your lakefile.toml:
[[require]]
name = "knuth-mmix"
git = "https://github.com/alok/knuth-mmix"
rev = "main"Then import the umbrella module:
import KnuthMmiximport KnuthMmix
open KnuthMmix
def copyAndCompare : Program := [mmix|
SETL $0, 0x1234;
INCML $0, 0x5678;
SETL $1, 0x0010;
STOU $0, $1, 0;
LDOU $2, $1, 0;
CMPU $3, $0, $2;
BNZ $3, fail;
JMP done;
fail:;
SETL $4, 1;
done:;
SWYM
]
def runCopyAndCompare : Except String (Octa × Octa) := do
let assembled ← Program.assemble copyAndCompare
let machine ← Machine.run 16 assembled {}
pure (machine.readReg (regLit 2), machine.readReg (regLit 4))For a pure encoding pass, use Program.encode. For full execution, assemble first and then run the
resulting AssembledProgram.
SET $r, n behaves like an assembler pseudo-instruction and expands to the wyde-lane sequence
needed to materialize the full 64-bit literal. If you want the same behavior directly from Lean,
use Program.materialize.
The current executable model covers:
- local, marginal, and global general-purpose register behavior with
rLandrG - a hidden register stack with executable
PUSHJ,PUSHGO, andPOP - all 32 MMIX special registers as named values for
GETandPUT - sparse byte-addressed memory with big-endian load/store behavior
- non-floating arithmetic including multiply/divide and the
rH/rR/rDside channels - floating-point register semantics, conversions, arithmetic, comparisons, and short-float loads/stores
rA-driven arithmetic status tracking with event bits, enable bits, and current rounding mode- the MMIX byte and bit-matrix operations (
BDIF/WDIF/TDIF/ODIF,MUX,SADD,MOR,MXOR) - the uncached/high-tetra/store-constant memory forms and basic
TRIP/TRAPstate shuffling - label resolution and relative branch/jump encoding
- a fuel-bounded small-step evaluator
The model still omits the rest of the privileged operating-system and virtual-memory behavior
beyond the current SAVE/UNSAVE/RESUME approximation, plus richer exception bookkeeping in
rA.
The [mmix| ... ] quasiquoter currently accepts:
- data movement and literal construction:
SET,SETL,SETML,SETMH,SETH,INCL,INCML,INCMH,INCH,ORL,ORML,ORMH,ORH,ANDNL,ANDNML,ANDNMH,ANDNH - floating-point operations:
FLOT,FLOTU,SFLOT,SFLOTU,FIX,FIXU,FADD,FSUB,FMUL,FDIV,FREM,FSQRT,FINT,FCMP,FEQL,FUN,FCMPE,FEQLE,FUNE - integer and bitwise operations:
ADD,ADDU,MUL,MULU,DIV,DIVU,ADDU2,ADDU4,ADDU8,ADDU16,LDA,SUB,SUBU,NEG,NEGU,OR,ORN,NOR,AND,ANDN,NAND,XOR,NXOR,SL,SLU,SR,SRU,CMP,CMPU - conditional register updates:
CSN,CSZ,CSP,CSOD,CSNN,CSNZ,CSNP,CSEV,ZSN,ZSZ,ZSP,ZSOD,ZSNN,ZSNZ,ZSNP,ZSEV - byte and matrix operations:
BDIF,WDIF,TDIF,ODIF,MUX,SADD,MOR,MXOR - loads:
LDB,LDBU,LDSF,LDHT,LDW,LDWU,LDT,LDTU,LDO,LDOU,LDUNC,LDVTS - stores:
STB,STBU,STSF,STHT,STCO,STUNC,STW,STWU,STT,STTU,STO,STOU - cache and prefetch hints:
PRELD,PREST,PREGO,SYNCD,SYNCID,SYNC - special-register and control flow instructions:
GET,PUT,GETA,JMP,GO,BN,BZ,PUSHJ,PUSHGO,POP,SAVE,UNSAVE,RESUME,TRIP,TRAP,BP,BOD,BNN,BNZ,BNP,BEV, plus probable variantsPBN,PBZ,PBP,PBOD,PBNN,PBNZ,PBNP,PBEV, andSWYM - labels written as
label:
The scaled-add instructions are spelled ADDU2, ADDU4, ADDU8, and ADDU16 in the
quasiquoter. This keeps the syntax robust inside Lean even though the MMIX manuals write the same
operations as 2ADDU, 4ADDU, 8ADDU, and 16ADDU.
- Unwritten memory reads back as zero.
- Memory accesses round the address down to the natural width alignment, following MMIX behavior for these operations.
Machine.run fuel program machinestops when either fuel is exhausted or execution falls off the end of the assembled code, at which point the machine is marked halted.- Branch and jump targets are resolved against assembled labels, and encoded using MMIX relative offsets.
KnuthMmix/Word.lean: MMIX word sizes, lane helpers, byte packing, comparisons, alignmentKnuthMmix/Register.lean: special-register enumeration and encodingsKnuthMmix/Memory.lean: sparse memory model plus signed and unsigned loads/storesKnuthMmix/Instruction.lean: instruction AST, label assembly, and opcode encodingKnuthMmix/Machine.lean: executable machine state and operational semanticsKnuthMmix/Syntax.lean:[mmix| ... ]quasiquoterKnuthMmix/Examples.lean: executable examples checked at build time
Build the package with:
uv run lake buildGitHub Actions runs the same Lean build on pushes and pull requests.
Major remaining gaps are:
- the rest of the operating-system-facing control layer beyond the current
SAVE/UNSAVE/RESUMEapproximation - richer privileged machine-state behavior around interrupts, emulation resumes, and protection faults
- full MMIX rounding-mode control and the remaining privileged machine details