Bytecode VM for the Lox language in Rust. Follows Crafting Interpreters Part III (clox) by Robert Nystrom, with some extensions.
Source text
└─ Scanner (src/scanner.rs)
│ tokenises lazily — no token Vec allocated upfront
└─ Parser (src/compiler/parser.rs)
│ one token of lookahead (current + previous)
│ panic-mode error recovery: suppress cascading errors until
│ next statement boundary (semicolon or statement keyword)
└─ Compiler (src/compiler/mod.rs)
│ Pratt parser — no AST, emits bytecode directly
│ stack of FunctionCompiler frames (one per fun { })
└─ Chunk (src/chunk.rs)
│ bytecode + constant pool + line table
└─ VM (src/vm.rs)
stack-based interpreter
No intermediate representation. The compiler is a recursive descent / Pratt parser that emits opcodes as it parses. The entire pipeline runs in a single pass over the source.
Pratt parser (src/compiler/rules.rs)
Each TokenType maps to a ParseRule { prefix, infix, precedence }. parse_precedence(min) drives the loop:
- Consume the next token; call its
prefixhandler (literals, unary, grouping, variable,switch, prefix++/--). - While the next token's precedence ≥
min, consume and call itsinfixhandler (binary ops,and,or, call, postfix++/--).
This correctly handles precedence and associativity without explicit grammar rules.
FunctionCompiler frame stack (src/compiler/frame.rs)
Each fun pushes a new FunctionCompiler onto Compiler::frames. It owns:
locals: Vec<Local>— slot 0 is a synthetic empty-name local reserving the function's own stack slot; user locals start at 1.scope_depth— incremented on{, decremented on}.jumps: Vec<(loop_depth, continue_target, Vec<break_patch>)>— tracks pendingbreak/continuepatches per loop nesting level.function: *mut Object— the heap-allocatedFunctionObjectbeing filled.
end_compiler() pops the frame and returns the completed FunctionObject pointer. The enclosing chunk stores it as a constant and emits OpClosure to wrap it at runtime. Functions are always closures at runtime.
Local variable resolution
resolve_local scans locals in reverse to find the innermost binding. Uninitialised locals are skipped with an error, preventing var x = x;. If not found locally, resolve_upvalue walks the enclosing compiler frames recursively — capturing the variable as an upvalue (emits OpGetUpvalue/OpSetUpvalue). If not found anywhere, falls back to a global by name (emits OpGetGlobal/OpSetGlobal).
Scope exit
end_scope / discard_locals counts how many locals are deeper than the target depth and emits either:
- One
OpPoporOpCloseUpvalueper local (normal scope exit) —OpCloseUpvaluefor captured locals,OpPopfor plain ones OpYield n— preserve the top value, pop the n locals beneath it (used byswitchcase blocks to return a value out of a scope)
Jump patching
Jumps emit 0xff 0xff as placeholder offsets, returning the patch site. patch_jump backfills the real 16-bit relative offset once the target address is known. emit_loop emits OpLoop with a backwards offset back to loop start. break and continue patch their respective targets at loop end / loop top.
Chunk {
code: Vec<u8> // bytecode
constants: Vec<Value> // constant pool (max 256 entries per chunk)
lines: Vec<usize> // parallel to code, one line per byte
constant_index: HashMap<(u8,u64), u8> // dedup: (tag, bits) → pool index
}
Constant deduplication keyed on (type_tag, f64::to_bits | ptr as u64) — identical number literals or already-interned string pointers reuse the same constant slot.
The Debug impl is a full disassembler: walks the bytecode and pretty-prints every instruction with offset, line, opcode, and operand.
Value
├─ Nil
├─ Bool(bool)
├─ Number(f64)
└─ Object(*mut Object) // raw pointer into the Heap
Object
├─ obj_type: ObjectType
│ ├─ String(String)
│ ├─ Function(FunctionObject)
│ ├─ Native(NativeFunction)
│ ├─ Array(Vec<Value>)
│ ├─ Closure(ClosureObject) // runtime wrapper around Function; holds upvalue slots
│ └─ UpValue(UpValueObject) // open: location → stack slot; closed: location → self.closed
├─ is_marked: bool // tri-color mark-and-sweep GC
└─ next: *mut Object // intrusive linked list through the Heap
Value is Clone — numbers/bools copy by value, Object copies the pointer. Equality on objects compares pointers (reference equality), which works correctly for strings because of interning.
Heap {
objects: *mut Object // head of the intrusive linked list
gray_stack: Vec<*mut Object> // worklist for tri-color marking
interned_strings: HashMap<String, *mut Object> // string intern table
}
allocate boxes the object, calls Box::into_raw, and prepends it to the list. Drop walks the list and reconstructs Box::from_raw to free each node. All heap allocation goes through Vm::allocate_object, which checks BYTES_ALLOCATED against gc_threshold before every allocation.
String interning: allocate_string checks heap.interned_strings before allocating. Duplicate strings get the same pointer, so == on string objects is a cheap pointer compare. remove_strings prunes the intern table during sweep — dead strings are removed before their memory is freed.
Mark-and-sweep GC (Vm::collect_garbage):
- Mark roots — stack values, call-frame closures, open upvalues, globals, and any in-flight compiler-allocated objects passed as
compiler_roots. - Trace —
heap.trace_references()drainsgray_stack, marking all objects reachable from each gray object (closure upvalues, function constants, array elements, upvalue closed-over values). - Prune intern table —
remove_strings()before sweep so dead string pointers are not left dangling. - Sweep — walks the object list; unmarked objects are freed, marked objects have
is_markedreset to false.
Byte-based threshold (src/allocator.rs): a custom #[global_allocator] wraps the system allocator and tracks every byte allocated/freed process-wide in a BYTES_ALLOCATED: AtomicUsize. gc_threshold starts at 1 MB and grows ×1.5 after each collection. Because all Rust allocations are counted — including Vec internals, String data, etc. — the threshold reflects real memory pressure rather than object headcount.
Vm {
stack: Vec<Value> // value stack (pre-allocated 256 cap — upvalue ptrs into it must not invalidate)
heap: Heap // owns all objects + intern table + GC state
globals: HashMap<String, Value>
call_stack: Vec<CallFrame>
open_upvalues: Vec<*mut Object> // all live UpValueObjects still pointing into the stack
gc_threshold: usize // collect when BYTES_ALLOCATED reaches this; starts 1 MB, grows ×1.5
}
CallFrame {
closure: *mut ClosureObject
ip: usize // index into closure.function.chunk.code
stack_base: usize // start of this frame's locals in Vm::stack
}
run() is the main dispatch loop. It reads opcodes via read_byte() which advances ip on the current frame. Locals are accessed as stack[stack_base + slot] — no separate locals array.
Call protocol
- Caller pushes the closure value, then all arguments.
OpCall arg_countresolves the value atstack[len - arg_count - 1].- For
ObjectType::Closure: push a newCallFramewithstack_base = stack.len() - arg_count - 1. The function's locals live in-place on the stack starting atstack_base. - For
ObjectType::Native: slice args directly off the stack, call the function pointer, truncate stack, push result — noCallFrameneeded. OpReturn: close all upvalues pointing into this frame (close_upvalues(stack[stack_base])), popCallFrame, truncate stack toframe.stack_base, push return value. If call stack is empty, program ends.
Closures and upvalues — every function is wrapped in a ClosureObject at runtime. Captured variables become UpValueObjects. Open upvalues hold a raw pointer into the stack; on scope exit (OpCloseUpvalue) or frame return, they are closed — the value is copied into UpValueObject::closed and location is redirected to point there. Deduplication in capture_upvalue ensures two closures capturing the same local share one upvalue object. Supports arbitrarily deep capture chains (inner → middle → outer). Vm::open_upvalues tracks all live open upvalues for GC root marking.
Error handling: runtime_error prints the message then walks call_stack in reverse to print a stack trace with file/function name and line number, then clears both stack and call stack so the REPL can continue.
| Opcode | Operands | Effect |
|---|---|---|
OpConstant |
u8 index |
push constants[index] |
OpNil / OpTrue / OpFalse |
— | push literal |
OpPop |
— | pop 1 |
OpPopN |
u8 n |
pop n |
OpDup |
— | duplicate top |
OpYield |
u8 n |
save top, pop n beneath it, push saved |
OpNegate / OpNot |
— | unary on top |
OpAdd / OpSubtract / OpMultiply / OpDivide |
— | binary on top 2 |
OpEqual / OpGreater / OpLess |
— | comparison → bool |
OpPrint |
— | pop + println |
OpDefineGlobal |
u8 name-constant |
pop → globals |
OpGetGlobal / OpSetGlobal |
u8 name-constant |
globals r/w |
OpGetLocal / OpSetLocal |
u8 slot |
stack[base+slot] r/w |
OpJump |
u16 offset |
ip += offset |
OpJumpIfFalse |
u16 offset |
ip += offset if top is falsey |
OpLoop |
u16 offset |
ip -= offset |
OpCall |
u8 arg_count |
call top-of-stack closure or native |
OpReturn |
— | close upvalues in frame, return top value to caller |
OpClosure |
u8 idx, then (u8 is_local, u8 slot)* |
wrap function constant in a closure, capture upvalues |
OpGetUpvalue / OpSetUpvalue |
u8 slot |
read/write upvalue slot in current closure |
OpCloseUpvalue |
— | copy top-of-stack into its upvalue's closed field, pop |
OpArray |
u8 n |
pop n values, heap-allocate array, push |
OpMakeArray |
— | pop length n, heap-allocate nil array of size n, push |
OpGetIndex |
— | pop index + array, push element |
OpSetIndex |
— | pop value + index + array, mutate, push value |
OpLen |
— | pop array, push length as number |
switch expression — expression-oriented: each case is a scoped block that yields a value. The yield keyword inside a case saves the value across end_scope via OpYield. Compiles to a chain of OpDup / OpEqual / OpJumpIfFalse comparisons, O(n) cases.
Prefix and postfix ++/-- — resolved to local/global get+set pairs with OpDup to preserve the pre-increment value for postfix semantics.
break and continue — tracked per loop in FunctionCompiler::jumps. break emits a forward OpJump patch site collected at loop end. continue emits a backward OpLoop directly to the continue target (before the increment clause in for). Both call discard_locals to clean up any locals declared inside the loop body.
Arrays — [1, "hello", true] literal syntax and array(n) for nil-filled pre-allocation. Index get arr[i] and set arr[i] = x are proper l-values. len(arr) returns element count. Heterogeneous — elements are Value, any type mix is valid. Bounds and type checked at runtime. Arrays are heap objects, freed at VM teardown with everything else.
Native function extension point — NativeFunction { arity, name, is_variadic, fun: fn(&[Value]) -> Result<Value, String> }. New natives register via get_native_functions(). Runtime type errors return Err(msg) which the VM surfaces as a runtime error. Currently: clock() (Unix time as f64), floor(n), mod(a, b), random(min, max).
- Classes and instances —
ObjClass,ObjInstance,OpGetProperty/OpSetPropertywith field hash map (ch. 27) - Methods and
this—ObjBoundMethod,OpInvokefast path, implicitthisas slot 0 (ch. 28) - Inheritance —
<syntax,ObjClass::superclass,OpGetSuper/OpInvokeSuper,superkeyword (ch. 29)
Fibonacci (recursion + timing)
fun fib(n) {
if (n < 2) return n;
return fib(n - 1) + fib(n - 2);
}
var start = clock();
print fib(30);
print clock() - start;
Closures / higher-order functions
fun makeCounter(start) {
var count = start;
fun increment() {
count = count + 1;
print count;
}
return increment;
}
var c = makeCounter(0);
c(); // 1
c(); // 2
c(); // 3
Switch expression
fun grade(score) {
return switch (floor(score / 10)) {
case 10 => { yield "A+"; }
case 9 => { yield "A"; }
case 8 => { yield "B"; }
default => { yield "F"; }
};
}
print grade(95); // A
print grade(72); // F
Loops with break / continue
var i = 0;
while (true) {
i++;
if (mod(i, 2) == 0) continue;
if (i > 9) break;
print i; // 1 3 5 7 9
}
for (var j = 0; j < 5; j++) {
print j;
}
Scoping and shadowing
var x = "global";
{
var x = "inner";
print x; // inner
}
print x; // global
Arrays
fun reverse(arr) {
var lo = 0;
var hi = len(arr) - 1;
while (lo < hi) {
var tmp = arr[lo];
arr[lo] = arr[hi];
arr[hi] = tmp;
lo++; hi--;
}
}
fun indexOf(arr, val) {
for (var i = 0; i < len(arr); i++) {
if (arr[i] == val) return i;
}
return -1;
}
var a = [5, 3, 1, 4, 2];
reverse(a);
print a; // [2, 4, 1, 3, 5]
print indexOf(a, 3); // 2
Native functions
print clock(); // Unix epoch seconds as f64
print mod(17, 5); // 2
print floor(9.7); // 9
cargo build --release
./target/release/rclox # REPL (vi keybindings, history via rustyline)
./target/release/rclox script.lox # run file
# exit 0 → ok
# exit 65 → compile error
# exit 70 → runtime error