Skip to content

JIT Coverage Expansion: Indirect Calls, Closures, Global Refs, Heap Object Ops #60

@siyul-park

Description

@siyul-park

The ARM64 JIT (interp/jit.go driver + interp/jit_arm64.go lowerer) currently compiles ~90 scalar opcodes but rejects: indirect CALL (only fused CONST_GET;CALL direct calls compile), functions with captures (closures, UPVAL_GET/SET), ref-typed globals/locals/params/returns (guard-fallback because retain/release is unavailable natively), and all heap object ops (ARRAY/STRUCT/MAP/REF_GET/SET — hard rejects that drop whole components to segment mode). This plan nativizes those four areas in dependency order.

Concurrency constraint: another agent is editing interp/pool_test.go, prof/prof.go, prof/prof_test.go. This plan touches NONE of those. Total foreign-file diff: interp/interp.go ~5 lines (scratch() +3, entry() 2-line reorder); interp/threaded.go zero. Everything else lands in interp/jit.go, interp/jit_arm64.go, and three NEW files. Tests are append-only subtests in interp/interp_test.go (not in the other agent's file set).

Core design decisions

D1 — No native→Go calls. Inline fast paths + guard fallback; allocation always falls back.

Native code keeps the existing contract: no FFI, exit via journal only. Rationale:

  • retain = i.rc[addr]++ (interp.go:1120) — 3 instructions inline.
  • release is complex only at rc 1→0 (free list, recursive child release). Guard rc >= 2 BEFORE decrement; otherwise exitFallback(ip) and the interpreter re-runs the opcode (same shape as existing guardI64).
  • Allocation (REF_NEW, *_NEW, CLOSURE_NEW, i64 heap promotion) can trigger gc()/slice growth, invalidating base pointers → always exitFallback; re-entry through scratch() (interp.go:935) refreshes all base pointers.
  • BLR-to-Go rejected (Go ABI/stack/GC unsafe below the NOSPLIT trampoline); ABI0 thunks rejected (leaf-only = inlineable anyway).

D2 — Base pointers via new journal header cells (trampoline ABI unchanged)

Scratch registers X10–X14 are full (maxScratch = 5, asm/arm64/abi.go); X15 pinned. Add three journal header cells (native already addresses journal via X14):

journalRC      // &i.rc[0]            (Phase 1)
journalHeap    // &i.heap[0]          (Phase 4)
journalUpvals  // &i.fr.upvals[0]|0   (Phase 3)
journalHead    // shifts 6 → 9

Constants in interp/jit.go; New() already sizes journal from journalHead. scratch() fills the cells (+3 lines in interp.go — only edit there besides the Phase 3 reorder).

D3 — RC correctness invariants (review checklist for every lowering)

  • I1 mirrored ownership: each shadow-stack VReg holds exactly the rc its materialized VM slot would. RC effects emitted eagerly at the same opcode position as the threaded handler. Any exit kind then needs no rc fixup.
  • I2 guards before side effects: all guards (rc>=2, tag, bounds) fire before any store/rc mutation; restore pre-op shadow stack on the guard path (existing pre := append(...) pattern). Interpreter re-runs the opcode from scratch.
  • I3 native never frees: rc never goes 1→0 natively (also avoids a permanent leak vs mark-sweep backup, which only frees rc<0 entries).
  • I4 no native allocation: within one native dwell, heap/rc/stack/globals/upvals backing arrays cannot move — so per-entry base pointers are safe.

D4 — Indirect-call IC: compile-time monomorphic/polymorphic compare chain (no mutable cache)

asm.Buffer is sealed W^X after Link → patchable IC impossible. Instead:

  • Boxed callee payload = heap index; *types.Function constants are rooted in the constant pool (via keep in New(), never released) → heap addr is a stable identity. One CMP callee, #BoxRef(addr) validates tag+identity.
  • Candidates: extend jitCompiler.calls() (jit.go:632) with values()CONST_GET of a function constant NOT fused with CALL (function flowing through locals/params). Cap chain at 4; all candidates must share (params, returns) arity. Miss → exitFallback(ip); threaded CALL (threaded.go:144) handles Closure/HostFunction/unknown.
  • No invalidation needed: constants immutable, code immutable, pool members build identical constant heaps. Record invariant in docs + pool subtest.
  • Callee-ref ownership: release callee ref BEFORE the BL, guarded rc >= 2 (fallback otherwise — nothing committed). Safe because on IC hit the callee is the pool-rooted constant function. deopt() rebuilds frames with release = false (interp.go:890-899) so mid-call yield/trap never double-releases.

Phase 1 — Native refcounting + ref globals/locals/signatures

NEW interp/jit_rc_arm64.go (methods on the arm64 lowerer type, per coding-patterns §1.5):

  • rcBase(ctx)LDR base, [vCtrl, journalRC*8]
  • retainRef(ctx, v) — payload-extract addr, LDR rc, [base, addr, LSL 3], ADDI, STR (no guard)
  • releaseRef(ctx, v, pre) — load rc, CMPI rc,1; B.GT ok; guard path restores pre + exitFallback(ctx.ip); ok: SUBI+STR
  • retainBoxed/releaseBoxed — runtime ref-tag test wrapping the above, for compile-time-unknown kinds

Shadow-stack kind tracking (interp/jit.go): kinds map[int32]types.Kind on jitContext (VReg ID → static kind; absent = unknown). Populated at push sites (localGet from ctx.locals, globalGet from compile-time global kind, constGet, imm, refNull). Rule: statically Ref → retainRef/releaseRef; unknown → boxed variants; statically scalar → nothing (zero regression).

interp/jit_arm64.go changes (exact functions):

  • globalGet (:704): drop value guardRef; emit retain for ref/unknown. Drop KindRef reject in global() (:1787), keep bounds + offset checks.
  • globalPut (:725): replace guardRefs with: load old; CMP old,src; B.EQ skip; guarded releaseBoxed(old, pre) BEFORE the STR (mirrors threaded.go:309-312 old != val test). Threaded GLOBAL_SET does not retain val (ownership transfer) — neither does native.
  • localGet/localPut (:756/:777): remove KindRef rejects; ref get → load+retain; ref set → guarded release of old (with old!=val skip), store.
  • drop (:443): release ref/unknown before discard (threaded.go:49). dup (:458): retain (threaded.go:64).
  • refNull (:1334): add retain(0) — fixes pre-existing asymmetry vs threaded.go:622 (i.retain(0)).

interp/jit.go: eligible() (:289) — delete the three KindRef loops (Params/Returns/Locals). Keep Captures reject until Phase 3. Ref returns need nothing extra (ret :1297 transfers ownership like threaded RETURN).

interp/interp.go (+3 lines): scratch() fills journalRC (zero-fill the other two cells until their phases). rc never empty (alloc(types.Null) at New).

Phase 2 — Indirect calls

NEW interp/jit_call_arm64.go: move existing call (jit_arm64.go:534-654) here; refactor:

  • descend(ctx, target, consumed) — shared framed-call sequence (budget check vs X15 → trapOverflow, journalActive bump, materializeSP, save caller bp/sp, compute callee bp/sp, BLLabel, trap-unwind with record, restore + return-value reload). consumed distinguishes direct (params; resume ip+4) from indirect (params+1 incl. callee slot; bp = sp−params−1; returns overwrite callee slot like threaded RETURN via i.stack[f.bp]; resume ip+1).
  • call(ctx) — existing fused form → descend(ctx, target, params).
  • callIndirect(ctx) bool — requires ctx.framed + candidates:
    1. per candidate: LDI tmp, BoxRef(addr); CMP callee,tmp; B.EQ hit_k
    2. fall-through = miss: restore pre-op stack, exitFallback(ctx.ip)
    3. each hit: guarded pre-release of callee ref (rc>=2 else fallback), pop callee from shadow, descend(ctx, target_k, params+1)
    4. identical post-call shadow shape across candidates (same arity) → block merge unaffected.

interp/jit.go: new values(i, fn) next to calls() (:632); component() (:613) unions calls+values; bound component size (~8 fns) — over budget → no candidates → CALL rejects as today. Lower dispatch: case instr.CALL: return ctx.framed && l.callIndirect(ctx) (replaces unconditional false at jit_arm64.go:277). Also lower constGet for non-String ref constants (load imm + retainRef) so function values stored to locals compile.

Note: main code (addr 0) never compiles framed → indirect calls nativize only inside functions (document).

Phase 3 — Closures

Nativize closure BODIES, not closure calls. Compiled entry installed at i.code[fn.Fn][0]; threaded CALL Closure arm (threaded.go:181-210) builds the frame and the Run loop dispatches into native. Closure callsites in native code stay IC-miss → fallback. All dispatch complexity remains in threaded.go.

  • Upvals base: scratch() fills journalUpvals = &i.fr.upvals[0] (0 if nil). In entry() (interp.go:820) move i.fr.code = nil; i.fr.upvals = nil clearing to AFTER i.scratch() (2-line reorder — only interp.go edit this phase). segment() already calls scratch with fr intact.
  • Lowerings (in jit_rc_arm64.go, rc-coupled): upvalGet — load base from journalUpvals, compile-time bounds check vs len(fn.Captures), LDR [base, idx*8], retain per static capture kind, i64 boxability guard (same heap-promotion rule as locals). upvalSet — guarded release of old (old!=val skip), STR. Wire UPVAL_GET/SET into lower.
  • Eligibility: remove Captures reject (jit.go:308). Constraint: capture-bearing functions must never be BL targets (stale upvals base — journalUpvals set once per dwell). Enforce in call/callIndirect: skip candidates with len(Captures)>0. Re-entry segments at loop headers work unchanged (entered from Go wrapper with correct frame).
  • CLOSURE_NEW allocates → bail (Phase 4 helper) in framed mode, reject in segment mode.
  • Deopt: restore() (interp.go:911) already re-derives upvals from frame.ref → no change.

Phase 4 — Heap object ops

NEW interp/jit_heap_arm64.go:

  • scratch() fills journalHeap (+1 line interp.go).
  • Layout constants derived, not hard-coded: unsafe.Offsetof/Sizeof at JIT-compile time — interface elem = heapBase + addr*16 (itab +0, data +8; compile-guard test asserts Sizeof(types.Value)==16); itab identity immediates per concrete type via (*[2]uintptr)(unsafe.Pointer(&v))[0]; slice header {data+0, len+8}; Struct.Data/Typ/StructField.Kind offsets (all exported — reachable).
  • object(ctx, ref, wantItab, pre) — shared deref: addr extract, load itab, compare immediate, mismatch → restore+exitFallback; returns data word.

Lowerings:

  • ARRAY_LEN/GET/SET (FILL optional): itab chain over five TypedArray[T] + *types.Array; bounds check from header len (out-of-range → guard fallback, interpreter raises ErrIndexOutOfRange); scaled LDR/STR; per-kind box/unbox via existing helpers; i64 elements get boxability guard. *types.Array set releases old elem (guarded), get retains. Each op ends with guarded release of the array ref (mirrors threaded.go:2298/2353) — release guard hoisted before first store per I2.
  • STRUCT_GET/SET: confirm *types.Struct itab (HostObject → fallback); runtime kind dispatch via Fields[idx].Kind 5-way branch (I32/F32/F64 raw box; I64 boxability guard; Ref retain/guarded-release). Ends with guarded release of struct ref. Optional: fold preceding I32_CONST idx to skip runtime kind load.
  • REF_GET/SET: itab chain over types.I32/I64/F32/F64. Caveat: ref operand often holds the last count (temporaries) → rc>=2 guard misses often; win is shared cells. Deferred-release queue = future follow-up, out of scope.
  • bail(ctx) lowering: unconditional exitFallback(ip) + stop/closed (shape of unreachable(), jit_arm64.go:451) for REF_NEW, ARRAY_NEW*, STRUCT_NEW*, MAP_NEW*, CLOSURE_NEW, all MAP_*. Converts today's hard rejects into kept guard-fallbacks → a function with one cold MAP_GET still compiles whole. Maps stay interpreted by design.

Eligibility/lower matrix

Phase eligible() (jit.go:289) lower() rejects removed
1 drop ref Params/Returns/Locals loops global/local ref guards
2 CALL (framed+candidates); ref constGet
3 drop Captures reject UPVAL_GET/SET
4 ARRAY_GET/SET/LEN, STRUCT_GET/SET, REF_GET/SET; allocs+MAP → bail

Testing

Append-only subtests in interp/interp_test.go under existing TestInterpreter_JIT (~:3909) — honors §6.3 one-test-per-symbol, zero conflict with the other agent. Pattern: threaded/JIT parity via WithTick(1),WithCutoff(1),WithThreshold(1) vs WithThreshold(-1), runtime skip on non-arm64.

  • P1: ref global get/set/tee hot loop native (JIT emit counters via prof read-only API); release-to-zero overwrite falls back + frees (slot reused); ref local round-trip; ref param/return through direct native call; DROP/DUP rc exactness; promoted-i64 still guards.
  • P2: monomorphic indirect call native + parity; 2-candidate polymorphic; HostFunction/Closure miss falls back per-call; deopt inside IC-called callee; ErrFrameOverflow through indirect call; pool: two cache-attached members run IC code (subtest in interp_test.go, NOT pool_test.go).
  • P3: closure body upval loop native via threaded dispatch; mutation visible after return; ref upval rc balance; yield+re-entry inside closure loop; deopt restores upvals.
  • P4: typed array sum per kind; ErrIndexOutOfRange parity; Array ref-element rc balance; struct fields per kind incl. ref; REF_GET shared vs temporary; map-using fn compiles whole with bail; layout compile-guard test (Sizeof(types.Value)==16, offsets in range).

Benchmarks (benchmarks/): indirect-call fib-via-local, closure counter loop, typed-array sum — before/after.

Risks

  1. Go internal layout coupling (P4) — itabs/interface/slice headers. Mitigated: unsafe-derived offsets + compile-guard tests; phase is last and severable.
  2. IC megamorphism — each miss = full exit/re-enter. Bounded (cap 4, arity filter). Telemetry belongs to other agent's prof work — leave TODO, do NOT add prof fields now.
  3. Code-size growth — itab chains + per-candidate sequences; check asm.Buffer growth handling (starts 4096) or raise initial size.
  4. Pool constant-addr invariant — IC bakes heap indices; safe because pool members build identical constant heaps. Document + pool subtest.
  5. Journal shift — journalHead 6→9 shifts record area; recordAt STP offsets are constant-driven; re-verify int16 STP range.
  6. Concurrent agent — foreign diff: interp.go ~5 lines, threaded.go 0, pool_test/prof 0.

Docs (lockstep per §8)

docs/jit-internals.md: journal table (+3 cells), Globals section (ref support), CALL Boundaries (indirect IC protocol + ownership), coverage lists (Phase C/D), pool constant-addr invariant.

Verification

Per phase: go build ./... && go vet ./...; go test ./interp/ -run 'TestInterpreter_JIT' -count=1 on arm64 mac (this machine); full go test ./...; benchmark before/after via go test -bench in benchmarks/. RC exactness verified by the release-to-zero/reuse subtests; deopt by yield/promotion subtests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions