Skip to content

refactor: Remove ARM64 and Compiler Implementation#53

Merged
siyul-park merged 19 commits into
mainfrom
feature/redesign_jit
Jun 8, 2026
Merged

refactor: Remove ARM64 and Compiler Implementation#53
siyul-park merged 19 commits into
mainfrom
feature/redesign_jit

Conversation

@siyul-park

@siyul-park siyul-park commented Jun 7, 2026

Copy link
Copy Markdown
Owner
  • Reworked the JIT architecture by moving the compiler/link/install flow into interp, removing the old standalone jit/ package and slot indirection.
  • Simplified the native callable ABI to a scratch-register argv contract, matching how the interpreter invokes JIT segments and whole-function entries.
  • Rebuilt ARM64 lowering around interpreter-owned frame, stack, fallback, branch, local/global, ref, numeric, and conversion handling.
  • Updated ASM frame/register rewriting support and architecture docs to match the new scratch-only JIT boundary.
  • Tightened boxed integer range checks and added boundary coverage for types.IsBoxable.

Changes Made

  • Removed the old jit package (Compiler, Lowerer, Module, Slots, segment call/result wrappers) and inlined the remaining JIT orchestration in interp/jit.go.
  • Added ARM64 lowering in interp/jit_arm64.go plus a non-ARM64 stub that disables native JIT cleanly.
  • Changed asm.Callable to accept scratch argv values directly and updated the ARM64 trampoline to load/store X10-X14 from that buffer.
  • Added frame layout support in asm.Frame / asm/arm64.Frame and expanded rewriter tests for spill/frame behavior.
  • Expanded interpreter JIT tests for refs, direct calls, fused-call release handling, globals, branches, i32/i64 arithmetic, float conversions, and fallback paths.
  • Updated docs for architecture, JIT internals, compatibility, value representation, and architecture-extension guidance.

Rationale

The old design split opcode lowering, segment linking, callable indirection, and interpreter installation across jit/, asm/, and interp/. That made frame ownership and fallback semantics harder to audit. This refactor keeps interpreter-specific behavior inside interp, while asm now owns only architecture-neutral assembly/linking and the scratch-register callable boundary.

Verification

  • go test ./...
  • go test -race ./...
  • go vet ./...
  • GOARCH=amd64 go test ./...
  • git diff --check origin/main...HEAD

CI Note

codecov/patch is expected to under-report this PR because the remote CI job runs on ubuntu-24.04 amd64, while the new native JIT and ARM64 trampoline paths are guarded or skipped off ARM64. The main Go test job passes and project coverage improves.

Related Issues

No linked issue.

Additional Information

This PR intentionally preserves interpreter fallback for unsupported or unsafe native paths. Non-ARM64 platforms keep running the threaded interpreter with JIT unavailable rather than failing startup.

- Deleted the ARM64 specific register initialization code.
- Removed the entire compiler implementation, including the Compiler struct and its methods.
- Eliminated the Lowerer interface and its Context struct, which were responsible for opcode lowering.
- Removed the Module struct and its associated methods for managing compiled functions.
- Deleted the Slots management code that handled indirection for callable entries.
- Cleaned up the segment handling and invocation logic, including the Call and Outcome types.
- Updated the boxed types to improve the range checks for boxable integers.
- Added new test cases for the IsBoxable function to cover edge cases for integer bounds.

@siyul-park siyul-park left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review — Refactor JIT Compiler: Remove ARM64 and Compiler Implementation

Decision

Request Changes

This PR should not merge in its current state due to insufficient test coverage of new JIT code and missing context about the refactoring rationale.

Merge Readiness Summary

The PR successfully removes the old jit/ package architecture and moves JIT compilation inline into interp/, with all local tests passing and the build succeeding. However, the new JIT implementation lacks adequate test coverage (5.67% vs 51.42% target), and the PR description provides insufficient context about why this major refactor was undertaken and what acceptance criteria it meets.

Blocking Findings

1. Insufficient Test Coverage for New JIT Code

  • Severity: Blocking
  • Evidence: Codecov reports only 5.67% of the diff is covered by tests (target: 51.42%). The new interp/jit.go (771 lines) and interp/jit_arm64.go (1535 lines) contain critical VM runtime code that generates native machine code.
  • Why it matters:
    • JIT is a high-risk component; bugs here cause VM crashes, data corruption, or silent incorrect results
    • This is a major refactor moving the architecture from a separate package to inline
    • 2,306 lines of new/heavily modified JIT code with low test coverage increases regression risk significantly
    • The project's stated target is 51.42% coverage for diffs
  • Minimal fix:
    • Add test cases that exercise the new JIT code paths in interp/interp_test.go
    • Focus on: opcode lowering logic, frame metadata handling, scratch register management, fallback paths
    • Run go test ./interp/... to verify coverage improves

2. Missing PR Context and Linked Issue

  • Severity: Blocking
  • Evidence: PR description lists what was deleted but provides no rationale, no linked issue, and no explanation of the new architecture's benefits
  • Why it matters:
    • This is a 5,494-line deletion of core functionality; reviewers need to understand why
    • Without context, it's unclear if this is an incremental step in a larger plan or a standalone refactor
    • No acceptance criteria are stated
  • Minimal fix:
    • Update PR description to explain: why this refactor improves the codebase, what architectural benefits it provides, what issue(s) it addresses
    • Link any related issue(s) (e.g., if this is part of a larger effort like #50)
    • State the acceptance criteria clearly

Important Findings

1. Overall Project Coverage Actually Improved

  • Evidence: codecov/project shows 55.87% coverage, a +4.44% improvement over the base
  • Implication: The refactor itself is not harmful to overall coverage; the issue is specific to untested new code paths

2. Mergeable State is "Unstable"

  • Evidence: GitHub reports mergeable_state: "unstable"
  • Status: This appears to be due to pending codecov checks, not a merge conflict; should resolve once coverage is addressed

Questions

  1. Is this refactor part of the larger JIT redesign effort from PR #50? If so, what does this PR add beyond that work?
  2. Are there specific JIT code paths that are intentionally not covered by tests (e.g., rare fallback scenarios)? If so, please document this in the PR description.

Readiness Checklist

  • Scope control: Pass — PR is focused on removing old architecture and moving to new inline approach
  • Correctness: Pass — Local tests pass, build succeeds, no broken functionality detected
  • Test coverage: Needs work — New JIT code (2,306 lines) has only 5.67% coverage; target is 51.42%
  • Architecture consistency: Pass — Documentation is updated; new architecture aligns with stated design
  • Risk / compatibility: Pass — No public API breakage detected; internal refactor
  • Maintainability: Pass — Code structure is reasonable; architecture is documented
  • CI / validation: Needs work — codecov/patch check failing due to insufficient coverage

Minimal Path to Merge

  1. Add test coverage for new JIT code paths:

    • Expand interp/interp_test.go with cases exercising opcode lowering, frame metadata, scratch registers
    • Target: increase diff coverage from 5.67% to ≥51.42%
    • Run go test ./interp/... to verify
  2. Update PR description to include:

    • Rationale: why this refactor improves the codebase
    • Context: link to related issues (if any) or explain as standalone effort
    • Acceptance criteria: what success looks like
  3. Re-run CI to verify codecov/patch passes

Final Recommendation

Fix blocking findings and resubmit. The PR is architecturally sound and local tests pass, but the new JIT code must have adequate test coverage before merge. Adding context to the PR description will also help reviewers understand the scope and rationale.


Generated by Claude Code

@codecov

codecov Bot commented Jun 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 7.36994% with 641 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.03%. Comparing base (6104a9a) to head (3a75dbf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
interp/jit.go 0.00% 413 Missing ⚠️
asm/rewriter.go 0.00% 132 Missing ⚠️
interp/interp.go 43.00% 49 Missing and 8 partials ⚠️
asm/arm64/frame.go 0.00% 20 Missing ⚠️
asm/arm64/abi.go 0.00% 12 Missing ⚠️
asm/arm64/arch.go 0.00% 3 Missing ⚠️
asm/assembler.go 0.00% 3 Missing ⚠️
interp/fuse.go 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
+ Coverage   51.42%   56.03%   +4.60%     
==========================================
  Files          68       62       -6     
  Lines       11700    10726     -974     
==========================================
- Hits         6017     6010       -7     
+ Misses       5086     4111     -975     
- Partials      597      605       +8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@siyul-park

Copy link
Copy Markdown
Owner Author

Design note: keep native execution across partial JIT call entries

The current direct-call JIT work proves that complete JIT entries can call each other with native BL, including recursive and mutually recursive calls. However, this still leaves an important performance gap: a function whose entry point is JIT-compiled as a partial segment cannot safely be reached by native BL today unless the whole callee is part of the same complete native component.

The target behavior should be broader:

  • If a callee has a JIT-compiled entry point, a native caller should be able to stay native and jump/call into that entry whenever possible.
  • If that callee later hits an unsupported opcode or guard fallback, the VM must restore enough interpreter frame state to continue from the fallback IP.
  • Recursive and SCC direct calls should remain eligible. The VM should not force a round trip through the threaded interpreter merely because the direct-call graph is recursive.

Current limitation

The existing native direct-call path runs nested calls on the host stack and does not push a real VM frame for each native callee. That works only while the whole call subtree stays native. It is unsafe for partial entries because fallback needs interpreter-visible frame metadata:

  • frame.addr
  • frame.ref
  • frame.bp
  • frame.ip
  • frame.returns
  • frame.code
  • frame.upvals
  • i.fp / i.fr
  • i.sp

The Go-side segment/entry wrappers can restore this metadata only at wrapper boundaries. A nested native callee entered by BL does not currently have a corresponding VM frame to restore into.

Proposed implementation

Make native direct CALL frame-aware.

  1. On native direct CALL, emit the same logical frame setup as threaded CALL, but in native code:

    • Check frame capacity before entering the callee.
    • Allocate/use i.frames[i.fp] for the callee frame.
    • Populate addr, ref, bp, ip = 0, returns, release, and enough metadata to recover code/upvals.
    • Advance i.fp / effective native frame depth.
  2. Enter the best available native callee entry:

    • Prefer a complete native entry when present.
    • Otherwise allow a JIT segment at ip=0.
    • If no native entry exists, fall back to threaded CALL.
  3. On native callee return:

    • Mirror threaded RETURN teardown.
    • Move return values to the caller stack slots.
    • Release frame.ref if needed.
    • Decrement i.fp / restore caller frame state.
    • Continue in native caller if possible.
  4. On native fallback/deopt inside a nested callee:

    • Materialize the native shadow stack into i.stack.
    • Store fallback target (addr, ip) plus trap kind in scratch/output state.
    • Return through the native call chain to the outer Go wrapper.
    • The wrapper restores i.fr from the real VM frame, calls restore(frame, addr), sets frame.ip, and resumes threaded execution at the fallback IP.
  5. Preserve existing guard semantics:

    • i64 overflow / heap-promoted i64 fallback must continue to run the threaded opcode that owns allocation/refcounting.
    • Ref/heap operations should stay interpreter-owned unless their retain/release behavior is explicitly lowered.
    • Frame overflow should still report ErrFrameOverflow, not host stack overflow.

Acceptance criteria

  • Direct caller -> callee remains native when the callee has a JIT entry, even if the callee is only partially compiled.
  • Recursive and mutually recursive direct calls remain eligible for native execution.
  • If a nested native callee falls back, interpreter frame fields are correct and execution resumes at the fallback IP.
  • i64 factorial-like workloads compile the function entry and stay native as long as possible, rather than dropping back to threaded dispatch at every call boundary.
  • Benchmarks should show the JIT path faster than threaded for recursive numeric functions once the callee entry is compiled:
    • recursive i32 fibonacci
    • recursive i64 factorial
  • Existing tests for frame overflow, i64 heap promotion, closure calls, host calls, and refcounting continue to pass.

Why this is the right direction

This keeps the policy simple: native execution should continue whenever the VM has a compiled entry and can prove/recover the interpreter state needed for fallback. The important invariant is not “complete JIT only”; it is “every native call that may fallback must have a recoverable VM frame”. Once that invariant holds, the JIT can keep execution native for the longest valid prefix and safely return to the interpreter only at real unsupported boundaries.

@siyul-park

Copy link
Copy Markdown
Owner Author

Follow-up design: i64 guard-free complete JIT vs native deopt/frame reconstruction

There are two related ways to close the remaining recursive numeric gap, especially for the i64 factorial-style benchmark where the function entry can be JIT-compiled but complete JIT is rejected because i64 operations emit guard fallback paths.

These are not mutually exclusive. The range/fact approach is a narrower Phase 1 optimization. Native deopt + VM frame reconstruction is the broader mechanism that also enables partial-entry native calls and maximal native execution across fallback boundaries.


Option A: i64 range/fact analysis for guard-free complete JIT

Goal

Allow complete JIT for i64 code when the compiler can prove that selected i64 operations stay inside the inline 49-bit boxed i64 range and therefore do not need a native fallback path.

Today, i64 lowering emits fallback for cases such as:

  • arithmetic result outside the 49-bit inline range
  • heap-promoted i64 operands loaded from locals/globals
  • division by zero guards

Because complete JIT rejects reachable native fallback, these fallback-capable i64 ops can prevent whole-function entry compilation even when the concrete benchmark values are safe.

Proposed model

Add a lightweight fact/range analysis for JIT compilation:

  • Track integer intervals for i64 values where possible.
  • Track whether an i64 value is definitely inline boxed (KindI64) rather than heap-promoted (KindRef).
  • Propagate facts through simple ops:
    • I64_CONST
    • LOCAL_GET / LOCAL_SET / LOCAL_TEE
    • I64_ADD, I64_SUB, I64_MUL
    • comparisons and simple branch predicates
  • Use branch predicates to narrow ranges across basic blocks, especially patterns like:
    • n <= 1 ? return 1 : n * f(n - 1)
    • n < 2 ? return n : f(n-1) + f(n-2)

Lowering change

When facts prove safety:

  • Emit a guard-free i64 op variant.
  • Do not call finishI64 with fallback emission.
  • Keep complete JIT eligible because ctx.fallback remains false.

When facts are absent or too weak:

  • Keep the existing guarded lowering.
  • Segment mode remains safe because guard fallback is already supported.
  • Complete mode still rejects the guarded path.

Scope limits

Keep this deliberately small at first:

  • Only handle i64 locals/params/constants, not globals/ref slots.
  • Only prove common recursive numeric cases.
  • Do not attempt full abstract interpretation of all bytecode.
  • Bail out on joins where intervals cannot be merged cleanly.

Acceptance criteria

  • Recursive i64 factorial complete-JITs without guard fallback for safe input ranges.
  • Recursive i32 fibonacci remains complete-JIT and unchanged.
  • Heap-promoted i64 cases still fallback correctly in segment mode.
  • Unsafe i64 programs do not silently wrap or skip heap promotion.
  • Benchmarks show i64 factorial JIT faster than threaded when the proven range applies.

Tradeoff

This is the smaller implementation and likely improves the exact i64 benchmark quickly. It does not solve native-to-partial-entry calls, nested fallback, or general “stay native as long as possible” behavior.


Option B: native deopt + VM frame reconstruction

Goal

Make native direct calls safe even when the callee is only partially JIT-compiled or may fallback later. This lets the JIT keep execution native for the longest possible prefix and return to the interpreter only at real unsupported boundaries.

This generalizes the earlier direct-call design: the key invariant should be “every native call that can fallback has a recoverable VM frame,” not “only complete JIT calls can use native BL.”

Proposed model

Native direct CALL should become VM-frame-aware:

  1. On native direct CALL, create a real VM frame record, mirroring threaded CALL:

    • check frame capacity
    • populate frame.addr, frame.ref, frame.bp, frame.ip, frame.returns, frame.release
    • update i.fp / active frame state
    • keep enough state to restore frame.code and frame.upvals
  2. Enter the best native target available:

    • complete entry if available
    • otherwise entry segment at ip=0
    • otherwise threaded fallback
  3. On native RETURN, mirror threaded RETURN:

    • move return values to caller slots
    • release callee ref when needed
    • pop VM frame
    • restore caller frame state
    • continue native caller when possible
  4. On native fallback/deopt:

    • materialize shadow stack into i.stack
    • write fallback (addr, ip, trap kind) to scratch/output state
    • unwind native call frames back to the Go wrapper
    • restore i.fr from the VM frame stack
    • call restore(frame, addr)
    • set frame.ip to fallback IP
    • resume threaded execution at that opcode

Trap kinds

Use explicit trap kinds rather than overloading only one bit:

  • normal native exit / next IP
  • guard fallback at (addr, ip)
  • frame overflow
  • divide by zero / runtime panic path if needed

This will make nested deopt easier to reason about than relying on one scratchFallback bit.

Acceptance criteria

  • A native caller can call a callee’s JIT entry even if that callee is not complete-JIT eligible.
  • If the callee falls back, interpreter-visible frame fields are correct.
  • Recursive/SCC direct calls remain eligible for native execution.
  • i64 heap promotion fallback resumes correctly from inside a nested native callee.
  • Frame overflow still reports ErrFrameOverflow.
  • HostFunction, Closure, and ref-heavy calls remain interpreter-owned unless separately lowered.

Tradeoff

This is the more complete and more invasive design. It solves the general “maximal native prefix” problem and subsumes the partial-entry call design. It also reduces the need to prove every i64 path statically, because unsafe paths can deopt safely.


Recommended path

Implement both in phases:

  1. Phase 1: i64 facts/range analysis

    • Small, targeted, and likely enough to make recursive i64 factorial complete-JIT and faster than threaded.
    • Keeps existing fallback model unchanged.
  2. Phase 2: native deopt + VM frame reconstruction

    • General solution for native calls into partial entries.
    • Enables “stay native as long as possible” across recursive/SCC call graphs.
    • Makes future JIT coverage less brittle because unsupported paths can safely return to the interpreter.

If implementation effort must be spent only once, prefer Option B. It is the standard long-term mechanism. Option A is best as a quick, low-risk performance win for proven numeric kernels.

@siyul-park

Copy link
Copy Markdown
Owner Author

Clarification: integrate Option B into the partial-entry design

Option B from the follow-up note should not be treated as a separate competing design. It is the concrete implementation strategy for the partial-entry native-call design above.

The integrated design is:

  1. Native direct CALL becomes VM-frame-aware

    • Native CALL must create/update a real VM frame, not only use the host stack.
    • This makes fallback from a nested native callee recoverable.
  2. Native callers may enter any compiled callee entry

    • Complete entry if available.
    • ip=0 JIT segment if only the function entry is partially compiled.
    • Threaded CALL only when there is no safe native entry.
  3. Fallback/deopt unwinds to the Go wrapper with enough state

    • trap kind
    • target function/template addr
    • fallback IP
    • materialized stack/SP/BP
    • VM frame stack already populated enough for restore(frame, addr)
  4. Interpreter resumes from the fallback IP

    • The wrapper restores i.fr, frame.code, frame.upvals, and frame.ip.
    • The original threaded handler owns heap promotion, refcounting, host/closure behavior, and other unsupported work.

With this invariant, “complete JIT only” is no longer the boundary for native calls. The boundary becomes: native may continue as long as every possible fallback has a recoverable VM frame.

Where i64 range/fact analysis fits

The i64 range/fact design remains useful, but as an optional fast path rather than the core mechanism.

  • It can make some i64 complete-JIT paths guard-free.
  • It can reduce deopt frequency in hot numeric kernels.
  • It should not be required for correctness.

The general path should be native deopt + VM frame reconstruction. Then i64 overflow / heap-promoted operands can safely fallback when facts are not strong enough, while proven-safe i64 paths can still skip guards for speed.

Practical phase split

  1. Implement native VM-frame-aware CALL + deopt state restoration.
  2. Allow native callers to enter compiled ip=0 callee entries.
  3. Keep current i64 guards and verify nested fallback correctness.
  4. Add i64 range/fact analysis only after the deopt path is correct, as a performance optimization.

This keeps the main design unified with the earlier partial-entry proposal and avoids making i64 facts a prerequisite for preserving native execution across calls.

@siyul-park

Copy link
Copy Markdown
Owner Author

Frame-aware native direct calls + nested deopt (PR #53 Option B)

Context

PR #53 design notes (comment 4642665810 + follow-ups 4642692961) ask for one
invariant: every native call that may fall back must have a recoverable VM
frame.
Once that holds, the JIT can keep execution native across recursive /
SCC direct calls and into partially compiled callee entries, deopting to the
threaded interpreter only at real unsupported boundaries — instead of today's
"complete-JIT-only" rule that forces a threaded round trip whenever any reachable
opcode cannot lower.

Today (verified)

  • Native direct CALL is arm64JIT.call (interp/jit_arm64.go:473),
    reached only in complete mode where ctx.framed == true. It fuses
    CONST_GET+CALL into BL target.label, manages BP/SP in scratch regs + the host
    stack, and pushes no real VM frame.
  • scratchNext is dual-use: input = frame budget len(i.frames)-i.fp
    (interp/interp.go:718); output = nextIP | flag bits
    (scratchFallback 1<<63, scratchFrameOverflow 1<<62).
  • complete/walkFull (interp/jit.go:187,
    interp/jit.go:420) reject the whole component if any
    function sets ctx.fallback → "complete only."
  • Wrappers entry / segment (interp/interp.go:629,
    interp/interp.go:654) restore one frame on fallback.
    Nested native callees have no frame to restore into.
  • Hard constraint: ABI = 5 scratch regs X10–X14
    (asm/arm64/abi.go:23) and scratchCount == 5
    (interp/jit.go:101) — all slots used, none free.

Decisions (from user)

  1. Implement the full integrated design (phases 1–3).
  2. Get frames/fp into native by repurposing the scratchNext slot into a
    pointer to an Interpreter-owned []uint64 "frame journal"
    — no asm package
    change. amd64 unaffected (arm64 is the only backend; interp/jit_stub.go).
  3. Unify all native direct calls on the frame-aware (journal) path; no
    register-only fast path.

The frame journal

Add journal []uint64 to Interpreter, allocated once in New sized from
opt.frame (header + jStride * (frame+1)). scratch() writes &i.journal[0]
into the scratchCtrl slot (renamed from scratchNext) and initializes the
header.

Flat layout (indices, all uint64):

  • jDepth (0) — native frame depth; native r/w.
  • jCap (1) — budget = len(i.frames) - i.fp; read-only. CALL overflow check
    mirrors threaded i.fp == len(i.frames) exactly.
  • jTrap (2) — out: trap kind enum trapNone | trapFallback | trapOverflow.
  • jNextIP(3) — out: resume / fallback IP for the single-frame (depth 0) path.
  • records from jHead (4), stride jStride = 4: {addr, bp, ip, returns}.

Invariant

record[0] always maps to the outermost frame, which is the real frame at
i.frames[i.fp-1] that the Go wrapper entered native through — on deopt only its
ip is reconciled. record[k>=1] each become new frames pushed at
i.frames[i.fp-1+k].

Native protocol

  • Direct CALL by caller at active depth d:
    1. if jDepth >= jCap → write jTrap=trapOverflow, jNextIP=callIP, unwind.
    2. write record[jDepth] = {ownAddr (const), scratchBP, callIP+4, ownReturns (const)}.
    3. jDepth++.
    4. set callee BP/SP (as today), BL.
    5. post-BL: if jTrap != trapNone unwind (RET up the host chain, untouched
      records); else jDepth-- and continue.
  • Deopt (exitFallback / overflow) by active innermost frame: write a
    self-record record[jDepth] = {ownAddr, scratchBP, fallbackIP, ownReturns},
    jDepth++, set jTrap, materialize shadow stack to i.stack, RET. The host
    trap-check chain unwinds to the Go wrapper leaving all records intact.
  • Normal exit / RETURN: unchanged in spirit; results land via i.stack +
    scratchSP; depth stays balanced.

Go-side deopt reconstruction (in the entry/segment wrapper)

After Call returns, if jTrap != trapNone:

  1. i.sp = scratchSP.
  2. frames[i.fp-1].ip = record[0].ip; restore(&frames[i.fp-1], record[0].addr).
  3. for k = 1..jDepth-1: fill frames[i.fp-1+k] from record[k]
    (addr,bp,ip,returns; i.retain(addr), ref=addr, release=true to match
    threaded CONST_GET+CALL refcount semantics), restore(...).
  4. i.fp += jDepth-1; i.fr = &frames[i.fp-1].
  5. trapOverflowpanic(ErrFrameOverflow); trapFallback → run the threaded
    handler i.fr.code[i.fr.ip] (segment) / fallback[...] (entry) at the
    innermost IP and resume.

The retain in step 3 reproduces the retain that the fused-away CONST_GET would
have done, so release on the eventual threaded RETURN balances.

Phased work

Phase 1 — ABI reshape, no behavior change

  • interp/jit.go: rename scratchNextscratchCtrl; drop the
    scratchFallback/scratchFrameOverflow bit consts (replace with journal trap
    enum). scratchCount stays 5.
  • interp/interp.go: add journal field + New allocation;
    rewrite scratch() to publish &journal[0] and seed jDepth/jCap; rewrite
    entry + segment wrappers to read jTrap/jNextIP/jDepth + records from the
    journal instead of the scratchNext register.
  • interp/jit_arm64.go: exit/exitIP/exitFallback/
    exitOverflow write trap+IP into journal memory (via the scratchCtrl
    pointer) instead of pinning rNext. Keep call's existing single-frame
    semantics for now but source budget from jCap/jDepth in memory.
  • Gate: full go test -race ./interp/... green — pure plumbing swap.

Phase 2 — frame-aware CALL + nested deopt

  • Rewrite arm64JIT.call (interp/jit_arm64.go:473)
    to the native protocol above (push/pop journal records around BL; overflow
    and trap via journal). Extract intent-named helpers
    (pushRecord, popRecord, selfRecord) per docs/coding-patterns.md §1.1 so
    call reads as a short narrative.
  • exitFallback writes a self-record before trapping.
  • Wrappers gain the multi-frame reconstruction loop (above).
  • Targets still limited to fully-lowerable component funcs (so only depth bumps,
    no partial bodies yet) — lets nested deopt be validated in isolation, e.g. an
    i64 heap-promotion fallback inside a nested callee.

Phase 3 — enter partial ip=0 entries

  • Generalize complete/walkFull (interp/jit.go:187,
    interp/jit.go:420): instead of rejecting on ctx.fallback
    or an unlowerable block, emit a frame-aware deopt exit at the first unsupported
    op / non-block-start branch and keep the rest of the component native. Every
    eligible component function still gets a framed, labeled ip=0 entry — partial
    or whole — so call's BL target.label reaches partial callees and they deopt
    safely.
  • call returns false (→ threaded CALL at that site, itself a safe deopt) when
    the target is ineligible or has no compiled entry. Mixed native/threaded call
    graphs become legal because the call site can now deopt.
  • Eligibility (interp/jit.go:278) unchanged: ref
    params/locals/captures still bar a function from native; "partial" means an
    eligible function whose body has a guard/unsupported op (e.g. i64 overflow,
    heap-promoted i64 load).

Phase 4 — tests & benchmarks

  • Extend TestInterpreter_JIT / the JIT subtests in
    interp/interp_test.go (one t.Run per public symbol,
    per docs/coding-patterns.md §6.3 — no new top-level tests for covered
    symbols):
    • nested native callee deopt (i64 heap promotion) resumes at the correct IP
      with correct frame fields;
    • recursive i32 fib + recursive i64 factorial stay native across calls and
      return correct values (3628800 / 6765 anchors already at
      interp/interp_test.go:2363);
    • frame overflow still reports ErrFrameOverflow not host-stack overflow
      (existing WithFrame(2) test interp/interp_test.go:2996);
    • closure / host-function / refcount tests still green (interpreter-owned).
  • Benchmarks: JIT faster than threaded for recursive fib (i32) and factorial
    (i64) once the entry is compiled.

Risks

  • Native↔Go struct/offset coupling is confined to the flat journal []uint64
    (no Go slice headers written from native) — the chosen low-risk shape.
  • journal backing array must not move; consistent with the existing
    &i.stack[0] raw-pointer assumption (Go GC is non-moving here).
  • No GC can fire mid-native (eligible funcs never alloc); frames become GC roots
    only after Go-side reconstruction — safe.
  • Refcount balance hinges on the step-3 retain; covered by the refcount tests.

Verification

  • go test -race ./interp/... ./asm/... green after each phase.
  • go test -run TestInterpreter_JIT -race ./interp/ for the new deopt cases.
  • go test -bench 'Fib|Fact' -benchmem ./interp/ shows JIT < threaded for the
    two recursive kernels.
  • Manual: i.jit(addr) then i.Run on the factorial/fib programs above and
    assert returned values + that prof JIT counters are non-zero.

@siyul-park siyul-park merged commit fd1437f into main Jun 8, 2026
4 of 5 checks passed
@siyul-park siyul-park deleted the feature/redesign_jit branch June 8, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant