10 releases (5 breaking)

Uses new Rust 2024

new 0.9.0 May 15, 2026
0.8.0 Apr 21, 2026
0.7.1 Apr 5, 2026
0.7.0 Mar 23, 2026
0.4.0 Mar 6, 2026

#318 in Programming languages

MIT license

1MB
19K SLoC

WASM-PVM: WebAssembly to PolkaVM Recompiler

WARNING: This project is largely vibe-coded. It was built iteratively with heavy AI assistance (Claude). While it has 670 passing integration tests and produces working PVM bytecode, the internals may contain unconventional patterns, over-engineering in some places, and under-engineering in others. Use at your own risk. Contributions and proper engineering reviews are very welcome!

A Rust compiler that translates WebAssembly (WASM) bytecode into PolkaVM (PVM) bytecode for execution on the JAM (Join-Accumulate Machine) protocol. Write your JAM programs in AssemblyScript (TypeScript-like), hand-written WAT, or any language that compiles to WASM — and run them on PVM.

WASM  ──►  LLVM IR  ──►  PVM bytecode  ──►  JAM program (.jam)
      inkwell    mem2reg       Rust backend

Getting Started

Prerequisites

  • Rust (stable, edition 2024)
  • LLVM 18 — the compiler uses inkwell (LLVM 18 bindings)
    • macOS: brew install llvm@18 then export LLVM_SYS_181_PREFIX=/opt/homebrew/opt/llvm@18
    • Ubuntu: apt install llvm-18-dev
  • Bun (for running integration tests and the JAM runner) — bun.sh

Build

git clone https://github.com/tomusdrw/wasm-pvm.git
cd wasm-pvm
cargo build --release

Hello World: Compile & Run

Create a simple WAT program that adds two numbers:

;; add.wat
(module
  (memory 1)
  (func (export "main") (param $args_ptr i32) (param $args_len i32) (result i64)
    ;; Read two i32 args, add them, write result to memory
    (i32.store (i32.const 0)
      (i32.add
        (i32.load (local.get $args_ptr))
        (i32.load (i32.add (local.get $args_ptr) (i32.const 4)))))
    (i64.const 17179869184)))  ;; packed ptr=0, len=4

Compile it to a JAM blob and run it:

# Compile WAT → JAM
cargo run -p wasm-pvm-cli -- compile add.wat -o add.jam

# Run with two u32 arguments: 5 and 7 (little-endian hex)
npx @fluffylabs/anan-as run add.jam 0500000007000000
# Output: 0c000000  (12 in little-endian)

Inspect the Output

Upload the resulting .jam file to the PVM Debugger for step-by-step execution, disassembly, register inspection, and gas metering visualization.

AssemblyScript Example

You can also write programs in AssemblyScript:

// fibonacci.ts
export function main(args_ptr: i32, args_len: i32): i64 {
  const buf = heap.alloc(256);
  let n = load<i32>(args_ptr);
  let a: i32 = 0;
  let b: i32 = 1;

  while (n > 0) {
    b = a + b;
    a = b - a;
    n = n - 1;
  }

  store<i32>(buf, a);
  return (buf as i64) | ((4 as i64) << 32);  // packed ptr + len
}

Compile via the AssemblyScript compiler to WASM, then use wasm-pvm-cli to produce a JAM blob. See the tests/fixtures/assembly/ directory for more examples.

How It Works

The compiler pipeline:

Entry functions use a unified ABI: main(args_ptr: i32, args_len: i32) -> i64, where the return value packs the result pointer in the lower 32 bits and the result length in the upper 32 bits. The compiler unpacks this into PVM's SPI convention (r7 = start address, r8 = end address).

  1. Adapter merge (optional) — merges a WAT adapter module into the WASM binary, replacing matching imports with adapter function bodies
  2. WASM → LLVM IR — translates WASM opcodes to LLVM IR using inkwell (LLVM 18 bindings), with PVM-specific intrinsics for memory operations
  3. LLVM optimization passesmem2reg (SSA promotion), instcombine, simplifycfg, gvn, dce, and optional function inlining
  4. LLVM IR → PVM bytecode — a custom Rust backend reads LLVM IR and emits PVM instructions with per-block register caching (store-load forwarding)
  5. SPI assembly — packages the bytecode into a JAM/SPI program blob with entry headers, jump tables, and data sections

Key Design Decisions

  • Stack-slot approach with register allocation: every SSA value gets a dedicated 8-byte memory offset from SP. A linear-scan register allocator assigns high-use values to available callee-saved registers r9-r12 when not used for this function's incoming parameters (and reserves r9+ needed for outgoing call arguments in non-leaf functions) to eliminate redundant memory traffic across block boundaries and loops
  • Per-block register cache: eliminates redundant loads when a value is reused shortly after being computed (~50% gas reduction)
  • No unsafe code: deny(unsafe_code) enforced at workspace level
  • No floating point: PVM lacks FP support; WASM floats are rejected at compile time
  • All optimizations are toggleable: --no-llvm-passes, --no-peephole, --no-register-cache, --no-icmp-fusion, --no-shrink-wrap, --no-dead-store-elim, --no-const-prop, --no-inline, --inline-threshold N, --no-cross-block-cache, --no-register-alloc, --no-aggressive-regalloc, --no-scratch-reg-alloc, --no-caller-saved-alloc, --no-lazy-spill, --no-dead-function-elim, --no-fallthrough-jumps, --no-libcall-recognition

Benchmark: Optimizations Impact

All PVM-level optimizations enabled (default):

Benchmark WASM size JAM size Code size Gas Used
add(5,7) 68 B 160 B 96 B 27
fib(20) 110 B 221 B 144 B 429
factorial(10) 102 B 200 B 126 B 185
is_prime(25) 162 B 271 B 189 B 61
AS fib(10) 235 B 622 B 496 B 258
AS factorial(7) 234 B 619 B 493 B 225
AS gcd(2017,200) 229 B 627 B 505 B 168
AS decoder 1.5 KB 6.4 KB 4,734 B 913
AS array 1.4 KB 5.8 KB 4,207 B 782
regalloc two loops(500) 252 B 579 B 454 B 37,574
host-call-log 171 B 458 B 104 B 40
aslan-fib accumulate - 19.8 KB 12,556 B 10,706
blake2b("abc", 32) 1.1 KB 3.8 KB 2,572 B 16,675
sha512("abc") 1.7 KB 3.5 KB 2,396 B 16,787
u128 mul x1000 296 B 457 B 342 B 71,031
u128 div(fast) x1000 273 B 767 B 608 B 68,031
u128 div(slow) x1000 273 B 774 B 609 B 130,031
anan-as PVM interpreter 53.4 KB 109.8 KB 79,031 B -

The three u128 rows are microbenchmarks for the libcall_recognition optimization (replaces __multi3 and __udivti3 bodies with hand-crafted PVM-friendly versions; --no-libcall-recognition to disable). Compared against the same workloads with recognition off: u128 mul −37% gas, u128 div fast path (callers with a_hi = b_hi = 0) −41% gas, u128 div slow path (b_hi non-zero) +11% gas — the slow-path regression is the cost of the dispatch check and is dwarfed by the fast-path savings in real workloads (substrate runtimes hit the fast path on the dominant Perbill/Balance: u128 patterns). See docs/src/optimizations.md for details.

PVM-in-PVM: programs executed inside the anan-as PVM interpreter (outer gas cost):

Benchmark JAM Size Outer Gas Direct Gas Overhead
TRAP (interpreter overhead) 21 B 80,451 - -
add(5,7) 160 B 1,164,147 27 43,116x
host-call-log 458 B 1,208,919 40 30,223x
AS fib(10) 622 B 1,536,038 258 5,954x
JAM-SDK fib(10)* 25.4 KB 8,717,551 - -
Jambrains fib(10)* 61.1 KB 7,505,155 - -
JADE fib(10)* 67.3 KB 18,659,363 - -
aslan-fib accumulate* 19.8 KB 14,033,405 10,706 1,311x
blake2b("abc", 32) 3.8 KB 14,402,928 16,675 863x
sha512("abc") 3.5 KB 14,390,234 16,787 857x

*JAM-SDK fib(10), Jambrains fib(10), JADE fib(10), and aslan-fib accumulate exit on unhandled host calls (ecalli). The gas cost reflects program parsing/loading plus partial execution up to the first unhandled ecalli.

Memory layout summary

The JAM blob reserves separate ranges for RO data, a guard gap, globals/overflow metadata, and the WASM heap; see the Architecture docs for the full breakdown, including GLOBAL_MEMORY_BASE, PARAM_OVERFLOW_BASE, SPILLED_LOCALS_BASE, and how wasm_memory_base is computed.

The SPI rw_data section is simply a contiguous copy of every byte from GLOBAL_MEMORY_BASE up to the highest initialized heap address, which is why stub AssemblyScript fixtures such as decoder-test/array-test emit ~13 KB of RW data even though only a handful of bytes are non-zero: the encoder must preserve the absolute addresses of the data segments, so the zero stretch between globals and the first heap byte is encoded verbatim. Keeping globals/data near the heap base or introducing sparse RW descriptors (future work) are the only ways to shrink those blobs without redesigning SPI.

Supported WASM Features

Category Operations
Arithmetic (i32 & i64) add, sub, mul, div_u/s, rem_u/s, all comparisons, clz, ctz, popcnt, rotl, rotr, bitwise ops
Control flow block, loop, if/else, br, br_if, br_table, return, unreachable, block results
Memory load/store (all widths), memory.size, memory.grow, memory.fill, memory.copy, globals, data sections
Functions call, call_indirect (with signature validation), recursion, stack overflow detection
Type conversions wrap, extend_s/u, sign extensions (i32/i64 extend8/16/32_s)
Imports Text-based import maps (--imports) and WAT adapter files (--adapter)

Not supported: floating point (by design — PVM has no FP instructions).

CLI Usage

# Compile WAT or WASM to JAM
wasm-pvm compile input.wat -o output.jam
wasm-pvm compile input.wasm -o output.jam

# With import resolution
wasm-pvm compile input.wasm -o output.jam \
  --imports imports.txt \
  --adapter adapter.wat

# Disable specific optimizations
wasm-pvm compile input.wasm -o output.jam --no-inline --no-peephole

# Disable all optimizations
wasm-pvm compile input.wasm -o output.jam \
  --no-llvm-passes --no-peephole --no-register-cache \
  --no-icmp-fusion --no-shrink-wrap --no-dead-store-elim \
  --no-const-prop --no-inline --no-cross-block-cache \
  --no-register-alloc --no-aggressive-regalloc \
  --no-scratch-reg-alloc --no-caller-saved-alloc \
  --no-lazy-spill --no-dead-function-elim \
  --no-fallthrough-jumps --no-libcall-recognition

# Compile past the "float wall" by replacing every f32/f64 op
# with a runtime trap (useful for discovering other unsupported
# features in a module before adding real FP support)
wasm-pvm compile input.wasm -o output.jam --trap-floats

See the Import Handling section for details on resolving WASM imports.

Using as a Library

The wasm-pvm crate can be used as a Rust dependency. It supports two modes:

# Full compiler (default) — requires LLVM 18
wasm-pvm = "0.9.0"

# PVM types only — no LLVM dependency, compiles to wasm32-unknown-unknown
wasm-pvm = { version = "0.9.0", default-features = false }

With default-features = false, only the PVM type definitions are available: Instruction, Opcode, ProgramBlob, SpiProgram, abi::*, memory_layout::*, and Error. This is useful for downstream tools that need to work with PVM bytecode (interpreters, debuggers, analyzers) without requiring the full LLVM compiler toolchain.

Feature Default Description
compiler Yes Full WASM-to-PVM compiler (inkwell, wasmparser, wasm-encoder)
test-harness Yes Test utilities for unit testing (implies compiler)

Project Structure

crates/
  wasm-pvm/              # Core library
    src/
      pvm/               # PVM instruction definitions (always available)
      memory_layout.rs   # PVM memory address constants (always available)
      spi.rs             # JAM/SPI format encoder (always available)
      abi.rs             # Register & frame layout constants (always available)
      llvm_frontend/     # WASM → LLVM IR translation (feature = "compiler")
      llvm_backend/      # LLVM IR → PVM bytecode lowering (feature = "compiler")
      translate/         # Compilation orchestration & SPI assembly (feature = "compiler")
  wasm-pvm-cli/          # Command-line interface
tests/                   # 670 integration tests (TypeScript/Bun)
  fixtures/
    wat/                 # WAT test programs
    assembly/            # AssemblyScript examples
    imports/             # Import maps & adapter files
vendor/
  anan-as/               # PVM interpreter (submodule)

Testing

# Rust unit tests
cargo test

# Lint
cargo clippy -- -D warnings

# Integration tests (builds artifacts, then runs all layers)
cd tests && bun run test

# Quick validation (Layer 1 smoke tests only)
cd tests && bun test layer1/

The test suite is organized into layers:

  • Layer 1: Core/smoke tests (~56 tests) — fast, run during development
  • Layer 2: Feature tests (~169 tests)
  • Layer 3: Regression/edge cases (~445 tests)
  • Layer 4-5: PVM-in-PVM tests — the PVM interpreter itself compiled to PVM, running the test suite inside PVM
  • Differential (~142 tests): cross-checks PVM output against Bun's native WebAssembly engine; run with bun run test:differential

Import Handling

WASM modules that import external functions need those imports resolved before compilation. Two mechanisms are available:

Import Map (--imports)

A text file mapping import names to simple actions:

# my-imports.txt
abort = trap        # emit unreachable (panic)
console.log = nop   # do nothing, return zero

Adapter WAT (--adapter)

A WAT module whose exports replace matching imports, enabling arbitrary logic for import resolution (pointer conversion, memory reads, host calls):

(module
  (import "env" "host_call_5" (func $host_call_5 (param i64 i64 i64 i64 i64 i64) (result i64)))
  (import "env" "pvm_ptr" (func $pvm_ptr (param i64) (result i64)))

  (func (export "console.log") (param i32)
    (drop (call $host_call_5
      (i64.const 100)                                    ;; ecalli index
      (i64.const 3)                                      ;; log level
      (i64.const 0) (i64.const 0)                        ;; target ptr/len
      (call $pvm_ptr (i64.extend_i32_u (local.get 0)))   ;; message ptr
      (i64.extend_i32_u (i32.load offset=0
        (i32.sub (local.get 0) (i32.const 4)))))))       ;; message len
)

When both --imports and --adapter are provided, the adapter runs first, then the import map handles remaining unresolved imports. All imports must be resolved or compilation fails.

Resources

  • PVM Debugger — upload .jam files for disassembly, step-by-step execution, and register/gas inspection
  • PVM Decompiler — decompile PVM bytecode back to human-readable form
  • ananas (anan-as) — PVM interpreter written in AssemblyScript, compiled to PVM itself for PVM-in-PVM execution
  • as-lan — example AssemblyScript project compiled from WASM to PVM using this tool
  • JAM Gray Paper — the JAM protocol specification (PVM is defined in Appendix A)
  • AssemblyScript — TypeScript-like language that compiles to WASM
  • Documentation Book — full compiler docs (run mdbook serve docs to browse locally)

License

MIT

Contributing

Contributions are welcome! See AGENTS.md for coding guidelines, project conventions, and a map of the codebase.

Dependencies

~14–19MB
~263K SLoC