Note: This project is still experimental, and the API is subject to significant changes.
ASTro is an optimization framework based on Abstract Syntax Trees (ASTs). It provides a reusable infrastructure for generating optimized code fragments through partial evaluation of AST interpreters, emitting C source as the output of specialization and delegating native code generation to a mature C compiler (gcc/clang).
The companion tool ASTroGen (lib/astrogen.rb) automatically generates an interpreter (evaluator), a dispatcher, allocators, Merkle-tree hash functions, a partial evaluator (specializer), a dumper, and node-replacement helpers from a node.def file that defines node types and their behaviors in C code.
- Dispatcher / Evaluator separation. Evaluators (
EVAL_xxx) hold the user-written semantics; dispatchers (DISPATCH_xxx) are thin wrappers that unpack node fields and call the evaluator. Partial evaluation specializes only the dispatchers, so evaluator code is reused unchanged and the C compiler is free to inline the resulting call chain. - Merkle-tree hashing of AST nodes. Each node carries a hash derived from its kind and children, enabling content-addressable sharing of compiled code across processes and machines (functions are named
SD_<hash>). - C source as the IR. Specialized code is emitted as ordinary C, compiled with the host toolchain, and loaded via shared objects (
dlopen+dlsym). No custom backend. - Code Store (
runtime/astro_code_store.{h,c}). A small runtime library that manages<store>/SD_<hash>.{c,o}files plus an aggregatedall.so, and swaps a node's dispatcher to the specialized one onastro_cs_load.
The same node.def and the same generated infrastructure support:
- Plain interpreter — pure tree-walking via
DISPATCH→EVAL. - AOT compilation — specialize the whole AST offline, link, and run.
- Profile-guided compilation — collect profile on the first run, specialize before the second.
- JIT compilation — specialize and load
.sofiles at run time. A tiered design (in-process L0 thread, local L1 daemon, remote L2 compile farm) shares compiled code by hash.
See docs/idea.md for the design rationale and docs/usage.md for an ASTroGen tutorial.
lib/astrogen.rb ASTroGen core (Ruby) — generates C from node.def
runtime/ Reusable C runtime (Code Store)
sample/ Sample languages — see § Samples below
docs/ Design notes and papers
ASTro samples span a wide range of language families to exercise the framework against very different value representations, control-flow shapes, and runtime services. All share a uniform layout (node.def, Makefile, optional ASTroGen extension, per-sample docs/). Each sample's own README has the full language scope, build / run, benchmarks, and design notes; docs/samples.md is the cross-sample analysis. The entries below are one-liners with the most distinctive flagship result.
At a glance — 27 samples by paradigm:
| Group | Count | Samples |
|---|---|---|
| Tutorial / calculators | 2 | calc, abc |
| Ruby family | 5 | naruby, baruby, baruby_precise, abruby, koruby |
| Other dynamic scripting | 4 | luastro, jstro, pystro, anlox |
| Functional / OO academic | 5 | ascheme, astocaml, asml, ancaml, asom |
| Statically-typed imperative | 3 | pascalast, castro, anpy |
| Stack-based | 1 | aforth |
| Data processing | 2 | astr, arawk |
| Non-source / DSL | 5 | wastro, astrogre, nuq, arjsv, arcel |
Tutorial / calculators.
calc— toy 6-node calculator REPL (num++/-/*///%), the smallest end-to-end ASTroGen example.abc— POSIX/GNUbc-compatible arbitrary-precision calculator (40 nodes): GMP-mantissa +scalefixed-point, variables / arrays / user functions (dynamic scope) /if-while-for/ibase-obasebases, with LSB-tagged fixnum immediates so scale-0 small integers skip GMP/GC; Boehm GC throughout. Verified by 5,500+ differential tests vs systembc(curated + fixtures + seeded fuzzing); faster thanbcon every benchmark (geomean ~6×) — factorial 42× / sqrt 25× via GMP, integer loops 2–5×.
Ruby family.
naruby— not a Ruby: tiny integer-only Ruby subset (32 nodes). The original ASTro paper's vehicle and the only sample exercising all four execution modes (plain / AOT / PG / JIT) from a single binary.baruby— barely a Ruby: naruby fork extended with Array + String + libgc (52 nodes); LSB-tagged VALUE, parse-time method desugar (no OO machinery). Built as the first testbed for the unified GC framework (docs/gc_design.md) — minimal value representation, ships withbinary_trees/list_alloc/string_concatbenches at ~1 s scale that report wall time + libgc collection counts.baruby_precise— precise GC testbed forked frombaruby: same language surface, but 14 hand-written GC backends (none/mark/mark_gen/mark_gen_inc/copy/copy_gen/mark_compact/mark_compact_gen/bump/mark_bump_gen/immix/immix_gen/mark_bitmap_gen/mark_card_gen) selectable bymake GC=<name>. Precise rooting via sharedsp[]spill (Lua/Rust-style; seedocs/gc_design.md). Ships a 14-bench × 14-backend perf matrix and per-collector wall time / pause / collection-count breakdown.abruby— a bit Ruby: larger Ruby subset as a CRuby C extension (107 nodes), reusing CRuby'sVALUE/ Prism / GC. PGC-baked optcarrot 86.5 fps vs CRuby (no-JIT) 45.6 fps; integer-loop microbenches 4–8×.koruby— kind of Ruby: standalone (non-CRuby) Ruby with Boehm GC + GMP + Prism (119 nodes). Runs optcarrot end-to-end at ~100 fps (gcc-15 -O3, vs CRuby 42 fps ≈ 2.4×; YJIT 175 fps).
Other dynamic scripting.
luastro— Lua 5.4 (74 nodes) with tagged 8-byteLuaValue. Full pattern matcher, weak tables,__gcfinalizers, ucontext coroutines, self-written mark-sweep GC.jstro— JavaScript (broad ES2023 subset, 101 nodes) with V8-style hidden-class objects + shape-transition ICs. Profile-driven kind swap, longjmpthrow, mark-sweep GC; beats node v18 by 2–50× on several benches.pystro— Python 3 subset (95 nodes): GMP bignum + classes + dunder +try/except+ generators + comprehensions + f-strings + ~37 builtins. Beats CPython 3.12 on 13 / 14 benches (4 macro + 9/10 micro) at ~1 s scale (while_loop17×); also beats CPython 3.14+JIT on 12/14.
Functional / academic.
ascheme— R5RS Scheme (54 nodes) with the full numeric tower (fixnum / bignum / rational / flonum / complex via GMP),call/cc, multiple values, ports. Passes 179/179 of chibi'sr5rs-tests.scm; vs chibi 18/18 wins, vs guile 17/18 (up to 27×).astocaml— OCaml subset (91 nodes) with HM-lite type inference, ADTs, exceptions, single-inheritance classes, real functor instantiation, TCO. 35/35 tests; with-c, fib / ack / tak beatocamlcbytecode andocamltoplevel.asml— Standard ML subset (85 nodes) with full Hindley–Milner type inference (Algorithm W + level-based generalisation + value restriction), ADTs with type parameters, records ({x=1, y=2}/#field/ record patterns), pattern match, exceptions, refs. Type errors halt at compile time; the type-checker drives lower-time dispatcher specialisation, and generic dynamic-typecheck nodes are removed fromnode.defentirely — only_int/_real/_string/_poly/_bool/_uncheckedvariants remain. AOT pipeline includesis_leaf-driven C-stack alloca,mark_tail_callspost-pass, and astocaml-style aggressive cflags (-fno-stack-clash-protection -flto). vs SML/NJ v110.79: AOT 1.8× of SML/NJ onfiband 2.7× faster onstrcat; 30× gap on multi-arg curried recursion (next target:app2/app3lower-time folding).ancaml— MinCaml (the monomorphic ML subset from Tohoku University's compiler course, 38 nodes): a from-scratch Hindley–Milner inferer (destructive unification, defaults free vars toint), de Bruijn frame variables resolved at parse time, boxed-float / closure / tuple / array values, and tail-call elimination (trampoline) + leaf-framealloca. Differential-tested vs the systemocaml(a two-line prelude reconcilesprint_char's arg type and the removedArray.create); AOTfibbeats OCaml bytecode (only nativeocamloptis ahead). Ships a self-contained reimplementation reference (docs/mincaml_impl_spec.md) and an implementation-agnostic harness (ANCAML=/path ruby test/run_tests.rb). Part of the An* series (ASTro Nutshell — bite-sized pedagogical "subject" languages; cf.anpy).asom— SOM Smalltalk dialect (80 nodes) with type-specialized sends, control-flow inlining, Boehm GC + GMP. Passes the full SOM TestSuite (221/221); PG mode beats SOM++ on all 12 AreWeFastYet benches.
Statically-typed imperative.
pascalast— Pascal subset (159 nodes), ISO 7185 + Free Pascal–style OO: variant records, sets,with,array of T,virtual/override,try/except, properties. Wins outright on tight constant-folding loops vsfpc -O3(collatz 0.4× / leibniz 0.6× / mandelbrot 0.8×).castro— C subset (101 nodes) with tree-sitter-c front-end, 8-byte slot VALUE, structs / function pointers /printf,gcc -Epreprocessing. AOT beatsgcc -O0on tight loops;crc32tiesgcc -O1.anpy— ChocoPy (statically-typed Python 3.6 subset, 41 nodes): indentation lexer, a static type checker (conformance / assignment-compat / join + §5.2 rules), single-inheritance classes with dynamic dispatch, nested functions withglobal/nonlocalclosures, lists/strings; tagged-immediate ints + Boehm GC. Verified by differential testing vspython3(valid ChocoPy == valid Python 3.6) plus runtime-error and type-error-rejection cases. Beats CPython 3.12 on loop-heavy code (loop/collatz/list 0.4–0.5×), trails on call-heavy code (per-call env-frame cost); geomean 1.26×. Ships a self-contained implementation reference (docs/chocopy_impl_spec.md) and an implementation-agnostic harness (ANPY=/path/to/impl ruby test/run_tests.rb) so a from-scratch ChocoPy in any language can be checked against the same suite. First of the An* series (ASTro Nutshell — bite-sized pedagogical "subject" languages; cf.ancaml,anlox).anlox— Lox (Robert Nystrom's Crafting Interpreters teaching language, 38 nodes): a dynamic language with closures and single-inheritance classes (this/super/dynamic dispatch). Demonstrates a parse-time resolver assigning(depth, slot)to locals (globals stay by-name), bound methods via athis-frame, and the canonical self-contained// expect:test format (no system Lox exists; setANLOX_REFfor a reference diff).--buildis unsupported (side-table indirection); interpreter +--aot-compilework. Part of the An* series.
Stack-based.
aforth— Forth subset (68 nodes) where every word is an AST NODE (no traditional threaded code); calls indirect through aword_id→NODE *table. Wins 8/9 against gforth 0.7.3 (gcd 13.9×, factorial 8.1×, collatz 7.2×; ack 0.95× the only loss).
Data processing.
astr— R subset (46 nodes) with tagged 64-bitVALUE, libgc, vector/scalar broadcast,c()/paste/substr/1:nranges. AOT 4.1× on fib(36), 3.8× on ack(3,9).arawk— POSIX awk subset (115 nodes, regex excluded) — BEGIN/END + pattern-action rules,$N/NR/NF, associative arrays,for-in, user-defined functions, fullprintf/sprintf, sixgetlineforms, pipe + redirect I/O, UTF-8 codepoint mode (gawk-styleLC_CTYPEdetect;--byte/--posixfor mawk-compatible byte mode). AOT geomean 0.93× of gawk on the bench set (beats goawk 1.07×, behind mawk 1.93×). Phase 2 plans an AST-interpreter merge withastrogreso awk + regex run under a single ASTroGen instance.
Non-source / DSL.
wastro— WebAssembly 1.0+ (MVP) interpreter (212 nodes). Reads both.watand.wasm, runs the wasm spec-test.wastharness; AOT-cached within 3–6× ofgcc -O2on tight loops.astrogre— Ruby/Onigmo-compatible regex engine (53 nodes — the matcher itself is an AST) with agrep-style CLI (are); switchable backend (astrogre / Onigmo). AOT fuses the start-position scan loop and the regex chain into one SD (8/8 wins vs Onigmo, 3–15×).nuq—jq1.7-compatible clone (209 nodes) with full pipeline /try-catch/reduce/foreach/ module / call-by-name / 70+ builtins. Passes 524/526 of jq 1.7's official test suite; real-world 11/11 wins 2.6–5.0× vs jq (microupto50×,reverse16×).arjsv— JSON Schema validator (drafts 04 / 06 / 07 / 2020-12, 47 nodes), CRuby C extension,json_schemer-compatible API, per-schema SD with alloc-freevalid?. Test Suite (excl. IDNA / HTTP$ref/ dynamic$dynamicRef): 04 98.8% / 06 98.9% / 07 95.5% / 2020-12 94.6%. 4–11× vsrj_schema(Rust+RapidJSON FFI), 25–180× vsjson_schemer(pure Ruby).arcel— CEL (Common Expression Language) evaluator (57 nodes) — drop-in replacement forcel-go/cel-cppaimed at K8s admission / Envoy authz / IAM Conditions hot paths. Passes 808/808 of Google'scel-spec/tests/simple/testdata(vs cel-go 89.7% on the same harness). AOT vs cel-cpp on 11 bench cases: geomean 14×, realistic K8s ValidatingAdmissionPolicy 52 ns/op (22.4×),bool_ladderconstant-folds to 5 ns/op (42.6×).