Skip to content

snobol4ever/SCRIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,062 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCRIP

License: AGPL v3

A multi-language compiler collection — SNOBOL4, Icon, Prolog, Snocone, Rebus — targeting x86-64 native ASM, JVM bytecode, .NET MSIL, WebAssembly, and portable C — all from a single IR. Part of the snobol4ever organization.


What This Is

SCRIP (the scrip-cc compiler) is a from-scratch SNOBOL4 compiler: one frontend pipeline (scrip-cc) feeding five independent backend emitters. Write SNOBOL4 once. Run it anywhere.

Flag Output Status
(default) Portable C with labeled gotos ✅ 106/106 corpus
-asm x86-64 NASM assembly ✅ 97/106 corpus · 9 known failures
-jvm JVM Jasmin bytecode (.j) ✅ 106/106 corpus · beauty.sno
-net .NET CIL assembly (.il) ✅ 110/110 corpus · roman + wordcount ✅
-wasm WebAssembly text format (.wat) 🚧 active — SW-2, hello/literals/arith
-js Node.js module (.js) 🚧 SJ-6 — 14/17 feat · 1286/0 emit

The 9 ASM failures (tests 022, 055, 064, cross, word1–4, wordcount) are under active investigation via the five-way differential monitor.

Sister repos: snobol4jvm (full Clojure→JVM pipeline, 2,033 tests) and snobol4dotnet (full C#→MSIL pipeline, 1,874 tests).


The Architecture — Byrd Boxes All the Way Down

Every SNOBOL4 statement has the same shape:

label:   subject   pattern   = replacement   :S(goto)  :F(goto)

Each pattern node compiles to a Byrd box — four labeled entry points wired at compile time, zero runtime dispatch:

Port Greek Meaning
proceed α Enter fresh — cursor at current position
recede β Resume after backtrack from a child
succeed γ Match succeeded — advance cursor, pass forward
concede ω Match failed — restore cursor, propagate back

Sequential composition wires γ of one node to α of the next. Alternation saves the cursor on the left-ω path and restores it before trying right. ARBNO wires child-γ back into its own α until child-ω exits. The wiring is the execution — no interpreter table, no virtual dispatch on the hot path.

This model, first described by Lawrence Byrd in 1980 for Prolog debugging and generalized by Todd Proebsting in 1996 as a syntax-directed code generation strategy for goal-directed languages, turns out to describe SNOBOL4 pattern matching exactly. All five backends implement the same four-port wiring. The semantics are identical whether the output is C labeled gotos, x86-64 JMP instructions, JVM goto bytecodes, CIL br instructions, or WebAssembly return_call tail calls.

Hot path: pure labeled gotos. Zero overhead. No setjmp on the hot path. Cold path: longjmp for ABORT, bare FENCE, and genuine runtime errors only.


Five Frontends

Five frontends share the same IR (AST_t / STMT_t):

Frontend Source language Status
SNOBOL4 / SPITBOL Full SNOBOL4 + SPITBOL extensions ✅ active — all backends
Snocone Andrew Koenig's structured C-like frontend (Bell Labs TR 124, 1986) ✅ active — ASM backend (-sc -asm)
Rebus Structured transpiler — Rebus source → SNOBOL4 ✅ complete — M-REBUS ✅
Icon Icon — generators, suspend/resume, string scanning ✅ active — ASM + JVM backends (-icn)
Prolog Prolog — unification, backtrack, Byrd Box wiring ✅ active — ASM + JVM backends (-pl)

The Byrd Box IR is the bridge between languages. Icon generators map to the same four ports. Prolog unification is goal-directed evaluation — the same model. SNOBOL4, Icon, and Prolog are three syntaxes for one execution machine.


Build

# Dependencies
apt-get install -y libgc-dev nasm default-jdk

# Build scrip-cc
make -C src

# C backend (default)
./scrip-cc program.sno > prog.c && gcc prog.c -lgc -o prog && ./prog

# ASM backend
./scrip-cc -asm program.sno > prog.s
nasm -f elf64 prog.s -o prog.o && gcc prog.o -lgc -o prog && ./prog

# JVM backend
./scrip-cc -jvm program.sno > prog.j
java -jar src/backend/jasmin.jar prog.j -d .
java -cp . Prog

# NET backend
./scrip-cc -net program.sno > prog.il
ilasm prog.il && mono prog.exe

# WASM backend
./scrip-cc -wasm -o prog.wat program.sno
wat2wasm --enable-tail-call prog.wat -o prog.wasm
node test/wasm/run_wasm.js prog.wasm

Corpus Ladder

All backends climb the same 12-rung ladder against corpus/crosscheck/:

Rung  1: hello / output    Rung  5: control       Rung  9: keywords
Rung  2: assign            Rung  6: patterns       Rung 10: functions
Rung  3: concat            Rung  7: capture        Rung 11: data
Rung  4: arith             Rung  8: strings        Rung 12: beauty.sno
Backend Corpus Rung 12 Notes
C (portable) ✅ 106/106 Full corpus
x86-64 ASM ⚠ 97/106 9 known failures; monitor investigation active
JVM bytecode ✅ 106/106 beauty.sno self-beautifies — M-JVM-BEAUTY ✅
.NET MSIL ✅ 110/110 roman + wordcount pass — M-NET-SAMPLES ✅

Oracle: CSNOBOL4 2.3.3 — snobol4 -f -P256k -I$INC file.sno


Validate

# C backend
bash test/crosscheck/run_crosscheck.sh

# ASM backend (STOP_ON_FAIL=0 shows all results)
STOP_ON_FAIL=0 bash test/crosscheck/run_crosscheck_asm_corpus.sh

# JVM backend — full corpus
JASMIN=src/backend/jasmin.jar
bash test/crosscheck/run_crosscheck_jvm.sh

# JVM backend — manual per-rung (e.g. patterns rung)
JASMIN=src/backend/jasmin.jar
PDIR=../corpus/crosscheck/patterns
for sno in $PDIR/*.sno; do
  base=$(basename $sno .sno); TMPD=$(mktemp -d)
  ./scrip-cc -jvm "$sno" > $TMPD/p.j 2>/dev/null
  java -jar $JASMIN $TMPD/p.j -d $TMPD/ 2>/dev/null
  cls=$(ls $TMPD/*.class 2>/dev/null | head -1 | xargs basename 2>/dev/null | sed 's/.class//')
  got=$(java -cp $TMPD $cls 2>/dev/null); exp=$(cat "${sno%.sno}.ref" 2>/dev/null)
  rm -rf $TMPD
  [ "$got" = "$exp" ] && echo "PASS $base" || echo "FAIL $base"
done

# NET backend
bash test/crosscheck/run_crosscheck_net.sh

Correctness — Chomsky Hierarchy Oracles

SNOBOL4 patterns are not a regex engine. They are a universal grammar machine. The corpus includes mathematical oracles at every tier of the Chomsky hierarchy:

Tier Oracle language All backends
Type 3 — Regular (a|b)*abb, a*b*, {x^2n}
Type 2 — Context-free {a^n b^n}, palindromes, Dyck language
Type 1 — Context-sensitive {a^n b^n c^n}
Type 0 — Turing {w#w} copy language

These are proven results, not empirical approximations. A backend either computes the correct answer or it does not.


Repository Layout

src/
  frontend/
    snobol4/          SNOBOL4/SPITBOL lexer + parser → AST + IR
    snocone/          Snocone frontend (SC language, ~10 source files)
    rebus/            Rebus transpiler
    icon/             Icon frontend — ASM + JVM
    prolog/           Prolog frontend — ASM + JVM
  backend/
    c/                Portable C emitter (emit_byrd.c 2,709 lines · emit.c 2,220 lines)
    x64/              x86-64 NASM emitter (emit_byrd_asm.c 4,159 lines)
    jvm/              JVM Jasmin emitter (emit_byrd_jvm.c 4,051 lines · jasmin.jar)
    net/              .NET CIL emitter (emit_byrd_net.c 1,934 lines)
  driver/
    main.c            scrip-cc entry point — flag dispatch
  runtime/
    asm/              NASM macro library + runtime helpers
test/
  crosscheck/         106-program corpus + .ref oracle outputs
  sprint_asm/         ASM regression suite
  jvm_j3/             JVM sprint J3 smoke tests
  rebus/              Rebus round-trip tests
  smoke/              Quick sanity tests
artifacts/
  asm/
    beauty_prog.s     beauty.sno → x86-64 ASM (tracked canonical output)
    samples/
      roman.s         roman.sno → x86-64 ASM
      wordcount.s     wordcount.sno → x86-64 ASM
  jvm/                hello_prog.j · roman.j · wordcount.j
  net/                hello_prog.il
  c/                  Canonical C outputs

The Five-Way Monitor

Active on the asm-backend branch: a parallel differential monitor that runs the same SNOBOL4 program through all five participants simultaneously and compares trace streams event-by-event via named FIFOs.

# Participant Role
1 CSNOBOL4 2.3.3 Primary oracle
2 SPITBOL x64 4.0f Secondary oracle
3 SCRIP ASM backend Compiled target
4 SCRIP JVM backend Compiled target
5 SCRIP NET backend Compiled target

monitor_ipc.so — a LOAD'd C shared library — writes trace events to a per-participant named FIFO, bypassing stdio entirely. The collector reads all five FIFOs in parallel. The first line where any participant diverges from the oracle is the exact statement, variable, and value where the bug fires. No bisecting. No guessing.

Status (2026-03-21): CSNOBOL4 ✅ · SPITBOL ✅ · ASM ✅ working in isolation. JVM OUTPUT fast-path hook and NET emitter hook in progress — M-MONITOR-IPC-5WAY next.



JavaScript Backend (In Progress — SJ-6)

-js produces a Node.js module runnable with node prog.js.

# JS backend
./scrip-cc -js program.sno -o prog.js
SNO_RUNTIME=src/runtime/js/sno_runtime.js node prog.js

Status: SJ-6 · feat suite 14/17 PASS · emit-diff 1286/0

Feature Status
Arithmetic, strings, control flow
Pattern matching (LIT/ANY/SPAN/BREAK/ARB/ARBNO/BAL/…)
Immediate capture ($) / conditional capture (.)
Hello suite (hello, literals, INTEGER, UCASE, REMDR) ✅ 4/4
User-defined functions / DEFINE 🔧 SJ-7
INPUT line buffering 🔧 SJ-7
run_invariants.sh wiring 🔧 SJ-7

Pattern Engine — sno_engine.js

The JS pattern runtime (src/runtime/js/sno_engine.js, 532 lines) is an iterative frame-based engine modelled after the Clojure implementation in snobol4jvm. Frame state uses Greek variable names matching the Clojure source:

Frame ζ = [Σ, Δ, σ, δ, Π, φ, Ψ]
  Σ/Δ — subject string + cursor on entry
  σ/δ — subject string + current cursor
  Π   — current pattern node
  φ   — child index (ALT/SEQ) or retry state
  Ψ   — parent frame stack
Ω     — backtrack stack
α     — current action signal (:proceed/:succeed/:fail/:recede)
λ     — current node type tag

Frames are immutable plain JS arrays — transitions create new arrays, old ones are GC'd. No memcpy, no snapshot/restore, no arena. The GC is the stack allocator.

Benchmark: SCRIP vs spipatjs

Head-to-head against Phil Budne's spipatjs (3,090 lines, GNAT PE node-graph model) — same Node.js v22 process, same JIT warmup, 20,000 iterations each. SCRIP wins all 8 benchmarks.

ID Pattern SCRIP spipatjs ratio
B01 Literal match 207,510 6,354 32.7×
B02 BREAK+SPAN word scan 23,578 6,072 3.9×
B03 ARB backtrack depth 12 28,602 6,418 4.5×
B04 ARBNO multi-rep 232,160 6,875 33.8×
B05 BAL balanced parens 179,353 6,457 27.8×
B06 Wide ALT (4 alternatives) 9,196 6,379 1.4×
B07 Deep SEQ (10 literals) 163,845 6,268 26.1×
B08 CAPT_IMM capture 415,434 6,406 64.9×

ops/sec · Node.js v22.22.0 · see test/js/bench_engine.js

spipatjs's throughput is nearly flat (~6,000–6,900 ops/sec) regardless of pattern complexity — Object.freeze() on every match result dominates. SCRIP's immutable-frame design avoids all post-match allocation.

The Bootstrap Goal

The correctness target is self-hosting. Two gates:

M-BEAUTIFY-BOOTSTRAPbeauty.sno (the SNOBOL4 beautifier written in SNOBOL4) reads itself and produces output identical to its input on all backends. A fixed point.

M-COMPILER-BOOTSTRAPcompiler.sno (the full compiler written in SNOBOL4) compiles itself.

The JVM backend has already passed Rung 12: beauty.sno via the JVM backend produces output byte-for-byte identical to the CSNOBOL4 oracle (M-JVM-BEAUTY ✅, commit b67d0b1 J-212). The other backends follow.


The Development Story

SCRIP is co-authored by Lon Jones Cherryholmes and Claude Sonnet 4.6.

The sessions run like a buddy comedy: Lon arrives with an architectural insight or an inconvenient bug, Claude writes the code, they argue about the right abstraction, one of them is wrong, they figure out which one, the milestone fires, and Claude writes the commit. Then they do it again, starting fresh with no memory of the previous session except whatever made it into the docs.

The architecture has a name for that: the session log. Every session's mental state at handoff is recorded in SESSIONS_ARCHIVE.md so the next Claude can pick up exactly where the last one left off. It is, in a way, the compiler writing itself — one session at a time.


Active Development

Sprint state lives in snobol4ever/.github:

  • PLAN.md — milestone dashboard, sprint state, session handoffs
  • ARCH-monitor.md — five-way monitor design and sprint detail
  • SESSIONS_ARCHIVE.md — full session history, append-only

Current sprint: G-10 · SJ-6 (SNOBOL4×JS) — engine complete, bench done, DEFINE/RETURN next.


Collaborators

  • Lon Jones Cherryholmes — compiler architecture, all backends, SCRIP lead
  • Jeffrey Cooper, M.D. — snobol4dotnet, .NET MSIL target
  • Claude Sonnet 4.6 — scrip-cc co-author; every sprint, every Byrd box, every labeled goto — written in session, committed, pushed

Source Volume (G-VOLUME · M-VOL-X ✅ · 2026-03-22)

wc -l scan of src/. Generated artifacts (.s files, 36,890 lines across 28 files) excluded. Categories are logical function — comparable across SCRIP, snobol4jvm, snobol4dotnet. % of total = % of src/ lines only.

Category Files Lines Blank-stripped % total
Parser / lexer 20 6,368 5,728 20.5%
Code emitter 11 17,291 15,936 55.6%
Pattern engine 10 1,588 1,421 5.1%
Runtime / builtins 7 4,614 4,120 14.8%
Driver / CLI 1 140 128 0.5%
Extensions / plugins 3 1,085 969 3.5%
Tests 47 6,265 5,495
Benchmarks 12 1,603 1,541
Docs / Markdown 2 1,080 814
Total (src) 54 31,090 28,306 100%

IR EKind — SNOBOL4 Operator Name Reference

Four-column reference: SIL/CSNOBOL4 proc name · MINIMAL/SPITBOL o$ entry · functional name · current IR node. Source authority: snobol4-2.3.3/v311.sil (CSNOBOL4) and spitbol-docs/v37.min (SPITBOL v3.7).

Unary operators

Syntax SIL / CSNOBOL4 MINIMAL / SPITBOL Functional name IR node
+X PLS o$aff — affirmation numeric coerce / affirmation AST_PLSAST_PLS (unary plus; see note)
-X MNS o$com — complementation arithmetic negation AST_MNS
\X NEG o$nta/b/c — negation logical negation (not) AST_NOT
?X QUES o$int — interrogation interrogation AST_INTERROGATE
@X ATOP o$cas — cursor assignment cursor position capture AST_CAPT_CURSOR
$X (c$ind, inline) o$inv — indirection indirection AST_INDIRECT
&X (c$key, inline) o$kwv — keyword reference keyword reference AST_KEYWORD
*X (c$def, inline) (c$def, no o$ entry) deferred expression AST_DEFER
.X (unary, via NAM) o$nam — name reference name reference (unary) AST_NAME

Note on AST_PLS vs AST_PLS: SIL PLS and MINIMAL o$aff are the same operation. The IR currently has both AST_PLS and AST_PLS with identical semantics — one must be removed. Decision: AST_PLS is the canonical name (matches SIL); AST_PLS is deprecated.

Binary operators

Syntax SIL / CSNOBOL4 MINIMAL / SPITBOL Functional name IR node
X + Y ADD o$add — addition addition AST_ADD
X - Y SUB o$sub — subtraction subtraction AST_SUB
X * Y MPY o$mlt — multiplication multiplication AST_MUL
X / Y DIV o$dvd — division division AST_DIV
X ! Y EXPOP o$exp — exponentiation exponentiation AST_POW
X Y (blank, value ctx) CONCAT o$cnc — concatenation string concatenation AST_CAT
X Y (blank, pattern ctx) (BINCON path, CONCL) (c$cnc type, Byrd wiring) goal-directed pattern sequence AST_SEQ
X | Y OR / ORPP o$alt — alternation pattern alternation AST_ALT
X ? Y SCAN o$pmv/pmn/pms — pattern match pattern match / scan AST_SCAN
X = Y ASGN o$ass — assignment assignment AST_ASSIGN
X . Y NAM o$pas — pattern assignment conditional capture (on match) AST_CAPT_COND_ASGN
X $ Y DOL o$ima — immediate assignment immediate capture AST_CAPT_IMMED_ASGN

About

SCRIP — fresh start from one4all working tree (refs/ excluded). Single initial commit, zero inherited history.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors