badc is a rather small cross-platform optimizing compiler (also a compiler-as-library)
of the C language.
It had appeared out of necessity to quickly tweak how and what a C compiler emits. Then it was captivating making it being able to become a nimble practical tool for everyday use rather than a niche hack. Modern approaches to coding would make building a compiler easier than that had been before I thought :)
Now badc implements a very large portion of the C99, C11 standards and some
popular idioms from the later standards as well as few extensions. All of that is
enough to build and test Python 3.14 on all of the five supported targets (and
there are more demos included, read on!).
badc's small footprint and embedded headers (which you can override or --install
to some path for tweaking or inspecting) give a one-executable experience of the
portable tools. The compiler's codebase of moderate size can be used as a small
self-sufficient toolchain or can be used as a library giving your project the
ability to build C code or just run it (the default when using as a library).
A fun extension is that badc can automatically add the header(s)
for the standard library so the bare hello.c with
int main() {
puts("Hello");
return 0;
}works:
info: auto-including <stdio.h> for undeclared `puts`
info: wrote file hello for target `macos-aarch64`badc is able to produce the debug information so that the binaries it generates
can be debugged and/or their performance can be profiled (use -g).
badc optimizes when you specify -O and can produce code that's faster
than clang -O0, especially on ARM64. To get an idea of the codegen
quality, take a look at ./tests/snapshots with assembly and
SSA snapshots of the test fixtures. The optimized binaries will run on any modern
ARM64 processor, and on x86_64 processors not older than Intel Haswell and AMD Zen
(circa 2013, the optimizer uses FMA3 instructions).
badc emits position-independent code and the real native binaries (macOS Mach-O,
Linux ELF, or Windows PE32+), on any of five targets, from any host:
- macOS (
ARM64), - Linux (
ARM64,x86_64), - Windows ({
ARM64,x86_64}x{console,GUI,NT,driver}).
It supports also separate translation units (always translated to ELF) and has a small
linker (so no relaxations or LTO). badc tries hard not to get in the way with assumptions
on the runtime library, and --freestanding as available should you need that. EFI
is supported as well.
badc can also JIT-compile into the machine code in-process so no binary is written
to the disk. Finally, it recognizes being used as #! so that C source code becomes
a (fast) script.
There are various demo's under demos:
- Few small-ish ones (
threads.c,coro_pool.c,hello_server.c), maze.c- maze builder and solver,gui_hello- GUI demos for macOS, Linux and Windows,wdm_driver,nt_hello,nt_loader- examples of the Windows native (NT) executable, Windows driver,efi_hello- a UEFI binary,sqlite3- the most famous embedded database,miniz- compression, CRC32, integers, bit twiddling,kissfft- floating points, Fast Fourier Transform,bzip2- compression, integers, bit twiddling,stb- header-only C library with lots of incredible features (math noise generation, sound, JPEG, PNG, BMP, PSD support to name a few). It really stresses all of the compiler.chibicc- a small C compilertinycc- a cool and small C toolchainTweetNaCl,Monocypher,BearSSL- cryptographyLua- the embeddable scripting languagequickjs- JavaScript interpreterTCL- Tool command languagePython- Python 3.14
Besides these, there are some fun test fixtures implementing Horner scheme, RK4, 8-Queens and more.
Finally, there's an option to run the IR (intermediate representation) with tracking pointer access and bounds to catch memory issues.
badcused to be bad when the projects just started out and the name stuck.There is some compiler-building jargon in this document here and there. You can safely skip it, and jump to the usage section right away.
For the true compiler heads there is the
--dump-ssaoption which prints each function's SSA IR plus the register allocator's per-value placement to stderr before lowering.
It started out as a Rust port of Robert Swierczek's teeny-tiny C compiler in 4 functions
c4 and grew from there. There then has been enough divergence
from the original to call the dialect c5. Due to that facetious naming the source tree
spells that out as the c5 module and C5Error type.
The venerable 4-function c4.c compiler ships as a test fixture and self-hosts:
badc -O -o c4 tests/fixtures/c/c4.c # compile c4 to a native binary
./c4 hello.c # which then runs hello.cAnd you can really crank the fun up with something like
badc -O --jit tests/fixtures/c/c4.c tests/fixtures/c/c4.c tests/fixtures/c/c4.c tests/fixtures/c/c4.cto run it quadro-nested :)
During the development, the badc compiler was "spiraling" out from the stack
IR execution and evolving frontend to the 3-operand IR and SSA IR and the optimizing
backend.
It lowers through an SSA intermediate representation and a graph-coloring register allocator, but doesn't go for the exquisite optimization passes a titan toolchain like clang, gcc or msvc run. All told, to stay slim, it's unlikely to surpass the ability of multi-gigabyte compiler suites to squeeze the last drop of perf from the machine, and that's fine.
You can download one of the binary release packages matching your
hardware and the OS. There is one small binary inside, and that's
all you should need to start using badc.
If you have Rust installed, clone the repo, and install it with
cargo install --path . --features fullor just
cargo install badc --features fullif you're not interested in building from the source code.
The --features full is required for the command-line compiler: the
crate's default feature set is the host-architecture JIT library alone
(so cargo add badc pulls in a slim dependency), and the badc binary
additionally needs the native object writers and the cross-translation-unit
linker, which the full feature enables.
Now badc is available on the PATH.
A first run:
badc --jit hello.c # runs native code in-processHello 123or
badc -O hello.c # Produces native optimized binary
./hello # produced by the previous lineHello 123Here's a quick debugging session:
badc -g hello.c # Build with the debug informationinfo: wrote file hello for target macos-aarch64Now run under the debugger (lldb, gdb, rr), set breakpoints, check out the local variables:
lldb ./hello
(lldb) target create "./hello"
Current executable set to '/Users/krom/src/compilers/badc/hello' (arm64).
(lldb) b main
Breakpoint 1: where = hello`main + 16 at hello.c:5, address = 0x00000001000006fc
(lldb) l
note: No source available
(lldb) run
Process 19800 launched: '/Users/krom/src/compilers/badc/hello' (arm64)
Process 19800 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001000006fc hello`main at hello.c:5
2 #include <stdlib.h>
3
4 int main() {
-> 5 int a = 123;
6 printf("Hello %d\n", a);
7 return 0;
8 }
Target 0: (hello) stopped.
(lldb) n
Process 19800 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100000704 hello`main at hello.c:6
3
4 int main() {
5 int a = 123;
-> 6 printf("Hello %d\n", a);
7 return 0;
8 }
Target 0: (hello) stopped.
(lldb) v
(int) a = 123The first non-flag argument is the source file. By default badc
lowers it to a native binary at the obvious path next to the
source (hello.c -> hello on POSIX targets, hello.exe on
Windows targets); pass -o <path> to choose a different one.
The three execution modes:
| flag | what it does |
|---|---|
| (default) | Lower to a native Mach-O / ELF / PE32+ at -o <path> and exit. |
--jit |
Lower in-process, mmap the result, call main directly. |
--interp |
Run the SSA IR under a watchful VM (pointer tracking, traces). |
Flags (--target=<spec>, --optimize / -O, --dump-ssa,
--list-symbols, -H / --show-includes, plus the VM-only
--track-pointers / --trace) can appear anywhere before the
source. -D NAME[=VALUE], -U NAME, -I path, and -include FILE work the same way they do on gcc / clang. Source-driven
build flags ride on #pragmas -- see "Headers and bindings"
below.
A .c file may start with a shebang. With badc on PATH,
chmod +x script.c makes the file directly executable -- in
which case the shebang line picks the mode (#!/usr/bin/env badc --interp for the VM, the bare form for native compilation).
Five targets are supported, and you cross-compile from any host to any of them:
--target= |
format |
|---|---|
macos-aarch64 |
Mach-O |
linux-aarch64 |
ELF |
linux-x64 |
ELF |
windows-x64 |
PE32+ |
windows-arm64 |
PE32+ |
A single badc invocation can mix .c source files, .o
object files, and .a archives:
badc -c foo.c bar.c # emits foo.o + bar.o (native ELF64 ET_REL, target pinned)
badc -o app foo.o bar.o # links them into a final binary
badc --ar -o libfoo.a foo.c bar.c # bundles into a SysV ar(5) archive
badc -o app main.c -L. -l foo # link against libfoo.a, gcc-stylebadc ships its own linker -- there's no ld / lld /
link.exe dependency. Object files are standard ELF64 ET_REL
relocatables: a .text section of native machine code,
.data / .bss for static storage, .symtab / .strtab
for the name table, and .rela.text carrying the relocations
the linker applies once each unit's final position is known.
The target is pinned at -c time, and the objects are also
linkable by ld / lld. Archives are ar(5) with a SysV-style
symbol index. The full cargo feature gates the entire
pipeline; library consumers that don't need
multi-TU artifacts can opt out via
default-features = false, features = ["std"] to keep the
footprint slim.
Storage-class linkage follows C99 6.2.2: static at file
scope is internal, bare or extern declarations are external,
and extern T x; with no defining declaration becomes an
unresolved external that the linker tries to satisfy from the
remaining objects or archive members.
A summary of what the dialect parses + lowers, and where it
diverges from C99, lives in std-conformance.md. Short
version: c5 covers most of the language and few features of the later standards.
The doc enumerates rejected idioms, divergent behavior, and the c5-only extensions
(#pragma dylib / binding / export / entrypoint / subsystem).
The preprocessor pre-defines a small standard set, double-underscore wrapped in the gcc / clang / msvc convention so they don't collide with user identifiers:
__BADC_VERSION__ <crate version> // string literal from Cargo.toml, e.g. "0.0.9"
__BADC_TARGET__ "macos-aarch64" // canonical target id (string literal)
__aarch64__ / __arm64__ // AArch64 targets
__x86_64__ / __amd64__ // x86_64 targets
_WIN32 / _WIN64 // Windows targets only
__BADC_WINDOWS__ // Windows targets only
__APPLE__ // macOS target only
__linux__ // Linux targets onlyThe MSVC/MinGW mimicry surface (_MSC_VER / __MINGW32__ / __int64
/ __declspec / etc.) lives in headers/include/msvc_compat.h
and is opted into per translation unit with -include msvc_compat.h.
The header tells the compiler which dylib's/so's/dll's the target offers and which local names resolve to which exported symbols. A snippet:
#pragma dylib(libsystem, "/usr/lib/libSystem.B.dylib")
#pragma binding(libsystem::printf, "_printf")
int printf(char *fmt, ...);The codegen drives its IAT / .got / DT_NEEDED records from
these declarations. When the source calls printf, the parser
type-checks the call against the prototype; the codegen looks up
the binding to learn that the loader should resolve _printf from
libSystem.B.dylib. Switching target swaps the header and the
bindings change with it -- printf lands on bare printf from
libc.so.6 on Linux, printf from msvcrt.dll on Windows.
Validation runs at codegen entry: every intrinsic the program references must have a matching binding for the chosen target. Unused bindings cost nothing -- they describe the surface without forcing you to pull in everything they name.
badc uses #pragma's to lighten the command line. One can specify
dylib bindings, exports, alignment, the entry-point name, and the Windows
subsystem -- every knob lives next to the code it configures
so the source carries enough context to build with a bare
badc <file>.
#pragma once // single-inclusion guard for headers.
#pragma dylib(libc, "libc.so.6") // declare a dylib c5 can bind into.
#pragma binding(libc::sin, "sin") // map a portable name to its dylib symbol.
#pragma export(my_api) // promote a function to a shared-object export.
#pragma pack(N) / pop / push // override the default 8-byte struct alignment.
#pragma entrypoint(WinMain) // override the default `main` entry point.
#pragma subsystem(windows) // pick the PE subsystem (console | windows | native | efi_*).#pragma entrypoint(<name>) lets the source declare a
non-main entry without a build-driver flag; the compiler
resolves the name through the same symbol-table lookup it uses
for main. #pragma subsystem(<kind>) drives the
PE optional-header Subsystem byte. The accepted kinds are
console (default, IMAGE_SUBSYSTEM_WINDOWS_CUI = 3),
windows (IMAGE_SUBSYSTEM_WINDOWS_GUI = 2), native
(IMAGE_SUBSYSTEM_NATIVE = 1, with nt / driver as
aliases), and the EFI variants efi_application,
efi_boot_service_driver, efi_runtime_driver, and
efi_rom. With console / windows, entrypoint(WinMain)
plus subsystem(windows) is what a Win32 GUI app needs to
skip the loader's auto-attach to a console window. Non-PE
targets keep the default and ignore the directive, so the
same source builds for every OS.
Unknown directives (and #includes that don't resolve through
the search-path / embedded-header chain) emit a warning rather
than failing the build; pass -H / --show-includes to see
the gcc--H-shape resolution trace on stderr.
If something is not available, define it yourself for a
quick fix, open an issue or use runtime linking with dlopen / dlsym
or LoadLibrary/GetProcAddress:
int main() {
int *h, *fn;
h = dlopen(0, 2); // RTLD_NOW
fn = dlsym(h, "strlen");
return fn("hello, world!"); // exits 13
}dlopen(NULL, RTLD_NOW) returns the calling process's symbol
scope -- libc on POSIX, the loaded set on Windows.
For a flavour of what's reachable from each system:
- macOS --
dlsym(h, "objc_msgSend")gives you the Objective-C runtime entry point. The CoreFoundation / AppKit / Foundation surfaces are onedlopen("/System/Library/.../X.framework/X")away. - Linux --
clock_gettime,nanosleep,pipe2, the entirepthread_*family. Anything in/usr/lib's sonames if you spell the path. - Windows --
dlopenresolves toLoadLibraryA, sodlopen("user32.dll", 0)plusdlsym(h, "MessageBoxA")gives you a callable Win32 API entry point.
Same encoder + relocations as the AOT path. badc mmaps the result
executable, resolves libc through a runtime-built fake GOT, and
calls main directly via a transmuted function pointer. No
subprocess, no on-disk binary -- parse, lower, exec all happen
inside the badc process:
badc --jit tests/fixtures/c/c4.c hello.c # JIT'd c4 self-hosts hello.cFive hosts are supported:
| host | mapping |
|---|---|
| Linux/aarch64 | mmap RW -> mprotect RX, manual dc cvau / ic ivau |
| Linux/x86_64 | mmap RW -> mprotect RX, hardware-coherent I-cache (no-op) |
| macOS/aarch64 | mmap RWX + MAP_JIT, pthread_jit_write_protect_np toggle |
| Windows/x86_64 | VirtualAlloc RW -> VirtualProtect RX, FlushInstructionCache (no-op) |
| Windows/aarch64 | VirtualAlloc RW -> VirtualProtect RX, FlushInstructionCache |
libc is bound at JIT time: a writable "fake GOT" gets one entry
per resolved import, and the codegen's existing GOT relocations
are patched against this region. POSIX uses dlopen(NULL, RTLD_NOW) + dlsym
to find each symbol in the loaded process;
Windows uses LoadLibraryA per declared dylib (kernel32, msvcrt,
ws2_32, ...) + GetProcAddress. macOS uses Apple's MAP_JIT +
per-thread W^X toggle for the hardware-enforced W^X on Apple
Silicon.
For more, one can use objdump, readelf, etc.
The codegen always lowers through an SSA intermediate
representation and a graph-coloring register allocator. A
handful of cheap rewrites run unconditionally; --optimize
adds a set of SSA passes on top.
Always on: drop self-movs and fuse compare + branch into
cmp / b.cond (or cmp / jcc) without materializing a 0/1
boolean in between. The register allocator builds an
interference graph over phi-congruence classes and colors it
greedily, spilling to frame slots only under pressure.
examples/bench.rs runs a few pure-computation workloads
(fib32, quicksort-50k, matmul-50) through the VM and the
in-process JIT and reports per-iteration timings:
cargo run --release --example bench -- --iter 10--interp runs the program through the SSA interpreter
instead of compiling to native:
$ cargo run --quiet --features full -- --interp hello.c
Hello 123
exit(0)The VM keeps code, stack, and data in three distinct address ranges
and refuses to mix them. Function pointers carry a CODE_BASE
bias; loading or storing through one is rejected, and so is
calling through a fabricated integer (fp = 42; fp();) -- the
call site refuses an address it didn't originate.
--track-pointers opts in to allocation tracking. With it on,
free on an unknown or already-freed pointer errors, and any
access into a freed allocation (or past the end of a live one) is
reported with the offending allocation's id. --trace opts in to
a per-instruction trace on stdout (off by default -- it's noisy).
Native and JIT modes skip this safety net by design. Use
--interp if you want the watchful version, especially while
debugging memory-shape issues.
The library compiles under --no-default-features:
cargo build --no-default-features --libIn that mode the StdHost adapter (file IO, env vars, real
stdin/stdout) is gone. Consumers supply their own Host impl and
construct the VM with Vm::with_host(program, my_host). Everything
else -- lexer, parser, preprocessor, VM dispatch, pointer tracking,
native backends -- runs on extern crate alloc.
The CLI binary requires the std and full features (see the
install section above).
cargo test --features full--features full runs the full suite. A bare cargo test exercises
only the host-only JIT library (the default feature set), gating out
the native*, linker, and dwarf modules that emit on-disk images.
Tests are split by what they exercise. lexer, parser, and
codegen drive each phase directly. programs and intrinsics
load real C sources from tests/fixtures/c/ and check the exit
code under the SSA interpreter. types checks the
warning-not-error behaviour. pointer_tracking exercises the
opt-in safety net. native, native_elf, native_elf_x64,
native_pe_x64, and native_pe_arm64 compile each fixture
through the matching backend and exec it under the host kernel,
including an -O rerun that asserts the exit code is unchanged.
jit covers the in-process path the same way. linker exercises
the multi-TU object / archive path, dwarf the debug-info emit,
and deferred the lazy-symbol resolution.
A few fixtures under tests/fixtures/c/ are worth reading on their
own, each pinning a distinct hard feature:
c4.c-- the original c4 compiler; self-hosts (see above).fma_numeric_kernels.c-- Horner polynomial evaluation, a dense matrix-product inner loop, and a fourth-order Runge-Kutta step, all multiply-add heavy; checks that the-Ofused multiply-add contraction keeps single-rounding parity with the VM.fma_contraction.c-- thea*b+c/a*b-c/c-a*bcontraction shapes plus explicit C99fma/fmaf.aapcs64_variadic_host_abi.c,sysv_variadic_host_abi.c-- the per-target variadic calling conventions on the host ABI.setjmp_longjmp_roundtrip.c-- non-local control flow, including the CRT-free AArch64setjmp/longjmpintrinsic on Windows.struct_by_value_param.c,struct_by_value_return.c-- aggregate pass / return through the hidden out-pointer ABI.bitfield_storage_unit.c-- C99 6.7.2.1 bitfield packing across storage units.
Release builds add the JIT and native fixture-parity paths that debug builds skip:
cargo test --release --libCI runs the matrix on ubuntu-latest, ubuntu-24.04-arm,
macos-latest, windows-latest, and windows-11-arm. Every
runner additionally runs the demo smokes -- sqlite3, miniz,
kissfft, bzip2, tweetnacl, monocypher, bearssl, lua, stb,
chibicc, tinycc, gui_hello, nt_loader -- end-to-end (or
build-only for the GUI demos, which need a display). See
demos/ for what each exercises. The PE-via-
WINE lane is gated on BADC_RUN_WINE=1; a bare cargo test
on a developer machine skips it, and CI doesn't currently
set it (the native Windows runners cover the same surface
directly).
tools/core-walker.py walks the saved-rbp chain in a Linux ELF
core dump and reports each frame's saved return address as a
file offset into the original non-PIE x64 binary (load base
fixed at 0x400000). Useful for naming the crashing function
when a higher-level debugger path is blocked. Modes:
- default: walk the rbp chain, resolve each frame's saved return address.
--dump-around-rbp: print the 16 8-byte slots aroundrbp.--scan-stack: ignore the rbp chain, scan upward fromrspfor any 8-byte slot that looks like a code address, and resolve each. Useful when stack corruption broke the rbp chain -- the actual return addresses are usually still on the stack, just no longer reachable through the saved-rbp links.--list-segments: list every PT_LOAD in the core file with its vaddr range. Useful for understanding where the stack and the emulator's mappings ended up after a corruption.
This is a personal educational/research project, it has not been sponsored or suggested by anyone, i.e. it is a product of my own volition. That said, in no event I'll be responsible for how you use this project or what happens due to that. See LICENSE for the exact terms.