Skip to content

avwohl/uc386

Repository files navigation

uc386

C23 compiler targeting the Intel 386 (i386 / x86-32) processor under a DOS extender — specifically the flat 32-bit Watcom / DOS/4GW-era C that early-to-mid-1990s PC games were written in.

Status: working — in testing ahead of a general release. Measured against two reference suites under our DOS emulator (compile → assemble → run → diff): 215 / 220 c-testsuite and, with the --kr pre-pass (see below), 1397 / 1514 gcc-c-torture executable tests passing. The frontend defaults to strict C23; the gcc-c-torture corpus is pre-ANSI and GNU-heavy, so it is run with --kr enabled. The remaining ~117 are GCC extensions that need a static-chain ABI / closure conversion (nested functions, __label__), a fragmented backlog of unimplemented codegen features (C99 VLA / variably-modified types, vectors, designated initializers, _Complex, offsetof designators, …), and a few large-frame / file-I/O edges — tracked, not claimed as passing. The standard-C codegen-corner miscompiles have been driven out (see STANDARD_C_BACKLOG.md).

K&R / implicit-int compatibility (--kr). Pre-ANSI sources — implicit-int returns (main() { … }) and K&R old-style parameter lists (f(a, b) int a; char *b; { … }) — are not valid C23 and the strict grammar rejects them — as is the GNU computed-goto / labels-as-values extension (&&label, goto *expr). Passing --kr enables a source-level pre-pass (in uc_core) that rewrites these shapes into equivalent standard C before parsing (computed goto lowers to a switch dispatch). It is off by default and only engages on files that fail the strict parse, so modern code is parsed exactly once and pays zero cost. Use it for legacy/pre-ANSI or GNU-C codebases; the conformance runners enable it for the K&R-heavy torture corpus.

The frontend (parsing, preprocessing, AST-level optimization) lives in uc_core; this repo owns the driver, the x86-32 NASM emitter, and the DOS runtime bindings.

Highlights — beyond the reference suites, uc386 compiles real third-party C programs into runnable DOS executables:

  • Real .exe output. Produces self-contained, PMODE/W-bound DOS .exe files (not just flat binaries), boot-tested under DOSBox in CI: correct errorlevels, command-line argument parsing, and printf/file I/O through genuine DOS handles.
  • DOOM (id Software's 1993 shooter) compiles and boots end-to-end, running through engine startup until it exits cleanly on the expected "WAD file not found".
  • MicroPython (a small Python interpreter) compiles into a working DOS Python REPL — expressions, functions, classes, list comprehensions, exceptions, and the common builtins. Packaged separately as freedos_micro_python. It is our toughest end-to-end test of the compiler.
  • awk — Kernighan's "one true awk" runs arithmetic, regexes, aggregation, and string functions.
  • GNU utilities — 16 in-tree coreutils-style programs (cat, wc, true, …) build and pass parametrized regression tests.

See addons/STATUS.md for the full per-addon report and docs/path-a-mz-le.md for the .exe build path.

Size — measured, not asserted

The "tiny output" claim, checked against the period reference compiler instead of asserted. Every column below was reproduced on one macOS/arm64 host by python -m addons.harness.compare (Open Watcom V2 has no native macOS build, so its DOS-hosted wcc386/wlink run under DOSBox-X via addons/harness/ watcom_dosbox.py; DJGPP is the gcc-12.2 osx cross under Rosetta). Bytes of the on-disk executable; full table in addons/results.md:

program uc386 .bin uc386 .exe Watcom DJGPP
true 18 16,907 5,420 147,914
echo 148 16,915 11,286 150,212
factor 1,858 16,989 20,538 179,614
wc 1,529 16,928 20,158 179,092

Reading this honestly:

  • .bin is not a DOS program. It has no MZ header and runs only under uc386.dos_emu/a custom loader. It is the right metric for codegen+DCE tightness (and there uc386 is in a class of its own — tens of bytes), but it is not what you ship.
  • .exe is what you ship, and it carries a ~17 KB PMODE/W extender floor. Against that real-DOS artifact, Open Watcom is ~2–3× smaller on tiny programs (its DOS/4GW clib + mature linker beat our extender floor); the two converge as real code grows. uc386 beats DJGPP ~9× and host gcc ~2×.
  • So: uc386's code generation is extremely compact; its current DOS packaging (PMODE/W) is not yet competitive with Watcom's on small binaries. Both statements are true and the table shows which is which — no single "390× smaller" headline.

Goal

Compile representative public-source DOS games unmodified:

  • Descent (Parallax, 1995 — Watcom)
  • Duke Nukem 3D / Build engine (3D Realms, 1996 — Watcom)
  • Rise of the Triad (Apogee, 1994 — Watcom)
  • Heretic / Hexen (Raven, 1994–95 — Watcom)

These all share one compiler (Watcom C/C++) and one memory model (flat 32-bit under DOS/4GW). That's the target.

Non-goals: 16-bit real-mode with near/far/huge memory models (Wolf3D-era code). uc386 will parse the 16-bit keywords so that shared period headers don't choke, but won't honor their semantics — all pointers are 32-bit flat.

Design

The uc80/uc386 family shares a single C23 frontend (uc_core, itself uplox-driven). This project contributes only:

  • main.py — driver (CLI, I/O, embedding, post-processing)
  • codegen.py — x86-32 NASM code generator
  • peephole.py — NASM-text peephole optimizer
  • asm_dce.py — assembly-level dead-code elimination from _start / _main
  • libc_split.py — selective inclusion of lib/i386_dos_libc.asm symbols
  • runtime.py — MS-DOS runtime library bindings (INT 21h wrappers, stubs)
  • dos_emu.py — i386 emulator harness for testing flat-binary output
  • dos_emu_netsim.py — simulated network for the INT 0x83 packet-driver shim

Every front-end improvement (new C23 feature, AST optimization, DOS-era syntax tolerance) lands in uc_core and benefits both targets automatically.

Install

Note: not yet ready on PyPI — install from the GitHub repository for now.

From PyPI:

pip install uc386

That gets you the uc386 driver, the bundled i386_dos_libc.asm, and the lib/include/ headers. To assemble + run the output you also need nasm (system package) and, for the dos_emu test harness, pip install unicorn.

Source checkout for development:

sudo apt-get install -y python3 python3-venv nasm    # Debian/Ubuntu
python3 -m venv .venv && . .venv/bin/activate
pip install pytest unicorn "uc_core @ git+https://github.com/avwohl/uc_core@main" -e .
pytest tests/

macOS (Homebrew) and Fedora/RHEL (dnf) instructions, plus the optional toolchains for addon builds (bison/flex) and the DJGPP / OpenWatcom comparison columns, are documented in docs/INSTALL.md.

Related Projects

  • cpmdroid - Z80/CP/M emulator for Android with RomWBW HBIOS compatibility and VT100 terminal
  • cpmemu - CP/M 2.2 emulator with Z80/8080 CPU emulation and BDOS/BIOS translation to Unix filesystem
  • dosemu - MS-DOS emulator for Linux: dosbox-staging CPU + cpmemu-style syscall translation (intended test host for uc386)
  • qxDOS - DOS emulator for iPad and Mac — DOSBox-based with SwiftUI interface
  • uc80 - C23 compiler targeting Z80 processor and CP/M; sibling backend sharing the uc_core frontend
  • uc_core - Shared C23 frontend and AST optimizer used by uc80 and uc386
  • um80_and_friends - Microsoft MACRO-80 compatible toolchain for Linux: assembler, linker, librarian, disassembler (the Z80 analogue of what uc386 needs for i386)
  • upeepz80 - Z80 peephole optimizer (template for an eventual upeep386)
  • uplox - Parser/lexer-table generator that produces uc_core's C23 frontend (from examples/c23.uplox)

License

GPL-3.0-or-later.

About

C23 compiler targeting Intel 386 (x86-32) and MS-DOS. Shares frontend with uc80 via uc_core.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors