Puny C Compiler

Limited expressiveness unlimited portability

Very small cross compiler for a subset of C.

Features

Supported target and host architectures: OpenRISC, WebAssembly, RISC-V RV32IM, ARMv6-M (Thumb-2) and x86-32.
Valid source code for Puny C is also valid C99 and can be written in a way that gcc or clang compile it without any warning.
Code generation is designed to be easily portable to other target architectures.
Fast compilation, small code size.

Compiler Size

PunyCC can compile itself. There is a separate compiler executable for every host and target combination. Host is the architecture where the compiler runs and target is the ISA of the compiled binary. Each compiler is smaller than 10 KByte:

target \ host	wasm	x86	armv6m	rv32	or1k
wasm	6049	7259	7214	7476	9244
x86	6145	7442	7518	7624	9560
armv6m	6219	7454	7586	7908	9736
rv32	6448	8041	8094	8152	9912
or1k	6379	7791	7994	8028	9784

Language Restrictions

No linker.
No preprocessor.
No standard library.
No typedef.
No type checking. Variable types are always unsigned int, except if indexed with [] then the type is char *.
Any combination of unsigned, long int, char, void and * is accepted as valid type.
Type casts are allowed, but ignored.
Constants: only decimal, character and string without backslash escape
Statements: if, while, return.
Variable declaration: C99-style statements.
Operators: no unary, ternary, extended assignment.
Operator precedence: simplified, use parentheses instead.

level	operator	description
1	[] ()	array and function call
2	+ - << >> & ^ \|	binary operation
3	< <= > >= == !=	comparison
4	=	assignment

Inspired by

cc500 - a tiny self-hosting C compiler by Edmund Grimley Evans
Obfuscated Tiny C Compiler - very small self compiling C compiler by Fabrice Bellard
Tiny C Compiler - a small but hyper fast C compiler.
Compiler Construction - brief but comprehensive book by Niklaus Wirth.

Usage

To build punycc for all target architectures use

./make.sh compile_native

The executables are named build/punycc_ARCH.native. They read C source code from stdin and write an executable to stdout:

./punycc_x86.native < foo.c > foo.x86

To execute foo it must be made executable:

chmod +x foo.x86
./foo.x86

A cross compiled executable can be emulated with qemu:

./punycc_rv32.clang < foo.c > foo.rv32
chmod +x foo.rv32
qemu-riscv32 foo.rv32

There is no standard library or standard include files. Everything must be in the single source code file. The host_ARCH.c files have some rudimentary implementations of standard functions that are needed for the compiler. Use them by concatenating files:

cat host_rv32.c hello.c | ./punycc_rv32.x86 > hello.rv32

Compile all architectures against all others and check if they produce the same on different architectures with:

./make.sh test_full

Show the compiler sizes of all combinations:

./make.sh stats

Low-Level Functions

There is no inline assembler for functions that directly access the operating system (e.g. file I/O). But code can be written in pure binary:

void exit(int) _Pragma("PunyC emit \x58\x5b\x31\xc0\x40\xcd\x80");
/*  58      pop eax
    5b      pop ebx
    31 c0   xor eax, eax
    40      inc eax
    cd 80   int 128 */

Other compilers ignore the _Pragma statement, which turns the line into a forward declaration where libc can be linked against.

Implementation Details

Each compiler consists of three parts:

Host-specific standard functions for i/o in host_ARCH.c
Target-specific code generation in emit_ARCH.C
Architecture independent compiler parts (scanner, parser and symbol table)

Concatenate the three files and compile it, for example

cat host_x86.c emit_x86.c punycc.c | ./punycc_x86.clang > punycc_x86.x86

Cross compilers can be built by using a different ARCH for host_ and emit_:

cat host_x86.c emit_armv6m._c punycc.c | ./punycc_x86.clang > punycc_armv6m.x86

Memory Management

There is only one buffer buf. The code grows from 0 upwards, the symbol table grows from the top downwards. The token buffer for strings and identifiers is dynamically allocated in the space between them:

0   code_pos     code_pos+256         sym_head-256      sym_head   buf_size
                   token_buf      token_buf+token_size
+------+---------------+-------------------+---------------+--------------+
| code |   256 bytes   | identifier/string |   256 bytes   | symbol table |
+------+---------------+-------------------+---------------+--------------+

Symbol Table

The symbol table starts at sym_head at ends at the end of the buffer. It is the concatination of symbol entries with the following format:

offset	size	description
0	4 bytes	address (little endian)
4	1 byte	symbol type
5	1 byte	n: length of name
6	n bytes	name

Code Generation

The functions prefixed by emit_ are used to generate the machine code. The template in emit_template.c documents all functions and can be used as starting point for a new architecture backend. The steps to create the OpenRISC backend are documented in codegen/or1k/steps.md and may be helpful, too.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
cc500		cc500
codegen		codegen
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
emit_armv6m.c		emit_armv6m.c
emit_or1k.c		emit_or1k.c
emit_rv32.c		emit_rv32.c
emit_steps.c		emit_steps.c
emit_template.c		emit_template.c
emit_wasm.c		emit_wasm.c
emit_x86.c		emit_x86.c
host_armv6m.c		host_armv6m.c
host_native.c		host_native.c
host_or1k.c		host_or1k.c
host_rv32.c		host_rv32.c
host_wasm.c		host_wasm.c
host_x86.c		host_x86.c
make.sh		make.sh
punycc.c		punycc.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Puny C Compiler

Compiler Size

Language Restrictions

Usage

Low-Level Functions

Implementation Details

Memory Management

Symbol Table

Code Generation

About

Uh oh!

Releases

Packages

Languages

License

bobbl/punycc

Folders and files

Latest commit

History

Repository files navigation

Puny C Compiler

Compiler Size

Language Restrictions

Usage

Low-Level Functions

Implementation Details

Memory Management

Symbol Table

Code Generation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages