Overview

cgen is a compiler backend, inspired by the QBE project.

The overall goal of the project is:

Provide language front-ends with a target language for compilation to native code.
Implement the necessary ABI rules to ensure seamless interop with C code.
Provide a library that gives language developers control over how this process happens.
Provide a conservative set of sound optimizations for input programs.

Setup

To install the opam dependencies, run make deps
To build and install the project, run make (these can be done individually with make build and make install, respectively).
To uninstall the project, run make uninstall.
To clean the build artifacts, run make clean.
To run the unit tests, run make test.
For generating documentation, run make doc.

The generated documentation (in HTML format) should be located in ./src/_build/default/_doc/_html/index.html, which can be opened in your browser of choice.

Usage

The basic usage is cgen file.vir, where file.vir is a file containing the syntax for cgen's user-facing intermediate language: "Virtual". cgen will then transform the contents of this file, and the final output (via stdout) is an assembly program, which is intended to be fed into a common system assembler (such as GNU as).

See cgen --help for more information.

Library

cgen is also intended to be used directly as a library for language front-ends. The library API that is provided can be seen in the generated documentation.

Pipeline

There are three IRs used by the compiler:

Virtual: CFG-based, platform-agnostic intermetiate language. Similar to LLVM IR, but smaller in scale.
Virtual ABI: almost the same as Virtual, but with explicit mentioning of registers, stack locations, and with platform-specific instructions (such as va_arg and va_start) desugared, and with the target ABI implemented for function calling.
Pseudo: "pseudo"-assembly, CFG-based representation with concrete, target-specific instructions. This is generated at instruction selection time. Maintains the stack slot abstraction up until register allocation.

The compilation pipeline roughly follows this plan:

+---------------------------+
|    Virtual IR (input)     | Input can be parsed from textual representation, or constructed in-memory.
+-------------+-------------+
              |
              v
+---------------------------+
| Virtual IR (SSA, type     | Transform to semi-pruned SSA form, type check the module.
| checking)                 |
+-------------+-------------+
              |
              v
+---------------------------+
| Virtual IR (mid-end       | Control-flow simplifications, algebraic rewrites, constant propagation,
| optimizations)            | constant folding, GVN, GCM, LICM, RLE, store-to-load forwarding, etc.
+-------------+-------------+
              |
              v
+---------------------------+
| Virtual ABI IR (ABI       | Remove aggregate types, implement the platform-specific rules for parameter
| lowering)                 | passing and returning values from functions via registers/stack, etc.
+-------------+-------------+
              |
              v
+---------------------------+
| Virtual ABI IR (more      | CSE, RLE, store-to-load forwarding, etc. Intended for cleaning up
| optimizations)            | the code generated by the previous pass.
+-------------+-------------+
              |
              v
+---------------------------+
| Pseudo IR (instr.         | Implements a greedy, maximal-munch algorithm to match tree patterns over a
| selection)                | DAG of the function, producing sequences of native instructions.
+-------------+-------------+
              |
              v
+---------------------------+
| Pseudo IR (dead code      | Eliminates dead code generated by the previous pass.
| elimination)              |
+-------------+-------------+
              |
              v
+---------------------------+
| Pseudo IR (register       | Implements the Iterated Register Coalescing (IRC) algorithm, and
| allocation, stack layout) | assigns concrete stack locations to slots and spilled variables.
+-------------+-------------+
              |
              v
+---------------------------+
| Pseudo IR (peephole opts) | Performs target-specific peephole optimizations.
+-------------+-------------+
              |
              v
+---------------------------+
| Assembly (final output)   | Platform-specific assembly code.
+---------------------------+

A quick demo: print the nth Fibonacci number

We start with our IR file fibdemo.vir:

module fibdemo

const data $fmt = {
  "fib(%d) = %d\n",        ; our format string
  0_b                      ; zero-terminator
}

function w $fib(w %n) {
@start:
  %c = eq.w %n, 0_w        ; fib(0) = 0
  br %c, @retn, @chk1
@retn:
  ret %n
@chk1:
  %c = eq.w %n, 1_w        ; fib(1) = 1
  br %c, @retn, @body
@body:
  %m1 = sub.w %n, 1_w      ; fib(n) = fib(n-1) + fib(n-2)
  %fm1 = call.w $fib(%m1)
  %m2 = sub.w %n, 2_w
  %fm2 = call.w $fib(%m2)
  %r = add.w %fm1, %fm2
  ret %r
}

export function w $main(w %argc, l %argv) {
@start:
  %p = add.l %argv, 8_l    ; n = atoi(argv[1])
  %arg = ld.l %p
  %n = call.w $atoi(%arg)
  %f = call.w $fib(%n)
  %r = call.w $printf($fmt, ..., %n, %f)
  ret 0_w
}

We then run:

$ cgen fibdemo.vir -o fibdemo.S

This will be our completed assembly program. Next, we produce the executable using the system's C compiler (assumed to be GCC or Clang):

$ cc fibdemo.S -o fibdemo

Now, we can run the program. Let's print out the first 20 Fibonnaci numbers:

$ for i in `seq 1 20`; do ./fibdemo $i; done
fib(1) = 1
fib(2) = 1
fib(3) = 2
fib(4) = 3
fib(5) = 5
fib(6) = 8
fib(7) = 13
fib(8) = 21
fib(9) = 34
fib(10) = 55
fib(11) = 89
fib(12) = 144
fib(13) = 233
fib(14) = 377
fib(15) = 610
fib(16) = 987
fib(17) = 1597
fib(18) = 2584
fib(19) = 4181
fib(20) = 6765

Second example: print the first `n` prime numbers

Our file prime.vir:

module prime

const data $fmt = {
  "prime(%d) = %d\n",
  0x0_b
}

function w $is_prime(w %n) {
@start:
  %c = le.w %n, 1_w
  br %c, @no, @twothree
@no:
  ret 0_w
@twothree:
  %c = le.w %n, 3_w
  br %c, @yes, @divtwo
@yes:
  ret 1_w
@divtwo:
  %r = urem.w %n, 2_w
  %c = eq.w %r, 0_w
  br %c, @no, @divthree
@divthree:
  %r = urem.w %n, 3_w
  %c = eq.w %r, 0_w
  br %c, @no, @main
@main:
  %i = copy.w 5_w
  jmp @loop
@loop:
  %s = mul.w %i, %i
  %c = le.w %s, %n
  br %c, @divi, @yes
@divi:
  %r = urem.w %n, %i
  %c = eq.w %r, 0_w
  br %c, @no, @divi2
@divi2:
  %i2 = add.w %i, 2_w
  %r = urem.w %n, %i2
  %c = eq.w %r, 0_w
  br %c, @no, @inc
@inc:
  %i = add.w %i, 6_w
  jmp @loop
}

export function w $main(w %argc, l %argv) {
@start:
  %p = add.l %argv, 0x8_l
  %p = ld.l %p
  %n = call.w $atoi(%p)
  %m = copy.w 1_w
  %i = copy.w 1_w
  jmp @loop1
@loop1:
  %c = sle.w %n, 0_w
  br %c, @done, @loop2
@loop2:
  %b = call.w $is_prime(%i)
  %c = eq.w %b, 0_w
  br %c, @no, @yes
@yes:
  %k = call.w $printf($fmt, ..., %m, %i)
  %n = sub.w %n, 1_w
  %i = add.w %i, 1_w
  %m = add.w %m, 1_w
  jmp @loop1
@no:
  %i = add.w %i, 1_w
  jmp @loop2
@done:
  ret 0_w
}

Compiling:

$ cgen prime.vir -o prime.S
$ cc prime.S -o prime

Let's print the first 50 prime numbers, and we'll get a wall clock time:

$ time ./prime 50
prime(1) = 2
prime(2) = 3
prime(3) = 5
prime(4) = 7
prime(5) = 11
prime(6) = 13
prime(7) = 17
prime(8) = 19
prime(9) = 23
prime(10) = 29
prime(11) = 31
prime(12) = 37
prime(13) = 41
prime(14) = 43
prime(15) = 47
prime(16) = 53
prime(17) = 59
prime(18) = 61
prime(19) = 67
prime(20) = 71
prime(21) = 73
prime(22) = 79
prime(23) = 83
prime(24) = 89
prime(25) = 97
prime(26) = 101
prime(27) = 103
prime(28) = 107
prime(29) = 109
prime(30) = 113
prime(31) = 127
prime(32) = 131
prime(33) = 137
prime(34) = 139
prime(35) = 149
prime(36) = 151
prime(37) = 157
prime(38) = 163
prime(39) = 167
prime(40) = 173
prime(41) = 179
prime(42) = 181
prime(43) = 191
prime(44) = 193
prime(45) = 197
prime(46) = 199
prime(47) = 211
prime(48) = 223
prime(49) = 227
prime(50) = 229
./prime 50  0.00s user 0.00s system 77% cpu 0.002 total

Name		Name	Last commit message	Last commit date
Latest commit History 1,359 Commits
.github/workflows		.github/workflows
src		src
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Setup

Usage

Library

Pipeline

A quick demo: print the nth Fibonacci number

Second example: print the first `n` prime numbers

About

Uh oh!

Releases

Packages

Languages

License

bmourad01/cgen

Folders and files

Latest commit

History

Repository files navigation

Overview

Setup

Usage

Library

Pipeline

A quick demo: print the nth Fibonacci number

Second example: print the first n prime numbers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Second example: print the first `n` prime numbers

Packages