A complete compiler pipeline for Kira, a minimal imperative programming language, built entirely in Go with zero external dependencies.
Kira compiles to C, which is then compiled to a native binary using gcc. The compiler implements all five phases of classical compilation: lexical analysis, parsing, semantic analysis, intermediate code generation, and target code generation.
flowchart TB
src["source.kira"] --> lexer
subgraph compiler ["Kira Compiler"]
lexer["Phase 1: Lexer\n(Scanner)"]
parser["Phase 2: Parser\n(Syntax Analysis)"]
semantic["Phase 3: Semantic\nAnalyzer"]
irgen["Phase 4: IR Generator\n(Three-Address Code)"]
codegen["Phase 5: Code\nGenerator"]
lexer -- "Token Stream" --> parser
parser -- "AST" --> semantic
semantic -- "Validated AST" --> irgen
irgen -- "TAC Instructions" --> codegen
end
codegen --> cfile["output.c"]
cfile --> gcc["gcc"]
gcc --> bin["Native Binary"]
errs["errors.lst\n(Listing File)"]
lexer -. "errors" .-> errs
parser -. "errors" .-> errs
semantic -. "errors" .-> errs
style compiler fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
style lexer fill:#0f3460,stroke:#533483,color:#fff
style parser fill:#0f3460,stroke:#533483,color:#fff
style semantic fill:#0f3460,stroke:#533483,color:#fff
style irgen fill:#0f3460,stroke:#533483,color:#fff
style codegen fill:#0f3460,stroke:#533483,color:#fff
style src fill:#533483,stroke:#533483,color:#fff
style cfile fill:#1a8a42,stroke:#1a8a42,color:#fff
style gcc fill:#1a8a42,stroke:#1a8a42,color:#fff
style bin fill:#e94560,stroke:#e94560,color:#fff
style errs fill:#c4a000,stroke:#c4a000,color:#000
- Quick Start
- The Kira Language
- Compiler Phases
- Error Handling & Listing File
- AST Visualization (Novelty)
- CLI Usage
- Project Structure
- Examples
# Build the compiler
go build -o kira .
# Compile a Kira program
./kira examples/hello.kira
# Build and run the generated C code
gcc -o hello examples/hello.c && ./hello
# Output: Hello, World!That's it — three commands from source to running binary.
Kira is a statically typed, imperative language designed to be simple enough to compile in a weekend, yet expressive enough to write real algorithms.
| Type | Description | Example |
|---|---|---|
int |
64-bit signed integer | let x : int = 42 |
int[N] |
Fixed-size integer array | let arr : int[10] |
string |
String literals (print only) | "hello" |
| Category | Operators | Precedence (high to low) |
|---|---|---|
| Unary | - (negate), not |
Highest |
| Multiply | *, /, % |
|
| Add | +, - |
|
| Compare | <, >, <=, >= |
|
| Equality | ==, != |
|
| Logical | and, or |
Lowest |
// Functions
fn add(a : int, b : int) : int {
return a + b
}
// Entry point (required)
fn main() {
// Variables
let x : int = 10
let name : int = 0
// I/O
print "Enter a number: "
read name
print "You entered: ", name
// Control flow
if x > 5 {
print "big"
} else {
print "small"
}
// Loops
let i : int = 0
while i < 10 {
print i
i = i + 1
}
// Arrays
let arr : int[5]
arr[0] = 42
print arr[0]
// Function calls
let sum : int = add(3, 4)
print "Sum: ", sum
}
let if else while fn return print read
int and or not true false
// This is a comment (everything after // until end of line)
Strings support escape sequences: \", \\, \n, \t.
print "Hello,\tWorld!\n"
File: lexer/lexer.go, lexer/token.go
The lexer (scanner) reads the raw source code character by character and breaks it into a stream of tokens — the smallest meaningful units of the language.
flowchart LR
subgraph Source
src["let x : int = 42"]
end
subgraph Tokens
direction LR
t1["LET\n'let'"]
t2["IDENT\n'x'"]
t3["COLON\n':'"]
t4["INT\n'int'"]
t5["ASSIGN\n'='"]
t6["INT_LIT\n'42'"]
end
src --> t1 --> t2 --> t3 --> t4 --> t5 --> t6
style t1 fill:#0f3460,color:#fff
style t2 fill:#533483,color:#fff
style t3 fill:#16213e,color:#fff
style t4 fill:#0f3460,color:#fff
style t5 fill:#16213e,color:#fff
style t6 fill:#1a8a42,color:#fff
What it handles:
- Identifiers & Keywords — scans letter sequences, checks against the keyword table
- Integer literals — sequences of digits
- String literals —
"..."with escape sequence support (\n,\t,\\,\") - Operators — single-char (
+,-,*) and two-char (==,!=,<=,>=) - Comments —
//to end of line, stripped from token stream - Whitespace — consumed and discarded, but line/column tracked for error reporting
- Error recovery — invalid characters are flagged but scanning continues
Lexer state machine:
stateDiagram-v2
[*] --> Start
Start --> Identifier : letter/underscore
Start --> Number : digit
Start --> String : double quote
Start --> Operator : + - * / = < > !
Start --> Comment : //
Start --> Start : whitespace (skip)
Identifier --> Identifier : letter/digit
Identifier --> KeywordCheck : non-alphanumeric
KeywordCheck --> [*] : emit KEYWORD or IDENT
Number --> Number : digit
Number --> [*] : non-digit → emit INT_LIT
String --> String : any char
String --> Escape : backslash
Escape --> String : n, t, \\, \"
String --> [*] : closing quote → emit STRING_LIT
Operator --> [*] : single-char op → emit token
Operator --> [*] : two-char op (==, !=, <=, >=)
Comment --> Comment : any char
Comment --> Start : newline
Use --tokens to see the token stream:
$ ./kira --tokens examples/hello.kira
Token Stream:
FN "fn" (line 1, col 1)
IDENT "main" (line 1, col 4)
LPAREN "(" (line 1, col 8)
RPAREN ")" (line 1, col 9)
LBRACE "{" (line 1, col 11)
PRINT "print" (line 2, col 5)
STRING_LIT "Hello, World!" (line 2, col 11)
RBRACE "}" (line 3, col 1)
EOF "" (line 4, col 1)
File: parser/parser.go, parser/ast.go
The parser takes the flat token stream and builds a hierarchical Abstract Syntax Tree (AST) that represents the program's structure. It uses recursive descent parsing — each grammar rule becomes a function.
Grammar (simplified):
program → func_decl*
func_decl → "fn" IDENT "(" params? ")" (":" type)? block
block → "{" statement* "}"
statement → let_stmt | assign_stmt | if_stmt | while_stmt
| print_stmt | read_stmt | return_stmt | expr_stmt
expression → logic_or
logic_or → logic_and ("or" logic_and)*
logic_and → equality ("and" equality)*
equality → comparison (("==" | "!=") comparison)*
comparison → addition (("<" | ">" | "<=" | ">=") addition)*
addition → multiplication (("+" | "-") multiplication)*
multiplication → unary (("*" | "/" | "%") unary)*
unary → ("-" | "not") unary | primary
primary → INT_LIT | STRING_LIT | IDENT | call | index | "(" expr ")"
Expression precedence is handled by the structure of the recursive calls — deeper calls bind tighter:
flowchart TB
or["or (lowest)"] --> and["and"]
and --> eq["== !="]
eq --> cmp["< > <= >="]
cmp --> add["+ -"]
add --> mul["* / %"]
mul --> unary["- not (highest)"]
unary --> primary["INT_LIT | IDENT | CALL\nINDEX | ( expr )"]
style or fill:#e94560,color:#fff
style and fill:#c23666,color:#fff
style eq fill:#a0306a,color:#fff
style cmp fill:#7e2a6e,color:#fff
style add fill:#5c2472,color:#fff
style mul fill:#3a1e76,color:#fff
style unary fill:#18187a,color:#fff
style primary fill:#0f3460,color:#fff
How a + b * c is parsed (the tree structure enforces * binds tighter than +):
flowchart TB
plus["+"] --> a["a"]
plus --> mul["*"]
mul --> b["b"]
mul --> c["c"]
style plus fill:#5c2472,color:#fff
style mul fill:#3a1e76,color:#fff
style a fill:#533483,color:#fff
style b fill:#533483,color:#fff
style c fill:#533483,color:#fff
Error recovery: On a parse error, the parser records the error and skips tokens until it finds a safe synchronization point (next statement keyword or }), then continues parsing. This lets it report multiple errors in a single pass.
Use --ast to see the tree (with colored visualization — see AST Visualization):
Program
├── fn main() : void [line 1]
│ └── Block
│ └── print [line 2]
│ └── "Hello, World!"
File: semantic/analyzer.go, semantic/symtable.go
The semantic analyzer walks the AST and checks that the program makes sense beyond just syntax. It builds and uses a symbol table to track declarations.
What it checks:
| Check | Example Error |
|---|---|
| Undeclared variables | x = 5 without let x : int |
| Duplicate declarations | Two let x : int in the same scope |
| Undeclared functions | Calling foo() when no fn foo exists |
| Argument count mismatch | fn add(a:int, b:int) called as add(1) |
| Array misuse | Indexing a non-array, or assigning to whole array |
| Return type violations | Returning a value from a void function |
Missing main() |
Every program must have a fn main() |
Symbol Table — scoped chain with lookup walking outward:
flowchart TB
subgraph global ["Global Scope"]
gcd["gcd\n(func, 2 params)"]
main_fn["main\n(func, 0 params)"]
end
subgraph gcd_scope ["gcd's Scope"]
a["a : int\n(param)"]
b["b : int\n(param)"]
temp["temp : int\n(var)"]
end
subgraph main_scope ["main's Scope"]
x["x : int\n(var)"]
y["y : int\n(var)"]
result["result : int\n(var)"]
end
gcd_scope -- "parent" --> global
main_scope -- "parent" --> global
style global fill:#1a1a2e,stroke:#533483,color:#e0e0e0
style gcd_scope fill:#0f3460,stroke:#533483,color:#fff
style main_scope fill:#0f3460,stroke:#533483,color:#fff
style gcd fill:#533483,color:#fff
style main_fn fill:#533483,color:#fff
style a fill:#1a8a42,color:#fff
style b fill:#1a8a42,color:#fff
style temp fill:#1a8a42,color:#fff
style x fill:#1a8a42,color:#fff
style y fill:#1a8a42,color:#fff
style result fill:#1a8a42,color:#fff
Variable resolution: When the analyzer encounters a name, it checks the current scope first. If not found, it walks up through parent scopes until it reaches the global scope. If still not found, it reports an "undeclared variable" error.
File: irgen/irgen.go, irgen/tac.go
The IR generator walks the validated AST and produces Three-Address Code (TAC) — a linear, platform-independent intermediate representation where each instruction has at most three operands.
// Source
let result : int = a + b * c
// Three-Address Code
_t0 = b MUL c
_t1 = a ADD _t0
result = _t1
TAC instruction categories:
| Category | Instructions |
|---|---|
| Arithmetic | ADD, SUB, MUL, DIV, MOD, NEG |
| Comparison | EQ, NE, LT, GT, LE, GE |
| Logical | AND, OR, NOT |
| Control flow | LABEL, GOTO, IF_FALSE_GOTO |
| Functions | FUNC_BEGIN, FUNC_END, PARAM, CALL, RETURN |
| I/O | PRINT_INT, PRINT_STR, PRINT_NEWLINE, READ_INT |
| Arrays | ARRAY_DECL, ARRAY_STORE, ARRAY_LOAD |
| Assignment | ASSIGN |
Control flow — how while is translated:
flowchart TB
subgraph Source ["Kira Source"]
src["while x > 0 {\n x = x - 1\n}"]
end
subgraph TAC ["Three-Address Code"]
l0["L0: (loop start)"]
cmp["_t0 = x GT 0"]
branch["IF_FALSE _t0 GOTO L1"]
sub["_t1 = x SUB 1"]
assign["x = _t1"]
jmp["GOTO L0"]
l1["L1: (loop end)"]
l0 --> cmp --> branch --> sub --> assign --> jmp
jmp -.-> l0
branch -. "false" .-> l1
end
Source --> TAC
style l0 fill:#c4a000,color:#000
style l1 fill:#c4a000,color:#000
style branch fill:#e94560,color:#fff
style jmp fill:#e94560,color:#fff
style cmp fill:#0f3460,color:#fff
style sub fill:#0f3460,color:#fff
style assign fill:#0f3460,color:#fff
How if/else is translated:
flowchart TB
subgraph TAC ["Three-Address Code"]
cond["_t0 = (condition)"]
branch["IF_FALSE _t0 GOTO L_else"]
then_body["... then block ..."]
jmp["GOTO L_end"]
else_label["L_else:"]
else_body["... else block ..."]
end_label["L_end:"]
cond --> branch
branch -- "true" --> then_body --> jmp --> end_label
branch -. "false" .-> else_label --> else_body --> end_label
end
style branch fill:#e94560,color:#fff
style jmp fill:#e94560,color:#fff
style else_label fill:#c4a000,color:#000
style end_label fill:#c4a000,color:#000
style then_body fill:#1a8a42,color:#fff
style else_body fill:#0f3460,color:#fff
style cond fill:#533483,color:#fff
Use --ir to see the intermediate code:
$ ./kira --ir examples/hello.kira
Three-Address Code:
FUNC_BEGIN main
PRINT_STR "Hello, World!"
PRINT_NEWLINE
RETURN
FUNC_END main
File: codegen/codegen.go
The code generator translates TAC instructions into equivalent C source code. Each TAC instruction maps to one or more C statements.
flowchart LR
subgraph TAC ["Three-Address Code"]
direction TB
t1["FUNC_BEGIN main"]
t2["PRINT_STR 'Hello'"]
t3["_t0 = a ADD b"]
t4["IF_FALSE _t0 GOTO L1"]
t5["RETURN"]
t6["FUNC_END main"]
end
subgraph C ["Generated C"]
direction TB
c1["int main(void) {"]
c2["printf(\"%s\", \"Hello\");"]
c3["_t0 = a + b;"]
c4["if (!_t0) goto L1;"]
c5["return 0;"]
c6["}"]
end
t1 --> c1
t2 --> c2
t3 --> c3
t4 --> c4
t5 --> c5
t6 --> c6
style TAC fill:#1a1a2e,stroke:#533483,color:#e0e0e0
style C fill:#1a1a2e,stroke:#1a8a42,color:#e0e0e0
| TAC | Generated C |
|---|---|
FUNC_BEGIN main |
int main(void) { |
result = a ADD b |
result = a + b; |
IF_FALSE _t0 GOTO L1 |
if (!_t0) goto L1; |
LABEL L1 |
L1:; |
PRINT_INT x |
printf("%lld", x); |
READ_INT x |
scanf("%lld", &x); |
ARRAY_DECL arr 10 |
long long arr[10]; |
_t0 = arr ARRAY_LOAD 3 |
_t0 = arr[3]; |
CALL result add 2 |
result = add(x, y); |
Generated C for Hello World:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("%s", "Hello, World!");
printf("\n");
return 0;
}The generated C is then compiled with gcc to produce a native binary:
gcc -o hello examples/hello.c && ./helloThe compiler continues analyzing even when errors are found, collecting all errors from every phase. If errors exist, no target code is generated, and a listing file (.lst) is produced.
flowchart LR
subgraph Phases
L["Lexer"] --> P["Parser"] --> S["Semantic"]
end
L -. "error" .-> EC["Error\nCollector"]
P -. "error" .-> EC
S -. "error" .-> EC
EC --> check{"Any\nerrors?"}
check -- "yes" --> lst["Listing File (.lst)\nSource + interleaved errors\n❌ No code generated"]
check -- "no" --> gen["IR → CodeGen\n✅ output.c generated"]
style EC fill:#c4a000,color:#000
style lst fill:#e94560,color:#fff
style gen fill:#1a8a42,color:#fff
style check fill:#533483,color:#fff
The listing file shows every source line numbered, with error messages interleaved directly below the offending line:
=== LISTING FILE ===
1 | fn main() {
2 | let x : int = 10
3 | y = 20
*** [Semantic] Error at line 3, col 0: undeclared variable 'y'
4 | print "Value: ", z
*** [Semantic] Error at line 4, col 0: undeclared variable 'z'
5 | let x : int = 30
*** [Semantic] Error at line 5, col 0: 'x' is already declared in this scope
6 | foo(1, 2, 3)
*** [Semantic] Error at line 6, col 0: undeclared function 'foo'
7 | }
=== END LISTING ===
4 error(s) found. No target code generated.
Each error includes:
- The phase that caught it (Lexer, Parser, or Semantic)
- The line and column number
- A descriptive message
File: ast_viz/visualizer.go
The compiler includes a colored, tree-structured AST visualizer using Unicode box-drawing characters. This makes it easy to understand program structure at a glance.
$ ./kira --ast examples/factorial.kira
Program
├── fn factorial(n:int) : int [line 1]
│ └── Block
│ ├── if [line 2]
│ │ ├── cond: <=
│ │ │ ├── n
│ │ │ └── 1
│ │ └── Block
│ │ └── return [line 3]
│ │ └── 1
│ └── return [line 5]
│ └── *
│ ├── n
│ └── call factorial()
│ └── arg0: -
│ ├── n
│ └── 1
└── fn main() : void [line 8]
└── Block
├── let num : int [line 9]
│ └── = 0
├── print [line 10]
│ └── "Enter a number:"
├── read num [line 11]
├── let result : int [line 12]
│ └── = call factorial()
│ └── arg0: num
└── print [line 13]
├── "Factorial: "
└── result
Color scheme in terminal:
flowchart LR
subgraph Legend
cyan["Cyan\nProgram, Operators\n+ * == != < >"]
blue["Blue\nKeywords\nfn let if while print return"]
yellow["Yellow\nIdentifiers\nvariable & function names"]
green["Green\nLiterals & Types\n42 int"]
red["Red\nStrings\n'Hello, World!'"]
purple["Purple\nStructure\nBlock, true, false"]
gray["Gray\nMetadata\n[line 5]"]
end
style cyan fill:#00bcd4,color:#000
style blue fill:#2196f3,color:#fff
style yellow fill:#ffc107,color:#000
style green fill:#4caf50,color:#fff
style red fill:#f44336,color:#fff
style purple fill:#9c27b0,color:#fff
style gray fill:#9e9e9e,color:#000
# Build the compiler
go build -o kira .# Compile a .kira file → generates .c file
./kira program.kira
# Compile with custom output path
./kira -o output.c program.kira
# Build the generated C into a binary
gcc -o program program.c && ./program# Show token stream from lexer
./kira --tokens program.kira
# Show AST with tree visualization
./kira --ast program.kira
# Show three-address code IR
./kira --ir program.kira
# Show everything
./kira --tokens --ast --ir program.kiraflowchart LR
write["Write\nhello.kira"] --> compile["./kira\nhello.kira"]
compile --> gcc["gcc -o hello\nhello.c"]
gcc --> run["./hello"]
run --> output["Hello from Kira!"]
style write fill:#533483,color:#fff
style compile fill:#0f3460,color:#fff
style gcc fill:#1a8a42,color:#fff
style run fill:#e94560,color:#fff
style output fill:#c4a000,color:#000
# Write a program
cat > hello.kira << 'EOF'
fn main() {
print "Hello from Kira!"
}
EOF
# Compile → build → run
./kira hello.kira && gcc -o hello hello.c && ./helloflowchart TB
main["main.go\nCLI Entry Point"] --> lexer_pkg
main --> parser_pkg
main --> semantic_pkg
main --> irgen_pkg
main --> codegen_pkg
main --> listing_pkg
main --> ast_viz_pkg
subgraph lexer_pkg ["lexer/"]
token["token.go\nToken types & keywords"]
lex["lexer.go\nScanner"]
end
subgraph parser_pkg ["parser/"]
ast["ast.go\nAST node definitions"]
parse["parser.go\nRecursive descent"]
end
subgraph semantic_pkg ["semantic/"]
sym["symtable.go\nScoped symbol table"]
analyze["analyzer.go\nType & scope checking"]
end
subgraph irgen_pkg ["irgen/"]
tac["tac.go\nTAC instruction defs"]
ir["irgen.go\nAST → TAC"]
end
subgraph codegen_pkg ["codegen/"]
cg["codegen.go\nTAC → C translation"]
end
subgraph listing_pkg ["listing/"]
lst["listing.go\nListing file generator"]
end
subgraph ast_viz_pkg ["ast_viz/"]
viz["visualizer.go\nColored tree printer"]
end
subgraph errors_pkg ["errors/"]
errs["errors.go\nError collector"]
end
lexer_pkg --> errors_pkg
parser_pkg --> errors_pkg
semantic_pkg --> errors_pkg
main --> errors_pkg
style main fill:#e94560,color:#fff
style lexer_pkg fill:#0f3460,stroke:#533483,color:#fff
style parser_pkg fill:#0f3460,stroke:#533483,color:#fff
style semantic_pkg fill:#0f3460,stroke:#533483,color:#fff
style irgen_pkg fill:#0f3460,stroke:#533483,color:#fff
style codegen_pkg fill:#0f3460,stroke:#533483,color:#fff
style listing_pkg fill:#0f3460,stroke:#533483,color:#fff
style ast_viz_pkg fill:#0f3460,stroke:#533483,color:#fff
style errors_pkg fill:#c4a000,stroke:#c4a000,color:#000
fn main() {
print "Hello, World!"
}
fn factorial(n : int) : int {
if n <= 1 {
return 1
}
return n * factorial(n - 1)
}
fn main() {
let num : int = 0
print "Enter a number:"
read num
let result : int = factorial(num)
print "Factorial: ", result
}
fn gcd(a : int, b : int) : int {
while b != 0 {
let temp : int = b
b = a % b
a = temp
}
return a
}
fn main() {
let x : int = 0
let y : int = 0
print "Enter two numbers:"
read x
read y
let result : int = gcd(x, y)
print "GCD is: ", result
}
fn main() {
let n : int = 0
print "Enter number of Fibonacci terms:"
read n
let fib : int[50]
fib[0] = 0
fib[1] = 1
let i : int = 2
while i < n {
fib[i] = fib[i - 1] + fib[i - 2]
i = i + 1
}
let j : int = 0
while j < n {
print fib[j]
j = j + 1
}
}
- Language: Go 1.22 (zero external dependencies)
- Target: C (compiled via
gcc) - Parser: Hand-written recursive descent
- IR: Three-address code (TAC)
Built as a Compiler Design course project.