Skip to content

pixperk/cd_assgn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kira Language Compiler

A complete compiler pipeline for Kira, a minimal imperative programming language, built entirely in Go with zero external dependencies.

Kira compiles to C, which is then compiled to a native binary using gcc. The compiler implements all five phases of classical compilation: lexical analysis, parsing, semantic analysis, intermediate code generation, and target code generation.

flowchart TB
    src["source.kira"] --> lexer

    subgraph compiler ["Kira Compiler"]
        lexer["Phase 1: Lexer\n(Scanner)"]
        parser["Phase 2: Parser\n(Syntax Analysis)"]
        semantic["Phase 3: Semantic\nAnalyzer"]
        irgen["Phase 4: IR Generator\n(Three-Address Code)"]
        codegen["Phase 5: Code\nGenerator"]

        lexer -- "Token Stream" --> parser
        parser -- "AST" --> semantic
        semantic -- "Validated AST" --> irgen
        irgen -- "TAC Instructions" --> codegen
    end

    codegen --> cfile["output.c"]
    cfile --> gcc["gcc"]
    gcc --> bin["Native Binary"]

    errs["errors.lst\n(Listing File)"]
    lexer -. "errors" .-> errs
    parser -. "errors" .-> errs
    semantic -. "errors" .-> errs

    style compiler fill:#1a1a2e,stroke:#16213e,color:#e0e0e0
    style lexer fill:#0f3460,stroke:#533483,color:#fff
    style parser fill:#0f3460,stroke:#533483,color:#fff
    style semantic fill:#0f3460,stroke:#533483,color:#fff
    style irgen fill:#0f3460,stroke:#533483,color:#fff
    style codegen fill:#0f3460,stroke:#533483,color:#fff
    style src fill:#533483,stroke:#533483,color:#fff
    style cfile fill:#1a8a42,stroke:#1a8a42,color:#fff
    style gcc fill:#1a8a42,stroke:#1a8a42,color:#fff
    style bin fill:#e94560,stroke:#e94560,color:#fff
    style errs fill:#c4a000,stroke:#c4a000,color:#000
Loading

Table of Contents


Quick Start

# Build the compiler
go build -o kira .

# Compile a Kira program
./kira examples/hello.kira

# Build and run the generated C code
gcc -o hello examples/hello.c && ./hello
# Output: Hello, World!

That's it — three commands from source to running binary.


The Kira Language

Kira is a statically typed, imperative language designed to be simple enough to compile in a weekend, yet expressive enough to write real algorithms.

Types

Type Description Example
int 64-bit signed integer let x : int = 42
int[N] Fixed-size integer array let arr : int[10]
string String literals (print only) "hello"

Operators

Category Operators Precedence (high to low)
Unary - (negate), not Highest
Multiply *, /, %
Add +, -
Compare <, >, <=, >=
Equality ==, !=
Logical and, or Lowest

Syntax at a Glance

// Functions
fn add(a : int, b : int) : int {
    return a + b
}

// Entry point (required)
fn main() {
    // Variables
    let x : int = 10
    let name : int = 0

    // I/O
    print "Enter a number: "
    read name
    print "You entered: ", name

    // Control flow
    if x > 5 {
        print "big"
    } else {
        print "small"
    }

    // Loops
    let i : int = 0
    while i < 10 {
        print i
        i = i + 1
    }

    // Arrays
    let arr : int[5]
    arr[0] = 42
    print arr[0]

    // Function calls
    let sum : int = add(3, 4)
    print "Sum: ", sum
}

Keywords

let   if   else   while   fn   return   print   read
int   and  or     not     true false

Comments

// This is a comment (everything after // until end of line)

Strings

Strings support escape sequences: \", \\, \n, \t.

print "Hello,\tWorld!\n"

Compiler Phases

Phase 1: Lexical Analysis

File: lexer/lexer.go, lexer/token.go

The lexer (scanner) reads the raw source code character by character and breaks it into a stream of tokens — the smallest meaningful units of the language.

flowchart LR
    subgraph Source
        src["let x : int = 42"]
    end

    subgraph Tokens
        direction LR
        t1["LET\n'let'"]
        t2["IDENT\n'x'"]
        t3["COLON\n':'"]
        t4["INT\n'int'"]
        t5["ASSIGN\n'='"]
        t6["INT_LIT\n'42'"]
    end

    src --> t1 --> t2 --> t3 --> t4 --> t5 --> t6

    style t1 fill:#0f3460,color:#fff
    style t2 fill:#533483,color:#fff
    style t3 fill:#16213e,color:#fff
    style t4 fill:#0f3460,color:#fff
    style t5 fill:#16213e,color:#fff
    style t6 fill:#1a8a42,color:#fff
Loading

What it handles:

  • Identifiers & Keywords — scans letter sequences, checks against the keyword table
  • Integer literals — sequences of digits
  • String literals"..." with escape sequence support (\n, \t, \\, \")
  • Operators — single-char (+, -, *) and two-char (==, !=, <=, >=)
  • Comments// to end of line, stripped from token stream
  • Whitespace — consumed and discarded, but line/column tracked for error reporting
  • Error recovery — invalid characters are flagged but scanning continues

Lexer state machine:

stateDiagram-v2
    [*] --> Start
    Start --> Identifier : letter/underscore
    Start --> Number : digit
    Start --> String : double quote
    Start --> Operator : + - * / = < > !
    Start --> Comment : //
    Start --> Start : whitespace (skip)

    Identifier --> Identifier : letter/digit
    Identifier --> KeywordCheck : non-alphanumeric
    KeywordCheck --> [*] : emit KEYWORD or IDENT

    Number --> Number : digit
    Number --> [*] : non-digit → emit INT_LIT

    String --> String : any char
    String --> Escape : backslash
    Escape --> String : n, t, \\, \"
    String --> [*] : closing quote → emit STRING_LIT

    Operator --> [*] : single-char op → emit token
    Operator --> [*] : two-char op (==, !=, <=, >=)

    Comment --> Comment : any char
    Comment --> Start : newline
Loading

Use --tokens to see the token stream:

$ ./kira --tokens examples/hello.kira

Token Stream:
   FN           "fn"                  (line 1, col 1)
   IDENT        "main"                (line 1, col 4)
   LPAREN       "("                   (line 1, col 8)
   RPAREN       ")"                   (line 1, col 9)
   LBRACE       "{"                   (line 1, col 11)
   PRINT        "print"               (line 2, col 5)
   STRING_LIT   "Hello, World!"       (line 2, col 11)
   RBRACE       "}"                   (line 3, col 1)
   EOF          ""                    (line 4, col 1)

Phase 2: Syntax Analysis (Parsing)

File: parser/parser.go, parser/ast.go

The parser takes the flat token stream and builds a hierarchical Abstract Syntax Tree (AST) that represents the program's structure. It uses recursive descent parsing — each grammar rule becomes a function.

Grammar (simplified):

program        → func_decl*
func_decl      → "fn" IDENT "(" params? ")" (":" type)? block
block          → "{" statement* "}"
statement      → let_stmt | assign_stmt | if_stmt | while_stmt
               | print_stmt | read_stmt | return_stmt | expr_stmt
expression     → logic_or
logic_or       → logic_and ("or" logic_and)*
logic_and      → equality ("and" equality)*
equality       → comparison (("==" | "!=") comparison)*
comparison     → addition (("<" | ">" | "<=" | ">=") addition)*
addition       → multiplication (("+" | "-") multiplication)*
multiplication → unary (("*" | "/" | "%") unary)*
unary          → ("-" | "not") unary | primary
primary        → INT_LIT | STRING_LIT | IDENT | call | index | "(" expr ")"

Expression precedence is handled by the structure of the recursive calls — deeper calls bind tighter:

flowchart TB
    or["or (lowest)"] --> and["and"]
    and --> eq["== !="]
    eq --> cmp["< > <= >="]
    cmp --> add["+ -"]
    add --> mul["* / %"]
    mul --> unary["- not (highest)"]
    unary --> primary["INT_LIT | IDENT | CALL\nINDEX | ( expr )"]

    style or fill:#e94560,color:#fff
    style and fill:#c23666,color:#fff
    style eq fill:#a0306a,color:#fff
    style cmp fill:#7e2a6e,color:#fff
    style add fill:#5c2472,color:#fff
    style mul fill:#3a1e76,color:#fff
    style unary fill:#18187a,color:#fff
    style primary fill:#0f3460,color:#fff
Loading

How a + b * c is parsed (the tree structure enforces * binds tighter than +):

flowchart TB
    plus["+"] --> a["a"]
    plus --> mul["*"]
    mul --> b["b"]
    mul --> c["c"]

    style plus fill:#5c2472,color:#fff
    style mul fill:#3a1e76,color:#fff
    style a fill:#533483,color:#fff
    style b fill:#533483,color:#fff
    style c fill:#533483,color:#fff
Loading

Error recovery: On a parse error, the parser records the error and skips tokens until it finds a safe synchronization point (next statement keyword or }), then continues parsing. This lets it report multiple errors in a single pass.

Use --ast to see the tree (with colored visualization — see AST Visualization):

Program
├── fn main() : void [line 1]
│   └── Block
│       └── print [line 2]
│           └── "Hello, World!"

Phase 3: Semantic Analysis

File: semantic/analyzer.go, semantic/symtable.go

The semantic analyzer walks the AST and checks that the program makes sense beyond just syntax. It builds and uses a symbol table to track declarations.

What it checks:

Check Example Error
Undeclared variables x = 5 without let x : int
Duplicate declarations Two let x : int in the same scope
Undeclared functions Calling foo() when no fn foo exists
Argument count mismatch fn add(a:int, b:int) called as add(1)
Array misuse Indexing a non-array, or assigning to whole array
Return type violations Returning a value from a void function
Missing main() Every program must have a fn main()

Symbol Table — scoped chain with lookup walking outward:

flowchart TB
    subgraph global ["Global Scope"]
        gcd["gcd\n(func, 2 params)"]
        main_fn["main\n(func, 0 params)"]
    end

    subgraph gcd_scope ["gcd's Scope"]
        a["a : int\n(param)"]
        b["b : int\n(param)"]
        temp["temp : int\n(var)"]
    end

    subgraph main_scope ["main's Scope"]
        x["x : int\n(var)"]
        y["y : int\n(var)"]
        result["result : int\n(var)"]
    end

    gcd_scope -- "parent" --> global
    main_scope -- "parent" --> global

    style global fill:#1a1a2e,stroke:#533483,color:#e0e0e0
    style gcd_scope fill:#0f3460,stroke:#533483,color:#fff
    style main_scope fill:#0f3460,stroke:#533483,color:#fff
    style gcd fill:#533483,color:#fff
    style main_fn fill:#533483,color:#fff
    style a fill:#1a8a42,color:#fff
    style b fill:#1a8a42,color:#fff
    style temp fill:#1a8a42,color:#fff
    style x fill:#1a8a42,color:#fff
    style y fill:#1a8a42,color:#fff
    style result fill:#1a8a42,color:#fff
Loading

Variable resolution: When the analyzer encounters a name, it checks the current scope first. If not found, it walks up through parent scopes until it reaches the global scope. If still not found, it reports an "undeclared variable" error.


Phase 4: Intermediate Code Generation

File: irgen/irgen.go, irgen/tac.go

The IR generator walks the validated AST and produces Three-Address Code (TAC) — a linear, platform-independent intermediate representation where each instruction has at most three operands.

// Source
let result : int = a + b * c
// Three-Address Code
_t0 = b MUL c
_t1 = a ADD _t0
result = _t1

TAC instruction categories:

Category Instructions
Arithmetic ADD, SUB, MUL, DIV, MOD, NEG
Comparison EQ, NE, LT, GT, LE, GE
Logical AND, OR, NOT
Control flow LABEL, GOTO, IF_FALSE_GOTO
Functions FUNC_BEGIN, FUNC_END, PARAM, CALL, RETURN
I/O PRINT_INT, PRINT_STR, PRINT_NEWLINE, READ_INT
Arrays ARRAY_DECL, ARRAY_STORE, ARRAY_LOAD
Assignment ASSIGN

Control flow — how while is translated:

flowchart TB
    subgraph Source ["Kira Source"]
        src["while x > 0 {\n    x = x - 1\n}"]
    end

    subgraph TAC ["Three-Address Code"]
        l0["L0: (loop start)"]
        cmp["_t0 = x GT 0"]
        branch["IF_FALSE _t0 GOTO L1"]
        sub["_t1 = x SUB 1"]
        assign["x = _t1"]
        jmp["GOTO L0"]
        l1["L1: (loop end)"]

        l0 --> cmp --> branch --> sub --> assign --> jmp
        jmp -.-> l0
        branch -. "false" .-> l1
    end

    Source --> TAC

    style l0 fill:#c4a000,color:#000
    style l1 fill:#c4a000,color:#000
    style branch fill:#e94560,color:#fff
    style jmp fill:#e94560,color:#fff
    style cmp fill:#0f3460,color:#fff
    style sub fill:#0f3460,color:#fff
    style assign fill:#0f3460,color:#fff
Loading

How if/else is translated:

flowchart TB
    subgraph TAC ["Three-Address Code"]
        cond["_t0 = (condition)"]
        branch["IF_FALSE _t0 GOTO L_else"]
        then_body["... then block ..."]
        jmp["GOTO L_end"]
        else_label["L_else:"]
        else_body["... else block ..."]
        end_label["L_end:"]

        cond --> branch
        branch -- "true" --> then_body --> jmp --> end_label
        branch -. "false" .-> else_label --> else_body --> end_label
    end

    style branch fill:#e94560,color:#fff
    style jmp fill:#e94560,color:#fff
    style else_label fill:#c4a000,color:#000
    style end_label fill:#c4a000,color:#000
    style then_body fill:#1a8a42,color:#fff
    style else_body fill:#0f3460,color:#fff
    style cond fill:#533483,color:#fff
Loading

Use --ir to see the intermediate code:

$ ./kira --ir examples/hello.kira

Three-Address Code:
   FUNC_BEGIN main
     PRINT_STR "Hello, World!"
     PRINT_NEWLINE
     RETURN
   FUNC_END main

Phase 5: Code Generation

File: codegen/codegen.go

The code generator translates TAC instructions into equivalent C source code. Each TAC instruction maps to one or more C statements.

flowchart LR
    subgraph TAC ["Three-Address Code"]
        direction TB
        t1["FUNC_BEGIN main"]
        t2["PRINT_STR 'Hello'"]
        t3["_t0 = a ADD b"]
        t4["IF_FALSE _t0 GOTO L1"]
        t5["RETURN"]
        t6["FUNC_END main"]
    end

    subgraph C ["Generated C"]
        direction TB
        c1["int main(void) {"]
        c2["printf(\"%s\", \"Hello\");"]
        c3["_t0 = a + b;"]
        c4["if (!_t0) goto L1;"]
        c5["return 0;"]
        c6["}"]
    end

    t1 --> c1
    t2 --> c2
    t3 --> c3
    t4 --> c4
    t5 --> c5
    t6 --> c6

    style TAC fill:#1a1a2e,stroke:#533483,color:#e0e0e0
    style C fill:#1a1a2e,stroke:#1a8a42,color:#e0e0e0
Loading
TAC Generated C
FUNC_BEGIN main int main(void) {
result = a ADD b result = a + b;
IF_FALSE _t0 GOTO L1 if (!_t0) goto L1;
LABEL L1 L1:;
PRINT_INT x printf("%lld", x);
READ_INT x scanf("%lld", &x);
ARRAY_DECL arr 10 long long arr[10];
_t0 = arr ARRAY_LOAD 3 _t0 = arr[3];
CALL result add 2 result = add(x, y);

Generated C for Hello World:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    printf("%s", "Hello, World!");
    printf("\n");
    return 0;
}

The generated C is then compiled with gcc to produce a native binary:

gcc -o hello examples/hello.c && ./hello

Error Handling & Listing File

The compiler continues analyzing even when errors are found, collecting all errors from every phase. If errors exist, no target code is generated, and a listing file (.lst) is produced.

flowchart LR
    subgraph Phases
        L["Lexer"] --> P["Parser"] --> S["Semantic"]
    end

    L -. "error" .-> EC["Error\nCollector"]
    P -. "error" .-> EC
    S -. "error" .-> EC

    EC --> check{"Any\nerrors?"}
    check -- "yes" --> lst["Listing File (.lst)\nSource + interleaved errors\n❌ No code generated"]
    check -- "no" --> gen["IR → CodeGen\n✅ output.c generated"]

    style EC fill:#c4a000,color:#000
    style lst fill:#e94560,color:#fff
    style gen fill:#1a8a42,color:#fff
    style check fill:#533483,color:#fff
Loading

The listing file shows every source line numbered, with error messages interleaved directly below the offending line:

=== LISTING FILE ===

   1 | fn main() {
   2 |     let x : int = 10
   3 |     y = 20
       *** [Semantic] Error at line 3, col 0: undeclared variable 'y'
   4 |     print "Value: ", z
       *** [Semantic] Error at line 4, col 0: undeclared variable 'z'
   5 |     let x : int = 30
       *** [Semantic] Error at line 5, col 0: 'x' is already declared in this scope
   6 |     foo(1, 2, 3)
       *** [Semantic] Error at line 6, col 0: undeclared function 'foo'
   7 | }

=== END LISTING ===

4 error(s) found. No target code generated.

Each error includes:

  • The phase that caught it (Lexer, Parser, or Semantic)
  • The line and column number
  • A descriptive message

AST Visualization (Novelty)

File: ast_viz/visualizer.go

The compiler includes a colored, tree-structured AST visualizer using Unicode box-drawing characters. This makes it easy to understand program structure at a glance.

$ ./kira --ast examples/factorial.kira
Program
├── fn factorial(n:int) : int [line 1]
│   └── Block
│       ├── if [line 2]
│       │   ├── cond: <=
│       │   │   ├── n
│       │   │   └── 1
│       │   └── Block
│       │       └── return [line 3]
│       │           └── 1
│       └── return [line 5]
│           └── *
│               ├── n
│               └── call factorial()
│                   └── arg0: -
│                       ├── n
│                       └── 1
└── fn main() : void [line 8]
    └── Block
        ├── let num : int [line 9]
        │   └── = 0
        ├── print [line 10]
        │   └── "Enter a number:"
        ├── read num [line 11]
        ├── let result : int [line 12]
        │   └── = call factorial()
        │       └── arg0: num
        └── print [line 13]
            ├── "Factorial: "
            └── result

Color scheme in terminal:

flowchart LR
    subgraph Legend
        cyan["Cyan\nProgram, Operators\n+ * == != < >"]
        blue["Blue\nKeywords\nfn let if while print return"]
        yellow["Yellow\nIdentifiers\nvariable & function names"]
        green["Green\nLiterals & Types\n42 int"]
        red["Red\nStrings\n'Hello, World!'"]
        purple["Purple\nStructure\nBlock, true, false"]
        gray["Gray\nMetadata\n[line 5]"]
    end

    style cyan fill:#00bcd4,color:#000
    style blue fill:#2196f3,color:#fff
    style yellow fill:#ffc107,color:#000
    style green fill:#4caf50,color:#fff
    style red fill:#f44336,color:#fff
    style purple fill:#9c27b0,color:#fff
    style gray fill:#9e9e9e,color:#000
Loading

CLI Usage

# Build the compiler
go build -o kira .

Basic compilation

# Compile a .kira file → generates .c file
./kira program.kira

# Compile with custom output path
./kira -o output.c program.kira

# Build the generated C into a binary
gcc -o program program.c && ./program

Debug flags

# Show token stream from lexer
./kira --tokens program.kira

# Show AST with tree visualization
./kira --ast program.kira

# Show three-address code IR
./kira --ir program.kira

# Show everything
./kira --tokens --ast --ir program.kira

Full workflow

flowchart LR
    write["Write\nhello.kira"] --> compile["./kira\nhello.kira"]
    compile --> gcc["gcc -o hello\nhello.c"]
    gcc --> run["./hello"]
    run --> output["Hello from Kira!"]

    style write fill:#533483,color:#fff
    style compile fill:#0f3460,color:#fff
    style gcc fill:#1a8a42,color:#fff
    style run fill:#e94560,color:#fff
    style output fill:#c4a000,color:#000
Loading
# Write a program
cat > hello.kira << 'EOF'
fn main() {
    print "Hello from Kira!"
}
EOF

# Compile → build → run
./kira hello.kira && gcc -o hello hello.c && ./hello

Project Structure

flowchart TB
    main["main.go\nCLI Entry Point"] --> lexer_pkg
    main --> parser_pkg
    main --> semantic_pkg
    main --> irgen_pkg
    main --> codegen_pkg
    main --> listing_pkg
    main --> ast_viz_pkg

    subgraph lexer_pkg ["lexer/"]
        token["token.go\nToken types & keywords"]
        lex["lexer.go\nScanner"]
    end

    subgraph parser_pkg ["parser/"]
        ast["ast.go\nAST node definitions"]
        parse["parser.go\nRecursive descent"]
    end

    subgraph semantic_pkg ["semantic/"]
        sym["symtable.go\nScoped symbol table"]
        analyze["analyzer.go\nType & scope checking"]
    end

    subgraph irgen_pkg ["irgen/"]
        tac["tac.go\nTAC instruction defs"]
        ir["irgen.go\nAST → TAC"]
    end

    subgraph codegen_pkg ["codegen/"]
        cg["codegen.go\nTAC → C translation"]
    end

    subgraph listing_pkg ["listing/"]
        lst["listing.go\nListing file generator"]
    end

    subgraph ast_viz_pkg ["ast_viz/"]
        viz["visualizer.go\nColored tree printer"]
    end

    subgraph errors_pkg ["errors/"]
        errs["errors.go\nError collector"]
    end

    lexer_pkg --> errors_pkg
    parser_pkg --> errors_pkg
    semantic_pkg --> errors_pkg
    main --> errors_pkg

    style main fill:#e94560,color:#fff
    style lexer_pkg fill:#0f3460,stroke:#533483,color:#fff
    style parser_pkg fill:#0f3460,stroke:#533483,color:#fff
    style semantic_pkg fill:#0f3460,stroke:#533483,color:#fff
    style irgen_pkg fill:#0f3460,stroke:#533483,color:#fff
    style codegen_pkg fill:#0f3460,stroke:#533483,color:#fff
    style listing_pkg fill:#0f3460,stroke:#533483,color:#fff
    style ast_viz_pkg fill:#0f3460,stroke:#533483,color:#fff
    style errors_pkg fill:#c4a000,stroke:#c4a000,color:#000
Loading

Examples

Hello World

fn main() {
    print "Hello, World!"
}

Recursive Factorial

fn factorial(n : int) : int {
    if n <= 1 {
        return 1
    }
    return n * factorial(n - 1)
}

fn main() {
    let num : int = 0
    print "Enter a number:"
    read num
    let result : int = factorial(num)
    print "Factorial: ", result
}

GCD (Euclidean Algorithm)

fn gcd(a : int, b : int) : int {
    while b != 0 {
        let temp : int = b
        b = a % b
        a = temp
    }
    return a
}

fn main() {
    let x : int = 0
    let y : int = 0
    print "Enter two numbers:"
    read x
    read y
    let result : int = gcd(x, y)
    print "GCD is: ", result
}

Fibonacci with Arrays

fn main() {
    let n : int = 0
    print "Enter number of Fibonacci terms:"
    read n
    let fib : int[50]
    fib[0] = 0
    fib[1] = 1

    let i : int = 2
    while i < n {
        fib[i] = fib[i - 1] + fib[i - 2]
        i = i + 1
    }

    let j : int = 0
    while j < n {
        print fib[j]
        j = j + 1
    }
}

Tech Stack

  • Language: Go 1.22 (zero external dependencies)
  • Target: C (compiled via gcc)
  • Parser: Hand-written recursive descent
  • IR: Three-address code (TAC)

Built as a Compiler Design course project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages