Skip to content
/ anvil Public

A C library for compiler code generation, with support for multiple architectures through a portable Intermediate Representation system.

License

Notifications You must be signed in to change notification settings

bencz/anvil

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANVIL

A C library for compiler code generation with support for multiple architectures through a portable intermediate representation (IR) system.

Features

  • Portable IR: Architecture-independent intermediate representation
  • Multiple Backends: Support for x86, x86-64, S/370, S/370-XA, S/390, z/Architecture, PowerPC 32/64-bit, ARM64
  • Assembly Output: Generates assembly text (HLASM for mainframes, GAS for x86/PPC)
  • IR Optimization: Configurable optimization passes (constant folding, DCE, strength reduction)
  • CPU Model System: Target-specific code generation with CPU model selection and feature flags
  • Extensible: Plugin architecture for adding new backends
  • Opcode Ready: Design prepared for future binary code generation

Supported Architectures

Architecture Bits Endianness Stack FP Format ABI Syntax
x86 32 Little Down IEEE 754 System V GAS/NASM
x86-64 64 Little Down IEEE 754 System V GAS/NASM
S/370 24 Big Up HFP MVS HLASM
S/370-XA 31 Big Up HFP MVS HLASM
S/390 31 Big Up HFP MVS HLASM
z/Architecture 64 Big Up HFP+IEEE MVS HLASM
PowerPC 32 32 Big Down IEEE 754 System V GAS
PowerPC 64 64 Big Down IEEE 754 System V GAS
PowerPC 64 LE 64 Little Down IEEE 754 System V GAS
ARM64 (Linux) 64 Little Down IEEE 754 System V GAS
ARM64 (macOS) 64 Little Down IEEE 754 Darwin GAS

Floating-Point Formats:

  • IEEE 754: Standard IEEE floating-point (binary)
  • HFP: IBM Hexadecimal Floating Point (base-16 exponent, used in S/370, S/390)
  • HFP+IEEE: Both formats supported (z/Architecture)

OS ABI Variants:

  • System V: Standard Unix/Linux ABI
  • Darwin: macOS/Apple ABI (underscore prefix, Mach-O format)
  • MVS: IBM z/OS ABI

Building

# Build library and examples
make

# Library only
make lib

# Examples only
make examples

# Advanced examples (fp_math_lib, dynamic_array)
make examples-advanced

# Test advanced examples
make test-examples-advanced

# Clean
make clean

# Clean advanced examples
make clean-examples-advanced

# Install (requires root)
sudo make install

Basic Usage

#include <anvil/anvil.h>

int main(void)
{
    // Create context
    anvil_ctx_t *ctx = anvil_ctx_create();
    
    // Set target architecture
    anvil_ctx_set_target(ctx, ANVIL_ARCH_ZARCH);
    
    // Create module
    anvil_module_t *mod = anvil_module_create(ctx, "my_module");
    
    // Create function type: int add(int a, int b)
    anvil_type_t *i32 = anvil_type_i32(ctx);
    anvil_type_t *params[] = { i32, i32 };
    anvil_type_t *func_type = anvil_type_func(ctx, i32, params, 2, false);
    
    // Create function
    anvil_func_t *func = anvil_func_create(mod, "add", func_type, ANVIL_LINK_EXTERNAL);
    
    // Set insertion point
    anvil_block_t *entry = anvil_func_get_entry(func);
    anvil_set_insert_point(ctx, entry);
    
    // Get parameters
    anvil_value_t *a = anvil_func_get_param(func, 0);
    anvil_value_t *b = anvil_func_get_param(func, 1);
    
    // Build IR: result = a + b
    anvil_value_t *result = anvil_build_add(ctx, a, b, "result");
    
    // Build IR: return result
    anvil_build_ret(ctx, result);
    
    // Generate code
    char *output = NULL;
    size_t len = 0;
    anvil_module_codegen(mod, &output, &len);
    
    printf("%s", output);
    
    // Cleanup
    free(output);
    anvil_module_destroy(mod);
    anvil_ctx_destroy(ctx);
    
    return 0;
}

Supported IR Operations

Arithmetic

  • anvil_build_add : Addition
  • anvil_build_sub : Subtraction
  • anvil_build_mul : Multiplication
  • anvil_build_sdiv / anvil_build_udiv : Division (signed/unsigned)
  • anvil_build_smod / anvil_build_umod : Modulo (signed/unsigned)
  • anvil_build_neg : Negation

Bitwise

  • anvil_build_and : AND
  • anvil_build_or : OR
  • anvil_build_xor : XOR
  • anvil_build_not : NOT
  • anvil_build_shl : Shift left
  • anvil_build_shr : Shift right (logical)
  • anvil_build_sar : Shift right (arithmetic)

Comparison

  • anvil_build_cmp_eq / anvil_build_cmp_ne : Equal / Not equal
  • anvil_build_cmp_lt / anvil_build_cmp_le : Less than / Less or equal
  • anvil_build_cmp_gt / anvil_build_cmp_ge : Greater than / Greater or equal
  • Unsigned versions: _ult, _ule, _ugt, _uge

Memory

  • anvil_build_alloca : Stack allocation
  • anvil_build_load : Load from memory
  • anvil_build_store : Store to memory
  • anvil_build_gep : Get Element Pointer (array indexing)
  • anvil_build_struct_gep : Get Struct Field Pointer
  • anvil_module_add_global : Add global variable

Control Flow

  • anvil_build_br : Unconditional branch
  • anvil_build_br_cond : Conditional branch
  • anvil_build_call : Function call
  • anvil_build_ret / anvil_build_ret_void : Return

Type Conversion

  • anvil_build_trunc : Truncate
  • anvil_build_zext : Zero extend
  • anvil_build_sext : Sign extend
  • anvil_build_bitcast : Bitcast
  • anvil_build_ptrtoint / anvil_build_inttoptr : Pointer/integer conversion

Floating-Point

  • anvil_build_fadd : FP Addition
  • anvil_build_fsub : FP Subtraction
  • anvil_build_fmul : FP Multiplication
  • anvil_build_fdiv : FP Division
  • anvil_build_fneg : FP Negation
  • anvil_build_fabs : FP Absolute value
  • anvil_build_fcmp : FP Comparison

FP Conversions

  • anvil_build_fptrunc : Truncate (f64 → f32)
  • anvil_build_fpext : Extend (f32 → f64)
  • anvil_build_fptosi : FP to signed integer
  • anvil_build_fptoui : FP to unsigned integer
  • anvil_build_sitofp : Signed integer to FP
  • anvil_build_uitofp : Unsigned integer to FP

Miscellaneous

  • anvil_build_phi : PHI node
  • anvil_build_select : Select (ternary)

Supported Types

  • Integers: i8, i16, i32, i64 (signed)
  • Integers: u8, u16, u32, u64 (unsigned)
  • Floating point: f32, f64
  • Pointers: anvil_type_ptr(ctx, pointee_type)
  • Arrays: anvil_type_array(ctx, elem_type, count)
  • Structs: anvil_type_struct(ctx, name, fields, num_fields)
  • Functions: anvil_type_func(ctx, ret_type, params, num_params, variadic)

Calling Conventions

Architecture Convention Description
x86 CDECL Parameters on stack, caller cleanup
x86-64 System V RDI, RSI, RDX, RCX, R8, R9, then stack
S/370 MVS R1 points to parameter list
S/390 MVS R1 points to parameter list
z/Arch z/OS 64-bit R1 points to parameter list (64-bit)
PPC32 System V r3-r10 for args, r3 for return
PPC64 BE ELFv1 r3-r10 for args, function descriptors
PPC64 LE ELFv2 r3-r10 for args, local entry points
ARM64 (Linux) AAPCS64 x0-x7 for args, x0 for return
ARM64 (macOS) Apple ARM64 x0-x7 for args, underscore prefix on symbols

Mainframe Notes

GCCMVS Compatibility

ANVIL generates code compatible with GCCMVS conventions:

  • CSECT: Blank (no module name prefix)
  • AMODE/RMODE: AMODE ANY, RMODE ANY for maximum flexibility
  • Function Names: UPPERCASE (e.g., FACTORIAL, SUM_ARRAY)
  • Stack Allocation: Direct stack offset from R13 (no GETMAIN/FREEMAIN)
  • VL Bit: NOT cleared, allowing full 31/64-bit addressing

Stack Direction

Unlike x86 where the stack grows downward (toward lower addresses), IBM mainframes grow the stack upward (toward higher addresses). ANVIL handles this automatically.

Save Areas

Mainframes use chained save areas instead of push/pop on the stack:

  • S/370/S/390: 72 bytes (18 fullwords of 4 bytes)
  • z/Architecture: 144 bytes (18 doublewords of 8 bytes)

Stack-Based Code Generation

The mainframe backends generate efficient stack-based code:

  • Stack frame allocation via LA R2,72(,R13) (no GETMAIN overhead)
  • Proper save area chaining
  • Thread-safe execution
  • Simplified epilogue (no FREEMAIN cleanup)

HLASM Output

Generated mainframe code is in HLASM (High Level Assembler) format:

  • Labels in columns 1-8
  • Opcodes starting at column 10
  • Operands starting at column 16
  • Comments with asterisk in column 1

Adding New Backends

To add support for a new architecture:

  1. Create a new file at src/backend/<arch>/<arch>.c

  2. Implement the anvil_backend_ops_t structure:

const anvil_backend_ops_t anvil_backend_myarch = {
    .name = "MyArch",
    .arch = ANVIL_ARCH_MYARCH,
    .init = myarch_init,
    .cleanup = myarch_cleanup,
    .reset = myarch_reset,      // Clear cached IR pointers (optional but recommended)
    .prepare_ir = myarch_prepare_ir,  // Prepare/lower IR before codegen (optional)
    .codegen_module = myarch_codegen_module,
    .codegen_func = myarch_codegen_func,
    .get_arch_info = myarch_get_arch_info
};
  1. Add the architecture to anvil.h:
typedef enum {
    // ...
    ANVIL_ARCH_MYARCH,
    ANVIL_ARCH_COUNT
} anvil_arch_t;
  1. Register the backend in backend.c:
anvil_register_backend(&anvil_backend_myarch);

Recent Updates

Backend IR Preparation Phase

New optional prepare_ir callback in backend interface allows architecture-specific IR preparation before code generation:

  • IR Lowering: Convert unsupported operations to sequences of supported ones
  • Peephole Optimizations: Target-specific optimizations on IR level
  • Type Legalization: Split 64-bit ops on 32-bit targets, etc.
  • Function Analysis: Detect leaf functions, calculate stack frame layout

The ARM64 backend now uses prepare_ir to analyze all functions before code generation.

Struct Support

  • Struct field access via anvil_build_struct_gep() for all mainframe backends
  • Automatic field offset calculation at compile time
  • Efficient LA (Load Address) instruction for small offsets
  • Example: struct Point { int x; int y; } in examples/struct_test.c

Array Support (GEP)

  • Full array indexing via anvil_build_gep() for all mainframe backends
  • Automatic element size calculation (1, 2, 4, 8 bytes)
  • Efficient index multiplication using shifts (SLL/SLLG)
  • Example: sum_array(int *arr, int n) in examples/array_test.c

Floating-Point Support (Mainframes)

  • Full floating-point arithmetic for all mainframe backends
  • HFP (Hexadecimal FP): S/370, S/370-XA, S/390 (ADR, MDR, DDR instructions)
  • IEEE 754 (Binary FP): z/Architecture, S/390 optional (ADBR, MDBR, DDBR instructions)
  • FP format selection via anvil_ctx_set_fp_format()
  • Float↔Int conversion using Magic Number technique (HFP) or native CFDBR (IEEE)

Control Flow Support

  • Full support for loops (while, for) and conditionals (if/else)
  • Proper branch label generation with function-prefixed names (func$block)
  • Correct conditional branch code generation

Local Variable Allocation

  • Stack slot allocation for local variables via anvil_build_alloca
  • Direct stack offset addressing for efficient memory access
  • Automatic dynamic area sizing including local variables and FP temps

Instruction Optimizations

  • AHI/AGHI: Add Halfword Immediate for small constants (S/390, z/Architecture)
  • Direct stack access: Load/Store directly from stack slots without intermediate registers
  • Relative branches: J/JNZ instead of B/BNZ for better performance (S/390+)

Global Variables Support

  • Full support for global variables on all backends
  • Direct load/store to globals without intermediate address calculation
  • Type-aware storage allocation (C, H, F, FD, E, D for mainframes)
  • Support for initialized globals with DC (Define Constant)
  • Array constant initializers: anvil_const_array() and anvil_global_set_initializer()
  • UPPERCASE naming convention (GCCMVS compatible for mainframes)
  • Example: examples/global_test.c

PowerPC Backend Support

  • PPC32: 32-bit big-endian, System V ABI, GAS output
  • PPC64 BE: 64-bit big-endian, ELFv1 ABI with function descriptors (.opd section)
  • PPC64 LE: 64-bit little-endian, ELFv2 ABI with .localentry directives
  • Full IR operation support: arithmetic, bitwise, memory, control flow, comparisons
  • Type conversions: truncation, zero/sign extension, bitcast, pointer-int
  • Floating-point operations (IEEE 754): fadd, fsub, fmul, fdiv, fneg, fabs, fcmp
  • FP conversions: sitofp, uitofp, fptosi, fptoui, fpext, fptrunc
  • Stack slot allocation for local variables (alloca)
  • String table management for string literals
  • Global variable emission with proper alignment
  • GEP and STRUCT_GEP for array and struct access
  • CPU Model System: Target-specific code generation based on CPU model (POWER5-POWER10)

CPU Model System

ANVIL supports CPU model-specific code generation, allowing optimized code for specific processor generations.

Supported CPU Models:

  • PowerPC: G3, G4, 970 (G5), POWER4-POWER10
  • z/Architecture: z900, z9, z10, z196, zEC12, z13-z16
  • ARM64: Generic, Cortex-A53/A72/A76, Neoverse N1/V1, Apple M1/M2/M3
  • x86-64: Generic, Core2, Nehalem, Sandy Bridge, Haswell, Skylake, Ice Lake, Zen/Zen3/Zen4

Usage:

// Set target architecture and CPU model
anvil_ctx_set_target(ctx, ANVIL_ARCH_PPC64);
anvil_ctx_set_cpu(ctx, ANVIL_CPU_PPC64_POWER9);

// Check available features
if (anvil_ctx_has_feature(ctx, ANVIL_FEATURE_PPC_VSX)) {
    // VSX vector instructions available
}

// Enable/disable specific features
anvil_ctx_enable_feature(ctx, ANVIL_FEATURE_PPC_HTM);
anvil_ctx_disable_feature(ctx, ANVIL_FEATURE_PPC_VSX);

CPU-Specific Optimizations (PPC64):

  • popcntd: Native on POWER5+, emulated on older CPUs
  • isel: Conditional select on POWER7+, branch-based fallback
  • ldbrx/stdbrx: Byte reversal on POWER7+
  • cmpb: Byte comparison on POWER6+
  • fcpsgn: FP copy sign on POWER7+

ARM64 Backend Improvements

Recent fixes and refactoring of the ARM64 backend for robust code generation:

Modular Architecture:

  • arm64_internal.h: Definitions, structures, and declarations
  • arm64_helpers.c: Helper functions (type size, stack slots, code emission)
  • arm64_emit.c: Instruction emission (arithmetic, memory, control flow, FP)
  • arm64.c: Main backend (lifecycle, codegen entry points)
  • opt/: Architecture-specific optimization passes

ARM64-Specific Optimizations (src/backend/arm64/opt/):

  • Peephole optimizations: Redundant store elimination, load-store same address removal
  • Dead store elimination: Remove stores that are immediately overwritten
  • Redundant load elimination: Reuse values already loaded from same address
  • Branch optimization: Combine cmp+cset+cbnz into cmp+b.cond, use cbz/cbnz/tbz/tbnz
  • Immediate optimization: Use immediate forms of instructions when possible
  • Conditional branch fusion: arm64_emit_br_cond() detects comparison results and emits cmp + b.cond directly
  • 32-bit register usage: Arithmetic/bitwise ops use W registers for 32-bit types (reduces code size)
  • Immediate operands: ADD/SUB/CMP use immediate form for small constants (add w0, w9, #1)
  • CBZ/CBNZ optimization: x == 0 uses cbz, x != 0 uses cbnz (saves 1 instruction)

Code Generation Improvements:

  • PHI node handling: Correct SSA resolution with copies before branches
  • External function calls: Proper handling of malloc, free, memcpy and other C library functions
  • SSA value preservation: All instruction results saved to stack slots to prevent register clobbering
  • Large stack frames: Support for stack offsets >255 bytes using x16 as scratch register
  • Very large stack frames (>4095 bytes): Support for stack allocation/deallocation using mov x16, #offset + sub/add sp, sp, x16 sequence
  • Type-aware load/store: Correct instruction selection based on type size (ldr w0 for 32-bit, ldrb w0 for 8-bit)
  • Sign-extending loads: Proper ldrsb, ldrsh, ldrsw for signed types to preserve sign in 64-bit registers
  • Parameter spilling: Function parameters saved to stack at entry for safe access in loops
  • macOS global variable syntax: Proper @PAGE/@PAGEOFF relocations for Darwin ABI (instead of :lo12:)
  • Array stack allocation: Correct stack frame sizing for arrays based on element type and count
  • Type size calculation: arm64_type_size() function for accurate allocation of arrays, structs, and primitives
  • String pointer arrays: Proper emission of string constant pointers in global array initializers (.quad .LCn directives)
  • Variadic function calls (Darwin): Arguments to variadic functions (e.g., printf) passed on stack as required by AAPCS64 on macOS
  • Array initializers in globals: Full support for emitting initialized arrays with correct element values
  • Float/double global initializers: Floating-point constants emitted using bit representation (.long/.quad with hex values)
  • Correct store sizes for array elements: Store instructions use source value type size to avoid corrupting adjacent elements in multi-dimensional arrays

IR Debug/Dump API

New debugging functionality for inspecting IR structures:

#include <anvil/anvil.h>  // anvil_debug.h is now included automatically

// Print module IR to stdout
anvil_print_module(mod);

// Print function IR to stdout
anvil_print_func(func);

// Dump to FILE*
anvil_dump_module(stderr, mod);
anvil_dump_func(stderr, func);
anvil_dump_block(stderr, block);
anvil_dump_instr(stderr, instr);

// Convert to string (caller must free)
char *ir_str = anvil_module_to_string(mod);
printf("%s", ir_str);
free(ir_str);

// Check if block has terminator (ret, br, br_cond)
if (!anvil_block_has_terminator(block)) {
    anvil_build_ret_void(ctx);  // Add implicit return
}

// Check if value is boolean (comparison result)
if (anvil_value_is_bool(cond)) {
    // Already boolean, use directly in br_cond
} else {
    // Need to compare with zero first
    cond = anvil_build_cmp_ne(ctx, cond, zero, "tobool");
}

// Get type of a value
anvil_type_t *type = anvil_value_get_type(val);

String escaping: String constants in IR dumps are properly escaped (\n, \t, \0, \xHH for non-printable characters).

Output format:

; ModuleID = 'my_module'
; Functions: 2, Globals: 1

@counter = external global i32 42

define external i32 @factorial(i32 %arg0) {
entry:
    %cmp = cmp_le i8 %arg0, 1
    br_cond %cmp, label %base_case, label %recurse
...
}

Memory Management Improvements

Improved cleanup flow to prevent dangling pointers and use-after-free issues:

  • Backend reset function: New reset callback in anvil_backend_ops_t to clear cached IR pointers
  • Safe cleanup order: anvil_ctx_destroy() now resets backend state before destroying modules
  • All backends updated: x86, x86-64, ARM64, S/370, S/370-XA, S/390, z/Architecture, PPC32, PPC64, PPC64LE

Advanced Examples

Three advanced examples demonstrate ANVIL's capabilities for generating linkable libraries:

  • examples/fp_math_lib/: Floating-point math library

    • Generates exportable FP functions: fp_add, fp_sub, fp_mul, fp_div, fp_neg, fp_abs
    • Demonstrates ANVIL IR for floating-point operations
    • Includes C test program that links with generated assembly (24 tests)
  • examples/dynamic_array/: Dynamic array library with C library calls

    • Demonstrates calling external C functions: malloc, free, memcpy
    • Functions: array_create, array_destroy, array_copy, array_sum, array_max, array_min, array_count_if, array_scale
    • Shows pointer arithmetic, loops, conditionals, and memory management
    • Includes comprehensive test suite (41 tests)
  • examples/base64_lib/: Base64 encoding library

    • Demonstrates complex bitwise operations, byte manipulation, and lookup table logic
    • Functions: base64_encode, base64_encoded_len
    • Shows select operations for conditional value computation
    • Includes test suite with RFC 4648 test vectors (28 tests)

IR Optimization

ANVIL includes a configurable optimization pass infrastructure that can be enabled or disabled.

Optimization Levels

Level Name Description
O0 ANVIL_OPT_NONE No optimization (default)
Og ANVIL_OPT_DEBUG Debug-friendly: copy propagation, store-load propagation
O1 ANVIL_OPT_BASIC Og + constant folding, DCE
O2 ANVIL_OPT_STANDARD O1 + CFG simplification, strength reduction, memory opts, CSE
O3 ANVIL_OPT_AGGRESSIVE O2 + loop unrolling

Available Passes

Pass Level Description
Constant Folding O1+ Evaluates constant expressions at compile time (3 + 58)
Dead Code Elimination (DCE) O1+ Removes unused instructions
Copy Propagation Og+ Replaces uses of copied values with originals
Store-Load Propagation Og+ Replaces load after store with stored value
Strength Reduction O2+ Replaces expensive ops with cheaper ones (x * 8x << 3)
CFG Simplification O2+ Merges blocks, removes unreachable code
Dead Store Elimination O2+ Removes stores overwritten before read
Redundant Load Elimination O2+ Reuses loaded values from same address
Common Subexpression Elimination (CSE) O2+ Reuses computed values
Loop Unrolling O3+ Unrolls small loops with known trip counts (experimental)

Usage

#include <anvil/anvil_opt.h>

// Set optimization level
anvil_ctx_set_opt_level(ctx, ANVIL_OPT_STANDARD);

// Optimize module before codegen
anvil_module_optimize(mod);

// Or fine-grained control
anvil_pass_manager_t *pm = anvil_ctx_get_pass_manager(ctx);
anvil_pass_manager_enable(pm, ANVIL_PASS_CONST_FOLD);
anvil_pass_manager_disable(pm, ANVIL_PASS_DCE);

Example: Constant Folding (S/390)

Before optimization:

         LA    R2,3            Load constant 3
         AHI   R2,5            Add 5
         LR    R15,R2          Result in R15

After optimization:

         LA    R15,8           Load constant 8 directly

Example: Strength Reduction (S/390)

Before optimization:

         LA    R3,8            Load constant 8
         MSR   R2,R3           Multiply (expensive)

After optimization:

         LA    R3,3            Load shift amount
         SLL   R2,0(R3)        Shift left by 3 (x * 8 = x << 3)

Roadmap

  • Binary opcode generation
  • ASI/AGSI optimization (Add to Storage Immediate)
  • Register allocation improvements
  • RISC-V support
  • Debug info (DWARF)
  • Extend CPU model system to more backends (ARM64, z/Architecture, x86-64)

Documentation

See DOCUMENTATION.md for complete API reference and detailed usage examples.

License

Unlicense

About

A C library for compiler code generation, with support for multiple architectures through a portable Intermediate Representation system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published