A C library for compiler code generation with support for multiple architectures through a portable intermediate representation (IR) system.
- Portable IR: Architecture-independent intermediate representation
- Multiple Backends: Support for x86, x86-64, S/370, S/370-XA, S/390, z/Architecture, PowerPC 32/64-bit, ARM64
- Assembly Output: Generates assembly text (HLASM for mainframes, GAS for x86/PPC)
- IR Optimization: Configurable optimization passes (constant folding, DCE, strength reduction)
- CPU Model System: Target-specific code generation with CPU model selection and feature flags
- Extensible: Plugin architecture for adding new backends
- Opcode Ready: Design prepared for future binary code generation
| Architecture | Bits | Endianness | Stack | FP Format | ABI | Syntax |
|---|---|---|---|---|---|---|
| x86 | 32 | Little | Down | IEEE 754 | System V | GAS/NASM |
| x86-64 | 64 | Little | Down | IEEE 754 | System V | GAS/NASM |
| S/370 | 24 | Big | Up | HFP | MVS | HLASM |
| S/370-XA | 31 | Big | Up | HFP | MVS | HLASM |
| S/390 | 31 | Big | Up | HFP | MVS | HLASM |
| z/Architecture | 64 | Big | Up | HFP+IEEE | MVS | HLASM |
| PowerPC 32 | 32 | Big | Down | IEEE 754 | System V | GAS |
| PowerPC 64 | 64 | Big | Down | IEEE 754 | System V | GAS |
| PowerPC 64 LE | 64 | Little | Down | IEEE 754 | System V | GAS |
| ARM64 (Linux) | 64 | Little | Down | IEEE 754 | System V | GAS |
| ARM64 (macOS) | 64 | Little | Down | IEEE 754 | Darwin | GAS |
Floating-Point Formats:
- IEEE 754: Standard IEEE floating-point (binary)
- HFP: IBM Hexadecimal Floating Point (base-16 exponent, used in S/370, S/390)
- HFP+IEEE: Both formats supported (z/Architecture)
OS ABI Variants:
- System V: Standard Unix/Linux ABI
- Darwin: macOS/Apple ABI (underscore prefix, Mach-O format)
- MVS: IBM z/OS ABI
# Build library and examples
make
# Library only
make lib
# Examples only
make examples
# Advanced examples (fp_math_lib, dynamic_array)
make examples-advanced
# Test advanced examples
make test-examples-advanced
# Clean
make clean
# Clean advanced examples
make clean-examples-advanced
# Install (requires root)
sudo make install#include <anvil/anvil.h>
int main(void)
{
// Create context
anvil_ctx_t *ctx = anvil_ctx_create();
// Set target architecture
anvil_ctx_set_target(ctx, ANVIL_ARCH_ZARCH);
// Create module
anvil_module_t *mod = anvil_module_create(ctx, "my_module");
// Create function type: int add(int a, int b)
anvil_type_t *i32 = anvil_type_i32(ctx);
anvil_type_t *params[] = { i32, i32 };
anvil_type_t *func_type = anvil_type_func(ctx, i32, params, 2, false);
// Create function
anvil_func_t *func = anvil_func_create(mod, "add", func_type, ANVIL_LINK_EXTERNAL);
// Set insertion point
anvil_block_t *entry = anvil_func_get_entry(func);
anvil_set_insert_point(ctx, entry);
// Get parameters
anvil_value_t *a = anvil_func_get_param(func, 0);
anvil_value_t *b = anvil_func_get_param(func, 1);
// Build IR: result = a + b
anvil_value_t *result = anvil_build_add(ctx, a, b, "result");
// Build IR: return result
anvil_build_ret(ctx, result);
// Generate code
char *output = NULL;
size_t len = 0;
anvil_module_codegen(mod, &output, &len);
printf("%s", output);
// Cleanup
free(output);
anvil_module_destroy(mod);
anvil_ctx_destroy(ctx);
return 0;
}anvil_build_add: Additionanvil_build_sub: Subtractionanvil_build_mul: Multiplicationanvil_build_sdiv/anvil_build_udiv: Division (signed/unsigned)anvil_build_smod/anvil_build_umod: Modulo (signed/unsigned)anvil_build_neg: Negation
anvil_build_and: ANDanvil_build_or: ORanvil_build_xor: XORanvil_build_not: NOTanvil_build_shl: Shift leftanvil_build_shr: Shift right (logical)anvil_build_sar: Shift right (arithmetic)
anvil_build_cmp_eq/anvil_build_cmp_ne: Equal / Not equalanvil_build_cmp_lt/anvil_build_cmp_le: Less than / Less or equalanvil_build_cmp_gt/anvil_build_cmp_ge: Greater than / Greater or equal- Unsigned versions:
_ult,_ule,_ugt,_uge
anvil_build_alloca: Stack allocationanvil_build_load: Load from memoryanvil_build_store: Store to memoryanvil_build_gep: Get Element Pointer (array indexing)anvil_build_struct_gep: Get Struct Field Pointeranvil_module_add_global: Add global variable
anvil_build_br: Unconditional branchanvil_build_br_cond: Conditional branchanvil_build_call: Function callanvil_build_ret/anvil_build_ret_void: Return
anvil_build_trunc: Truncateanvil_build_zext: Zero extendanvil_build_sext: Sign extendanvil_build_bitcast: Bitcastanvil_build_ptrtoint/anvil_build_inttoptr: Pointer/integer conversion
anvil_build_fadd: FP Additionanvil_build_fsub: FP Subtractionanvil_build_fmul: FP Multiplicationanvil_build_fdiv: FP Divisionanvil_build_fneg: FP Negationanvil_build_fabs: FP Absolute valueanvil_build_fcmp: FP Comparison
anvil_build_fptrunc: Truncate (f64 → f32)anvil_build_fpext: Extend (f32 → f64)anvil_build_fptosi: FP to signed integeranvil_build_fptoui: FP to unsigned integeranvil_build_sitofp: Signed integer to FPanvil_build_uitofp: Unsigned integer to FP
anvil_build_phi: PHI nodeanvil_build_select: Select (ternary)
- Integers:
i8,i16,i32,i64(signed) - Integers:
u8,u16,u32,u64(unsigned) - Floating point:
f32,f64 - Pointers:
anvil_type_ptr(ctx, pointee_type) - Arrays:
anvil_type_array(ctx, elem_type, count) - Structs:
anvil_type_struct(ctx, name, fields, num_fields) - Functions:
anvil_type_func(ctx, ret_type, params, num_params, variadic)
| Architecture | Convention | Description |
|---|---|---|
| x86 | CDECL | Parameters on stack, caller cleanup |
| x86-64 | System V | RDI, RSI, RDX, RCX, R8, R9, then stack |
| S/370 | MVS | R1 points to parameter list |
| S/390 | MVS | R1 points to parameter list |
| z/Arch | z/OS 64-bit | R1 points to parameter list (64-bit) |
| PPC32 | System V | r3-r10 for args, r3 for return |
| PPC64 BE | ELFv1 | r3-r10 for args, function descriptors |
| PPC64 LE | ELFv2 | r3-r10 for args, local entry points |
| ARM64 (Linux) | AAPCS64 | x0-x7 for args, x0 for return |
| ARM64 (macOS) | Apple ARM64 | x0-x7 for args, underscore prefix on symbols |
ANVIL generates code compatible with GCCMVS conventions:
- CSECT: Blank (no module name prefix)
- AMODE/RMODE:
AMODE ANY,RMODE ANYfor maximum flexibility - Function Names: UPPERCASE (e.g.,
FACTORIAL,SUM_ARRAY) - Stack Allocation: Direct stack offset from R13 (no GETMAIN/FREEMAIN)
- VL Bit: NOT cleared, allowing full 31/64-bit addressing
Unlike x86 where the stack grows downward (toward lower addresses), IBM mainframes grow the stack upward (toward higher addresses). ANVIL handles this automatically.
Mainframes use chained save areas instead of push/pop on the stack:
- S/370/S/390: 72 bytes (18 fullwords of 4 bytes)
- z/Architecture: 144 bytes (18 doublewords of 8 bytes)
The mainframe backends generate efficient stack-based code:
- Stack frame allocation via
LA R2,72(,R13)(no GETMAIN overhead) - Proper save area chaining
- Thread-safe execution
- Simplified epilogue (no FREEMAIN cleanup)
Generated mainframe code is in HLASM (High Level Assembler) format:
- Labels in columns 1-8
- Opcodes starting at column 10
- Operands starting at column 16
- Comments with asterisk in column 1
To add support for a new architecture:
-
Create a new file at
src/backend/<arch>/<arch>.c -
Implement the
anvil_backend_ops_tstructure:
const anvil_backend_ops_t anvil_backend_myarch = {
.name = "MyArch",
.arch = ANVIL_ARCH_MYARCH,
.init = myarch_init,
.cleanup = myarch_cleanup,
.reset = myarch_reset, // Clear cached IR pointers (optional but recommended)
.prepare_ir = myarch_prepare_ir, // Prepare/lower IR before codegen (optional)
.codegen_module = myarch_codegen_module,
.codegen_func = myarch_codegen_func,
.get_arch_info = myarch_get_arch_info
};- Add the architecture to
anvil.h:
typedef enum {
// ...
ANVIL_ARCH_MYARCH,
ANVIL_ARCH_COUNT
} anvil_arch_t;- Register the backend in
backend.c:
anvil_register_backend(&anvil_backend_myarch);New optional prepare_ir callback in backend interface allows architecture-specific IR preparation before code generation:
- IR Lowering: Convert unsupported operations to sequences of supported ones
- Peephole Optimizations: Target-specific optimizations on IR level
- Type Legalization: Split 64-bit ops on 32-bit targets, etc.
- Function Analysis: Detect leaf functions, calculate stack frame layout
The ARM64 backend now uses prepare_ir to analyze all functions before code generation.
- Struct field access via
anvil_build_struct_gep()for all mainframe backends - Automatic field offset calculation at compile time
- Efficient
LA(Load Address) instruction for small offsets - Example:
struct Point { int x; int y; }inexamples/struct_test.c
- Full array indexing via
anvil_build_gep()for all mainframe backends - Automatic element size calculation (1, 2, 4, 8 bytes)
- Efficient index multiplication using shifts (
SLL/SLLG) - Example:
sum_array(int *arr, int n)inexamples/array_test.c
- Full floating-point arithmetic for all mainframe backends
- HFP (Hexadecimal FP): S/370, S/370-XA, S/390 (ADR, MDR, DDR instructions)
- IEEE 754 (Binary FP): z/Architecture, S/390 optional (ADBR, MDBR, DDBR instructions)
- FP format selection via
anvil_ctx_set_fp_format() - Float↔Int conversion using Magic Number technique (HFP) or native CFDBR (IEEE)
- Full support for loops (while, for) and conditionals (if/else)
- Proper branch label generation with function-prefixed names (
func$block) - Correct conditional branch code generation
- Stack slot allocation for local variables via
anvil_build_alloca - Direct stack offset addressing for efficient memory access
- Automatic dynamic area sizing including local variables and FP temps
- AHI/AGHI: Add Halfword Immediate for small constants (S/390, z/Architecture)
- Direct stack access: Load/Store directly from stack slots without intermediate registers
- Relative branches: J/JNZ instead of B/BNZ for better performance (S/390+)
- Full support for global variables on all backends
- Direct load/store to globals without intermediate address calculation
- Type-aware storage allocation (C, H, F, FD, E, D for mainframes)
- Support for initialized globals with
DC(Define Constant) - Array constant initializers:
anvil_const_array()andanvil_global_set_initializer() - UPPERCASE naming convention (GCCMVS compatible for mainframes)
- Example:
examples/global_test.c
- PPC32: 32-bit big-endian, System V ABI, GAS output
- PPC64 BE: 64-bit big-endian, ELFv1 ABI with function descriptors (
.opdsection) - PPC64 LE: 64-bit little-endian, ELFv2 ABI with
.localentrydirectives - Full IR operation support: arithmetic, bitwise, memory, control flow, comparisons
- Type conversions: truncation, zero/sign extension, bitcast, pointer-int
- Floating-point operations (IEEE 754): fadd, fsub, fmul, fdiv, fneg, fabs, fcmp
- FP conversions: sitofp, uitofp, fptosi, fptoui, fpext, fptrunc
- Stack slot allocation for local variables (
alloca) - String table management for string literals
- Global variable emission with proper alignment
- GEP and STRUCT_GEP for array and struct access
- CPU Model System: Target-specific code generation based on CPU model (POWER5-POWER10)
ANVIL supports CPU model-specific code generation, allowing optimized code for specific processor generations.
Supported CPU Models:
- PowerPC: G3, G4, 970 (G5), POWER4-POWER10
- z/Architecture: z900, z9, z10, z196, zEC12, z13-z16
- ARM64: Generic, Cortex-A53/A72/A76, Neoverse N1/V1, Apple M1/M2/M3
- x86-64: Generic, Core2, Nehalem, Sandy Bridge, Haswell, Skylake, Ice Lake, Zen/Zen3/Zen4
Usage:
// Set target architecture and CPU model
anvil_ctx_set_target(ctx, ANVIL_ARCH_PPC64);
anvil_ctx_set_cpu(ctx, ANVIL_CPU_PPC64_POWER9);
// Check available features
if (anvil_ctx_has_feature(ctx, ANVIL_FEATURE_PPC_VSX)) {
// VSX vector instructions available
}
// Enable/disable specific features
anvil_ctx_enable_feature(ctx, ANVIL_FEATURE_PPC_HTM);
anvil_ctx_disable_feature(ctx, ANVIL_FEATURE_PPC_VSX);CPU-Specific Optimizations (PPC64):
popcntd: Native on POWER5+, emulated on older CPUsisel: Conditional select on POWER7+, branch-based fallbackldbrx/stdbrx: Byte reversal on POWER7+cmpb: Byte comparison on POWER6+fcpsgn: FP copy sign on POWER7+
Recent fixes and refactoring of the ARM64 backend for robust code generation:
Modular Architecture:
arm64_internal.h: Definitions, structures, and declarationsarm64_helpers.c: Helper functions (type size, stack slots, code emission)arm64_emit.c: Instruction emission (arithmetic, memory, control flow, FP)arm64.c: Main backend (lifecycle, codegen entry points)opt/: Architecture-specific optimization passes
ARM64-Specific Optimizations (src/backend/arm64/opt/):
- Peephole optimizations: Redundant store elimination, load-store same address removal
- Dead store elimination: Remove stores that are immediately overwritten
- Redundant load elimination: Reuse values already loaded from same address
- Branch optimization: Combine cmp+cset+cbnz into cmp+b.cond, use cbz/cbnz/tbz/tbnz
- Immediate optimization: Use immediate forms of instructions when possible
- Conditional branch fusion:
arm64_emit_br_cond()detects comparison results and emitscmp+b.conddirectly - 32-bit register usage: Arithmetic/bitwise ops use W registers for 32-bit types (reduces code size)
- Immediate operands: ADD/SUB/CMP use immediate form for small constants (
add w0, w9, #1) - CBZ/CBNZ optimization:
x == 0usescbz,x != 0usescbnz(saves 1 instruction)
Code Generation Improvements:
- PHI node handling: Correct SSA resolution with copies before branches
- External function calls: Proper handling of
malloc,free,memcpyand other C library functions - SSA value preservation: All instruction results saved to stack slots to prevent register clobbering
- Large stack frames: Support for stack offsets >255 bytes using
x16as scratch register - Very large stack frames (>4095 bytes): Support for stack allocation/deallocation using
mov x16, #offset+sub/add sp, sp, x16sequence - Type-aware load/store: Correct instruction selection based on type size (
ldr w0for 32-bit,ldrb w0for 8-bit) - Sign-extending loads: Proper
ldrsb,ldrsh,ldrswfor signed types to preserve sign in 64-bit registers - Parameter spilling: Function parameters saved to stack at entry for safe access in loops
- macOS global variable syntax: Proper
@PAGE/@PAGEOFFrelocations for Darwin ABI (instead of:lo12:) - Array stack allocation: Correct stack frame sizing for arrays based on element type and count
- Type size calculation:
arm64_type_size()function for accurate allocation of arrays, structs, and primitives - String pointer arrays: Proper emission of string constant pointers in global array initializers (
.quad .LCndirectives) - Variadic function calls (Darwin): Arguments to variadic functions (e.g.,
printf) passed on stack as required by AAPCS64 on macOS - Array initializers in globals: Full support for emitting initialized arrays with correct element values
- Float/double global initializers: Floating-point constants emitted using bit representation (
.long/.quadwith hex values) - Correct store sizes for array elements: Store instructions use source value type size to avoid corrupting adjacent elements in multi-dimensional arrays
New debugging functionality for inspecting IR structures:
#include <anvil/anvil.h> // anvil_debug.h is now included automatically
// Print module IR to stdout
anvil_print_module(mod);
// Print function IR to stdout
anvil_print_func(func);
// Dump to FILE*
anvil_dump_module(stderr, mod);
anvil_dump_func(stderr, func);
anvil_dump_block(stderr, block);
anvil_dump_instr(stderr, instr);
// Convert to string (caller must free)
char *ir_str = anvil_module_to_string(mod);
printf("%s", ir_str);
free(ir_str);
// Check if block has terminator (ret, br, br_cond)
if (!anvil_block_has_terminator(block)) {
anvil_build_ret_void(ctx); // Add implicit return
}
// Check if value is boolean (comparison result)
if (anvil_value_is_bool(cond)) {
// Already boolean, use directly in br_cond
} else {
// Need to compare with zero first
cond = anvil_build_cmp_ne(ctx, cond, zero, "tobool");
}
// Get type of a value
anvil_type_t *type = anvil_value_get_type(val);String escaping: String constants in IR dumps are properly escaped (\n, \t, \0, \xHH for non-printable characters).
Output format:
; ModuleID = 'my_module'
; Functions: 2, Globals: 1
@counter = external global i32 42
define external i32 @factorial(i32 %arg0) {
entry:
%cmp = cmp_le i8 %arg0, 1
br_cond %cmp, label %base_case, label %recurse
...
}
Improved cleanup flow to prevent dangling pointers and use-after-free issues:
- Backend reset function: New
resetcallback inanvil_backend_ops_tto clear cached IR pointers - Safe cleanup order:
anvil_ctx_destroy()now resets backend state before destroying modules - All backends updated: x86, x86-64, ARM64, S/370, S/370-XA, S/390, z/Architecture, PPC32, PPC64, PPC64LE
Three advanced examples demonstrate ANVIL's capabilities for generating linkable libraries:
-
examples/fp_math_lib/: Floating-point math library- Generates exportable FP functions:
fp_add,fp_sub,fp_mul,fp_div,fp_neg,fp_abs - Demonstrates ANVIL IR for floating-point operations
- Includes C test program that links with generated assembly (24 tests)
- Generates exportable FP functions:
-
examples/dynamic_array/: Dynamic array library with C library calls- Demonstrates calling external C functions:
malloc,free,memcpy - Functions:
array_create,array_destroy,array_copy,array_sum,array_max,array_min,array_count_if,array_scale - Shows pointer arithmetic, loops, conditionals, and memory management
- Includes comprehensive test suite (41 tests)
- Demonstrates calling external C functions:
-
examples/base64_lib/: Base64 encoding library- Demonstrates complex bitwise operations, byte manipulation, and lookup table logic
- Functions:
base64_encode,base64_encoded_len - Shows
selectoperations for conditional value computation - Includes test suite with RFC 4648 test vectors (28 tests)
ANVIL includes a configurable optimization pass infrastructure that can be enabled or disabled.
| Level | Name | Description |
|---|---|---|
| O0 | ANVIL_OPT_NONE |
No optimization (default) |
| Og | ANVIL_OPT_DEBUG |
Debug-friendly: copy propagation, store-load propagation |
| O1 | ANVIL_OPT_BASIC |
Og + constant folding, DCE |
| O2 | ANVIL_OPT_STANDARD |
O1 + CFG simplification, strength reduction, memory opts, CSE |
| O3 | ANVIL_OPT_AGGRESSIVE |
O2 + loop unrolling |
| Pass | Level | Description |
|---|---|---|
| Constant Folding | O1+ | Evaluates constant expressions at compile time (3 + 5 → 8) |
| Dead Code Elimination (DCE) | O1+ | Removes unused instructions |
| Copy Propagation | Og+ | Replaces uses of copied values with originals |
| Store-Load Propagation | Og+ | Replaces load after store with stored value |
| Strength Reduction | O2+ | Replaces expensive ops with cheaper ones (x * 8 → x << 3) |
| CFG Simplification | O2+ | Merges blocks, removes unreachable code |
| Dead Store Elimination | O2+ | Removes stores overwritten before read |
| Redundant Load Elimination | O2+ | Reuses loaded values from same address |
| Common Subexpression Elimination (CSE) | O2+ | Reuses computed values |
| Loop Unrolling | O3+ | Unrolls small loops with known trip counts (experimental) |
#include <anvil/anvil_opt.h>
// Set optimization level
anvil_ctx_set_opt_level(ctx, ANVIL_OPT_STANDARD);
// Optimize module before codegen
anvil_module_optimize(mod);
// Or fine-grained control
anvil_pass_manager_t *pm = anvil_ctx_get_pass_manager(ctx);
anvil_pass_manager_enable(pm, ANVIL_PASS_CONST_FOLD);
anvil_pass_manager_disable(pm, ANVIL_PASS_DCE);Before optimization:
LA R2,3 Load constant 3
AHI R2,5 Add 5
LR R15,R2 Result in R15
After optimization:
LA R15,8 Load constant 8 directly
Before optimization:
LA R3,8 Load constant 8
MSR R2,R3 Multiply (expensive)
After optimization:
LA R3,3 Load shift amount
SLL R2,0(R3) Shift left by 3 (x * 8 = x << 3)
- Binary opcode generation
- ASI/AGSI optimization (Add to Storage Immediate)
- Register allocation improvements
- RISC-V support
- Debug info (DWARF)
- Extend CPU model system to more backends (ARM64, z/Architecture, x86-64)
See DOCUMENTATION.md for complete API reference and detailed usage examples.
Unlicense