A bare-metal C unikernel for serving large language models -- no OS, no overhead.
CLLM is a Multiboot-compliant unikernel written in C that boots directly on bare metal (or in QEMU) and serves LLM inference over HTTP. It eliminates the operating system layer entirely -- the kernel is the application.
The kernel includes a custom libc subset, PCI bus enumeration, an Intel e1000 NIC driver, an HTTP server with REST API endpoints, and a model loading interface compatible with llama.cpp.
# Prerequisites: gcc (with -m32 support), make, qemu-system-i386
sudo apt-get install gcc gcc-multilib make qemu-system-x86
# Build and run
git clone git@github.com:cognisoc/cllm.git
cd cllm
make runSerial output appears on your terminal. Press Ctrl-A X to exit QEMU.
| Target | Description |
|---|---|
make |
Build release kernel (build/kernel.bin) |
make debug |
Build with debug symbols |
make run |
Build and boot in QEMU (serial on stdio) |
make run-vga |
Build and boot in QEMU (VGA window) |
make run-debug |
Build and boot paused for GDB on :1234 |
make clean |
Remove build artifacts |
+-----------------------------------------------------------+
| QEMU / Bare Metal (x86, Multiboot) |
+-----------------------------------------------------------+
| boot.S Multiboot entry, stack, serial init |
| kernel.c Kernel main, VGA terminal, serial I/O |
| memory.c Heap allocator (malloc/free) |
| string.c libc subset (snprintf, memcpy, ...) |
| network.c PCI enumeration + e1000 NIC driver |
| http.c / api.c HTTP server, request routing |
| api_v1.c llama.cpp-compatible REST API |
| llm.c Model loading and inference interface |
+-----------------------------------------------------------+
The kernel boots via Multiboot, initializes serial and VGA output, brings up an e1000 network interface via PCI, and enters a packet-processing loop that serves HTTP requests for LLM inference.
src/ C source files (kernel, drivers, HTTP, LLM)
include/ Header files
build/ Build scripts, linker script, artifacts
documentation/ MkDocs documentation site
llama.cpp/ llama.cpp headers for model integration
- Multiboot kernel with VGA + serial output
- Custom libc (malloc, snprintf, string ops)
- PCI enumeration and e1000 NIC driver
- HTTP server with REST API endpoints
- llama.cpp-compatible API (v1 endpoints)
- Integrate llama.cpp inference engine
- GPU passthrough (CUDA backend)
- Streaming token generation
- vLLM optimizations for transformer serving
- Architecture Overview
- Getting Started
- Project Specification
- GPU Backend Analysis
- llama.cpp Integration
- HTTP Server Design
See LICENSE for details.