Skip to content

cognisoc/cllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLLM

A bare-metal C unikernel for serving large language models -- no OS, no overhead.

Build License Platform Language

What is CLLM?

CLLM is a Multiboot-compliant unikernel written in C that boots directly on bare metal (or in QEMU) and serves LLM inference over HTTP. It eliminates the operating system layer entirely -- the kernel is the application.

The kernel includes a custom libc subset, PCI bus enumeration, an Intel e1000 NIC driver, an HTTP server with REST API endpoints, and a model loading interface compatible with llama.cpp.

Quick Start

# Prerequisites: gcc (with -m32 support), make, qemu-system-i386
sudo apt-get install gcc gcc-multilib make qemu-system-x86

# Build and run
git clone git@github.com:cognisoc/cllm.git
cd cllm
make run

Serial output appears on your terminal. Press Ctrl-A X to exit QEMU.

Make Targets

Target Description
make Build release kernel (build/kernel.bin)
make debug Build with debug symbols
make run Build and boot in QEMU (serial on stdio)
make run-vga Build and boot in QEMU (VGA window)
make run-debug Build and boot paused for GDB on :1234
make clean Remove build artifacts

Architecture

+-----------------------------------------------------------+
|  QEMU / Bare Metal  (x86, Multiboot)                     |
+-----------------------------------------------------------+
|  boot.S             Multiboot entry, stack, serial init   |
|  kernel.c           Kernel main, VGA terminal, serial I/O |
|  memory.c           Heap allocator (malloc/free)          |
|  string.c           libc subset (snprintf, memcpy, ...)   |
|  network.c          PCI enumeration + e1000 NIC driver    |
|  http.c / api.c     HTTP server, request routing          |
|  api_v1.c           llama.cpp-compatible REST API         |
|  llm.c              Model loading and inference interface |
+-----------------------------------------------------------+

The kernel boots via Multiboot, initializes serial and VGA output, brings up an e1000 network interface via PCI, and enters a packet-processing loop that serves HTTP requests for LLM inference.

Project Structure

src/            C source files (kernel, drivers, HTTP, LLM)
include/        Header files
build/          Build scripts, linker script, artifacts
documentation/  MkDocs documentation site
llama.cpp/      llama.cpp headers for model integration

Roadmap

  • Multiboot kernel with VGA + serial output
  • Custom libc (malloc, snprintf, string ops)
  • PCI enumeration and e1000 NIC driver
  • HTTP server with REST API endpoints
  • llama.cpp-compatible API (v1 endpoints)
  • Integrate llama.cpp inference engine
  • GPU passthrough (CUDA backend)
  • Streaming token generation
  • vLLM optimizations for transformer serving

Documentation

License

See LICENSE for details.