A research prototype that aggregates multiple x86_64 machines into one large NUMA system presented to a guest OS via UEFI/ACPI.
Mission Critical: Build for maximum accessibility
- TCP first, RDMA optional - Works on ANY standard Ethernet hardware (consumer-grade, no special NICs required)
- Zero-config networking - Plug-and-play deployment, automatic transport detection
- Graceful performance scaling - 200-500Β΅s on 10G Ethernet (good), <100Β΅s on RDMA (excellent)
- Cost barrier: $0 - No specialized hardware required to start, optional upgrade path to RDMA
π See QUICKSTART.md for 3-step setup on consumer hardware
Milestones Complete: M0 (VMM Skeleton) β
| M1 (Userfaultfd Pager) β
Current Phase: M2 (RDMA Transport) π§
Test Coverage: 63 tests, 100% pass rate, ~85% coverage
| Component | Status | Tests | Coverage |
|---|---|---|---|
| VMM | β Complete | 4 | ~70% |
| Pager | β Complete | 17 | ~90% |
| RDMA Transport | π§ Framework | 13 | ~85% |
| ACPI Generator | π§ Framework | 7 | ~80% |
| Coordinator | β Complete | 22 | ~95% |
Hardware: ANY 2+ Linux machines with standard Ethernet (1G, 10G, or better)
- No special RDMA NICs required (optional upgrade for <100Β΅s latency)
- Consumer-grade hardware fully supported (desktops, laptops, cloud VMs)
Software:
- Rust stable toolchain (2021 edition)
- Python 3.10+
- Linux kernel 6.2+ (for KVM and userfaultfd)
- KVM support (check with
lsmod | grep kvm)
# Build all components
cargo build --workspace --release
# Run all tests
./dev.sh test
# Or use individual commands
cargo test --workspace # Rust tests
cd coordinator && pytest -v # Python tests# Start the coordinator (control plane)
./dev.sh start
# API docs: http://localhost:8000/docs
# Run VMM (single node)
cargo run --bin vmm -- --node-id 0 --total-nodes 1
# Generate ACPI tables
cargo run --bin acpi-gen -- cluster-config.yaml- ARCHITECTURE.md - π― Core design philosophy: TCP-first accessibility
- QUICKSTART.md - π 3-step setup for consumer hardware
- DEVELOPMENT.md - Development guide and architecture
- STATUS.md - Detailed milestone tracking
- TEST_COVERAGE.md - Test inventory and coverage
- TDD_SUMMARY.md - TDD methodology and implementation
- TESTING.md - Quick testing reference
- docs/01_problem_statement.md - Problem statement & scope
- docs/02_system_requirements.md - Requirements & milestones
VMM (Virtual Machine Monitor)
- KVM integration for VM lifecycle management
- Guest physical memory allocation and mapping
- vCPU creation and management
- Located in
vmm/
Pager (Distributed Memory Manager)
- Userfaultfd-based fault handling
- First-touch page allocation policy
- Page directory for ownership tracking
- Statistics collection (latency, fault rate)
- Located in
pager/
Transport Layer (rdma-transport/)
- TCP transport (default) - Works on ANY network hardware
- RDMA transport (optional) - High-performance upgrade path
- Auto-detection and graceful fallback
- Page fetch/send API with <500Β΅s latency on consumer hardware
ACPI Generator
- NUMA topology table generation (SRAT, SLIT, HMAT)
- Cluster configuration support
- Located in
acpi-gen/
Coordinator (Control Plane)
- FastAPI REST API for cluster management
- Node join/leave orchestration
- Metrics exposition
- Located in
coordinator/
All components have comprehensive test coverage following TDD principles:
# Run all tests
./dev.sh test
# Run specific component tests
cargo test -p pager
cargo test -p rdma-transport
pytest coordinator/test_coordinator.py
# Generate coverage report
./dev.sh coverage
# Watch mode for TDD
./dev.sh watchTest Summary:
- β 41 Rust unit tests across 4 components
- β 22 Python tests for REST API
- β 100% pass rate
- β ~85% code coverage
Use the included dev.sh script for common tasks:
./dev.sh help # Show all commands
./dev.sh build # Build all components
./dev.sh test # Run all tests
./dev.sh lint # Run linters
./dev.sh fix # Auto-fix issues
./dev.sh watch # Watch mode for TDD
./dev.sh coverage # Generate coverage report
./dev.sh stats # Show project statisticsThe project includes .vscode/tasks.json for common development tasks:
Ctrl+Shift+B- Build AllCtrl+Shift+T- Test All- Access via
Terminal > Run Task...
M2: RDMA Transport (Current)
- Implement actual RDMA operations using ibverbs
- Target: <100Β΅s median latency, <500Β΅s p99
M3: Two-Node Bring-Up
- Integrate coordinator with VMMs
- Cross-node page fault resolution
- Boot Linux guest spanning 2 nodes
M4: ACPI NUMA
- Binary ACPI table encoding
- OVMF firmware integration
- Guest NUMA topology recognition
M5: Windows Boot
- Windows-compatible ACPI tables
- VirtIO device support
- Windows guest testing
M6: Telemetry & Placement
- Page heat tracking
- Migration policies (LRU, affinity-based)
- Prometheus metrics
M7: Hardening
- Huge page support
- Failure recovery
- Performance optimization
| Metric | Target | Status |
|---|---|---|
| Remote fault latency (median) | <100Β΅s | π§ M2 |
| Remote fault latency (p99) | <500Β΅s | π§ M2 |
| Remote miss ratio | <5% | π§ M6 |
| RDMA bandwidth | >10 GB/s | π§ M2 |
This is a research prototype. Contributions welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Follow TDD principles (write tests first)
- Ensure all tests pass (
./dev.sh test) - Run linters (
./dev.sh lint) - Submit a pull request
Apache-2.0 (see LICENSE)