Skip to content

Donovoi/ssi-hv-starter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SSI-HV (Single‑System‑Image Hypervisor)

Tests License

A research prototype that aggregates multiple x86_64 machines into one large NUMA system presented to a guest OS via UEFI/ACPI.

🎯 Core Design Principles

Mission Critical: Build for maximum accessibility

  • TCP first, RDMA optional - Works on ANY standard Ethernet hardware (consumer-grade, no special NICs required)
  • Zero-config networking - Plug-and-play deployment, automatic transport detection
  • Graceful performance scaling - 200-500Β΅s on 10G Ethernet (good), <100Β΅s on RDMA (excellent)
  • Cost barrier: $0 - No specialized hardware required to start, optional upgrade path to RDMA

πŸ‘‰ See QUICKSTART.md for 3-step setup on consumer hardware

🎯 Project Status

Milestones Complete: M0 (VMM Skeleton) βœ… | M1 (Userfaultfd Pager) βœ…
Current Phase: M2 (RDMA Transport) 🚧
Test Coverage: 63 tests, 100% pass rate, ~85% coverage

Component Status Tests Coverage
VMM βœ… Complete 4 ~70%
Pager βœ… Complete 17 ~90%
RDMA Transport 🚧 Framework 13 ~85%
ACPI Generator 🚧 Framework 7 ~80%
Coordinator βœ… Complete 22 ~95%

πŸš€ Quick Start

Prerequisites

Hardware: ANY 2+ Linux machines with standard Ethernet (1G, 10G, or better)

  • No special RDMA NICs required (optional upgrade for <100Β΅s latency)
  • Consumer-grade hardware fully supported (desktops, laptops, cloud VMs)

Software:

  • Rust stable toolchain (2021 edition)
  • Python 3.10+
  • Linux kernel 6.2+ (for KVM and userfaultfd)
  • KVM support (check with lsmod | grep kvm)

Build & Test

# Build all components
cargo build --workspace --release

# Run all tests
./dev.sh test

# Or use individual commands
cargo test --workspace              # Rust tests
cd coordinator && pytest -v         # Python tests

Run Components

# Start the coordinator (control plane)
./dev.sh start
# API docs: http://localhost:8000/docs

# Run VMM (single node)
cargo run --bin vmm -- --node-id 0 --total-nodes 1

# Generate ACPI tables
cargo run --bin acpi-gen -- cluster-config.yaml

πŸ“š Documentation

πŸ—οΈ Architecture

Components

VMM (Virtual Machine Monitor)

  • KVM integration for VM lifecycle management
  • Guest physical memory allocation and mapping
  • vCPU creation and management
  • Located in vmm/

Pager (Distributed Memory Manager)

  • Userfaultfd-based fault handling
  • First-touch page allocation policy
  • Page directory for ownership tracking
  • Statistics collection (latency, fault rate)
  • Located in pager/

Transport Layer (rdma-transport/)

  • TCP transport (default) - Works on ANY network hardware
  • RDMA transport (optional) - High-performance upgrade path
  • Auto-detection and graceful fallback
  • Page fetch/send API with <500Β΅s latency on consumer hardware

ACPI Generator

  • NUMA topology table generation (SRAT, SLIT, HMAT)
  • Cluster configuration support
  • Located in acpi-gen/

Coordinator (Control Plane)

  • FastAPI REST API for cluster management
  • Node join/leave orchestration
  • Metrics exposition
  • Located in coordinator/

πŸ§ͺ Testing

All components have comprehensive test coverage following TDD principles:

# Run all tests
./dev.sh test

# Run specific component tests
cargo test -p pager
cargo test -p rdma-transport
pytest coordinator/test_coordinator.py

# Generate coverage report
./dev.sh coverage

# Watch mode for TDD
./dev.sh watch

Test Summary:

  • βœ… 41 Rust unit tests across 4 components
  • βœ… 22 Python tests for REST API
  • βœ… 100% pass rate
  • βœ… ~85% code coverage

πŸ”§ Development

Helper Script

Use the included dev.sh script for common tasks:

./dev.sh help          # Show all commands
./dev.sh build         # Build all components
./dev.sh test          # Run all tests
./dev.sh lint          # Run linters
./dev.sh fix           # Auto-fix issues
./dev.sh watch         # Watch mode for TDD
./dev.sh coverage      # Generate coverage report
./dev.sh stats         # Show project statistics

VS Code Integration

The project includes .vscode/tasks.json for common development tasks:

  • Ctrl+Shift+B - Build All
  • Ctrl+Shift+T - Test All
  • Access via Terminal > Run Task...

Next Steps (M2-M7)

M2: RDMA Transport (Current)

  • Implement actual RDMA operations using ibverbs
  • Target: <100Β΅s median latency, <500Β΅s p99

M3: Two-Node Bring-Up

  • Integrate coordinator with VMMs
  • Cross-node page fault resolution
  • Boot Linux guest spanning 2 nodes

M4: ACPI NUMA

  • Binary ACPI table encoding
  • OVMF firmware integration
  • Guest NUMA topology recognition

M5: Windows Boot

  • Windows-compatible ACPI tables
  • VirtIO device support
  • Windows guest testing

M6: Telemetry & Placement

  • Page heat tracking
  • Migration policies (LRU, affinity-based)
  • Prometheus metrics

M7: Hardening

  • Huge page support
  • Failure recovery
  • Performance optimization

πŸ“Š Performance Targets

Metric Target Status
Remote fault latency (median) <100¡s 🚧 M2
Remote fault latency (p99) <500¡s 🚧 M2
Remote miss ratio <5% 🚧 M6
RDMA bandwidth >10 GB/s 🚧 M2

🀝 Contributing

This is a research prototype. Contributions welcome!

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing)
  3. Follow TDD principles (write tests first)
  4. Ensure all tests pass (./dev.sh test)
  5. Run linters (./dev.sh lint)
  6. Submit a pull request

πŸ“ License

Apache-2.0 (see LICENSE)

πŸ”— References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published