Pure Rust Inference Engine
-
Updated
Jun 13, 2026 - Rust
Pure Rust Inference Engine
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
Serve the home! Inference stack for your Nvidia DGX Spark aka the Grace Blackwell AI supercomputer on your desk. Mostly vLLM based for now and single-spark. For the not-so-rich buddies. If you want latest/in-testing, look at the branches
DGX Spark / GB10 vLLM Docker stack for large-model serving, presets, patches, and validation notes.
Local diagnostic CLI for NVIDIA DGX Spark (GB10). Detects power caps, unified memory pressure, thermal risk, Docker/runtime issues, and validates vLLM/Ollama/llama.cpp/SGLang recipes.
Headless 4K remote desktop for the NVIDIA DGX Spark (GB10): one-command installer for Sunshine + Moonlight low-latency game streaming with NVENC hardware encoding, a software virtual display (no HDMI dummy plug), GDM autologin, and optional Tailscale.
llama.cpp fork optimized for NVIDIA DGX Spark / GB10 (Blackwell, SM 12.1) — TurboQuant weights + KV, NVFP4, DFlash MTP
Operator-grade GPU monitor for NVIDIA GPUs with native GB10 / DGX Spark coherent UMA support — PSI pressure, clock detection, ConnectX-7 network layer
Single-file web UI for NVIDIA DGX Spark — pull Ollama models, browse and download from HuggingFace, manage LiteLLM routing, and control SGLang, vLLM, llama.cpp, LocalAI, and ComfyUI. All from one browser tab.
Turn any NVIDIA GPU into a local AI platform. Inference + fine-tuning in your browser. One command to start, automatic clustering.
SGLang optimizations for NVIDIA Spark (GB10) — SM121 Grace Blackwell
7.67× LoRA / 8.35× Full FT speedup for Qwen3.5 (0.8B–27B) on NVIDIA DGX Spark — wall-clock parity with rented H100. Lossless within BF16. Three-command interactive wizard handles model picker, data validator, training, and merge.
NVIDIA DGX Spark workstation — self-hosted LLM stack (vLLM, llama.cpp, Ollama + Open WebUI) behind Traefik, with Cloudflare Tunnel + Tailscale ingress and Netdata observability.
Enhanced GPU throttle diagnostic for DGX Spark (GB10): NVML direct telemetry, throttle cause decoder, PCIe link monitoring, baseline drift detection, timeline capture.
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
Add a description, image, and links to the gb10 topic page so that developers can more easily learn about it.
To associate your repository with the gb10 topic, visit your repo's landing page and select "manage topics."