Unified GPU + model platform for Kubernetes.
KPilot is a control plane for running GPU workloads on Kubernetes. Cluster operations, Volcano-based batch scheduling, vGPU governance, hardware telemetry, plugin lifecycle, and model serving live behind one console with a consistent permission and audit surface.
Multi-cluster is the default — a single KPilot Server manages many clusters, with the in-cluster agent dialing back over gRPC. No inbound ports on the cluster side, no shared kubeconfigs, no per-cloud divergence.
Server owns the UI, API, and durable state (cluster registry, plugin metadata, accounts) but holds no kubeconfigs. Worker runs inside each managed cluster, dials the Server over a single long-lived gRPC stream, and brokers every Kubernetes operation on its behalf — no inbound ports, no shared credentials, no cross-cloud divergence. Plugins ship as Helm charts and reconcile via an in-cluster CRD, executing in the cluster's own RBAC context.
Install the Server (control-plane cluster):
helm install kpilot oci://ghcr.io/togettoyou/charts/kpilot \
--version 0.0.0-dev \
--namespace kpilot-system --create-namespace \
--set server.admin.password='<change-me>'Port-forward the UI and log in with kpilot / <your password>:
kubectl -n kpilot-system port-forward svc/kpilot-server 8080:80
open http://localhost:8080Install the Worker (each managed cluster). Create a cluster row in the UI, copy the one-time ClusterToken, then:
helm install kpilot-worker oci://ghcr.io/togettoyou/charts/kpilot \
--version 0.0.0-dev \
--namespace kpilot-system --create-namespace \
--set server.enabled=false,worker.enabled=true,postgresql.enabled=false \
--set worker.serverAddr='kpilot-server-grpc.kpilot-system.svc:9090' \
--set worker.clusterToken='<paste-token>'The cluster row in the Server UI transitions to Online within a few seconds. Production exposure (Ingress, external Postgres, image registry mirrors) is covered in deploy/README.md.
- Multi-cluster GPU operations — run a single platform team across clusters in different VPCs, regions, or clouds without touching network policies.
- Shared GPU tenancy — partition each card into vGPU slices and govern allocation through Volcano queues with explicit capability / guarantee / deserved policies.
- GPU usage metering — produce GPU-Hour reports per node and per card straight from DCGM, then drill into hotspots from the same UI.
- Self-service AI platform (roadmap) — let teams deploy inference endpoints from a model catalog and run distributed fine-tuning without writing YAML.
Cluster Management
|
Compute Scheduling
|
GPU Observability
|
Plugin Management
|
Cluster Management — docs/clusters.md
Compute Scheduling — docs/compute.md
Plugin Management — docs/plugins.md
Coming in upcoming releases:
- Model repository with curated vLLM templates for Qwen, DeepSeek, Llama, and other open-weights families
- One-click inference deployment with a built-in chat playground
- OpenAI-compatible routing with canary and A/B controls
- Distributed fine-tuning on Volcano gang scheduling