Single-binary Go pod orchestrator for GPU hosts. Accepts Kubernetes manifests and runs workloads via Podman on a single node. Connects to NATS for messaging.
- Kubernetes manifest support -- Pod, Job, CronJob, Deployment, StatefulSet
- NATS messaging -- apply, delete, get, list pods over request/reply subjects
- HTTP REST API -- health checks, resource queries, and pod CRUD (v1.2.0)
- Resource-aware scheduling -- CPU, memory, and GPU tracking with allocatable limits
- Priority-based preemption -- lower-priority pods evicted to make room for higher-priority work
- SQLite state persistence -- WAL-mode database for crash recovery (v1.1.0)
- Pod re-discovery -- recovers running pods from Podman after restart (v1.1.0)
- Retention pruning -- automatic cleanup of completed/failed pods (v1.1.0)
- Resource reconciliation -- periodic sync of actual vs requested CPU/memory/GPU usage (v1.2.0)
- Graceful shutdown -- configurable drain timeout with ordered teardown (v1.2.0)
- GPU detection -- NVIDIA GPU support with unified memory fallback (GB10)
- CronJob scheduling -- cron expression parsing with Allow/Forbid/Replace concurrency policies
- Directory watcher -- manifest hot-loading with SHA-256 change detection
- Heartbeat publishing -- periodic node status over NATS
- Event and log streaming -- lifecycle events and container logs published over NATS
- Prometheus metrics -- /metrics endpoint with node, pod, and scheduler metrics (v1.3.0)
- HTTP authentication -- bearer token auth with configurable token file (v1.3.0)
- Pod logs via HTTP -- tail and SSE streaming of pod logs (v1.3.0)
- Pod events via HTTP -- lifecycle event history with time filtering (v1.3.0)
- Structured JSON logging -- configurable log format for aggregation (v1.3.0)
- EmptyDir volumes -- tmpfs-backed scratch volumes for pods (v1.3.0)
- Pod exec -- execute commands inside running containers via HTTP (v1.4.0)
- Container port mapping -- expose container ports to the host via podman --publish (v1.4.0)
- Init containers -- sequential initialization containers before main containers (v1.4.0)
- GPU device assignment -- per-pod GPU device isolation via NVIDIA_VISIBLE_DEVICES (v1.4.0)
- Image management -- list and pull container images via HTTP API (v1.4.0)
- Security context -- runAsUser, privileged, capabilities add/drop forwarded to podman (v1.5.0)
- Manifest removal -- file deletion stops pods, releases resources, unregisters cron jobs (v1.5.0)
- CronJob registration on all paths -- NATS, HTTP, and filesystem all register cron jobs (v1.5.0)
- Stuck pod recovery -- Scheduled and Preempted pods recovered after timeout (v1.5.0)
- GPU count-based scheduling --
nvidia.com/gpu: Nallocates N device slots, separate from GPU memory (v1.6.0) - Liveness probes -- exec and HTTP probes with configurable thresholds; reconciler restarts on failure (v1.6.0)
- CronJob HTTP management -- list, inspect, and unregister cron jobs via REST API (v1.6.0)
- Node info endpoint -- GPU model, device count, device IDs, CPU, memory, OS via HTTP (v1.6.0)
go build ./cmd/spark
./spark --nats nats://localhost:4222Spark will detect system resources (CPU, memory, GPU), connect to NATS, and begin watching /etc/spark/manifests for Kubernetes manifests.
| Flag | Default | Description |
|---|---|---|
--nats |
nats://localhost:4222 |
NATS server URL |
--node-id |
hostname | Node identifier |
--manifest-dir |
/etc/spark/manifests |
Directory to watch for manifests |
--gpu-max |
1 |
Max concurrent GPU pods |
--heartbeat-interval |
10s |
Heartbeat publish interval |
--reconcile-interval |
5s |
Reconciliation loop interval |
--system-reserve-cpu |
2000 |
CPU millicores reserved for system |
--system-reserve-memory |
4096 |
MB of RAM reserved for system |
--state-db |
/var/lib/spark/state.db |
Path to SQLite database file |
--pod-retention |
168h |
Retention period for completed/failed pods |
--http-addr |
:8080 |
HTTP listen address |
--shutdown-timeout |
30s |
Max time to drain pods on shutdown |
--reconcile-resources-interval |
60s |
Resource reconciliation interval |
--log-format |
text |
Log output format (text or json) |
--api-token-file |
(empty) | Path to file containing API bearer token |
--housekeeping-interval |
1m |
Housekeeping loop interval |
--completed-pod-ttl |
1h |
TTL after which Completed pods are reaped (0 disables) |
--failed-pod-ttl |
24h |
TTL after which Failed pods are reaped (0 disables) |
--orphan-reap-ttl |
1h |
TTL after which terminal-state orphan podman pods are reaped (0 disables) |
--image-prune-interval |
24h |
Interval between podman image prune -f runs (0 disables) |
Per-pod TTL override is available via the spark.feza.ai/ttl-after-finished annotation (any value parseable by time.ParseDuration; 0s disables cleanup for that pod).
All endpoints are served on the address specified by --http-addr (default :8080).
curl http://localhost:8080/healthz{"status": "ok"}curl http://localhost:8080/api/v1/resources{
"total": {"cpu_millis": 20000, "memory_mb": 131072, "gpu_memory_mb": 131072},
"reserved": {"cpu_millis": 2000, "memory_mb": 4096, "gpu_memory_mb": 0},
"allocated": {"cpu_millis": 4000, "memory_mb": 8192, "gpu_memory_mb": 65536},
"available": {"cpu_millis": 14000, "memory_mb": 118784, "gpu_memory_mb": 65536}
}curl http://localhost:8080/api/v1/pods[
{"name": "myapp", "status": "Running", "created_at": "2026-03-19T10:00:00Z"}
]curl http://localhost:8080/api/v1/pods/myappReturns the full pod record including spec, status, events, and timestamps.
curl -X POST http://localhost:8080/api/v1/pods \
-H "Content-Type: application/yaml" \
-d @pod.yamlAccepts a Kubernetes manifest (YAML) in the request body. Supports Pod, Job, CronJob, Deployment, and StatefulSet kinds.
curl -X DELETE http://localhost:8080/api/v1/pods/myapp{"deleted": "myapp"}curl http://localhost:8080/metricsReturns node and pod metrics in Prometheus text exposition format (v0.0.4).
curl http://localhost:8080/api/v1/pods/myapp/logs?tail=50Returns the last 50 lines of pod logs as text/plain. Use ?follow=true for SSE streaming.
curl http://localhost:8080/api/v1/pods/myapp/eventsReturns pod lifecycle events as JSON. Use ?since=2026-03-19T00:00:00Z to filter by time.
curl -X POST http://localhost:8080/api/v1/pods/myapp/exec \
-H "Content-Type: application/json" \
-d '{"command":["ls","-la"]}'{"stdout":"total 0\ndrwxr-xr-x ...","stderr":"","exit_code":0}curl http://localhost:8080/api/v1/imagescurl -X POST http://localhost:8080/api/v1/images/pull \
-H "Content-Type: application/json" \
-d '{"image":"localhost:5000/mymodel:latest"}'curl http://localhost:8080/api/v1/node{
"hostname": "dgx-spark",
"os": "linux",
"arch": "arm64",
"cpu_cores": 72,
"memory_total_mb": 131072,
"gpu_model": "NVIDIA GH200",
"gpu_count": 1,
"gpu_device_ids": [0],
"gpu_memory_mb": 131072
}curl http://localhost:8080/api/v1/cronjobs[
{"name": "train-nightly", "schedule": "0 2 * * *", "next_run": "2026-03-21T02:00:00Z", "run_count": 14}
]curl http://localhost:8080/api/v1/cronjobs/train-nightlycurl -X DELETE http://localhost:8080/api/v1/cronjobs/train-nightlyWhen --api-token-file is set, all HTTP endpoints except /healthz and /metrics require a bearer token:
curl -H "Authorization: Bearer <token>" http://localhost:8080/api/v1/podsWithout the header, requests return 401 Unauthorized. If --api-token-file is not set, authentication is disabled.
| Subject | Purpose |
|---|---|
req.spark.apply |
Apply a pod manifest (request/reply) |
req.spark.delete |
Delete a pod (request/reply) |
req.spark.get |
Get pod status (request/reply) |
req.spark.list |
List all pods (request/reply) |
evt.spark.event.{pod} |
Pod lifecycle events |
log.spark.{pod} |
Container log streaming |
heartbeat.spark.{node} |
Node heartbeat with resource usage |
The deploy/ directory contains everything needed to run Spark on a DGX or similar GPU host:
| File | Purpose |
|---|---|
setup-dgx.sh |
Full DGX setup: installs NATS, Spark, and systemd services |
setup-registry.sh |
Sets up a local OCI registry on port 5000 |
spark.service |
Systemd unit for Spark |
nats-server.service |
Systemd unit for NATS |
registry.service |
Systemd unit for local OCI registry |
spark.env |
Environment variables for the Spark service |
install.sh |
Binary installation script |
nfpm/ |
Deb packaging configuration |
ssh ndungu@192.168.86.250
sudo bash deploy/setup-dgx.shThis installs Spark and NATS as systemd services, configures the manifest directory, and starts both services.
Spark uses a local OCI registry at localhost:5000 to store and serve container images, avoiding remote pulls during workload execution.
# Set up the registry
sudo bash deploy/setup-registry.sh
# Push an image
podman build -t myapp:latest .
podman tag myapp:latest localhost:5000/myapp:latest
podman push localhost:5000/myapp:latestReference images in pod manifests with the localhost:5000 prefix:
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp
image: localhost:5000/myapp:latestcmd/spark/ Entry point: flags, startup, signal handling
internal/
api/ HTTP REST API handlers (health, resources, node, pods, exec, logs, events, images, cronjobs, metrics, auth)
bus/ NATS bus abstraction, protocol handlers, event/log publishers
cron/ Cron expression parser and scheduled job trigger
executor/ Podman interface: pod create, stop, exec, logs, image pull, stats, liveness probes
gpu/ GPU detection (nvidia-smi), device enumeration, and system resource detection
lifecycle/ Graceful shutdown coordinator with pod draining
manifest/ K8s YAML parser (Pod, Job, CronJob, Deployment, StatefulSet) with ports, init containers, securityContext, livenessProbe
metrics/ Prometheus metrics collector and text renderer
reconciler/ Desired-state reconciliation loop, pod recovery, resource sync, liveness probe polling
scheduler/ Resource-aware scheduling with priority preemption and GPU count-based device slot tracking
state/ Pod state store (in-memory + SQLite WAL persistence) with source path tracking
watcher/ Manifest directory poller (SHA-256 change detection)
# Build
go build ./cmd/spark
# Test
go test ./... -race -timeout 120s
# Lint
go vet ./...
staticcheck ./...- Go standard library only, except
github.com/nats-io/nats.goandmodernc.org/sqlite. - Podman, not Docker.
- Standard
flagpackage for CLI flags. - HTTP routing via
net/http.ServeMuxwith Go 1.22+ method-aware patterns.
| ADR | Title |
|---|---|
| 001 | Go standard library only |
| 002 | NATS protocol design |
| 003 | Local OCI registry |
| 004 | K8s manifest compatibility |
| 005 | Priority preemption algorithm |
| 006 | Resource-aware scheduling |
| 007 | Ubuntu deb packaging |
| 008 | SQLite state persistence |
| 009 | HTTP API design |
| 010 | Prometheus metrics via stdlib |
| 011 | HTTP bearer token authentication |
MIT License. See LICENSE for details.