Kubernetes Observability Platform for self-hosted, air-gapped, multi-cluster environments.
Built on OpenTelemetry, Beyla eBPF, and ClickHouse.
K-O11y is a self-hosted Kubernetes observability platform that unifies metrics, logs, and traces across multiple clusters. Built on OpenTelemetry + Beyla eBPF with a 2-tier Host–Agent architecture and ClickHouse-backed storage with automatic Hot → Warm (S3) → Cold (Glacier IR) tiering.
Capture stack traces with spanID and traceID attached — jump straight from an exception into its distributed trace.
Per-signal retention with native Hot → Warm (S3) → Cold (Glacier IR) tiering configured from the UI.
- 📊 Unified Observability — Metrics, logs, and traces in a single platform
- 🗺️ ServiceMap — Microservice dependency topology visualization
- 🔍 Distributed Tracing — ClickHouse-based trace storage and query
- ⚡ Zero-Code Instrumentation — Auto-instrument apps with Beyla eBPF
- 🏷️ CRD Label Enrichment — Automatically add Kubernetes CRD labels (e.g.
k8s.rollout.namefor Argo Rollouts) to all telemetry - 🏢 Multi-Cluster Native — 2-tier Host-Agent architecture for fleet observability
- 💾 S3 3-Tier Storage — Hot (EBS) → Warm (S3 Standard) → Cold (S3 Glacier IR) automatic tiering
- 🔐 SSO Tenant Auto-Lock — JWT-based multi-tenant SSO with automatic workspace binding
- 🔒 Air-Gapped Ready — Works in fully offline environments (regulated industries)
- 📦 Self-Hosted — Your data never leaves your infrastructure
- 🎫 License Guard — RS256 JWT-based license validation with configurable grace period (enterprise distributions)
K-O11y exists to serve a specific gap: teams who need production-grade observability but cannot use SaaS — regulated industries, air-gapped environments, multi-cluster fleets, or teams wary of vendor lock-in.
| Need | SaaS (Datadog, etc.) | DIY (Prom + Grafana + Jaeger + Loki) | K-O11y |
|---|---|---|---|
| Self-hosted | ❌ | ✅ | ✅ |
| Air-gapped | ❌ | ✅ | |
| Multi-cluster (Host-Agent) | ✅ (if you pay) | ✅ built-in | |
| Metrics + Logs + Traces unified | ✅ | ❌ 4 tools | ✅ |
| eBPF auto-instrumentation | partial | ✅ Beyla integrated | |
| Cost predictability | ❌ usage-based | ✅ | ✅ |
| Operational complexity | ✅ low | ❌ high |
Best fit for: on-premise K8s fleets, government / defense / finance / healthcare, edge deployments, cost-sensitive teams moving off Datadog.
demo-final.mp4
K-O11y uses a 2-tier Host-Agent model: lightweight Agent collectors in each workload cluster ship telemetry over OTLP to a central Host cluster that stores, queries, and visualizes everything. ClickHouse is co-located on a dedicated VM (not in-cluster) for storage tiering control.
%%{init: {'theme':'base', 'themeVariables': {
'background':'#0B1220',
'fontFamily':'Inter, -apple-system, system-ui, sans-serif',
'fontSize':'14px',
'primaryColor':'#1E1B4B',
'primaryTextColor':'#E8EBFF',
'primaryBorderColor':'#6D5EF9',
'lineColor':'#6D5EF9',
'clusterBkg':'#0F1530',
'clusterBorder':'#374151'
}}}%%
flowchart TB
subgraph AgentClusters["<b>AGENT CLUSTERS</b><br/><font color='#9CA3AF'>workload clusters · N × ...</font>"]
subgraph AgentCluster1["<b>Agent Cluster #1</b>"]
App1["<b>Application Pods</b>"]
Beyla1["<b>Beyla eBPF APM</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
OtelAgent1["<b>OTel Collector</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
OtelDeploy1["<b>OTel Collector</b><br/><font color='#9CA3AF'>Deployment</font>"]
KSM1["<b>Kube State Metrics</b>"]
OtelOp1["<b>OTel Operator</b>"]
App1 -.auto-instrument.-> Beyla1
Beyla1 --> OtelAgent1
App1 --> OtelAgent1
KSM1 --> OtelDeploy1
OtelOp1 -.manages CRDs.-> OtelAgent1
end
subgraph AgentClusterN["<b>Agent Cluster #N</b>"]
AppN["<b>Application Pods</b>"]
BeylaN["<b>Beyla eBPF</b>"]
OtelAgentN["<b>OTel Collector</b>"]
end
end
subgraph HostCluster["<b>HOST CLUSTER</b><br/><font color='#9CA3AF'>centralized observability</font>"]
Gateway["<b>K-O11y OTel Gateway</b><br/><font color='#9CA3AF'>License Guard · License Gate</font>"]
subgraph Server["<b>K-O11y Server</b>"]
Frontend["<b>Frontend UI</b><br/><font color='#9CA3AF'>React</font>"]
QueryService["<b>Query Service</b><br/><font color='#9CA3AF'>S3 Tiering · SSO</font>"]
AlertManager["<b>Alert Manager</b>"]
Core["<b>Core API</b><br/><font color='#9CA3AF'>ko11y-core · ServiceMap</font>"]
end
Gateway --> QueryService
Gateway --> Core
AlertManager --> QueryService
Frontend --> QueryService
end
subgraph ClickHouseVM["<b>CLICKHOUSE VM</b><br/><font color='#9CA3AF'>dedicated storage</font>"]
subgraph Storage["<b>Storage Tiers</b>"]
Hot["<b>Hot</b><br/><font color='#9CA3AF'>EBS / SSD · 0-7d</font>"]
Warm["<b>Warm</b><br/><font color='#9CA3AF'>S3 Standard · 7-30d</font>"]
Cold["<b>Cold</b><br/><font color='#9CA3AF'>S3 Glacier IR · 30d+</font>"]
Hot -.TTL MOVE.-> Warm
Warm -.Lifecycle.-> Cold
end
DBAgent["<b>DB Agent</b><br/><font color='#9CA3AF'>systemd</font>"]
DBAgent -.manages.-> Storage
end
QueryService --> Storage
Core --> Storage
OtelAgent1 -->|OTLP :4317| Gateway
OtelDeploy1 -->|OTLP :4317| Gateway
OtelAgentN -->|OTLP :4317| Gateway
User["👤 <b>User / Operator</b>"] --> Frontend
classDef agent fill:#2A1B5B,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8
classDef host fill:#1A3A4A,stroke:#4DC4E5,color:#E8F7FF,stroke-width:2px,rx:8,ry:8
classDef storage fill:#1F1A38,stroke:#9D7AE8,color:#EEEAFF,stroke-width:2px,rx:8,ry:8
classDef user fill:#0F1530,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8
class App1,AppN,Beyla1,BeylaN,OtelAgent1,OtelAgentN,OtelDeploy1,KSM1,OtelOp1 agent
class Gateway,Frontend,QueryService,AlertManager,Core host
class Hot,Warm,Cold,DBAgent storage
class User user
Data flow:
- Apps emit telemetry (or Beyla eBPF auto-instruments — no code changes)
- OTel Collectors in each Agent cluster enrich with K8s + CRD labels, batch, and forward via OTLP gRPC
- Host's K-O11y Gateway validates license (RS256 JWT), gates data through License Gate Processor, and persists to ClickHouse
- ClickHouse on a dedicated VM tiers data Hot → Warm → Cold automatically; a systemd DB Agent manages S3 lifecycle and Glacier backups
- Users explore via the web UI
K-O11y is composed of four repositories, included here as git submodules.
| Component | Repository | Description |
|---|---|---|
| 🧠 Server | k-o11y-server | Self-hosted observability backend. Monorepo with packages/core (Go API for ServiceMap and S3 Tiering, Go 1.24 + Gin + ClickHouse) and packages/signoz (React frontend and Query Service). |
| 📦 Install | k-o11y-install | 6 Helm charts (k-o11y-host, k-o11y-agent, and 4 sub-charts: k-o11y-otel-agent, k-o11y-apm-agent, k-o11y-ksm, k-o11y-otel-operator) + 2 Go CLI tools: k-o11y-db (ClickHouse VM installer, DDL apply, S3 tiering) and k-o11y-tls (cert-manager setup: existing / self-signed / private-CA / Let's Encrypt). |
| 📡 OTel Collector | k-o11y-otel-collector | Custom OTel Collector v0.109.0 distribution with CRD Processor — automatically adds Kubernetes CRD labels (e.g. k8s.rollout.name for Argo Rollouts) to traces, metrics, and logs via a K8s Informer. Extensible to Knative, KEDA, etc. |
| 🛂 OTel Gateway | k-o11y-otel-gateway | OTel Collector distribution with two custom components: License Guard Extension (RS256 JWT license validation with 7-day grace period) and License Gate Processor (drops telemetry when license is invalid and grace period has expired). |
Clone with all submodules:
git clone --recurse-submodules https://github.com/Wondermove-Inc/k-o11y.gitFull Host + Agent installation is a 6-step process using Go CLI tools and Helm. Pre-built Docker images and OCI-registry Helm charts are not yet published (see Roadmap), so you'll need to push the built images to your own OCI registry (GHCR, Harbor, etc.) for now.
- ClickHouse VM: Ubuntu 22.04 LTS, SSH access with sudo, 8+ vCPU, 32GB+ RAM
- Host K8s cluster: Kubernetes 1.28+, Helm 3.12+, kubectl
- Agent K8s cluster(s): Kubernetes 1.28+, Linux kernel 5.8+ (for Beyla eBPF)
- OCI registry: Accessible by both clusters
- Encryption key:
openssl rand -hex 32(stored asK_O11Y_ENCRYPTION_KEY)
# ── 1. Install ClickHouse + Keeper on the VM (Go CLI)
./k-o11y-db install --mode ssh \
--ssh-user <SSH_USER> --ssh-key <SSH_KEY_PATH> \
--keeper-host <KEEPER_IP> --clickhouse-host <CLICKHOUSE_IP> \
--clickhouse-password '<CLICKHOUSE_PASSWORD>' \
--encryption-key <K_O11Y_ENCRYPTION_KEY> --yes
# ── 2. (Optional) Set up TLS for OTel Gateway
./k-o11y-tls setup --mode selfsigned \
--domain <DOMAIN> --secret-name k-o11y-otel-collector-tls \
--kube-context <HOST_CONTEXT>
# ── 3. Install Host cluster (Helm)
helm upgrade --install k-o11y-host \
--kube-context <HOST_CONTEXT> \
oci://<YOUR_REGISTRY>/charts/k-o11y-host \
--namespace k-o11y --create-namespace \
--set externalClickhouse.host=<NLB_DNS_OR_IP> \
--set externalClickhouse.password='<CLICKHOUSE_PASSWORD>' \
--set o11yHub.additionalEnvs.K_O11Y_ENCRYPTION_KEY=<ENCRYPTION_KEY>
# ── 4. Apply DDL + install OTel Agent on ClickHouse VM (Go CLI)
./k-o11y-db post-install --mode ssh \
--clickhouse-host <CLICKHOUSE_IP> \
--clickhouse-password '<CLICKHOUSE_PASSWORD>' \
--otel-endpoint <HOST_GATEWAY_IP>:4317 --environment prod
# ── 5. Install cert-manager on Agent cluster (Helm)
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace \
--version v1.17.1 --set crds.enabled=true \
--kube-context <AGENT_CONTEXT> --wait
# ── 6. Install Agent cluster (Helm)
helm upgrade --install k-o11y-agent \
--kube-context <AGENT_CONTEXT> \
oci://<YOUR_REGISTRY>/charts/k-o11y-agent \
--namespace k-o11y --create-namespace \
--set global.clusterName=<CLUSTER_NAME> \
--set global.deploymentEnvironment=prod \
--set k-o11y-otel-agent.otelCollectorEndpoint=<HOST_GATEWAY_IP>:4317 \
--wait --timeout 25mFull reference (including all flags, TLS variants, and bastion SSH mode): k-o11y-install/README.md
There are three installation scenarios depending on your setup.
The documented 6-step process above builds on this. You clone each sub-repo, build and push Docker images to your own OCI registry, then install via Helm charts that reference your registry.
Sub-repo READMEs contain the full build instructions:
- Server: packages/core and packages/signoz — build with
go build/make go-build-community/docker build - OTel Collector:
make docker→ pushesghcr.io/wondermove-inc/k-o11y-otel-collector-contrib:0.109.0.1 - OTel Gateway:
go build -o signoz-otel-collector ./cmd/signozotelcollector+ Docker
Once automated GHCR publishing lands (see Roadmap), installation will be:
helm install k-o11y oci://ghcr.io/wondermove-inc/charts/k-o11y-host \
--namespace k-o11y --create-namespaceNot yet available.
- Server (core API):
cd packages/core && go run cmd/main.go(requiresCLICKHOUSE_HOST,CLICKHOUSE_PORT,CLICKHOUSE_DATABASE) - Server (backend):
cd packages/signoz && make go-run-community - Frontend:
cd packages/signoz/frontend && yarn install && yarn dev
Not a commitment — a direction. Contributions welcome on any of these.
- 🐳 Publish GHCR Docker images for all 4 components (unblocks one-line install)
- 📦 Publish Helm charts to OCI registry (currently
<YOUR_REGISTRY>placeholder) - 🏗️ MkDocs / GitHub Pages documentation site
- 🌏 Translate Korean comments in Go code to English (k-o11y-server#1, k-o11y-install#1)
- 🧪 docker-compose.yml for local development
- 📚 Grafana dashboard JSON presets
- 🔔 Prometheus AlertManager rule presets
Contributions are welcome — especially on good first issues.
- Find an issue labeled
good first issueorhelp wanted - Comment on the issue to claim it (avoid duplicate work)
- Fork, branch, and send a PR — scope narrowly, describe clearly
- Address review feedback — maintainers will reply within a few days
See CONTRIBUTING.md in any of the sub-repos for more.
This project follows passive maintenance — PRs and issues are reviewed as time allows. We aim to respond within 7 days but cannot guarantee faster turnaround.
Thanks to everyone who's made K-O11y better.
Contributors shown above are for this umbrella repository. For the full list across all sub-repositories: server · install · otel-collector · otel-gateway
If K-O11y is useful to you, please consider giving it a star — it helps others find the project.
- k-o11y-server and k-o11y-install: MIT License (inherited from SigNoz)
- k-o11y-otel-collector and k-o11y-otel-gateway: Apache License 2.0 (inherited from OpenTelemetry)
Forked from SigNoz (MIT) and the OpenTelemetry Collector (Apache 2.0). See NOTICE for attribution details.
- 🐛 Bug reports & feature requests: GitHub Issues
- 💭 Questions & discussions: Open an issue (GitHub Discussions coming soon)
- 🌐 Website: www.skuberplus.com
Built and maintained by Wondermove