Kubernetes Observability Platform for self-hosted, air-gapped, multi-cluster environments.
English | νκ΅μ΄ | ζ₯ζ¬θͺ | δΈζ
Built on OpenTelemetry, Beyla eBPF, and ClickHouse.
K-O11y is a self-hosted Kubernetes observability platform that unifies metrics, logs, and traces across multiple clusters. Built on OpenTelemetry + Beyla eBPF with a 2-tier HostβAgent architecture and ClickHouse-backed storage with automatic Hot β Warm (S3) β Cold (Glacier IR) tiering.
Capture stack traces with spanID and traceID attached β jump straight from an exception into its distributed trace.
Per-signal retention with native Hot β Warm (S3) β Cold (Glacier IR) tiering configured from the UI.
- π Unified Observability β Metrics, logs, and traces in a single platform
- πΊοΈ ServiceMap β Microservice dependency topology visualization
- π Distributed Tracing β ClickHouse-based trace storage and query
- β‘ Zero-Code Instrumentation β Auto-instrument apps with Beyla eBPF
- π·οΈ CRD Label Enrichment β Automatically add Kubernetes CRD labels (e.g.
k8s.rollout.namefor Argo Rollouts) to all telemetry - π’ Multi-Cluster Native β 2-tier Host-Agent architecture for fleet observability
- πΎ S3 3-Tier Storage β Hot (EBS) β Warm (S3 Standard) β Cold (S3 Glacier IR) automatic tiering
- π SSO Tenant Auto-Lock β JWT-based multi-tenant SSO with automatic workspace binding
- π Air-Gapped Ready β Works in fully offline environments (regulated industries)
- π¦ Self-Hosted β Your data never leaves your infrastructure
- π« License Guard β RS256 JWT-based license validation with configurable grace period (enterprise distributions)
K-O11y exists to serve a specific gap: teams who need production-grade observability but cannot use SaaS β regulated industries, air-gapped environments, multi-cluster fleets, or teams wary of vendor lock-in.
| Need | SaaS (Datadog, etc.) | DIY (Prom + Grafana + Jaeger + Loki) | K-O11y |
|---|---|---|---|
| Self-hosted | β | β | β |
| Air-gapped | β | β | |
| Multi-cluster (Host-Agent) | β (if you pay) | β built-in | |
| Metrics + Logs + Traces unified | β | β 4 tools | β |
| eBPF auto-instrumentation | partial | β Beyla integrated | |
| Cost predictability | β usage-based | β | β |
| Operational complexity | β low | β high |
Best fit for: on-premise K8s fleets, government / defense / finance / healthcare, edge deployments, cost-sensitive teams moving off Datadog.
demo-final.mp4
K-O11y uses a 2-tier Host-Agent model: lightweight Agent collectors in each workload cluster ship telemetry over OTLP to a central Host cluster that stores, queries, and visualizes everything. ClickHouse is co-located on a dedicated VM (not in-cluster) for storage tiering control.
%%{init: {'theme':'base', 'themeVariables': {
'background':'#0B1220',
'fontFamily':'Inter, -apple-system, system-ui, sans-serif',
'fontSize':'14px',
'primaryColor':'#1E1B4B',
'primaryTextColor':'#E8EBFF',
'primaryBorderColor':'#6D5EF9',
'lineColor':'#6D5EF9',
'clusterBkg':'#0F1530',
'clusterBorder':'#374151'
}}}%%
flowchart TB
subgraph AgentClusters["<b>AGENT CLUSTERS</b><br/><font color='#9CA3AF'>workload clusters Β· N Γ ...</font>"]
subgraph AgentCluster1["<b>Agent Cluster #1</b>"]
App1["<b>Application Pods</b>"]
Beyla1["<b>Beyla eBPF APM</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
OtelAgent1["<b>OTel Collector</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
OtelDeploy1["<b>OTel Collector</b><br/><font color='#9CA3AF'>Deployment</font>"]
KSM1["<b>Kube State Metrics</b>"]
OtelOp1["<b>OTel Operator</b>"]
App1 -.auto-instrument.-> Beyla1
Beyla1 --> OtelAgent1
App1 --> OtelAgent1
KSM1 --> OtelDeploy1
OtelOp1 -.manages CRDs.-> OtelAgent1
end
subgraph AgentClusterN["<b>Agent Cluster #N</b>"]
AppN["<b>Application Pods</b>"]
BeylaN["<b>Beyla eBPF</b>"]
OtelAgentN["<b>OTel Collector</b>"]
end
end
subgraph HostCluster["<b>HOST CLUSTER</b><br/><font color='#9CA3AF'>centralized observability</font>"]
Gateway["<b>K-O11y OTel Gateway</b><br/><font color='#9CA3AF'>License Guard Β· License Gate</font>"]
subgraph Server["<b>K-O11y Server</b>"]
Frontend["<b>Frontend UI</b><br/><font color='#9CA3AF'>React</font>"]
QueryService["<b>Query Service</b><br/><font color='#9CA3AF'>S3 Tiering Β· SSO</font>"]
AlertManager["<b>Alert Manager</b>"]
Core["<b>Core API</b><br/><font color='#9CA3AF'>ko11y-core Β· ServiceMap</font>"]
end
Gateway --> QueryService
Gateway --> Core
AlertManager --> QueryService
Frontend --> QueryService
end
subgraph ClickHouseVM["<b>CLICKHOUSE VM</b><br/><font color='#9CA3AF'>dedicated storage</font>"]
subgraph Storage["<b>Storage Tiers</b>"]
Hot["<b>Hot</b><br/><font color='#9CA3AF'>EBS / SSD Β· 0-7d</font>"]
Warm["<b>Warm</b><br/><font color='#9CA3AF'>S3 Standard Β· 7-30d</font>"]
Cold["<b>Cold</b><br/><font color='#9CA3AF'>S3 Glacier IR Β· 30d+</font>"]
Hot -.TTL MOVE.-> Warm
Warm -.Lifecycle.-> Cold
end
DBAgent["<b>DB Agent</b><br/><font color='#9CA3AF'>systemd</font>"]
DBAgent -.manages.-> Storage
end
QueryService --> Storage
Core --> Storage
OtelAgent1 -->|OTLP :4317| Gateway
OtelDeploy1 -->|OTLP :4317| Gateway
OtelAgentN -->|OTLP :4317| Gateway
User["π€ <b>User / Operator</b>"] --> Frontend
classDef agent fill:#2A1B5B,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8
classDef host fill:#1A3A4A,stroke:#4DC4E5,color:#E8F7FF,stroke-width:2px,rx:8,ry:8
classDef storage fill:#1F1A38,stroke:#9D7AE8,color:#EEEAFF,stroke-width:2px,rx:8,ry:8
classDef user fill:#0F1530,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8
class App1,AppN,Beyla1,BeylaN,OtelAgent1,OtelAgentN,OtelDeploy1,KSM1,OtelOp1 agent
class Gateway,Frontend,QueryService,AlertManager,Core host
class Hot,Warm,Cold,DBAgent storage
class User user
Data flow:
- Apps emit telemetry (or Beyla eBPF auto-instruments β no code changes)
- OTel Collectors in each Agent cluster enrich with K8s + CRD labels, batch, and forward via OTLP gRPC
- Host's K-O11y Gateway validates license (RS256 JWT), gates data through License Gate Processor, and persists to ClickHouse
- ClickHouse on a dedicated VM tiers data Hot β Warm β Cold automatically; a systemd DB Agent manages S3 lifecycle and Glacier backups
- Users explore via the web UI
K-O11y is composed of four repositories, included here as git submodules.
| Component | Repository | Description |
|---|---|---|
| π§ Server | k-o11y-server | Self-hosted observability backend. Monorepo with packages/core (Go API for ServiceMap and S3 Tiering, Go 1.24 + Gin + ClickHouse) and packages/signoz (React frontend and Query Service). |
| π¦ Install | k-o11y-install | 6 Helm charts (k-o11y-host, k-o11y-agent, and 4 sub-charts: k-o11y-otel-agent, k-o11y-apm-agent, k-o11y-ksm, k-o11y-otel-operator) + 2 Go CLI tools: k-o11y-db (ClickHouse VM installer, DDL apply, S3 tiering) and k-o11y-tls (cert-manager setup: existing / self-signed / private-CA / Let's Encrypt). |
| π‘ OTel Collector | k-o11y-otel-collector | Custom OTel Collector v0.109.0 distribution with CRD Processor β automatically adds Kubernetes CRD labels (e.g. k8s.rollout.name for Argo Rollouts) to traces, metrics, and logs via a K8s Informer. Extensible to Knative, KEDA, etc. |
| π OTel Gateway | k-o11y-otel-gateway | OTel Collector distribution with two custom components: License Guard Extension (RS256 JWT license validation with 7-day grace period) and License Gate Processor (drops telemetry when license is invalid and grace period has expired). |
Clone with all submodules:
git clone --recurse-submodules https://github.com/Wondermove-Inc/k-o11y.gitFull Host + Agent installation is a 6-step process using Go CLI tools and Helm. Pre-built Docker images and OCI-registry Helm charts are not yet published (see Roadmap), so you'll need to push the built images to your own OCI registry (GHCR, Harbor, etc.) for now.
- ClickHouse VM: Ubuntu 22.04 LTS, SSH access with sudo, 8+ vCPU, 32GB+ RAM
- Host K8s cluster: Kubernetes 1.28+, Helm 3.12+, kubectl
- Agent K8s cluster(s): Kubernetes 1.28+, Linux kernel 5.8+ (for Beyla eBPF)
- OCI registry: Accessible by both clusters
- Encryption key:
openssl rand -hex 32(stored asK_O11Y_ENCRYPTION_KEY)
# ββ 1. Install ClickHouse + Keeper on the VM (Go CLI)
./k-o11y-db install --mode ssh \
--ssh-user <SSH_USER> --ssh-key <SSH_KEY_PATH> \
--keeper-host <KEEPER_IP> --clickhouse-host <CLICKHOUSE_IP> \
--clickhouse-password '<CLICKHOUSE_PASSWORD>' \
--encryption-key <K_O11Y_ENCRYPTION_KEY> --yes
# ββ 2. (Optional) Set up TLS for OTel Gateway
./k-o11y-tls setup --mode selfsigned \
--domain <DOMAIN> --secret-name k-o11y-otel-collector-tls \
--kube-context <HOST_CONTEXT>
# ββ 3. Install Host cluster (Helm)
helm upgrade --install k-o11y-host \
--kube-context <HOST_CONTEXT> \
oci://<YOUR_REGISTRY>/charts/k-o11y-host \
--namespace k-o11y --create-namespace \
--set externalClickhouse.host=<NLB_DNS_OR_IP> \
--set externalClickhouse.password='<CLICKHOUSE_PASSWORD>' \
--set o11yHub.additionalEnvs.K_O11Y_ENCRYPTION_KEY=<ENCRYPTION_KEY>
# ββ 4. Apply DDL + install OTel Agent on ClickHouse VM (Go CLI)
./k-o11y-db post-install --mode ssh \
--clickhouse-host <CLICKHOUSE_IP> \
--clickhouse-password '<CLICKHOUSE_PASSWORD>' \
--otel-endpoint <HOST_GATEWAY_IP>:4317 --environment prod
# ββ 5. Install cert-manager on Agent cluster (Helm)
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace \
--version v1.17.1 --set crds.enabled=true \
--kube-context <AGENT_CONTEXT> --wait
# ββ 6. Install Agent cluster (Helm)
helm upgrade --install k-o11y-agent \
--kube-context <AGENT_CONTEXT> \
oci://<YOUR_REGISTRY>/charts/k-o11y-agent \
--namespace k-o11y --create-namespace \
--set global.clusterName=<CLUSTER_NAME> \
--set global.deploymentEnvironment=prod \
--set k-o11y-otel-agent.otelCollectorEndpoint=<HOST_GATEWAY_IP>:4317 \
--wait --timeout 25mFull reference (including all flags, TLS variants, and bastion SSH mode): k-o11y-install/README.md
There are three installation scenarios depending on your setup.
The documented 6-step process above builds on this. You clone each sub-repo, build and push Docker images to your own OCI registry, then install via Helm charts that reference your registry.
Sub-repo READMEs contain the full build instructions:
- Server: packages/core and packages/signoz β build with
go build/make go-build-community/docker build - OTel Collector:
make dockerβ pushesghcr.io/wondermove-inc/k-o11y-otel-collector-contrib:0.109.0.1 - OTel Gateway:
go build -o signoz-otel-collector ./cmd/signozotelcollector+ Docker
Once automated GHCR publishing lands (see Roadmap), installation will be:
helm install k-o11y oci://ghcr.io/wondermove-inc/charts/k-o11y-host \
--namespace k-o11y --create-namespaceNot yet available.
- Server (core API):
cd packages/core && go run cmd/main.go(requiresCLICKHOUSE_HOST,CLICKHOUSE_PORT,CLICKHOUSE_DATABASE) - Server (backend):
cd packages/signoz && make go-run-community - Frontend:
cd packages/signoz/frontend && yarn install && yarn dev
Not a commitment β a direction. Contributions welcome on any of these.
- π³ Publish GHCR Docker images for all 4 components (unblocks one-line install)
- π¦ Publish Helm charts to OCI registry (currently
<YOUR_REGISTRY>placeholder) - ποΈ MkDocs / GitHub Pages documentation site
- π Translate Korean comments in Go code to English (k-o11y-server#1, k-o11y-install#1)
- π§ͺ docker-compose.yml for local development
- π Grafana dashboard JSON presets
- π Prometheus AlertManager rule presets
Contributions are welcome β especially on good first issues.
- Find an issue labeled
good first issueorhelp wanted - Comment on the issue to claim it (avoid duplicate work)
- Fork, branch, and send a PR β scope narrowly, describe clearly
- Address review feedback β maintainers will reply within a few days
See CONTRIBUTING.md in any of the sub-repos for more.
This project follows passive maintenance β PRs and issues are reviewed as time allows. We aim to respond within 7 days but cannot guarantee faster turnaround.
Thanks to everyone who's made K-O11y better.
Contributors shown above are for this umbrella repository. For the full list across all sub-repositories: server Β· install Β· otel-collector Β· otel-gateway
If K-O11y is useful to you, please consider giving it a star β it helps others find the project.
- k-o11y-server and k-o11y-install: MIT License (inherited from SigNoz)
- k-o11y-otel-collector and k-o11y-otel-gateway: Apache License 2.0 (inherited from OpenTelemetry)
Forked from SigNoz (MIT) and the OpenTelemetry Collector (Apache 2.0). See NOTICE for attribution details.
- π Bug reports & feature requests: GitHub Issues
- π Questions & discussions: Open an issue (GitHub Discussions coming soon)
- π Website: www.skuberplus.com
Built and maintained by Wondermove