Skip to content

Wondermove-Inc/k-o11y

Repository files navigation

K-O11y

K-O11y

Kubernetes Observability Platform for self-hosted, air-gapped, multi-cluster environments.

English | 한국어 | 日本語 | 中文

Project Status: WIP License: MIT License: Apache 2.0 GitHub stars Release

Built on OpenTelemetry, Beyla eBPF, and ClickHouse.


K-O11y ServiceMap — live dependency topology powered by Beyla eBPF

K-O11y is a self-hosted Kubernetes observability platform that unifies metrics, logs, and traces across multiple clusters. Built on OpenTelemetry + Beyla eBPF with a 2-tier Host–Agent architecture and ClickHouse-backed storage with automatic Hot → Warm (S3) → Cold (Glacier IR) tiering.


📸 Screenshots

K-O11y Insight Dashboard

Unified cluster insight — CPU, memory, pods, nodes, and trend graphs in a single view.

🔭 Observability in One Place

Logs Explorer

📝 Logs
Frequency chart + severity filters + full-text search

Traces Explorer

🔍 Traces
Distributed tracing across services with rich filters

Services APM

📈 APM
p50/p90/p99 latency, Apdex, and key operations

Infrastructure Drill-down

🏗️ Infrastructure
Pod-level metrics, logs, traces, and events

🐞 Exceptions

Exception stacktrace with spanID and traceID

Capture stack traces with spanID and traceID attached — jump straight from an exception into its distributed trace.

💾 Data Lifecycle

Data Lifecycle — Hot / Warm (S3) / Cold (Glacier IR)

Per-signal retention with native Hot → Warm (S3) → Cold (Glacier IR) tiering configured from the UI.

🔔 Alerts

Alertmanager + notification channels

Alertmanager settings, SMTP, and pluggable notification channels — all configurable from the UI.


✨ Features

  • 📊 Unified Observability — Metrics, logs, and traces in a single platform
  • 🗺️ ServiceMap — Microservice dependency topology visualization
  • 🔍 Distributed Tracing — ClickHouse-based trace storage and query
  • Zero-Code Instrumentation — Auto-instrument apps with Beyla eBPF
  • 🏷️ CRD Label Enrichment — Automatically add Kubernetes CRD labels (e.g. k8s.rollout.name for Argo Rollouts) to all telemetry
  • 🏢 Multi-Cluster Native — 2-tier Host-Agent architecture for fleet observability
  • 💾 S3 3-Tier Storage — Hot (EBS) → Warm (S3 Standard) → Cold (S3 Glacier IR) automatic tiering
  • 🔐 SSO Tenant Auto-Lock — JWT-based multi-tenant SSO with automatic workspace binding
  • 🔒 Air-Gapped Ready — Works in fully offline environments (regulated industries)
  • 📦 Self-Hosted — Your data never leaves your infrastructure
  • 🎫 License Guard — RS256 JWT-based license validation with configurable grace period (enterprise distributions)

🎯 Why K-O11y?

K-O11y exists to serve a specific gap: teams who need production-grade observability but cannot use SaaS — regulated industries, air-gapped environments, multi-cluster fleets, or teams wary of vendor lock-in.

Need SaaS (Datadog, etc.) DIY (Prom + Grafana + Jaeger + Loki) K-O11y
Self-hosted
Air-gapped ⚠️ painful
Multi-cluster (Host-Agent) ✅ (if you pay) ⚠️ DIY federation ✅ built-in
Metrics + Logs + Traces unified ❌ 4 tools
eBPF auto-instrumentation partial ⚠️ DIY ✅ Beyla integrated
Cost predictability ❌ usage-based
Operational complexity ✅ low ❌ high ⚠️ medium

Best fit for: on-premise K8s fleets, government / defense / finance / healthcare, edge deployments, cost-sensitive teams moving off Datadog.


🎬 Demo

demo-final.mp4

🏗️ Architecture

K-O11y uses a 2-tier Host-Agent model: lightweight Agent collectors in each workload cluster ship telemetry over OTLP to a central Host cluster that stores, queries, and visualizes everything. ClickHouse is co-located on a dedicated VM (not in-cluster) for storage tiering control.

%%{init: {'theme':'base', 'themeVariables': {
  'background':'#0B1220',
  'fontFamily':'Inter, -apple-system, system-ui, sans-serif',
  'fontSize':'14px',
  'primaryColor':'#1E1B4B',
  'primaryTextColor':'#E8EBFF',
  'primaryBorderColor':'#6D5EF9',
  'lineColor':'#6D5EF9',
  'clusterBkg':'#0F1530',
  'clusterBorder':'#374151'
}}}%%
flowchart TB
    subgraph AgentClusters["<b>AGENT CLUSTERS</b><br/><font color='#9CA3AF'>workload clusters · N × ...</font>"]
        subgraph AgentCluster1["<b>Agent Cluster #1</b>"]
            App1["<b>Application Pods</b>"]
            Beyla1["<b>Beyla eBPF APM</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
            OtelAgent1["<b>OTel Collector</b><br/><font color='#9CA3AF'>DaemonSet</font>"]
            OtelDeploy1["<b>OTel Collector</b><br/><font color='#9CA3AF'>Deployment</font>"]
            KSM1["<b>Kube State Metrics</b>"]
            OtelOp1["<b>OTel Operator</b>"]

            App1 -.auto-instrument.-> Beyla1
            Beyla1 --> OtelAgent1
            App1 --> OtelAgent1
            KSM1 --> OtelDeploy1
            OtelOp1 -.manages CRDs.-> OtelAgent1
        end

        subgraph AgentClusterN["<b>Agent Cluster #N</b>"]
            AppN["<b>Application Pods</b>"]
            BeylaN["<b>Beyla eBPF</b>"]
            OtelAgentN["<b>OTel Collector</b>"]
        end
    end

    subgraph HostCluster["<b>HOST CLUSTER</b><br/><font color='#9CA3AF'>centralized observability</font>"]
        Gateway["<b>K-O11y OTel Gateway</b><br/><font color='#9CA3AF'>License Guard · License Gate</font>"]

        subgraph Server["<b>K-O11y Server</b>"]
            Frontend["<b>Frontend UI</b><br/><font color='#9CA3AF'>React</font>"]
            QueryService["<b>Query Service</b><br/><font color='#9CA3AF'>S3 Tiering · SSO</font>"]
            AlertManager["<b>Alert Manager</b>"]
            Core["<b>Core API</b><br/><font color='#9CA3AF'>ko11y-core · ServiceMap</font>"]
        end

        Gateway --> QueryService
        Gateway --> Core
        AlertManager --> QueryService
        Frontend --> QueryService
    end

    subgraph ClickHouseVM["<b>CLICKHOUSE VM</b><br/><font color='#9CA3AF'>dedicated storage</font>"]
        subgraph Storage["<b>Storage Tiers</b>"]
            Hot["<b>Hot</b><br/><font color='#9CA3AF'>EBS / SSD · 0-7d</font>"]
            Warm["<b>Warm</b><br/><font color='#9CA3AF'>S3 Standard · 7-30d</font>"]
            Cold["<b>Cold</b><br/><font color='#9CA3AF'>S3 Glacier IR · 30d+</font>"]

            Hot -.TTL MOVE.-> Warm
            Warm -.Lifecycle.-> Cold
        end
        DBAgent["<b>DB Agent</b><br/><font color='#9CA3AF'>systemd</font>"]
        DBAgent -.manages.-> Storage
    end

    QueryService --> Storage
    Core --> Storage

    OtelAgent1 -->|OTLP :4317| Gateway
    OtelDeploy1 -->|OTLP :4317| Gateway
    OtelAgentN -->|OTLP :4317| Gateway

    User["👤 <b>User / Operator</b>"] --> Frontend

    classDef agent fill:#2A1B5B,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8
    classDef host fill:#1A3A4A,stroke:#4DC4E5,color:#E8F7FF,stroke-width:2px,rx:8,ry:8
    classDef storage fill:#1F1A38,stroke:#9D7AE8,color:#EEEAFF,stroke-width:2px,rx:8,ry:8
    classDef user fill:#0F1530,stroke:#6D5EF9,color:#E8EBFF,stroke-width:2px,rx:8,ry:8

    class App1,AppN,Beyla1,BeylaN,OtelAgent1,OtelAgentN,OtelDeploy1,KSM1,OtelOp1 agent
    class Gateway,Frontend,QueryService,AlertManager,Core host
    class Hot,Warm,Cold,DBAgent storage
    class User user
Loading

Data flow:

  1. Apps emit telemetry (or Beyla eBPF auto-instruments — no code changes)
  2. OTel Collectors in each Agent cluster enrich with K8s + CRD labels, batch, and forward via OTLP gRPC
  3. Host's K-O11y Gateway validates license (RS256 JWT), gates data through License Gate Processor, and persists to ClickHouse
  4. ClickHouse on a dedicated VM tiers data Hot → Warm → Cold automatically; a systemd DB Agent manages S3 lifecycle and Glacier backups
  5. Users explore via the web UI

📦 Components

K-O11y is composed of four repositories, included here as git submodules.

Component Repository Description
🧠 Server k-o11y-server Self-hosted observability backend. Monorepo with packages/core (Go API for ServiceMap and S3 Tiering, Go 1.24 + Gin + ClickHouse) and packages/signoz (React frontend and Query Service).
📦 Install k-o11y-install 6 Helm charts (k-o11y-host, k-o11y-agent, and 4 sub-charts: k-o11y-otel-agent, k-o11y-apm-agent, k-o11y-ksm, k-o11y-otel-operator) + 2 Go CLI tools: k-o11y-db (ClickHouse VM installer, DDL apply, S3 tiering) and k-o11y-tls (cert-manager setup: existing / self-signed / private-CA / Let's Encrypt).
📡 OTel Collector k-o11y-otel-collector Custom OTel Collector v0.109.0 distribution with CRD Processor — automatically adds Kubernetes CRD labels (e.g. k8s.rollout.name for Argo Rollouts) to traces, metrics, and logs via a K8s Informer. Extensible to Knative, KEDA, etc.
🛂 OTel Gateway k-o11y-otel-gateway OTel Collector distribution with two custom components: License Guard Extension (RS256 JWT license validation with 7-day grace period) and License Gate Processor (drops telemetry when license is invalid and grace period has expired).

Clone with all submodules:

git clone --recurse-submodules https://github.com/Wondermove-Inc/k-o11y.git

🚀 Quick Start

Full Host + Agent installation is a 6-step process using Go CLI tools and Helm. Pre-built Docker images and OCI-registry Helm charts are not yet published (see Roadmap), so you'll need to push the built images to your own OCI registry (GHCR, Harbor, etc.) for now.

Prerequisites

  • ClickHouse VM: Ubuntu 22.04 LTS, SSH access with sudo, 8+ vCPU, 32GB+ RAM
  • Host K8s cluster: Kubernetes 1.28+, Helm 3.12+, kubectl
  • Agent K8s cluster(s): Kubernetes 1.28+, Linux kernel 5.8+ (for Beyla eBPF)
  • OCI registry: Accessible by both clusters
  • Encryption key: openssl rand -hex 32 (stored as K_O11Y_ENCRYPTION_KEY)

Minimal Flow (6 Steps)

# ── 1. Install ClickHouse + Keeper on the VM (Go CLI)
./k-o11y-db install --mode ssh \
  --ssh-user <SSH_USER> --ssh-key <SSH_KEY_PATH> \
  --keeper-host <KEEPER_IP> --clickhouse-host <CLICKHOUSE_IP> \
  --clickhouse-password '<CLICKHOUSE_PASSWORD>' \
  --encryption-key <K_O11Y_ENCRYPTION_KEY> --yes

# ── 2. (Optional) Set up TLS for OTel Gateway
./k-o11y-tls setup --mode selfsigned \
  --domain <DOMAIN> --secret-name k-o11y-otel-collector-tls \
  --kube-context <HOST_CONTEXT>

# ── 3. Install Host cluster (Helm)
helm upgrade --install k-o11y-host \
  --kube-context <HOST_CONTEXT> \
  oci://<YOUR_REGISTRY>/charts/k-o11y-host \
  --namespace k-o11y --create-namespace \
  --set externalClickhouse.host=<NLB_DNS_OR_IP> \
  --set externalClickhouse.password='<CLICKHOUSE_PASSWORD>' \
  --set o11yHub.additionalEnvs.K_O11Y_ENCRYPTION_KEY=<ENCRYPTION_KEY>

# ── 4. Apply DDL + install OTel Agent on ClickHouse VM (Go CLI)
./k-o11y-db post-install --mode ssh \
  --clickhouse-host <CLICKHOUSE_IP> \
  --clickhouse-password '<CLICKHOUSE_PASSWORD>' \
  --otel-endpoint <HOST_GATEWAY_IP>:4317 --environment prod

# ── 5. Install cert-manager on Agent cluster (Helm)
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --version v1.17.1 --set crds.enabled=true \
  --kube-context <AGENT_CONTEXT> --wait

# ── 6. Install Agent cluster (Helm)
helm upgrade --install k-o11y-agent \
  --kube-context <AGENT_CONTEXT> \
  oci://<YOUR_REGISTRY>/charts/k-o11y-agent \
  --namespace k-o11y --create-namespace \
  --set global.clusterName=<CLUSTER_NAME> \
  --set global.deploymentEnvironment=prod \
  --set k-o11y-otel-agent.otelCollectorEndpoint=<HOST_GATEWAY_IP>:4317 \
  --wait --timeout 25m

Full reference (including all flags, TLS variants, and bastion SSH mode): k-o11y-install/README.md


🛠️ Installation

There are three installation scenarios depending on your setup.

1. Full Stack — self-built images (works today) ⚙️

The documented 6-step process above builds on this. You clone each sub-repo, build and push Docker images to your own OCI registry, then install via Helm charts that reference your registry.

Sub-repo READMEs contain the full build instructions:

  • Server: packages/core and packages/signoz — build with go build / make go-build-community / docker build
  • OTel Collector: make docker → pushes ghcr.io/wondermove-inc/k-o11y-otel-collector-contrib:0.109.0.1
  • OTel Gateway: go build -o signoz-otel-collector ./cmd/signozotelcollector + Docker

2. GHCR pre-built images (roadmap) 🚧

Once automated GHCR publishing lands (see Roadmap), installation will be:

helm install k-o11y oci://ghcr.io/wondermove-inc/charts/k-o11y-host \
  --namespace k-o11y --create-namespace

Not yet available.

3. Local development

  • Server (core API): cd packages/core && go run cmd/main.go (requires CLICKHOUSE_HOST, CLICKHOUSE_PORT, CLICKHOUSE_DATABASE)
  • Server (backend): cd packages/signoz && make go-run-community
  • Frontend: cd packages/signoz/frontend && yarn install && yarn dev

🗺️ Roadmap

Not a commitment — a direction. Contributions welcome on any of these.

  • 🐳 Publish GHCR Docker images for all 4 components (unblocks one-line install)
  • 📦 Publish Helm charts to OCI registry (currently <YOUR_REGISTRY> placeholder)
  • 🏗️ MkDocs / GitHub Pages documentation site
  • 🌏 Translate Korean comments in Go code to English (k-o11y-server#1, k-o11y-install#1)
  • 🧪 docker-compose.yml for local development
  • 📚 Grafana dashboard JSON presets
  • 🔔 Prometheus AlertManager rule presets

🤝 Contributing

Contributions are welcome — especially on good first issues.

  1. Find an issue labeled good first issue or help wanted
  2. Comment on the issue to claim it (avoid duplicate work)
  3. Fork, branch, and send a PR — scope narrowly, describe clearly
  4. Address review feedback — maintainers will reply within a few days

See CONTRIBUTING.md in any of the sub-repos for more.

This project follows passive maintenance — PRs and issues are reviewed as time allows. We aim to respond within 7 days but cannot guarantee faster turnaround.


👥 Contributors

Thanks to everyone who's made K-O11y better.

Contributors shown above are for this umbrella repository. For the full list across all sub-repositories: server · install · otel-collector · otel-gateway


⭐ Star History

If K-O11y is useful to you, please consider giving it a star — it helps others find the project.

Star History Chart


📄 License

  • k-o11y-server and k-o11y-install: MIT License (inherited from SigNoz)
  • k-o11y-otel-collector and k-o11y-otel-gateway: Apache License 2.0 (inherited from OpenTelemetry)

Forked from SigNoz (MIT) and the OpenTelemetry Collector (Apache 2.0). See NOTICE for attribution details.


💬 Contact

  • 🐛 Bug reports & feature requests: GitHub Issues
  • 💭 Questions & discussions: Open an issue (GitHub Discussions coming soon)
  • 🌐 Website: www.skuberplus.com

Built and maintained by Wondermove

About

K-O11y: Kubernetes Observability Platform (SigNoz + OTel + Beyla)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors