Complete observability platform demonstrating traces, metrics and logs across C#, Go, Python, Rust and C++ using OpenTelemetry.
export RUNTIME=compose
# Option 1: Quick start with pre-built images (recommended for first run)
COMPOSE_FILE="./infra/compose/docker-compose.ci.yml" make start
# Option 2: Build all services locally and start everything
make start
# Option 3: Start infrastructure first, then selective services
make start-infra
make start SERVICES="python-otel-service go-otel-service csharp-otel-service"
make start SERVICES="rust-otel-service cpp-otel-service" # slow on first build
# Generate traffic + open Grafana
make traffic
make open-grafana # http://localhost:3000 (admin/admin)
# Tear down (compose: down + prune)
make stopOpen the matching dev container in any IDE that supports dev containers, then run:
export RUNTIME=k8s
# Terminal A — deploy
make start
# Terminal B — port-forward everything (observability + services)
make forward
# Terminal A — generate traffic + open Grafana
make traffic
make open-grafana # http://localhost:3000 (admin/admin)
# Tear down
make stopObservability Stack:
- OpenTelemetry Collector (port 4317) - Central telemetry hub
- Jaeger (port 16686) - Distributed tracing
- Prometheus (port 9090) - Metrics storage
- Loki (port 3100) - Log aggregation
- Grafana (port 3000) - Unified visualization
Example Services:
| Language | Port | Endpoint |
|---|---|---|
| C# (ASP.NET) | 5001 | http://localhost:5001/api/hello |
| Go (Gin) | 5002 | http://localhost:5002/api/hello |
| Python (FastAPI) | 5003 | http://localhost:5003/api/hello |
| Rust (Actix) | 5004 | http://localhost:5004/api/hello |
| C++ (httplib) | 5005 | http://localhost:5005/api/hello |
Architecture:
Services → OTLP (gRPC) → Collector → Jaeger/Prometheus/Loki → Grafana
Why OpenTelemetry?
- Vendor-neutral standard (CNCF)
- Single API across all languages
- Auto-instrumentation for common frameworks
- Future-proof observability
- Backend-agnostic (easily switch Jaeger/Tempo, Prometheus/Graphite, Loki/Elasticsearch)
make open-grafana
# Navigate to: Explore → Jaeger → Select service# Request rate (Go, C# services)
otel_http_server_request_duration_seconds_count
rate(otel_http_server_request_duration_seconds_count[5m]) # per-second request rate based on the otel_http_server_request_duration_seconds_count counter, averaged over the last 5 minutes
# Response size (Go service)
otel_http_server_response_body_size_bytes_count
rate(otel_http_server_response_body_size_bytes_count[5m]) # per-second increase of the response body size counter over the last 5 minutes
# 95th percentile latency (Go service)
histogram_quantile(0.95, rate(otel_http_server_request_duration_seconds_bucket[5m])) # histogram_quantile(0.95, rate(otel_http_server_request_duration_seconds_bucket[5m])) calculates the 95th percentile request duration over the last 5 minutes
# Browse all HTTP metrics
{__name__=~"otel_http.*"}
# Note: Not all services export the same metrics due to varying auto-instrumentation support:
# - Go: ✅ Full HTTP metrics (Gin auto-instrumentation)
# - C#: Partial (client metrics only, server metrics missing)
# - Python: Different metric names (use response_size instead of duration)
# - Rust: ❌ No HTTP metrics (Actix has no auto-instrumentation)
# - C++: Manual counter only
#
# TODO: Add manual HTTP metric instrumentation for Rust/C++ services.
# Refer to OpenTelemetry examples for your language:
# - Rust: https://github.com/open-telemetry/opentelemetry-rust/tree/main/examples
# - Python: https://opentelemetry.io/docs/languages/python/instrumentation/
# - C++: https://github.com/open-telemetry/opentelemetry-cpp/tree/main/examples# In Grafana Explore → Loki, try:
{service_name="rust-service"}
{service_name=~".*-service"} |= "error"Dev Containers:
Each service has a pre-configured dev container with debugging support. Open the dev container in a supported IDE for the chosen service → run make start-infra inside the container to launch external dependencies → set breakpoints in the service’s source code and start debugging
Available Commands:
OpenTelemetry Observability Stack PoC
PROJECT_ROOT = /Users/marvingajek/Documents/poc-repos/otel-poc
RUNTIME = compose
Usage:
make <target> [RUNTIME=compose|k8s] [SERVICES="svc1 svc2"]
help Show available targets
open-grafana Open Grafana in browser
open-jaeger Open Jaeger in browser
open-prometheus Open Prometheus in browser
lint Run pre-commit hooks
start Start the platform (compose: SERVICES="svc1 svc2" optional)
start-infra Start only infrastructure (observability stack)
stop Stop the platform
restart Restart the platform
logs Follow platform logs
build Rebuild service images (compose only; k8s rebuilds via deploy-to-kind.sh)
status Show platform status
traffic Generate test traffic
test Run service + telemetry tests
forward Port-forward everything (k8s only)
forward-obs Port-forward observability only (k8s only)
forward-svc Port-forward services only (k8s only)
forward-bg Background port-forward (k8s only; writes PID to /tmp/otel-pf.pid)
forward-stop Stop background port-forwards- Jaeger Docs
- Grafana Docs
- OpenTelemetry Rust Examples
- OpenTelemetry C++ Examples
- OpenTelemetry Go Examples
- OpenTelemetry .NET Examples
- OpenTelemetry Python Getting Started
- OpenTelemetry Docs
- OpenTelemetry Collector Documentation
- Prometheus Docs
- Loki Docs
- Find Jaeger traces by trace ID or time in Grafana or Jaeger Web UI
- Identify slow span/operation
- Check metrics for that service in Grafana
- View logs from that timeframe in Grafana
- Fix and verify with new traces
- Set up Grafana dashboard with key metrics
- Track request rates, error rates, latencies
- Set alerts on SLO violations
- Correlate metrics with traces for investigation
This PoC can be migrated to Kubernetes or deployed in the cloud:
- Cloud/On-prem: Deploy on managed/self-hosted Kubernetes; configure persistent storage, load balancers and ingress controllers or Gateway API.
- Best practices: Enable TLS, authentication/authorization, resource limits, sampling and backups for reliability.