- Introduction
- What is Observability and Why Do We Need It?
- Understanding OpenTelemetry
- Problem Statement: Monitoring Stuck Pods
- Prerequisites
- Step-by-Step Implementation
- Creating the Dashboard
- Setting Up Alerts
- Testing the Solution
- Troubleshooting
- Conclusion
In today's cloud-native world, Kubernetes has become the de facto standard for container orchestration. However, with great power comes great responsibility - monitoring your Kubernetes clusters is crucial for maintaining application reliability and performance.
This guide will walk you through setting up comprehensive pod monitoring using SigNoz, an open-source observability platform. We'll focus on a common but critical issue: detecting pods that get stuck in a "Pending" state for extended periods.
What you'll learn:
- How to set up OpenTelemetry Collector in Kubernetes
- How to create monitoring dashboards in SigNoz
- How to configure alerts for stuck pods
- How to integrate Slack notifications
Observability is the ability to understand what's happening inside your system by looking at its external outputs. Think of it like a car's dashboard - you can't see the engine directly, but you can monitor speed, fuel level, temperature, and other indicators to understand how the car is performing.
In software systems, observability is achieved through three main pillars:
- Metrics - Numerical data points over time (CPU usage, memory consumption, request rates)
- Logs - Text records of events that happened in your system
- Traces - Records of requests as they flow through your system
Kubernetes clusters are complex systems with many moving parts:
- Pods (your applications)
- Nodes (servers running your pods)
- Services (networking between pods)
- Controllers (managing pod lifecycles)
Without proper monitoring, you might not know when:
- A pod fails to start
- A pod gets stuck in an unhealthy state
- Resource constraints prevent pod scheduling
- Network issues affect your applications
OpenTelemetry is an open-source project that provides a unified way to collect observability data from your applications and infrastructure. It's like having a universal translator for monitoring data.
- OpenTelemetry Collector - A service that receives, processes, and exports telemetry data
- Receivers - Components that collect data from various sources
- Exporters - Components that send data to different destinations (like SigNoz)
- Processors - Components that transform or filter data
Kubernetes API β OpenTelemetry Collector β SigNoz Cloud
β β β
Pod Metrics Data Processing Visualization
Node Metrics & Enrichment & Alerting
Event Data & Export
Here's a detailed 2D workflow diagram showing how the entire monitoring system works:
graph TB
subgraph "Kubernetes Cluster"
K8S_API[Kubernetes API Server]
POD1[Pod: Running]
POD2[Pod: Pending]
POD3[Pod: Running]
NODE1[Node 1]
NODE2[Node 2]
POD1 --> K8S_API
POD2 --> K8S_API
POD3 --> K8S_API
NODE1 --> K8S_API
NODE2 --> K8S_API
end
subgraph "OpenTelemetry Collector"
K8S_RECEIVER[k8s_cluster Receiver]
PROCESSOR[Data Processor]
DEBUG_EXPORTER[Debug Exporter]
OTLP_EXPORTER[OTLP Exporter]
K8S_RECEIVER --> PROCESSOR
PROCESSOR --> DEBUG_EXPORTER
PROCESSOR --> OTLP_EXPORTER
end
subgraph "SigNoz Cloud"
INGESTION[Data Ingestion]
STORAGE[Time Series DB]
DASHBOARD[Dashboard Engine]
ALERT_ENGINE[Alert Engine]
NOTIFICATION[Notification Service]
INGESTION --> STORAGE
STORAGE --> DASHBOARD
STORAGE --> ALERT_ENGINE
ALERT_ENGINE --> NOTIFICATION
end
subgraph "Monitoring Outputs"
DASHBOARD_UI[Dashboard UI]
SLACK[Slack Channel]
EMAIL[Email Alerts]
DASHBOARD --> DASHBOARD_UI
NOTIFICATION --> SLACK
NOTIFICATION --> EMAIL
end
subgraph "Data Flow Details"
METRICS[k8s.pod.phase Metrics]
FILTERS[Namespace Filters]
AGGREGATION[Data Aggregation]
THRESHOLD[Alert Thresholds]
METRICS --> FILTERS
FILTERS --> AGGREGATION
AGGREGATION --> THRESHOLD
end
%% Main Data Flow
K8S_API -->|"Every 60s<br/>Pod Status"| K8S_RECEIVER
OTLP_EXPORTER -->|"OTLP Protocol<br/>Metrics + Logs"| INGESTION
%% Alert Flow
ALERT_ENGINE -->|"Pod Pending > 5min"| NOTIFICATION
%% Styling
classDef k8s fill:#0066cc,stroke:#000,stroke-width:2px,color:#fff
classDef otel fill:#ff6600,stroke:#000,stroke-width:2px,color:#fff
classDef signoz fill:#9900cc,stroke:#000,stroke-width:2px,color:#fff
classDef output fill:#00cc66,stroke:#000,stroke-width:2px,color:#fff
classDef data fill:#ffcc00,stroke:#000,stroke-width:2px,color:#000
class K8S_API,POD1,POD2,POD3,NODE1,NODE2 k8s
class K8S_RECEIVER,PROCESSOR,DEBUG_EXPORTER,OTLP_EXPORTER otel
class INGESTION,STORAGE,DASHBOARD,ALERT_ENGINE,NOTIFICATION signoz
class DASHBOARD_UI,SLACK,EMAIL output
class METRICS,FILTERS,AGGREGATION,THRESHOLD data
1. Data Collection Phase:
- Kubernetes API Server exposes pod status information
- OpenTelemetry Collector's
k8s_clusterreceiver polls the API every 60 seconds - Collects metrics like
k8s.pod.phase,k8s.pod.name,k8s.namespace.name
2. Data Processing Phase:
- Raw metrics are processed and enriched with metadata
- Debug exporter logs metrics to collector console for troubleshooting
- OTLP exporter formats data for SigNoz Cloud ingestion
3. Data Storage Phase:
- SigNoz Cloud receives data via OTLP protocol
- Data is stored in time-series database
- Indexed by metrics, tags, and timestamps
4. Visualization Phase:
- Dashboard engine queries stored data
- Real-time visualization of pod states
- Updates every 60 seconds with new data
5. Alerting Phase:
- Alert engine continuously evaluates stored metrics
- Checks for conditions (pod phase = 1 for > 5 minutes)
- Triggers notifications when thresholds are exceeded
6. Notification Phase:
- Notification service sends alerts to configured channels
- Slack webhooks, email, or other integrations
- Includes relevant context and pod information
In Kubernetes, pods can get stuck in a "Pending" state for various reasons:
- Resource constraints - Not enough CPU/memory available
- Node affinity issues - Pod requirements don't match available nodes
- Storage problems - Persistent volume claims can't be satisfied
- Network issues - Pod can't be scheduled due to network policies
A pod stuck in Pending state means:
- Your application isn't running
- Users can't access your service
- Your system is partially down
- You might not notice until users complain
We'll create a monitoring system that:
- Collects pod phase metrics from Kubernetes
- Visualizes pod states in a dashboard
- Alerts when pods are stuck in Pending for more than 5 minutes
- Notifies your team via Slack
Before we start, you'll need:
- A Kubernetes cluster (local or cloud)
kubectlconfigured to access your cluster- Access to SigNoz Cloud (we'll provide a signup link)
- Basic understanding of Kubernetes concepts (pods, namespaces, deployments)
- Familiarity with YAML files
- Basic command-line usage
First, let's create a dedicated namespace for our monitoring setup:
kubectl create namespace signoz-demoWhy a separate namespace? It keeps our monitoring components organized and separate from your application workloads.
The OpenTelemetry Collector is the heart of our monitoring system. It will collect metrics from your Kubernetes cluster and send them to SigNoz.
Create a file called configmap.yml:
apiVersion: v1
kind: ConfigMap
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
data:
config.yaml: |
receivers:
k8s_cluster:
collection_interval: 60s
exporters:
debug:
otlp:
endpoint: "ingest.in.signoz.cloud:443"
tls:
insecure: false
timeout: 20s
headers:
"signoz-ingestion-key": "YOUR_INGESTION_KEY_HERE"
service:
pipelines:
metrics:
receivers: [k8s_cluster]
exporters: [debug, otlp]
logs/entity_events:
receivers: [k8s_cluster]
exporters: [debug, otlp]What this does:
k8s_clusterreceiver collects Kubernetes metrics every 60 secondsdebugexporter logs metrics to the collector's console (for troubleshooting)otlpexporter sends metrics to SigNoz Cloud- You'll need to replace
YOUR_INGESTION_KEY_HEREwith your actual SigNoz ingestion key
Create a file called serviceaccount.yml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: otelcontribcol
namespace: signoz-demo
labels:
app: otelcontribcolWhat this does: Creates a service account that the OpenTelemetry Collector will use to authenticate with the Kubernetes API.
Create a file called clusterrole.yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
rules:
- apiGroups: [""]
resources: [events, namespaces, namespaces/status, nodes, nodes/spec,
pods, pods/status, replicationcontrollers, replicationcontrollers/status,
resourcequotas, services]
verbs: [get, list, watch]
- apiGroups: ["apps"]
resources: [daemonsets, deployments, replicasets, statefulsets]
verbs: [get, list, watch]
- apiGroups: ["extensions"]
resources: [daemonsets, deployments, replicasets]
verbs: [get, list, watch]
- apiGroups: ["batch"]
resources: [jobs, cronjobs]
verbs: [get, list, watch]
- apiGroups: ["autoscaling"]
resources: [horizontalpodautoscalers]
verbs: [get, list, watch]What this does: Defines the permissions the collector needs to read Kubernetes resources (pods, nodes, services, etc.).
Create a file called clusterrolebinding.yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otelcontribcol
subjects:
- kind: ServiceAccount
name: otelcontribcol
namespace: signoz-demoWhat this does: Binds the permissions (ClusterRole) to our service account.
Create a file called deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: otelcontribcol
labels:
app: otelcontribcol
spec:
replicas: 1
selector:
matchLabels:
app: otelcontribcol
template:
metadata:
labels:
app: otelcontribcol
spec:
serviceAccountName: otelcontribcol
containers:
- name: otelcontribcol
image: otel/opentelemetry-collector-contrib
args: ["--config", "/etc/config/config.yaml"]
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config
mountPath: /etc/config
volumes:
- name: config
configMap:
name: otelcontribcolWhat this does: Deploys the OpenTelemetry Collector as a pod in your cluster.
Apply all the YAML files to your cluster:
kubectl apply -f configmap.yml -n signoz-demo
kubectl apply -f serviceaccount.yml -n signoz-demo
kubectl apply -f clusterrole.yml -n signoz-demo
kubectl apply -f clusterrolebinding.yml -n signoz-demo
kubectl apply -f deployment.yaml -n signoz-demoCheck that the collector is running:
kubectl get pods -n signoz-demoYou should see something like:
NAME READY STATUS RESTARTS AGE
otelcontribcol-xxxxxxxxx-xxxxx 1/1 Running 0 2m
Let's create a pod that will intentionally get stuck in Pending state for testing:
Create a file called test-pending-pod.yaml:
apiVersion: v1
kind: Pod
metadata:
name: test-pending-pod
namespace: signoz-demo
labels:
app: test-pending-pod
spec:
containers:
- name: test-container
image: nginx:latest
resources:
requests:
cpu: "6000" # Requesting more CPU than available
memory: "2Gi"
limits:
cpu: "6000"
memory: "2Gi"
nodeSelector:
non-existent-node: "true" # This will keep the pod pendingApply it:
kubectl apply -f test-pending-pod.yaml -n signoz-demoVerify it's stuck in Pending:
kubectl get pods -n signoz-demoYou should see:
NAME READY STATUS RESTARTS AGE
otelcontribcol-xxxxxxxxx-xxxxx 1/1 Running 0 5m
test-pending-pod 0/1 Pending 0 1m
Now let's create a dashboard in SigNoz to visualize our pod metrics.
- Go to your SigNoz Cloud dashboard
- Navigate to Dashboards in the left sidebar
- Click "+ New Dashboard"
We've prepared a minimal dashboard configuration that focuses on pod phase monitoring. Here's the configuration:
{
"collapsableRowsMigrated": true,
"description": "Kubernetes Pod Metrics Dashboard for signoz-demo namespace - Monitoring pods and triggering alerts when stuck in pending state for over 5 minutes",
"dotMigrated": true,
"layout": [
{
"h": 12,
"i": "pod-phase-panel",
"moved": false,
"static": false,
"w": 12,
"x": 0,
"y": 0
}
],
"name": "",
"panelMap": {},
"tags": [
"pod",
"k8s",
"signoz-demo",
"pending-alert"
],
"title": "Kubernetes Pod Metrics - signoz-demo Namespace",
"uploadedGrafana": false,
"variables": {},
"version": "v4",
"widgets": [
{
"description": "Pod Phase Status - Critical for monitoring pending pods that need alerts. Shows pod lifecycle states: 1=Pending, 2=Running, 3=Succeeded, 4=Failed, 5=Unknown",
"id": "pod-phase-panel",
"isStacked": false,
"nullZeroValues": "zero",
"opacity": "1",
"panelTypes": "graph",
"query": {
"builder": {
"queryData": [
{
"aggregateAttribute": {
"dataType": "float64",
"id": "k8s.pod.phase--float64----true",
"isColumn": true,
"key": "k8s.pod.phase",
"type": ""
},
"aggregateOperator": "avg",
"dataSource": "metrics",
"disabled": false,
"expression": "A",
"filters": {
"items": [
{
"id": "namespace-filter",
"key": {
"dataType": "string",
"id": "k8s.namespace.name--string--tag--false",
"isColumn": false,
"key": "k8s.namespace.name",
"type": "tag"
},
"op": "=",
"value": "signoz-demo"
}
],
"op": "AND"
},
"groupBy": [
{
"dataType": "string",
"id": "k8s.pod.name--string--tag--false",
"isColumn": false,
"key": "k8s.pod.name",
"type": "tag"
}
],
"having": [],
"legend": "{{k8s.pod.name}}",
"limit": null,
"orderBy": [],
"queryName": "A",
"reduceTo": "sum",
"spaceAggregation": "sum",
"stepInterval": 60,
"timeAggregation": "avg"
}
],
"queryFormulas": []
},
"queryType": "builder"
},
"timePreferance": "GLOBAL_TIME",
"title": "Pod Phase Status (1=Pending, 2=Running, 3=Succeeded, 4=Failed, 5=Unknown)",
"yAxisUnit": "none"
}
]
}-
Import the Configuration:
- Copy the JSON configuration above
- In SigNoz, click "Import Dashboard"
- Paste the JSON configuration
- Click "Import"
-
Verify the Dashboard:
- You should see a single panel showing pod phase status
- The
test-pending-podshould appear as a line at value1(Pending) - Other pods should appear at value
2(Running)
What you're seeing:
- Y-axis: Pod phase values (1=Pending, 2=Running, 3=Succeeded, 4=Failed, 5=Unknown)
- X-axis: Time
- Lines: Each line represents a pod, showing its phase over time
- Pending pods: Appear as horizontal lines at value
1 - Running pods: Appear as horizontal lines at value
2
Now let's create an alert that will notify you when a pod is stuck in Pending state for more than 5 minutes.
Here's a focused workflow diagram showing how the alert system works:
flowchart TD
START([Pod Created]) --> CHECK{Check Pod Status}
CHECK -->|Running| RUNNING[Pod Running<br/>Value: 2]
CHECK -->|Pending| PENDING[Pod Pending<br/>Value: 1]
CHECK -->|Failed| FAILED[Pod Failed<br/>Value: 4]
PENDING --> TIMER[Start 5-minute Timer]
TIMER --> EVAL{Evaluate After 5min}
EVAL -->|Still Pending| TRIGGER[Alert Triggered]
EVAL -->|Now Running| CLEAR[Alert Cleared]
TRIGGER --> NOTIFY[Send Slack Notification]
NOTIFY --> TEAM[Team Gets Alerted]
TEAM --> INVESTIGATE[Investigate Issue]
INVESTIGATE --> FIX[Fix Pod Issue]
FIX --> RESOLVED[Pod Resolved]
RESOLVED --> CLEAR
RUNNING --> MONITOR[Continue Monitoring]
FAILED --> MONITOR
CLEAR --> MONITOR
MONITOR --> CHECK
%% Styling
classDef startEnd fill:#0066cc,stroke:#000,stroke-width:2px,color:#fff
classDef process fill:#ff6600,stroke:#000,stroke-width:2px,color:#fff
classDef decision fill:#ffcc00,stroke:#000,stroke-width:2px,color:#000
classDef alert fill:#cc0000,stroke:#000,stroke-width:2px,color:#fff
classDef success fill:#00cc66,stroke:#000,stroke-width:2px,color:#fff
class START,RESOLVED startEnd
class PENDING,RUNNING,FAILED,TIMER,NOTIFY,INVESTIGATE,FIX,MONITOR process
class CHECK,EVAL decision
class TRIGGER,TEAM alert
class CLEAR success
1. Pod Lifecycle Monitoring:
- Every pod goes through different states (Pending, Running, Failed, etc.)
- OpenTelemetry Collector continuously monitors these states
- Each state has a numeric value (1=Pending, 2=Running, 4=Failed)
2. Pending State Detection:
- When a pod enters Pending state (value = 1), monitoring begins
- A 5-minute timer starts counting down
- System continues to check pod status every minute
3. Alert Evaluation:
- After 5 minutes, system evaluates if pod is still pending
- If still pending β Alert triggers
- If now running β Alert is cleared (no notification needed)
4. Notification Process:
- Alert triggers Slack notification with pod details
- Team receives immediate notification
- Includes pod name, namespace, and duration information
5. Resolution Process:
- Team investigates the pending pod issue
- Fixes the underlying problem (resource constraints, node issues, etc.)
- Pod transitions to Running state
- Alert automatically clears
6. Continuous Monitoring:
- System continues monitoring all pods
- Process repeats for any new pending pods
- Provides ongoing protection against stuck pods
- Go to Alerts in the left sidebar
- Click "+ New Alert"
- Alert Type: Select "Threshold Alert"
- Metric: Choose
k8s.pod.phase - Filter: Add
k8s.namespace.name = 'signoz-demo' - Aggregation:
- Within time series:
Avgevery60 Seconds - Across time series:
Sumbyk8s.pod.name
- Within time series:
- Condition: "A is equal to the threshold"
- Threshold:
1(since Pending = 1) - Duration: "at least once during the last 5 mins"
- Frequency: Run alert every
1 min
- Alert Name: "Pod Stuck in Pending - signoz-demo Namespace"
- Description: "Alert when a pod in signoz-demo namespace is stuck in pending state for over 5 minutes"
- Severity: "Critical"
- Labels: Add
k8s,pod,pending,signoz-demo
-
Create Slack Channel:
- Click "select one or more channels"
- Click "+ New Channel"
- Type: Slack
- Webhook URL:
https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK - Channel Name:
kubernetes-alerts
-
Test Notification:
- Click "Test Notification" to verify Slack integration
-
Review Configuration:
- Ensure threshold is
1and condition isequal to - Verify all settings are correct
- Ensure threshold is
-
Create Rule:
- Click "Create Rule"
Here's the complete alert configuration for reference:
{
"name": "Pod Stuck in Pending - signoz-demo Namespace",
"description": "Alert when a pod in signoz-demo namespace is stuck in pending state for over 5 minutes",
"condition": {
"query": {
"builder": {
"queryData": [
{
"aggregateAttribute": {
"dataType": "float64",
"id": "k8s.pod.phase--float64----true",
"isColumn": true,
"key": "k8s.pod.phase",
"type": ""
},
"aggregateOperator": "avg",
"dataSource": "metrics",
"disabled": false,
"expression": "A",
"filters": {
"items": [
{
"id": "namespace-filter",
"key": {
"dataType": "string",
"id": "k8s.namespace.name--string--tag--false",
"isColumn": false,
"key": "k8s.namespace.name",
"type": "tag"
},
"op": "=",
"value": "signoz-demo"
},
{
"id": "phase-filter",
"key": {
"dataType": "string",
"id": "k8s.pod.phase--string--tag--false",
"isColumn": false,
"key": "k8s.pod.phase",
"type": "tag"
},
"op": "=",
"value": "1"
}
],
"op": "AND"
},
"groupBy": [
{
"dataType": "string",
"id": "k8s.pod.name--string--tag--false",
"isColumn": false,
"key": "k8s.pod.name",
"type": "tag"
}
],
"having": [],
"legend": "{{k8s.pod.name}}",
"limit": null,
"orderBy": [],
"queryName": "A",
"reduceTo": "sum",
"spaceAggregation": "sum",
"stepInterval": 60,
"timeAggregation": "avg"
}
],
"queryFormulas": []
},
"queryType": "builder"
},
"threshold": {
"value": 1,
"operator": "="
},
"evaluationWindow": "5m",
"frequency": "1m"
},
"notification": {
"channels": [
{
"type": "slack",
"webhook_url": "YOUR_SLACK_WEBHOOK_URL_HERE",
"message": "π¨ **Pod Stuck in Pending Alert**\n\n**Namespace:** signoz-demo\n**Pod:** {{.k8s.pod.name}}\n**Duration:** Over 5 minutes\n**Status:** Pending\n**Time:** {{.timestamp}}\n\nPlease check the pod status and resolve the scheduling issue.",
"title": "Pod Stuck in Pending - signoz-demo"
}
]
},
"severity": "critical",
"tags": [
"k8s",
"pod",
"pending",
"signoz-demo",
"alert"
]
}- Go to Alerts in SigNoz
- Check that your alert shows as "Active"
- Verify the configuration is correct
Since your test-pending-pod has been in Pending state, the alert should trigger within 5 minutes of creation.
You should receive a Slack message like:
π¨ **Pod Stuck in Pending Alert**
**Namespace:** signoz-demo
**Pod:** test-pending-pod
**Duration:** Over 5 minutes
**Status:** Pending
**Time:** 2024-09-06 23:15:00
Please check the pod status and resolve the scheduling issue.
- Go back to your dashboard
- You should see the
test-pending-podline consistently at value1 - The dashboard should update every 60 seconds
Symptoms: Dashboard shows "No Data"
Solutions:
-
Check if the OpenTelemetry Collector is running:
kubectl get pods -n signoz-demo
-
Check collector logs:
kubectl logs -n signoz-demo deployment/otelcontribcol
-
Verify the ingestion key is correct in the ConfigMap
Symptoms: Alert is active but not firing
Solutions:
- Verify the threshold is set to
1(not0) - Check that the condition is
equal to(notabove) - Ensure the test pod is actually in Pending state:
kubectl get pods -n signoz-demo
Symptoms: Alert fires but no Slack message
Solutions:
-
Verify the Slack webhook URL is correct
-
Test the webhook manually:
curl -X POST -H 'Content-type: application/json' \ --data '{"text":"Test message"}' \ YOUR_SLACK_WEBHOOK_URL
-
Check that the notification channel is properly configured
Symptoms: Collector pod shows CrashLoopBackOff
Solutions:
- Check the ConfigMap for syntax errors
- Verify the ingestion key format
- Check collector logs for specific error messages
Congratulations! You've successfully set up a comprehensive Kubernetes pod monitoring solution using SigNoz. Here's what you've accomplished:
- OpenTelemetry Collector - Collecting metrics from your Kubernetes cluster
- Monitoring Dashboard - Visualizing pod states in real-time
- Alert System - Detecting pods stuck in Pending state
- Slack Integration - Getting notified when issues occur
- Proactive Monitoring - Detect issues before they affect users
- Real-time Visibility - See pod states as they change
- Automated Alerting - Get notified immediately when problems occur
- Easy Troubleshooting - Clear visualization of what's happening
Now that you have the basics working, you can extend this solution:
- Add More Metrics - Monitor CPU, memory, and network usage
- Create More Alerts - Set up alerts for failed pods, high resource usage, etc.
- Monitor Multiple Namespaces - Extend monitoring to other parts of your cluster
- Add More Notification Channels - Email, PagerDuty, etc.
- Regular Testing - Periodically test your alerts to ensure they're working
- Monitor the Monitor - Set up alerts for the OpenTelemetry Collector itself
- Documentation - Keep your team informed about the monitoring setup
- Review and Tune - Regularly review alert thresholds and adjust as needed
Remember: Monitoring is not a one-time setup. It's an ongoing process that requires attention and maintenance. But with the foundation you've built today, you're well-equipped to keep your Kubernetes clusters healthy and your applications running smoothly.
Happy monitoring! π
Here are all the files you'll need for this setup:
configmap.yml- OpenTelemetry Collector configurationserviceaccount.yml- Service account for the collectorclusterrole.yml- Permissions for the collectorclusterrolebinding.yml- Binding permissions to service accountdeployment.yaml- OpenTelemetry Collector deploymenttest-pending-pod.yaml- Test pod for validation
kubernetes-pod-metrics-signoz-demo-minimal.json- Dashboard configurationsignoz-demo-pending-pod-alert-simple.json- Alert configuration
# Create namespace
kubectl create namespace signoz-demo
# Apply all manifests
kubectl apply -f configmap.yml -n signoz-demo
kubectl apply -f serviceaccount.yml -n signoz-demo
kubectl apply -f clusterrole.yml -n signoz-demo
kubectl apply -f clusterrolebinding.yml -n signoz-demo
kubectl apply -f deployment.yaml -n signoz-demo
kubectl apply -f test-pending-pod.yaml -n signoz-demo
# Verify setup
kubectl get pods -n signoz-demo
kubectl logs -n signoz-demo deployment/otelcontribcolThis completes your Kubernetes pod monitoring setup with SigNoz! π