Goldpinger makes calls between its instances to monitor your networking.
It runs as a DaemonSet on Kubernetes and produces Prometheus metrics that can be scraped, visualised and alerted on.
Oh, and it gives you the graph below for your cluster. Check out the video explainer.
We built Goldpinger to troubleshoot, visualise and alert on our networking layer while adopting Kubernetes at Bloomberg. It has since become the go-to tool to see connectivity and slowness issues.
It's small (~16MB), simple and you'll wonder why you hadn't had it before.
If you'd like to know more, you can watch our presentation at Kubecon 2018 Seattle.
- Simplified logging configuration: Use
LOG_LEVELenvironment variable (debug/info/warn/error) instead of complex JSON config files - Structured logging: All logs use structured zap logging with proper fields for easier parsing
- Enhanced probe debugging: Debug level shows detailed timing, connection info, error categorization for DNS/TCP/HTTP probes
- Fixed external probe blocking: External probes (DNS/TCP/HTTP) now run in parallel and use background caching
- Dead TCP targets no longer cause pod-to-pod health checks to timeout
- Eliminated "context deadline exceeded" cascade failures
- Pods remain responsive even with many slow/dead external targets
- Fixed cluster health calculation: External probe failures no longer mark entire cluster as unhealthy
- Cluster health now only reflects pod-to-pod connectivity
- External probe results displayed independently in UI
- Parallel probe execution: All external probes run concurrently (10 targets = ~500ms instead of 5 seconds)
- Background probe updates: Results cached and refreshed every refresh-interval (default 30s)
- Non-blocking API calls: Health check endpoints return immediately without waiting for probe execution
Getting from sources:
go get github.com/bloomberg/goldpinger/cmd/goldpinger
goldpinger --helpGetting from docker hub:
# get from docker hub
docker pull bloomberg/goldpinger:v3.0.0The repo comes with two ways of building a docker image: compiling locally, and compiling using a multi-stage Dockerfile image. docker setup, you might need to prepend the commands below with sudo.
You will need docker version 17.05+ installed to support multi-stage builds.
# Build a local container without publishing
make build
# Build & push the image somewhere
namespace="docker.io/myhandle/" make build-releaseThis was contributed via @michiel - kudos !
In order to build Goldpinger, you are going to need go version 1.15+ and docker.
Building from source code consists of compiling the binary and building a Docker image:
# step 0: check out the code
git clone https://github.com/bloomberg/goldpinger.git
cd goldpinger
# step 1: compile the binary for the desired architecture
make bin/goldpinger
# at this stage you should be able to run the binary
./bin/goldpinger --help
# step 2: build the docker image containing the binary
namespace="docker.io/myhandle/" make build
# step 3: push the image somewhere
docker push $(namespace="docker.io/myhandle/" make version)Goldpinger works by asking Kubernetes for pods with particular labels (app=goldpinger). While you can deploy Goldpinger in a variety of ways, it works very nicely as a DaemonSet out of the box.
Goldpinger can be installed via Helm using the following:
helm repo add goldpinger https://bloomberg.github.io/goldpinger
helm repo update
helm install goldpinger goldpinger/goldpingerThe Helm chart supports several configuration options. Here are some common examples:
Set log level:
helm install goldpinger goldpinger/goldpinger --set goldpinger.logLevel=debugConfigure external probes:
helm install goldpinger goldpinger/goldpinger \
--set 'extraEnv[0].name=TCP_TARGETS' \
--set 'extraEnv[0].value=10.0.0.1:443 10.0.0.2:8080' \
--set 'extraEnv[1].name=HTTP_TARGETS' \
--set 'extraEnv[1].value=https://www.example.com' \
--set 'extraEnv[2].name=HOSTS_TO_RESOLVE' \
--set 'extraEnv[2].value=www.example.com api.example.com'Using a values file:
# values.yaml
goldpinger:
logLevel: debug
extraEnv:
- name: TCP_TARGETS
value: "10.0.0.1:443 10.0.0.2:8080"
- name: HTTP_TARGETS
value: "https://www.example.com http://api.example.com"
- name: HOSTS_TO_RESOLVE
value: "www.example.com api.example.com"Then install:
helm install goldpinger goldpinger/goldpinger -f values.yamlGoldpinger can be installed manually via configuration similar to the following:
Goldpinger supports using a kubeconfig (specify with --kubeconfig-path) or service accounts.
Here's an example of what you can do (using the in-cluster authentication to Kubernetes apiserver).
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: goldpinger-serviceaccount
namespace: default
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: goldpinger
namespace: default
labels:
app: goldpinger
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: goldpinger
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
labels:
app: goldpinger
spec:
serviceAccount: goldpinger-serviceaccount
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: goldpinger
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "8080"
# Log level: debug, info, warn, error (default: info)
- name: LOG_LEVEL
value: "info"
# injecting real hostname will make for easier to understand graphs/metrics
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# podIP is used to select a randomized subset of nodes to ping.
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
image: "docker.io/bloomberg/goldpinger:v3.0.0"
imagePullPolicy: Always
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
resources:
limits:
memory: 80Mi
requests:
cpu: 1m
memory: 40Mi
ports:
- containerPort: 8080
name: http
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 20
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: goldpinger
namespace: default
labels:
app: goldpinger
spec:
type: NodePort
ports:
- port: 8080
nodePort: 30080
name: http
selector:
app: goldpingerNote, that you will also need to add an RBAC rule to allow Goldpinger to list other pods. If you're just playing around, you can consider a view-all default rule:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: goldpinger-serviceaccount
namespace: defaultYou can also see an example of using kubeconfig in the ./extras.
If your cluster IPv4/IPv6 dual-stack and you want to force IPv6, you can set the IP_VERSIONS environment variable to "6" (default is "4") which will use the IPv6 address on the pod and host.
Goldpinger uses structured logging via zap. The log level can be easily configured using an environment variable:
- name: LOG_LEVEL
value: "info" # Options: debug, info, warn, errorAvailable log levels:
debug- Detailed diagnostic information (includes probe execution details, timing, connection info)info- General informational messages (default)warn- Warning messages for non-critical issueserror- Error messages for serious problems
Enhanced probe logging: When using debug level, you'll see detailed information about:
- DNS resolution with resolved IPs and duration
- TCP connection details including local/remote addresses and error categorization
- HTTP request/response details including headers, status codes, and timing
- Error categorization (timeout, temporary, network, etc.)
From --help:
--log-level= Log level (debug, info, warn, error) (default: info) [$LOG_LEVEL]Example with debug logging:
- name: LOG_LEVEL
value: "debug"Note, that on top of resolving the other pods, all instances can also try to resolve arbitrary DNS. This allows you to test your DNS setup.
From --help:
--host-to-resolve= A host to attempt dns resolve on (space delimited) [$HOSTS_TO_RESOLVE]So in order to test two domains, we could add an extra env var to the example above:
- name: HOSTS_TO_RESOLVE
value: "www.bloomberg.com one.two.three"and goldpinger should show something like this:
Instances can also be configured to do simple TCP or HTTP checks on external targets. This is useful for visualizing more nuanced connectivity flows.
--tcp-targets= A list of external targets(<host>:<port> or <ip>:<port>) to attempt a TCP check on (space delimited) [$TCP_TARGETS]
--http-targets= A list of external targets(<http or https>://<url>) to attempt an HTTP{S} check on. A 200 HTTP code is considered successful. (space delimited) [$HTTP_TARGETS]
--tcp-targets-timeout= The timeout for a tcp check on the provided tcp-targets (default: 500ms) [$TCP_TARGETS_TIMEOUT]
--dns-targets-timeout= The timeout for a dns check on the provided dns-targets (default: 500ms) [$DNS_TARGETS_TIMEOUT]
--http-targets-timeout= The timeout for a http check on the provided http-targets (default: 500ms) [$HTTP_TARGETS_TIMEOUT] - name: HTTP_TARGETS
value: http://bloomberg.com
- name: TCP_TARGETS
value: 10.34.5.141:5000 10.34.195.193:6442The timeouts for the TCP, DNS and HTTP checks can be configured via TCP_TARGETS_TIMEOUT, DNS_TARGETS_TIMEOUT and HTTP_TARGETS_TIMEOUT respectively.
Important improvements:
- Parallel execution: All external probes run concurrently to minimize total check time
- Background caching: Probe results are cached and updated in the background every refresh-interval
- Non-blocking: External probe checks no longer block pod-to-pod health checks
- Independent health: External probe failures don't affect cluster health status - only pod-to-pod connectivity matters
This means that even if you have many dead TCP targets, your goldpinger pods will remain responsive and correctly report the health of your cluster. External probe results are displayed independently in the UI.
Once you have it running, you can hit any of the nodes (port 30080 in the example above) and see the UI.
You can click on various nodes to gray out the clutter and see more information.
The API exposed is via a well-defined Swagger spec.
The spec is used to generate both the server and the client of Goldpinger. If you make changes, you can re-generate them using go-swagger via make swagger
Once running, Goldpinger exposes Prometheus metrics at /metrics. All the metrics are prefixed with goldpinger_ for easy identification.
You can see the metrics by doing a curl http://$POD_ID:80/metrics.
These are probably the droids you are looking for:
goldpinger_peers_response_time_s_*
goldpinger_peers_response_time_s_*
goldpinger_nodes_health_total
goldpinger_stats_total
goldpinger_errors_totalYou can find an example of a Grafana dashboard that shows what's going on in your cluster in extras. This should get you started, and once you're on the roll, why not ❤️ contribute some kickass dashboards for others to use ?
Once you've gotten your metrics into Prometheus, you have all you need to set useful alerts.
To get you started, here's a rule that will trigger an alert if there are any nodes reported as unhealthy by any instance of Goldpinger.
alert: goldpinger_nodes_unhealthy
expr: sum(goldpinger_nodes_health_total{status="unhealthy"})
BY (instance, goldpinger_instance) > 0
for: 5m
annotations:
description: |
Goldpinger instance {{ $labels.goldpinger_instance }} has been reporting unhealthy nodes for at least 5 minutes.
summary: Instance {{ $labels.instance }} downSimilarly, why not ❤️ contribute some amazing alerts for others to use ?
Goldpinger also makes for a pretty good monitoring tool in when practicing Chaos Engineering. Check out PowerfulSeal, if you'd like to do some Chaos Engineering for Kubernetes.
Goldpinger was created by Mikolaj Pawlikowski and ported to Go by Chris Green.
We ❤️ contributions.
Have you had a good experience with Goldpinger ? Why not share some love and contribute code, dashboards and alerts ?
If you're thinking of making some code changes, please be aware that most of the code is auto-generated from the Swagger spec. The spec is used to generate both the server and the client of Goldpinger. If you make changes, you can re-generate them using go-swagger via make swagger.
Before you create that PR, please make sure you read CONTRIBUTING and DCO.
Please read the LICENSE file here.
For each version built by travis, there is also an additional version, appended with -vendor, which contains all source code of the dependencies used in goldpinger.