DEVOPS
BOOTCAMP Monitoring with
Prometheus
Introduction to Prometheus
Prometheus is an open-source monitoring system and alerting
toolkit
Prometheus is used widely and has an active community
It gathers, organizes, and stores metrics as time series data from
targets by "scraping" metrics HTTP endpoints
Can trigger alerts when specified conditions are observed
Why we need a monitoring tool - 1
Visibility in different environments Visibility on different levels
You need visibility in all kinds of When you have 100s or 1000s of containers, plus
environments, but especially in highly components on multiple levels (infrastructure,
dynamic container environment, which is more platform, application) you need a way to have a
challenging to monitor visibility and consistent monitoring across all these
components
Container
environments
Bare servers
For that you need tools like Prometheus, which
are designed for monitoring these types of Without visibility, it's a black box for you. When things
environments break inside your complex environment, you have no
idea what is happening. You don't know what has
caused the issue, what is not working.
Why we need a monitoring tool - 2
Example Use Case
Backend running?
Any exceptions?
Problem:
Auth-Service running?
Many application errors appear in frontend to the end user
Why did Auth-Service crash?
They only see the error message, but the cause can be any
of the many components in the backend
Solution:
Monitoring can help identifying the problem quickly with little effort
Instead of manually trying to troubleshoot across multiple
components, it will help exactly pin point directly to the root
cause
Saves you a lot of time and effort
Company saves a lot of money
Why we need a monitoring tool - 3
With Prometheus everything is automated. It's constantly monitoring and looking out for any
issues real time and may even identify a potential issue before it happens, so you can prevent it.
Constantly monitors all the services
Triggers alerts when a services crashes
Helps to identify problems before they happen
Prometheus Architecture - 1
How it all works
Prometheus Server
Is the main component
Does the actual monitoring work
Scrapes and stores time series data
Prometheus Architecture - 2
How it all works: Targets & Metrics
Prometheus pulls metrics from targets
Targets Metrics
What does Prometheus monitor? Which units are monitored of those targets?
Prometheus Architecture - 3
How it all works: Metrics
Metrics play an important role in understanding why your application is working in a certain way
Metric Entries How Prometheus collects Metrics Data from Targets
Format: Human-readable text-based Prometheus pulls from HTTP endpoints
Metric entries consist of: TYPE and HELP attributes Targets must expose: [hostaddress]/metrics
Must be in correct format that Prometheus
understands
...how many times x happened ...what is current value of x now? ...how long or how big?
Prometheus Architecture - 4
How it all works: Exporters
Official vs Third-Party
Some services expose /metrics endpoints by default Some are maintained as part of the official
Others need another component for that: Prometheus organization, others are externally
contributed and maintained.
Exporters
Exporters help in exporting existing metrics
from third-party systems as Prometheus
metrics
An exporter is a services that fetches metrics
from target and converts the data and
exposes them as Prometheus metrics
Prometheus can then scrape this endpoint as
usual
Prometheus Architecture - 5
How it all works: Exporters
Example: Monitor a Linux Server Example: Monitor own applications
1. Download a node exporter Client libraries let you define and expose
2. Untar and execute internal metrics via an HTTP endpoint on
3. Converts metrics of the server your application's instance
4. Exposes /metrics endpoint Metrics like: How many requests?
5. Configure Prometheus to How many exceptions?
Server resources used?
scrape this endpoint
Choose a Prometheus client library that
matches the language in which your
application is written
Exporters are available as Docker Images
Prometheus Architecture - 6
How it all works: Push vs Pull
Important difference of Prometheus compared to other monitoring systems like Amazon Cloud
Watch or New Relic
Prometheus - Pull Model
Others - Push Model Prometheus pulls metrics from endpoints
Services push to a centralized collection platform
High load of network traffic
Monitoring can become your bottleneck
Installation of additional software to push metrics
Prometheus Architecture - 7
How it all works: Pushgateway
Pushgateway
An intermediary service, which
allows you to push metrics from
jobs, which cannot be scraped
Prometheus recommends using the
Pushgateway only in certain limited
cases: Usually only valid use case for
capturing the outcome of a service-
level batch job
Prometheus Architecture - 8
How it all works: Alertmanager
Alertmanager
The Alertmanager handles alerts sent by Prometheus server
Takes care of deduplicating, grouping and routing them to the correct receiver integrations
Receiver of these alerts can be email,
PagerDuty, Slack etc.
Prometheus Architecture - 9
How it all works: Data Storage
Prometheus Data Storage
Prometheus includes a local on-disk time series database
But optionally integrates with remote storage systems
Data in local storage is stored in a
custom, highly efficient format
Prometheus Architecture - 10
How it all works: PromQL
Querying Prometheus
Prometheus provides a functional query language called
PromQL
Let's user select and aggregate time series data in real time
Options to view result Example Queries:
1) Query target directly
2) Prometheus Web UI
3) Or use a more
powerful visualization
tool, e.g. Grafana
Configuring Prometheus - 1
YAML Config
You write your configuration in a prometheus.yml file
Let Prometheus know what to scrape and when:
Which targets? At what interval? prometheus.yml
Example Config File
Targets are discovered via
How often Prometheus will
a service discovery
scrape its targets
mechanism
Rules for aggregating metric
values or creating alerts
when condition met
What resources Prometheus
monitors
Prometheus has its own
/metrics endpoint
Configuring Prometheus - 2
Define your own jobs
Default values for each job:
Prometheus Characteristics
Difficult to scale
Reliable
Standalone and self-containing
Works, even if other parts of infrastructure broken
No extensive set-up needed Limits Monitoring
Less complex
Workaround:
Increase Prometheus server capacity
Limit number of metrics
Other Prometheus Features
Prometheus Federation
Allows Prometheus to scale to environments
with tens of data centers and millions of
nodes
Allows a Prometheus server to scrape data
from other Prometheus servers
Prometheus with Docker and Kubernetes
Fully compatible Can easily be deployed in container
Prometheus environments like K8s
components available Monitoring of K8s cluster node
as Docker images resources out-of-the-box!
Deploy Monitoring Stack - 1
3 different ways to deploy the Prometheus monitoring stack
1) Do it yourself 2) Using an Operator 2) Using Helm
Create all configuration Manager of all Prometheus Using Helm chart to deploy
YAML files yourself components operator
Execute them in right Helm: Manage initial setup
order 1. Find Prometheus operator Operator: Manage setup
2. Deploy in K8s cluster
Deploy Monitoring Stack - 2
Overview of K8s resources deployed:
3 Deployments 1 DaemonSet
Prometheus Operator Node Exporter DaemonSet
created Prometheus and => connects to server
Alertmanager StatefulSet
=> translates Worker Nodes
Grafana metrics to Prometheus metrics
Kube State Metrics
CPU usage load on server
=> own Helm chart
=> dependency of this Helm chart
=> scrapes K8s components
Data Visualization - 1
1st step: Decide what to monitor?
Notice when something unexpected happens
Observe any anomalies
CPU spikes, insufficient storage, high load,
unauthorized requests
Analyze and react accordingly
2nd step: How to get this information?
How to get visibility of these
monitoring data
What data do we have available?
Data Visualization - 2
3rd step: Use a proper data visualization tool
Grafana = a powerful open source visualization and
analytics software
Already deployed with the Prometheus Operator
Helm Chart
Data Visualization Tool
Grafana
With Grafana you can create dynamic and reusable dashboards that allow
you to visualize your data in any way you want
Dashboard Panel
Dashboard is a set of one or more panels The basic visualization building block in Grafana
You can create your own Dashboards Composed by a query and a visualization
Organized into one or more rows Each panel has a query editor specific to the
Row is a logical divider within a dashboard data source selected in the panel
Rows are used to group panels together Can be moved and resized within a dashboard
Alerting in Prometheus - 1
Instead of constantly checking, you
want to get notified when something
happens
Then you will check your dashboards
For that we need to configure our
monitoring stack to notify us
whenever something unexpected
happens
Alerting in Prometheus - 2 Example Alert rules to configure
1st Alert: when CPU usage > 50%
Configure Alerting
2nd Alert: when Pod cannot start
Alerting with Prometheus is separated into 2 parts:
1) Alerting rules in Prometheus server send
alerts to an Alertmanager
2) Alertmanager then manages (deduplicating,
grouping, routing) those alerts, including
1)
sending out notifications
Main steps to setup alerting and notifications:
2)
1. Setup and configure the Alertmanager
2. Configure Prometheus to talk to the Alertmanaer
Prometheus server and
3. Create alerting rules in Prometheus
Alertmanager have each its
own configuration file
Alerting in Prometheus - 3
Alertmanager example configuration
Receiver:
These are the notification integrations
For each alert you can define own receiver. For example:
send all K8s cluster related issues to admin email
send all application related issues to developer
team's slack channel
Monitor third party and own applications - 1
Still missing:
Configure Third-Party and
own application monitoring
Monitor Kubernetes components
Monitor Resource Consumption on the Nodes
Monitor Prometheus Stack itself
Monitor third-party applications like Redis
Monitor own applications, like your online shop
microservices
Monitor third party and own applications - 2
3rd-party example: Redis
Monitor Redis on application level, not on
Kubernetes level
As we learnt, we can do that
via an Exporter!
How to:
1. Deploy redis-exporter
2. Deploy ServiceMonitor (custom K8s
resource) to tell Prometheus about
this new exporter
Monitor third party and own applications - 3
Own application
No exporter available for your own application As we learnt, we can do that
via Client Libraries!
So we have to define the metrics ourselves
How to (Nodejs application): Client Libraries:
1. Expose metrics using Nodejs client library Gives you an abstract interface to expose your
2. Deploy Nodejs application in the cluster metrics
3. Configure Prometheus to scrape new target Libraries implement the Prometheus metric types:
(ServiceMonitor) Counter, Gauge, Histogram, Summary
4. Visualize scraped metrics in Grafana Dashboard Choose client library that matches the application's
language
Best Practices
Official Best Practices:
Metric and Label Naming: https://prometheus.io/docs/practices/naming/
Set of guidelines for instrumenting your code:
https://prometheus.io/docs/practices/instrumentation/
Consoles and Dashboards: https://prometheus.io/docs/practices/consoles/
Alerting :https://prometheus.io/docs/practices/alerting/
On when to use the Pushgateway: https://prometheus.io/docs/practices/pushing/