0% found this document useful (0 votes)
226 views31 pages

16 - Prometheus Handout

Uploaded by

kihepex735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
226 views31 pages

16 - Prometheus Handout

Uploaded by

kihepex735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DEVOPS

BOOTCAMP Monitoring with


Prometheus
Introduction to Prometheus

Prometheus is an open-source monitoring system and alerting


toolkit
Prometheus is used widely and has an active community

It gathers, organizes, and stores metrics as time series data from


targets by "scraping" metrics HTTP endpoints
Can trigger alerts when specified conditions are observed
Why we need a monitoring tool - 1
Visibility in different environments Visibility on different levels

You need visibility in all kinds of When you have 100s or 1000s of containers, plus
environments, but especially in highly components on multiple levels (infrastructure,
dynamic container environment, which is more platform, application) you need a way to have a
challenging to monitor visibility and consistent monitoring across all these
components

Container
environments
Bare servers

For that you need tools like Prometheus, which


are designed for monitoring these types of Without visibility, it's a black box for you. When things
environments break inside your complex environment, you have no
idea what is happening. You don't know what has
caused the issue, what is not working.
Why we need a monitoring tool - 2
Example Use Case
Backend running?
Any exceptions?
Problem:
Auth-Service running?
Many application errors appear in frontend to the end user
Why did Auth-Service crash?
They only see the error message, but the cause can be any
of the many components in the backend

Solution:
Monitoring can help identifying the problem quickly with little effort
Instead of manually trying to troubleshoot across multiple
components, it will help exactly pin point directly to the root
cause
Saves you a lot of time and effort

Company saves a lot of money


Why we need a monitoring tool - 3

With Prometheus everything is automated. It's constantly monitoring and looking out for any
issues real time and may even identify a potential issue before it happens, so you can prevent it.

Constantly monitors all the services

Triggers alerts when a services crashes

Helps to identify problems before they happen


Prometheus Architecture - 1
How it all works

Prometheus Server

Is the main component


Does the actual monitoring work
Scrapes and stores time series data
Prometheus Architecture - 2
How it all works: Targets & Metrics

Prometheus pulls metrics from targets

Targets Metrics

What does Prometheus monitor? Which units are monitored of those targets?
Prometheus Architecture - 3
How it all works: Metrics

Metrics play an important role in understanding why your application is working in a certain way

Metric Entries How Prometheus collects Metrics Data from Targets

Format: Human-readable text-based Prometheus pulls from HTTP endpoints

Metric entries consist of: TYPE and HELP attributes Targets must expose: [hostaddress]/metrics
Must be in correct format that Prometheus
understands

...how many times x happened ...what is current value of x now? ...how long or how big?
Prometheus Architecture - 4
How it all works: Exporters
Official vs Third-Party
Some services expose /metrics endpoints by default Some are maintained as part of the official

Others need another component for that: Prometheus organization, others are externally
contributed and maintained.

Exporters

Exporters help in exporting existing metrics


from third-party systems as Prometheus
metrics
An exporter is a services that fetches metrics
from target and converts the data and
exposes them as Prometheus metrics
Prometheus can then scrape this endpoint as
usual
Prometheus Architecture - 5
How it all works: Exporters

Example: Monitor a Linux Server Example: Monitor own applications

1. Download a node exporter Client libraries let you define and expose

2. Untar and execute internal metrics via an HTTP endpoint on

3. Converts metrics of the server your application's instance

4. Exposes /metrics endpoint Metrics like: How many requests?


5. Configure Prometheus to How many exceptions?
Server resources used?
scrape this endpoint
Choose a Prometheus client library that
matches the language in which your
application is written

Exporters are available as Docker Images


Prometheus Architecture - 6
How it all works: Push vs Pull

Important difference of Prometheus compared to other monitoring systems like Amazon Cloud
Watch or New Relic
Prometheus - Pull Model

Others - Push Model Prometheus pulls metrics from endpoints

Services push to a centralized collection platform

High load of network traffic

Monitoring can become your bottleneck

Installation of additional software to push metrics


Prometheus Architecture - 7
How it all works: Pushgateway

Pushgateway

An intermediary service, which


allows you to push metrics from
jobs, which cannot be scraped
Prometheus recommends using the
Pushgateway only in certain limited
cases: Usually only valid use case for
capturing the outcome of a service-
level batch job
Prometheus Architecture - 8
How it all works: Alertmanager

Alertmanager

The Alertmanager handles alerts sent by Prometheus server


Takes care of deduplicating, grouping and routing them to the correct receiver integrations

Receiver of these alerts can be email,


PagerDuty, Slack etc.
Prometheus Architecture - 9
How it all works: Data Storage

Prometheus Data Storage

Prometheus includes a local on-disk time series database


But optionally integrates with remote storage systems

Data in local storage is stored in a


custom, highly efficient format
Prometheus Architecture - 10
How it all works: PromQL
Querying Prometheus
Prometheus provides a functional query language called
PromQL
Let's user select and aggregate time series data in real time

Options to view result Example Queries:

1) Query target directly

2) Prometheus Web UI

3) Or use a more
powerful visualization
tool, e.g. Grafana
Configuring Prometheus - 1
YAML Config

You write your configuration in a prometheus.yml file


Let Prometheus know what to scrape and when:

Which targets? At what interval? prometheus.yml

Example Config File


Targets are discovered via
How often Prometheus will
a service discovery
scrape its targets
mechanism
Rules for aggregating metric
values or creating alerts
when condition met
What resources Prometheus
monitors
Prometheus has its own
/metrics endpoint
Configuring Prometheus - 2

Define your own jobs

Default values for each job:


Prometheus Characteristics

Difficult to scale

Reliable

Standalone and self-containing

Works, even if other parts of infrastructure broken

No extensive set-up needed Limits Monitoring

Less complex

Workaround:

Increase Prometheus server capacity

Limit number of metrics


Other Prometheus Features
Prometheus Federation

Allows Prometheus to scale to environments


with tens of data centers and millions of
nodes
Allows a Prometheus server to scrape data
from other Prometheus servers

Prometheus with Docker and Kubernetes

Fully compatible Can easily be deployed in container


Prometheus environments like K8s
components available Monitoring of K8s cluster node
as Docker images resources out-of-the-box!
Deploy Monitoring Stack - 1
3 different ways to deploy the Prometheus monitoring stack

1) Do it yourself 2) Using an Operator 2) Using Helm

Create all configuration Manager of all Prometheus Using Helm chart to deploy
YAML files yourself components operator
Execute them in right Helm: Manage initial setup
order 1. Find Prometheus operator Operator: Manage setup
2. Deploy in K8s cluster
Deploy Monitoring Stack - 2

Overview of K8s resources deployed:

3 Deployments 1 DaemonSet
Prometheus Operator Node Exporter DaemonSet

created Prometheus and => connects to server


Alertmanager StatefulSet
=> translates Worker Nodes
Grafana metrics to Prometheus metrics
Kube State Metrics
CPU usage load on server
=> own Helm chart

=> dependency of this Helm chart

=> scrapes K8s components


Data Visualization - 1
1st step: Decide what to monitor?

Notice when something unexpected happens


Observe any anomalies
CPU spikes, insufficient storage, high load,
unauthorized requests

Analyze and react accordingly

2nd step: How to get this information?

How to get visibility of these


monitoring data
What data do we have available?
Data Visualization - 2

3rd step: Use a proper data visualization tool

Grafana = a powerful open source visualization and


analytics software
Already deployed with the Prometheus Operator
Helm Chart

Data Visualization Tool


Grafana
With Grafana you can create dynamic and reusable dashboards that allow
you to visualize your data in any way you want

Dashboard Panel
Dashboard is a set of one or more panels The basic visualization building block in Grafana

You can create your own Dashboards Composed by a query and a visualization

Organized into one or more rows Each panel has a query editor specific to the

Row is a logical divider within a dashboard data source selected in the panel

Rows are used to group panels together Can be moved and resized within a dashboard
Alerting in Prometheus - 1

Instead of constantly checking, you


want to get notified when something
happens
Then you will check your dashboards

For that we need to configure our


monitoring stack to notify us
whenever something unexpected
happens
Alerting in Prometheus - 2 Example Alert rules to configure

1st Alert: when CPU usage > 50%


Configure Alerting
2nd Alert: when Pod cannot start
Alerting with Prometheus is separated into 2 parts:
1) Alerting rules in Prometheus server send
alerts to an Alertmanager
2) Alertmanager then manages (deduplicating,
grouping, routing) those alerts, including
1)
sending out notifications

Main steps to setup alerting and notifications:


2)
1. Setup and configure the Alertmanager
2. Configure Prometheus to talk to the Alertmanaer
Prometheus server and
3. Create alerting rules in Prometheus
Alertmanager have each its
own configuration file
Alerting in Prometheus - 3
Alertmanager example configuration

Receiver:
These are the notification integrations
For each alert you can define own receiver. For example:
send all K8s cluster related issues to admin email
send all application related issues to developer
team's slack channel
Monitor third party and own applications - 1

Still missing:

Configure Third-Party and


own application monitoring

Monitor Kubernetes components

Monitor Resource Consumption on the Nodes

Monitor Prometheus Stack itself

Monitor third-party applications like Redis

Monitor own applications, like your online shop

microservices
Monitor third party and own applications - 2
3rd-party example: Redis

Monitor Redis on application level, not on


Kubernetes level

As we learnt, we can do that


via an Exporter!

How to:
1. Deploy redis-exporter
2. Deploy ServiceMonitor (custom K8s
resource) to tell Prometheus about
this new exporter
Monitor third party and own applications - 3
Own application

No exporter available for your own application As we learnt, we can do that


via Client Libraries!
So we have to define the metrics ourselves

How to (Nodejs application): Client Libraries:

1. Expose metrics using Nodejs client library Gives you an abstract interface to expose your

2. Deploy Nodejs application in the cluster metrics

3. Configure Prometheus to scrape new target Libraries implement the Prometheus metric types:

(ServiceMonitor) Counter, Gauge, Histogram, Summary

4. Visualize scraped metrics in Grafana Dashboard Choose client library that matches the application's
language
Best Practices

Official Best Practices:


Metric and Label Naming: https://prometheus.io/docs/practices/naming/
Set of guidelines for instrumenting your code:
https://prometheus.io/docs/practices/instrumentation/
Consoles and Dashboards: https://prometheus.io/docs/practices/consoles/
Alerting :https://prometheus.io/docs/practices/alerting/
On when to use the Pushgateway: https://prometheus.io/docs/practices/pushing/

You might also like