Send notifications when something goes wrong in rancher
- Will kick your ass when service goes down and send message when on recover
- Various notification mechanisms
- slack
-
- please create an issue if you need more
- Configure notification mechanisms globally or on a per service level(supported in
.jsonconfig setup for now) - Customize your notification messages
rancher-alarms:
image: ndelitski/rancher-alarms
environment:
ALARM_SLACK_WEBHOOK_URL:https://hooks.slack.com/services/:UUID
labels:
io.rancher.container.create_agent: true
io.rancher.container.agent.role: environmentHow to create Slack Webhook URL
NOTE: Including rancher agent labels is crucial otherwise you need provide rancher credentials manually with RANCHER_* variables
docker run \
-d \
-e RANCHER_ADDRESS=rancher.yourdomain.com \
-e RANCHER_ACCESS_KEY=ACCESS-KEY \
-e RANCHER_SECRET_KEY=SECRET-KEY \
-e RANCHER_PROJECT_ID=1a8 \
-e ALARM_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR_SLACK_UUID \
--name rancher-alarms \
ndelitski/rancher-alarms
On startup get a list of services and instantiate healthcheck monitor for each of them if service is in a running state. Removed, purged and etc services will be ignored
List of healthcheck monitors is updated with a pollServicesInterval interval. When service is removed it will be no longer monitored.
When a service transitions to a degraded state, all targets will be invoked to process notification(s).
rancher-alarms:
image: ndelitski/rancher-alarms
environment:
RANCHER_ADDRESS:your-rancher.com
ALARM_SLACK_WEBHOOK_URL:https://hooks.slack.com/services/...More docker-compose examples see in examples
Could be ignored if you are running inside Rancher environment (service should be started as a rancher agent though)
RANCHER_ADDRESSRANCHER_PROJECT_IDRANCHER_ACCESS_KEYRANCHER_SECRET_KEY
ALARM_POLL_INTERVALALARM_MONITOR_INTERVALALARM_MONITOR_HEALTHY_THRESHOLDALARM_MONITOR_UNHEALTHY_THRESHOLDALARM_FILTER
ALARM_EMAIL_ADDRESSESALARM_EMAIL_USERALARM_EMAIL_PASSALARM_EMAIL_SSLALARM_EMAIL_SMTP_HOSTALARM_EMAIL_SMTP_PORTALARM_EMAIL_FROMALARM_EMAIL_SUBJECTALARM_EMAIL_TEMPLATEALARM_EMAIL_TEMPLATE_FILE
ALARM_SLACK_WEBHOOK_URLALARM_SLACK_CHANNELALARM_SLACK_BOTNAMEALARM_SLACK_TEMPLATEALARM_SLACK_TEMPLATE_FILE
See examples using environment config in docker-compose files
{
"rancher": {
"address": "rancher-host:port",
"auth": {
"accessKey": "<ACCESS_KEY>",
"secretKey": "<KEEP_YOUR_SECRETS_SAFE>"
},
"projectId": "1a5"
},
"pollServicesInterval": 10000,
"filter": [
"app/*"
],
"notifications": {
"*": {
"targets": {
"email": {
"recipients": [
"join@snow.com"
]
}
},
"healthcheck": {
"pollInterval": 5000,
"healthyThreshold": 2,
"unhealthyThreshold": 3
},
},
"frontend": {
"targets": {
"email": {
"recipients": [
"arya@stark.com"
]
}
}
}
},
"targets": {
"email": {
"smtp": {
"from": "<Alarm Service> alarm@domain.com",
"auth": {
"user": "john@doe.com",
"password": "Str0ngPa$$"
},
"host": "smtp.gmail.com",
"secureConnection": true,
"port": 465
}
},
"slack": {
"webhookUrl": "https://hooks.slack.com/services/YOUR_SLACK_UUID",
"botName": "rancher-alarm",
"channel": "#devops"
}
}
}rancherRancher API settings.requiredpollServicesIntervalinterval in ms of fetching list of services.required.filterwhitelist filter for stack/services names in environment. List of string values. Every string is a RegExp expression so you can use something like this to match all stack servicesfrontend/*.optionalnotificationsper service notification settings. Wildcard means any servicerequiredhealtcheckmonitoring state options.optionaldefaults are:
{ pollInterval: 5000, healthyThreshold: 2, unhealthyThreshold: 3 }
targetswhat notification targets to use. Will override base target settings in a roottargetssection. Currently each target must be an Object value. If you have nothing to override from a base settings just place{}as a value.optional
targetsbase settings for each notification target.required
healthyStateHEALTHY or UNHEALTHYstateservice state like it named in Rancher APIprevMonitorStaterancher-alarms previous service state namemonitorStaterancher-alarms service state name - e.g. always degraded for unhealthyserviceNameName of a service in a RancherserviceUrlUrl to a running service in a Rancher UIstackUrlUrl to stack in a Rancher UIstackNameName of a stack in a RancherenvironmentNameName of a environment in a RancherenvironmentUrlURL to environment in a rancher UI
Hey buddy! Your service #{serviceName} become #{healthyState}, direct link to the service #{serviceUrl}
More detailed examples your can see in the examples folder
- [] Simplify configuration.
- [] More use of rancher labels and metadata. Alternate configuration through rancher labels/metadata(can be used in a conjunction with initial config).
- [] Run in a rancher environment as an agent with a new label
agent: true. No need to specify keys anymore! - [] More notifications mechanisms: AWS SNS, http, sms
- Support templating
- [] Test coverage. Setup drone.io
- Notify when all services operate normal after some of them were in a degraded state
- [] Refactor code
- Shrinking image size with alpine linux