Skip to content

jarlah/sentinel

Repository files navigation

Sentinel

CI

Programmable infrastructure health monitoring as a single Haskell binary. Probes your services and databases on a schedule, tracks status, and exposes results as a JSON API.

Built on http-tower-hs and tower-hs — every probe (HTTP or database) flows through a composable middleware stack with circuit breakers, retries, and timeouts.

Quick start

# config.yaml
port: 8080

probes:
  - name: my-app
    url: "https://myapp.example.com/health"

  - name: main-db
    type: postgres
    connection_string: "host=localhost port=5432 dbname=mydb user=postgres password=secret"

  - name: cache
    type: redis
    connection_string: "redis://localhost:6379"
$ sentinel config.yaml
Sentinel starting on port 8080
Monitoring 3 probes
Tracing: disabled
[probe:my-app] "GET" "myapp.example.com" "/health" -> 200 (89ms)
$ curl localhost:8080/status
[
  {
    "name": "my-app",
    "status": "up",
    "latency_ms": 89.4,
    "error": null,
    "checked_at": "2026-04-04T14:58:07Z"
  },
  {
    "name": "main-db",
    "status": "up",
    "latency_ms": 3.2,
    "error": null,
    "checked_at": "2026-04-04T14:58:07Z"
  },
  {
    "name": "cache",
    "status": "up",
    "latency_ms": 1.1,
    "error": null,
    "checked_at": "2026-04-04T14:58:07Z"
  }
]

For HTTP probes, only name and url are required. For database probes, specify the type and connection details. Everything else is optional.

Configuration

Minimal

probes:
  - name: my-app
    url: "https://myapp.example.com/health"

This gives each probe: User-Agent (sentinel/<version>), a unique request ID, and logging. No retry, no timeout, no validation — just a raw health check.

Full

port: 8080
tracing: true

alerting:
  slack:
    webhook_url: "https://hooks.slack.com/services/T.../B.../xxx"
  resend:
    api_key: "re_xxx"
    from: "sentinel@example.com"
    to: ["oncall@example.com"]
    status_report: true
  prometheus:
    pushgateway_url: "http://localhost:9091"

probes:
  # HTTP probes (type defaults to "http")
  - name: my-app
    url: "https://myapp.example.com/health"
    interval_seconds: 15
    timeout_ms: 3000
    retries: 3
    follow_redirects: 5
    expected_status: [200, 299]
    alert_after: 3
    alert_reminder: 3600
    alerts: [slack, resend]
    circuit_breaker:
      failure_threshold: 5
      cooldown_seconds: 60
    headers:
      - ["Authorization", "Bearer my-secret-token"]
      - ["Accept", "application/json"]

  - name: external-api
    url: "https://api.partner.com/v1/status"
    interval_seconds: 60
    timeout_ms: 10000
    retries: 1
    expected_status: [200, 200]

  - name: redirect-check
    url: "https://old.example.com"
    follow_redirects: 3

  - name: internal-service
    url: "https://internal.example.com/health"
    tls_ca_path: "/etc/sentinel/ca.pem"
    tls_client_cert: "/etc/sentinel/client.pem"
    tls_client_key: "/etc/sentinel/client-key.pem"

  # Database probes
  - name: main-db
    type: postgres
    connection_string: "host=localhost port=5432 dbname=mydb user=postgres password=secret"
    interval_seconds: 30
    timeout_ms: 5000
    retries: 2
    circuit_breaker:
      failure_threshold: 5
      cooldown_seconds: 60

  - name: cache
    type: redis
    connection_string: "redis://localhost:6379"
    interval_seconds: 15
    timeout_ms: 3000

  - name: app-mysql
    type: mysql
    host: "localhost"
    port: 3306
    user: "monitor"
    password: "secret"
    database: "mydb"
    interval_seconds: 30

Probe types

Sentinel supports HTTP and database probes. Set the type field to choose:

Type Description Required fields
http (default) HTTP GET health check url
postgres PostgreSQL connection ping (SELECT 1) connection_string
mysql MySQL/MariaDB connection ping (COM_PING) host, user, password
redis Redis connection ping (PING) connection_string (default: redis://localhost:6379)

Database probes create a fresh connection, execute the health check, and close. This tests that the database accepts new connections — not just that the port is open.

Shared config reference

These fields apply to all probe types:

Field Type Default Description
name string required Probe identifier (used in API responses and logs)
type string http Probe type: http, postgres, mysql, redis
interval_seconds int 30 Seconds between probes
timeout_ms int none Request timeout in milliseconds
retries int none Retry count with 1s constant backoff
circuit_breaker.failure_threshold int 5 Consecutive failures before tripping
circuit_breaker.cooldown_seconds int 30 Seconds before probing recovery
alert_after int 1 Consecutive failures before alerting
alert_reminder int 0 Seconds between reminder alerts while still down (0 = no reminders)
alerts [string] all Which channels to use: slack, resend, prometheus

HTTP-specific config

Field Type Default Description
url string required URL to probe
follow_redirects int none Max redirect hops (301/302/303/307/308)
expected_status [int, int] none Accepted status code range [min, max] inclusive
headers [[name, value]] none Custom headers added to every request
tls_ca_path string none Path to a custom CA certificate (PEM) for TLS verification
tls_client_cert string none Path to client certificate (PEM) for mTLS
tls_client_key string none Path to client private key (PEM) for mTLS

MySQL-specific config

Field Type Default Description
host string localhost MySQL server hostname
port int 3306 MySQL server port
user string root MySQL username
password string "" MySQL password
database string "" MySQL database name

Global config

Field Type Default Description
tracing bool false Enable OpenTelemetry tracing for HTTP probes

Alerting channels

alerting:
  slack:
    webhook_url: "https://hooks.slack.com/services/T.../B.../xxx"
  resend:
    api_key: "re_xxx"                     # Resend API key
    from: "sentinel@example.com"
    to: ["oncall@example.com"]
  prometheus:
    pushgateway_url: "http://localhost:9091"
    job: "sentinel"
Field Description
alerting.slack.webhook_url Slack incoming webhook URL
alerting.resend.api_key Resend API key
alerting.resend.from Sender email address
alerting.resend.to List of recipient email addresses
alerting.resend.status_report Send status report emails on Mondays and Fridays (default: true)
alerting.prometheus.pushgateway_url Prometheus Pushgateway URL
alerting.prometheus.job Job label for pushed metrics (default: sentinel)

All alerting config is optional. If alerting is absent, no alerts are sent.

Alerting

Sentinel alerts on state transitions — not every probe result:

Transition Alert Example
Up → Down :red_circle: **my-app** is DOWN — connection refused After alert_after consecutive failures
Down → Down :warning: **my-app** is still DOWN Every alert_reminder seconds
Down → Up :large_green_circle: **my-app** recovered (89ms) Immediately
Up → Up no alert

Alerts fire asynchronously — a Slack outage won't block health monitoring. All alert HTTP calls go through http-tower-hs with retry and timeout.

Prometheus metrics

When configured, Sentinel pushes gauges to a Pushgateway:

sentinel_probe_up{probe="my-app"} 1
sentinel_probe_latency_ms{probe="my-app"} 89.4

Use Alertmanager rules on these metrics for more advanced alerting workflows.

Status reports

When Resend email is configured, Sentinel sends a status report email every Monday and Friday at 8 AM (local server time). This provides assurance that the service is running at the start and end of each work week.

  • No downtime: Subject line "[Sentinel] Status Report — No downtime" with a confirmation that all services have been operational since the last report.
  • Downtime detected: Subject line "[Sentinel] Status Report — Downtime detected" with a list of all incidents (down, still down, recovered) since the last report.

Both reports include a table of current probe statuses with name, status, latency, and last check time.

Status reports are enabled by default. To disable:

alerting:
  resend:
    api_key: "re_xxx"
    from: "sentinel@example.com"
    to: ["oncall@example.com"]
    status_report: false

Middleware stack

Sentinel uses composable middleware from tower-hs for all probe types.

HTTP probes

HTTP probes build an http-tower-hs middleware stack. Only configured middleware is applied:

User-Agent ─> Request ID ─> Headers ─> Redirects ─> Retry ─> Timeout ─> Validate ─> Circuit Breaker ─> Tracing ─> Logging
  (always)     (always)    (optional)  (optional)  (optional) (optional) (optional)    (optional)      (optional)  (always)
-- What sentinel builds under the hood for HTTP probes:
client <- newClientWithTLS maybeCaPath maybeClientCert
let configured = client
      |> withUserAgent "sentinel/<version>"
      |> withRequestId
      |> withHeader "Authorization" "Bearer my-token"
      |> withFollowRedirects 5
      |> withRetry (constantBackoff 3 1.0)
      |> withTimeout 3000
      |> withValidateStatus (\c -> c >= 200 && c < 300)
      |> withCircuitBreaker cbConfig breaker
      |> withTracing
      |> withLogging logger

Database probes

Database probes use tower-hs's protocol-agnostic Service type directly. A Service () () wrapping the database ping is composed with the same middleware primitives:

Retry ─> Timeout ─> Circuit Breaker ─> DB Ping

This means a downed database gets the same circuit breaker protection as HTTP services — after the failure threshold, sentinel stops attempting connections until the cooldown period elapses.

Circuit breaker

When configured, each probe gets its own circuit breaker. After failure_threshold consecutive failures, the breaker trips open and immediately rejects probe requests (no wasted HTTP calls or database connections to a known-dead service). After cooldown_seconds, it allows one probe through to test recovery.

API

Endpoint Method Description
/status GET JSON array of all probe results

Response format

[
  {
    "name": "my-app",
    "status": "up",
    "latency_ms": 89.4,
    "error": null,
    "checked_at": "2026-04-04T14:58:07Z"
  },
  {
    "name": "external-api",
    "status": "down",
    "latency_ms": 5012.3,
    "error": "Request timed out",
    "checked_at": "2026-04-04T14:58:12Z"
  }
]

Building and running

stack build
stack run -- config.yaml

# Or directly:
stack exec sentinel -- config.yaml

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages