SRE as Agent

Turn Datadog alerts in Slack into automatic kagent investigations.

When Datadog posts an alert, this bridge picks it up from an allowlisted Slack channel, asks a kagent Datadog agent to investigate, and posts the findings back in the same Slack thread.

Datadog monitor -> Slack alert channel -> sre-slack-bridge
-> kagent datadog-agent -> Slack thread reply

Why this exists

On-call engineers often lose time doing the same first checks for every alert: open the monitor, inspect logs and metrics, correlate symptoms, and summarize what changed. This project automates that first investigation pass while keeping humans in control.

The design keeps Datadog, Slack, and kagent loosely coupled:

Datadog only posts to Slack.
The bridge owns Slack credentials and thread routing.
kagent owns the investigation.
The agent should stay read-only for automatically triggered alerts.

What you get

Slack-native workflow: investigations happen in the alert thread where the team is already looking.
No public Datadog webhook endpoint: Datadog does not need direct access to kagent or your cluster.
Structured alert contract: Datadog messages can include a small JSON marker so the bridge can identify the alert reliably.
Channel and sender controls: process only configured Slack channels and, when enabled, trusted Datadog sender IDs.
Local-cluster friendly: works with Kind, OrbStack, or any Kubernetes cluster where the bridge and kagent run together.

How it works

Datadog sends an alert to Slack.
The bridge receives the Slack message through Socket Mode.
The bridge checks the channel, sender, and Datadog alert marker.
The bridge calls the kagent A2A endpoint.
kagent investigates with Datadog MCP tools.
The bridge posts the final summary back to the Slack thread.

If the structured marker is missing, the bridge can still derive monitor_id and alert_id from Datadog Slack attachment links when they include link_monitor_id and link_event_id.

Local kagent cluster

For local development, run kagent and the Slack bridge in the same Kubernetes cluster and namespace. The bridge should call kagent through Kubernetes DNS:

http://kagent-controller.kagent:8083

Install kagent locally with the kagent CLI:

export KAGENT_DEFAULT_MODEL_PROVIDER=openAI
export OPENAI_API_KEY="<your-openai-or-compatible-api-key>"
kagent install --profile demo

Verify the local cluster:

kubectl get pods -n kagent
kubectl get svc -n kagent kagent-controller

For this same-cluster setup, configure the bridge with:

KAGENT_BASE_URL: "http://kagent-controller.kagent:8083"
KAGENT_NAMESPACE: "kagent"
KAGENT_AGENT_NAME: "datadog-agent"
KAGENT_API_TOKEN: "unused-in-local-unsecure-mode"

Local kagent installs usually use controller.auth.mode=unsecure, so KAGENT_API_TOKEN can be any non-empty placeholder. In an authenticated kagent deployment, replace it with the token accepted by your auth proxy.

Create the Slack app

Create a Slack app from slack-app-manifest.yaml, then install it to your workspace. The manifest enables Socket Mode, creates the kagent bot user, grants the required bot scopes, and subscribes to public/private channel message events.

After creating the app:

Create an app-level token with connections:write.
Copy the bot token after installing the app.
Copy the app-level token from Basic Information > App-Level Tokens.
Set KAGENT_BOT_USER_ID to the Slack bot user ID, for example <slack-bot-user-id>. Do not use the display name kagent; Slack mentions use the Slack user ID, as in <@...>.
Install the app and invite the bot to the Datadog alert channel.

Configure the Datadog alert message

Add a structured marker to the Datadog Slack monitor message:

```json
{"source":"datadog","alert_id":"<MONITOR_ID>:{{host.name}}:{{last_triggered_at_epoch}}","monitor_id":"<MONITOR_ID>","dedupe_key":"datadog:<MONITOR_ID>:{{host.name}}:{{last_triggered_at_epoch}}"}
```

Keep the rest of the monitor message human-readable. The bridge treats the JSON marker as the machine contract and uses the alert ID to ask kagent to fetch Datadog details.

Do not use @kagent as the automation trigger in Datadog. Datadog interprets @... as a Datadog notification handle, not as a Slack user mention.

Datadog test notifications are useful for validating the Slack-to-kagent pipeline, but they may contain placeholder values such as host.name. Use a real triggered monitor when validating investigation quality for a specific host, service, or Kubernetes workload.

Runtime config

Copy .env.example to .env for local runs, or set these values in Kubernetes:

Name	Description
`SLACK_BOT_TOKEN`	Slack bot token.
`SLACK_APP_TOKEN`	Slack app-level Socket Mode token.
`KAGENT_BOT_USER_ID`	Slack user ID for the kagent bot. The display name `kagent` will not work.
`ALLOWED_CHANNEL_IDS`	Comma- or space-separated Slack channel IDs to watch.
`TRUSTED_DATADOG_SENDER_IDS`	Optional comma- or space-separated Slack `bot_id`/`user` IDs allowed to trigger marker-based Datadog investigations. Leave empty for local testing.
`KAGENT_BASE_URL`	kagent controller base URL. Use `http://kagent-controller.kagent:8083` for same-cluster local setup.
`KAGENT_API_TOKEN`	Token sent as `Authorization: Bearer ...`. Use a dummy non-empty value for local unsecure kagent.
`KAGENT_NAMESPACE`	Defaults to `kagent`.
`KAGENT_AGENT_NAME`	Defaults to `datadog-agent`.
`KAGENT_USER_ID`	Defaults to `admin@kagent.dev`. Used when polling kagent session events for the final answer.
`KAGENT_SESSION_POLL_TIMEOUT_SECONDS`	Defaults to `90`. Maximum time to wait for kagent to write a final session event.
`KAGENT_SESSION_POLL_INTERVAL_SECONDS`	Defaults to `2`. Delay between session event polls.

The bridge first calls the kagent A2A endpoint. If the immediate response only contains task metadata, it polls the matching kagent session until it finds a model text response or an ask_user question to relay back into Slack.

Run locally

cp .env.example .env
# Fill in local values in .env; never commit real tokens.
python -m venv .venv
. .venv/bin/activate
pip install -e .
sre-slack-bridge

Run tests:

just test
# or
PYTHONPATH=src python3 -m unittest discover -s tests -v

Deploy to Kubernetes

Replace placeholders in k8s/datadog-agent.yaml and k8s/slack-bridge.yaml locally.
Build and publish the image from Dockerfile.
Update the deployment image.
Apply the manifests:

kubectl apply -f k8s/datadog-agent.yaml
kubectl apply -f k8s/slack-bridge.yaml

The Makefile wraps common local commands:

make kind-platform
make build-kind-push
make restart-bridge
make local-status
make local-url
make apply-sre-agent
make apply-bridge
make local-port-forward-ui

Use make build-kind-push for local Kind/OrbStack clusters. It detects the Kind node architecture and builds the image with the matching Docker platform. If the pod shows ImagePullBackOff with no match for platform in manifest, rebuild with make build-kind-push, then run make restart-bridge.

The justfile uses set dotenv-load, so recipes automatically load values from a local .env file:

REGISTRY=localhost:5001
IMAGE_NAME=sre-slack-bridge
TAG=latest
KUBE_CONTEXT=kind-kagent
KAGENT_NAMESPACE=kagent

Common just recipes:

just
just local-url
just kind-platform
just build-kind-push
just restart-bridge
just apply-all
just local-port-forward-ui
just test

Safety notes

Keep real Slack, Datadog, kagent, and LLM tokens out of git.
Keep autonomous alert investigations read-only.
Enable TRUSTED_DATADOG_SENDER_IDS outside local testing.
Review SECURITY.md before using this outside a local/dev environment.

Project status

This is a small, pragmatic bridge for experimenting with Slack-driven SRE agent workflows. If it helps you reduce alert triage toil, a star is appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRE as Agent

Why this exists

What you get

How it works

Local kagent cluster

Create the Slack app

Configure the Datadog alert message

Runtime config

Run locally

Deploy to Kubernetes

Safety notes

Project status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
adr		adr
k8s		k8s
src/sre_as_agent		src/sre_as_agent
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
justfile		justfile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
slack-app-manifest.yaml		slack-app-manifest.yaml

Folders and files

Latest commit

History

Repository files navigation

SRE as Agent

Why this exists

What you get

How it works

Local kagent cluster

Create the Slack app

Configure the Datadog alert message

Runtime config

Run locally

Deploy to Kubernetes

Safety notes

Project status

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages