An AI teammate for SRE and DevOps, currently focused on investigations
Investigate alerts quickly from Slack with Datadog telemetry and GitHub context.
Reili works as an AI teammate on your team, handling SRE and DevOps responsibilities.
When you assign a task to Reili, it will gather information from sources such as Datadog, GitHub, Slack, and configured knowledge bases to carry out the work As a general rule, Reili does not make changes to the production environment or perform recovery operations; instead, it uses the gathered information to investigate issues and generate reports.
SRE, DevOps, and on-call engineers spend much of their time on alert response — checking dashboards, reading diffs, and deciding whether action is needed. Reili takes that work off your plate.
- Let you decide exactly what permissions each connector can request and what authority you delegate to Reili, instead of accepting a fixed permission set chosen by a hosted app provider
- Investigate in Slack public channels like a teammate, working from the ongoing conversation where your team is already collaborating
- Connect Datadog telemetry, GitHub repositories and changes, optional knowledge base docs, and relevant Slack public-channel history to build investigation context
- Expand over time to cover additional external services beyond Datadog, GitHub, and Slack
Core:
- Read-focused: Reili reads and reports — it never changes your infrastructure
- Cross-service: works across Datadog, GitHub, Slack, and optional knowledge base search today, with additional services planned over time
- Chat-based: currently works in Slack
- Slack App
- Create and install it from slack-app-manifest.yml or Create App from manifest link
- In Slack App settings, open
Agents & AI Appsand turn onAgent or Assistantso Bot Token based Slack search is available - Configure the required scopes, events, and Interactivity using Slack Permissions and API Usage.
- Datadog API Key + APP Key for the Datadog MCP server
- OpenAI API Key, AWS credentials with permission to use Amazon Bedrock, or Google Cloud ADC with permission to call Vertex AI Gemini models
- GitHub App credentials for the repositories Reili investigates
- Create and install it from Create GitHub App
- Configure the required permissions and scope in GitHub Permissions and Scope.
- Optional esa access token with
readscope when you want Reili to search esa posts.
Use config/default.toml as the checked-in template for runtime settings:
cp config/default.toml reili.tomlFor a stripped-down example that relies on defaults, see config/minimum.toml.
Then edit reili.toml for your environment.
Non-secret settings live in reili.toml, including:
- server port
- conversation language
- Slack connection mode
- Slack channel table (
[[channel.slack.channels]]) opting channels in to mention responses and LLM-judged auto-responses, plus optional Slack actor authorization - selected AI backend and backend-specific non-secret settings, including optional separate backends for the lead agent and sub-agents
- Datadog site
- GitHub MCP URL, GitHub App ID, installation ID, and search scope org
- optional esa team name and access-token env var
Runtime config resolution is:
--config /path/to/reili.toml./reili.toml
If neither path exists, startup fails.
Use .env.example as a starting point and copy it to .env:
cp .env.example .envThen fill in the secret values referenced by reili.toml.
Required secrets depend on your selected Slack mode and backend:
SLACK_BOT_TOKENSLACK_APP_TOKENwhenchannel.slack.socket_mode = trueSLACK_SIGNING_SECRETwhenchannel.slack.socket_mode = falseDATADOG_API_KEYDATADOG_APP_KEYGITHUB_APP_PRIVATE_KEYESA_ACCESS_TOKENwhen[connector.esa]is configuredLLM_OPENAI_API_KEYwhen the selected backend usesprovider = "openai"LLM_ANTHROPIC_API_KEYwhen the selected backend usesprovider = "anthropic"
The GitHub integration talks to a streamable HTTP MCP server and exposes a small allowlisted set
of raw GitHub MCP read tools to the GitHub agent. File reads are exposed through a
read_file wrapper that returns a bounded, line-numbered window (offset/limit) over the
server-side get_file_contents tool, so large files are read incrementally instead of loading the
whole file into context.
GitHub App ID, installation ID, scope org, and MCP URL are configured in reili.toml.
The Datadog integration talks to the Datadog-hosted MCP server and internally requests the
core,security,dashboards,synthetics toolsets. Reili still exposes only an allowlisted read-only
subset of those tools, including dashboard detail retrieval and Synthetic test reads when Datadog
returns them for your plan and application key permissions.
The optional esa integration is enabled only when [connector.esa] is present in reili.toml.
When configured, Reili registers the investigate_esa sub-agent. That sub-agent uses
search_posts, which calls GET /v1/teams/:team_name/posts with the esa search query syntax in
q. Omit [connector.esa] to avoid reading the esa token env var and to keep the sub-agent and
tool unregistered.
SLACK_APP_TOKEN must be a Slack App-Level Token that starts with xapp-.
Reili runs a lead agent (the task runner) that delegates to per-connector sub-agents. By default both
roles use default_backend. To run sub-agents on a different model than the lead, point
ai.lead_backend and ai.sub_agent_backend at named backends in [ai.backends]. The two backends
must use the same provider; only the model differs between the roles. A common setup is a stronger
model for the lead and a cheaper, faster model for sub-agents. Either key falls back to
default_backend when omitted.
When the selected backend uses provider = "anthropic", Claude is called through the Anthropic
API.
- Set
api_key_env = "LLM_ANTHROPIC_API_KEY"on that backend inreili.toml. - Supported Anthropic model values are
claude-opus-4-6,claude-sonnet-4-6, andclaude-haiku-4-5. search_webuses Anthropic's web search server tool. Your Anthropic organization administrator must enable web search in Claude Console, or the tool will return a soft error payload instead of live search results.
When the selected backend uses provider = "bedrock", AWS credentials are loaded from the standard
AWS SDK chain. Set aws_profile and aws_region in reili.toml when you want to force a named
profile or region for that backend. The underlying AWS credentials still come from the normal AWS
environment or profile chain.
- Web search is currently unavailable with the Bedrock provider. If Reili issues a web search while
the selected backend uses
provider = "bedrock", it returns acapability_unavailableresult instead of live search results.
When the selected backend uses provider = "vertexai", Google credentials are loaded from
Application Default Credentials.
- Set
project_id,location, andmodel_idinreili.toml. - For Gemini on Vertex AI,
location = "global"is usually the best default. - Web search uses Vertex AI Gemini Grounding with Google Search.
- If Vertex AI returns
RESOURCE_EXHAUSTED, verify your project quotas in Google Cloud Quotas (https://console.cloud.google.com/iam-admin/quotas) and adjust them if needed.
To run Reili locally with Docker, provide both .env and reili.toml:
docker run --rm \
--env-file .env \
-v "$(pwd)/reili.toml:/home/reili/reili.toml:ro" \
ghcr.io/reilidev/reili:latestIf you are using HTTP mode, publish the application port as well:
docker run --rm \
--env-file .env \
-v "$(pwd)/reili.toml:/home/reili/reili.toml:ro" \
-p 3000:3000 \
ghcr.io/reilidev/reili:latestIf you need to override discovery order explicitly, pass --config to the runtime:
docker run --rm \
--env-file .env \
-v "$(pwd)/reili.toml:/work/reili.toml:ro" \
ghcr.io/reilidev/reili:latest \
--config /work/reili.tomlFor HTTP mode, Slack must be able to reach both /slack/events and /slack/interactions. In
local development, use a public tunnel such as ngrok or Cloudflare Tunnel and set both the
Slack Event Subscriptions Request URL and the Interactivity Request URL to that public HTTPS URL.
Mention the bot in Slack with a task request:
@Reili Please investigate this alert.
What happens:
- It posts a task control message with a
Cancelbutton in the thread - It posts task progress in the thread
- It loads current thread context and any recent reusable Reili notes visible through Slack search
- It investigates across Datadog, GitHub, and configured knowledge sources
- It replies with an evidence-backed summary
If you need to stop a queued or running investigation, click Cancel on that task's control
message in the same Slack thread.
Reili is intentionally scoped around task execution and decision support. The current runtime is investigation-focused. It can post progress and final replies in Slack, but it does not get shell access, cluster access, or deployment credentials in production.
At a high level, the current runtime:
- reads from Datadog, GitHub, optional esa posts, Slack thread history, Slack public-channel search, and web lookup integrations, and writes only Slack progress and result messages
- exposes only read-only Datadog MCP tools, including dashboard detail retrieval and Synthetic test reads when Datadog returns them
- does not register tools for Datadog mutations, GitHub writes, esa writes, remediation, or deployments
- is designed to investigate and report, not to change infrastructure, Datadog state, or repository state
For the full tool inventory, required Slack scopes, Datadog RBAC permissions, GitHub backend permissions, and LLM data boundary, see docs/permissions-and-boundaries.md.
For local development setup, architecture rules, and contributor workflows, see DEVELOPERS.md.
Releases are managed with tagpr using Git tags and changelog updates; Cargo manifest versions are
not part of the release flow.
- Executing operational actions like auto-remediation or auto-deploy
- Heavy stateful workflow orchestration
This project is intentionally focused on investigation-oriented task execution and decision support.