Build your own AI SRE agents. The open source toolkit for the AI era.
-
Updated
Jun 12, 2026 - Python
Build your own AI SRE agents. The open source toolkit for the AI era.
AI-powered SRE platform for automated incident investigation
AURA is an agentic harness that turns an LLM model into a reliable, autonomous service capable of executing real SRE work. AURA provides the guardrails, API servers, state management, authentication, streaming, error handling, and tool integrations necessary to run AI SRE agents safely in production.
Open-source AI SRE agent that investigates production incidents using episodic memory and Neo4j knowledge graph. 46 production skills. Self-hosted.
AI SRE tools for RCA, Incident Response, Cost-Saving, Infra management, DevOps and more
Open-source oncall management Alerts, Incidents, AI post-mortems. Self-hosted alternative to PagerDuty & incident.io. BYO-AI, Works with Prometheus, Grafana, Datadog, Slack, and Teams
Unpage is the open source framework for building SRE agents with infrastructure context and secure access to any dev tool.
Multi-strategy RAG system achieving 74% Recall@10 on MultiHop-RAG. Combines RAPTOR hierarchical retrieval, knowledge graphs, HyDE, BM25, and Cohere neural reranking.
A curated list of 100+ AI-powered tools, platforms, and resources for Site Reliability Engineering (SRE) — agents, incident management, observability, AIOps, chaos engineering, and more.
An open-source AI agent for infrastructure debugging.
Multi Agents responsible for complete Kubernetes deployment automation (generation → deployment → monitoring)
Synthetic production incidents and RCA evaluation for AI SRE agents.
Open-source, self-hostable AI production-debugging agent for backend teams.
Homebrew formulae that allows installation of Tracer tools through the Homebrew package manager.
Remote MCP server for mttrly — AI SRE agent for server health, diagnostics, playbooks, approvals, and audit history. OAuth 2.1 over Streamable HTTP.
Build a vector database from scratch in C++. Compare HNSW, KD-Tree, and Brute Force search algorithms with a local RAG pipeline and web visualization.
Curate and explore a comprehensive list of AI-driven tools and resources tailored for Site Reliability Engineering tasks and challenges.
Add a description, image, and links to the ai-sre topic page so that developers can more easily learn about it.
To associate your repository with the ai-sre topic, visit your repo's landing page and select "manage topics."