SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Here are 37 public repositories matching this topic...
Open source AI terminal and SSH Client for EC2, Database and Kubernetes.
-
Updated
Nov 12, 2025 - TypeScript
🔥GitHub Action to trigger alerts in incident.io.
-
Updated
Nov 11, 2025 - TypeScript
Everything you need to build, deploy, and collaborate with agents. Ride the llama, avoid the drama.
-
Updated
Nov 11, 2025 - TypeScript
GitHub Action for zero-maintenance CPython patch updates across your repo.
-
Updated
Nov 10, 2025 - TypeScript
A command-line tool that automates the calculation of out-of-hours (OOH) on-call compensation for engineering teams by fetching schedule data from the PagerDuty API
-
Updated
Nov 9, 2025 - TypeScript
A prometheus exporter exposing metrics for KafkaJS
-
Updated
Nov 12, 2025 - TypeScript
A prometheus exporter for node-postgres
-
Updated
Nov 12, 2025 - TypeScript
A prometheus exporter for pg-promise
-
Updated
Nov 12, 2025 - TypeScript
Easlity generate Prometheus Alerts and SLO Rules
-
Updated
Nov 7, 2025 - TypeScript
New Relic One quickstarts help accelerate your New Relic journey by providing immediate value for your specific use cases.
-
Updated
Nov 6, 2025 - TypeScript
InfraGPT is an AI SRE Copilot for the Cloud that provides infrastructure management agents through Slack integration. The system consists of multiple services that work together to deliver intelligent DevOps workflows.
-
Updated
Oct 28, 2025 - TypeScript
A prometheus exporter exposing metrics for the official MongoDB Node.js driver.
-
Updated
Nov 12, 2025 - TypeScript
Rubixkube AI - Site Reliability Intelligence platform with AI agents that detect, diagnose, and heal infrastructure issues automatically. Built with Next.js 15, featuring autonomous incident response, real-time monitoring, and human-in-the-loop guardrails for Kubernetes and cloud environments.
-
Updated
Oct 23, 2025 - TypeScript
End-to-end predictive reliability platform with anomaly detection, auto-remediation, and comprehensive observability for microservices
-
Updated
Oct 18, 2025 - TypeScript
-
Updated
Aug 18, 2025 - TypeScript
- Followers
- 142 followers
- Website
- github.com/topics/sre
- Wikipedia
- Wikipedia