SRE
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Here are 1,624 public repositories matching this topic...
Site Reliability Engineering bootcamp materials
-
Updated
Sep 2, 2025 - Java
Podger is a lightweight graphical debugger for Kubernetes, fully faithful to kubectl. It runs as a single binary and provides a modern web interface to visualize clusters, nodes, pods, and containers, seeking a fast, visual way to debug Kubernetes clusters.
-
Updated
Jan 10, 2026 - TypeScript
OpenAI compatible server for vLLM
-
Updated
Oct 6, 2025
Prepares a node to become a route reflector.
-
Updated
Nov 7, 2019 - Shell
A better way to marshal and unmarshal YAML in Golang
-
Updated
Sep 20, 2023 - Go
Practical Bash and Python scripts for day-to-day operations: AWS/cloud tasks, network diagnostics, system health checks, security audits, backups, and reliability workflows. Built from production experience and evolving toward Platform Engineering, FinOps, and AIOps.
-
Updated
Feb 18, 2026 - Shell
Kubernetes GitOps source of truth for ShopStack. Argo CD App-of-Apps pattern with Kustomize overlays manages deployments to a local Talos Linux cluster. CI pipeline auto-bumps image tags via PR. Part of the ShopStack SRE portfolio project.
-
Updated
Feb 19, 2026 - Makefile
A blazing-fast, AI-powered TUI dashboard for Kubernetes and SRE. Monitor alerts, debug RBAC, and troubleshoot incidents directly in your terminal.
-
Updated
Feb 27, 2026 - Rust
OpenClaw Agentic AI Engineering Best Practices: governance, playbooks, CI gates, and release discipline.
-
Updated
Mar 27, 2026 - Astro
🌳 A sustainable Terraform Package to manage all of things on Terraform Enterprise (Terraform Cloud)
-
Updated
Mar 13, 2026 - HCL
Automate infrastructure management with observability
-
Updated
Feb 19, 2024 - Go
Building a resilient backend project from scratch with Dockerization and CI/CD, incorporating Site Reliability Engineering (SRE) principles. Demonstrating best practices in modular architecture, logging, database migration, and CI/CD pipelines for automated testing, deployment.
-
Updated
Mar 23, 2024 - Go
Kubectl addon for connecting Kubernetes clusters to ranching.farm - an AI-powered Kubernetes management platform. Simplify cluster operations and get intelligent assistance for common tasks.
-
Updated
Jan 30, 2025 - Go
-
Updated
Nov 24, 2025 - Python
Career-prep-2025: Hands-on projects and study material in Java, Data Structures & Algorithms, Site Reliability Engineering, Cloud (AWS/Azure/GCP/Oracle), and System Design. Documenting my journey toward technical mastery and product-based company readiness.
-
Updated
Oct 15, 2025 - Java
- Followers
- 148 followers
- Website
- github.com/topics/sre
- Wikipedia
- Wikipedia