This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate, and GitHub Actions.
My Kubernetes cluster is deployed with Talos. This is a semi-hyper-converged cluster, workloads and block storage share the same available resources on my nodes while I have a separate server with ZFS for NFS/SMB shares, bulk file storage and backups.
If you want to try and follow along with how I built my cluster please check out the amazing template here:
- Networking: cilium provides eBPF-based networking replacing kube-proxy, Envoy Gateway implements Gateway API for routing, cloudflared secures ingress via Cloudflare tunnels, and external-dns + external-dns-unifi-webhook keeps DNS records in sync automatically.
- Security & Secrets: cert-manager automates SSL/TLS certificates. external-secrets with 1Password Connect injects secrets into Kubernetes, and sops manages encrypted secrets in Git.
- Storage & Data Protection: rook provides distributed block storage via Ceph, volsync handles PVC backups and recovery, and spegel runs a stateless cluster-local OCI image mirror.
- Automation: actions-runner-controller runs self-hosted GitHub Actions runners in the cluster.
My cluster runs on 3x Minisforum MS-01 mini PCs (Intel i9-12900H) provisioned with Talos. The nodes utilize Thunderbolt ring networking for high-speed Ceph storage traffic.
Talos Extensions:
siderolabs/i915- Intel GPU microcode binaries and drivers- `siderolabs/intel-ucode - Intel microcode binaries
siderolabs/mei- Intel Management Engine drivers kernel modulessiderolabs/thunderbolt- Thunderbolt/USB4 drivers kernel modulessiderolabs/util-linux-tools- Linux Utilities
Talos extraKernelArgs - Less Security (home-lab its fine) = Greater performance gains
intel_iommu=on- Enables Intel VT-d hardware virtualization supportiommu=pt- Passthrough mode, better performance for devicesmitigations=off- Disables CPU vulnerability patcheselinux=0- Disables SELinux mandatory access control systemapparmor=0- Disables AppArmor mandatory access control systeminit_on_alloc=0- Skips zeroing memory when allocating itinit_on_free=0- Skips zeroing memory when freeing itsecurity=none- Disables all Linux Security Modules entirelytalos.auditd.disabled=1- Disables Talos audit logging daemon service
Flux watches the cluster in my kubernetes folder and makes changes based on the state of this Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.
This is a high-level look at how Flux deploys applications with dependencies. A HelmRelease can depend on other HelmReleases, a Kustomization can depend on other Kustomizations, or an app can depend on both. Below shows that atuin won't deploy until rook-ceph-cluster is healthy.
This Git repository contains the following directories:
π kubernetes
βββ π apps # applications
βββ π bootstrap # bootstrap procedures
βββ π flux # core flux configuration
βββ π components # re-useable kustomize components
π bootstrap
βββ π templates # Makejinja templates (source files)Click to expand network architecture
- Split-Horizon DNS: LAN clients resolve to the internal gateway (10.90.3.202), external clients go through Cloudflare
- Gateway API: Envoy Gateway provides
externalandinternalgateways for traffic routing - Thunderbolt Ring: Full-mesh ~26 Gbps connectivity between nodes for Ceph replication traffic
For details on setting up Thunderbolt networking, see my guide.
| Service | Use | Cost (NZD) |
|---|---|---|
| Cloudflare | DNS, Tunnel, CDN, R2(Volsync), Domain renewel | ~$25/yr |
| Backblaze | B2(Volsync) | ~$30/yr |
| 1Password | Secrets via Connect (Family plan with 5 seats) | ~$135/yr |
| UptimeRobot | Status monitoring | ~$80/yr |
| GitHub | Code hosting, Actions | Free |
| Pushover | Push notifications from Alert Manger and UptimeRobot | $4.99 USD One Time |
| Migadu | SMTP for services | ~$40/yr (Micro plan) |
| Node | CPU | RAM | OS Disk | Ceph Disk | OS | Purpose |
|---|---|---|---|---|---|---|
| stanton-01 | i9-12900H (14c/20t) | 96GB | 1TB Samsung 990 Pro | 1.92TB Samsung PM9A3 U.2 | Talos | Control + Worker |
| stanton-02 | i9-12900H (14c/20t) | 96GB | 1TB Samsung 990 Pro | 1.92TB Samsung PM9A3 U.2 | Talos | Control + Worker |
| stanton-03 | i9-12900H (14c/20t) | 96GB | 1TB Samsung 990 Pro | 1.92TB Samsung PM9A3 U.2 | Talos | Control + Worker |
Totals: 42 cores / 60 threads | 288GB RAM | ~5.76TB Ceph
Nodes named after the Stanton system in Star Citizen. See you in the 'verse, citizen! o7
β NZVengeance
| Device | Count | Storage | RAM | OS | Purpose |
|---|---|---|---|---|---|
| Dell PowerEdge R730 | 1 | 4x Mirror vdevs (~18TB) + NVMe cache | 128GB | Proxmox/TrueNAS | NAS/Backup |
| Unifi Dream Machine Pro | 1 | - | - | - | Router/Firewall |
| Unifi US-24-250W | 1 | - | - | - | PoE Switch |
| Unifi US-48 | 1 | - | - | - | Primary Switch |
| Unifi U6 Lite | 3 | - | - | - | WiFi APs |
| JetKVM + DC Power Module | 3 | - | - | - | Remote KVM |
| Eaton 5S 850 | 2 | - | - | - | UPS |
Thanks to all the people who donate their time in the Home Operations Discord community for all of their support. Special shout out to my friend and colleague Kevin Durbin.
Check out my blog for more homelab content.
Also check out my Guides
See my awful commit history
See LICENSE