Skip to content

ziwon/homelab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,208 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏠 Homelab

K3s ArgoCD NVIDIA License: MIT

Kubernetes platform for AI/ML workloads. K3s cluster with GitOps automation.

Hardware

Component Spec
CPU Intel Core Ultra 9 285K (8P + 16E cores)
GPU NVIDIA RTX 5080 16GB
RAM 128GB DDR5-5600
Storage XFS (/cache, /data)

Comparable to AWS g6.8xlarge (~$1,780/month).

Directory Structure

homelab/
β”œβ”€β”€ bootstrap/          # Phase 1: Helmfile cluster init
β”‚   β”œβ”€β”€ helmfile.yaml
β”‚   β”œβ”€β”€ Justfile
β”‚   └── releases/       # argocd, cilium, gpu-operator, infisical
β”œβ”€β”€ platform/           # Phase 2: Argo CD GitOps
β”‚   β”œβ”€β”€ gitops/         # ApplicationSet templates
β”‚   └── stacks/         # Kustomize overlays (00-core ~ 06-labs)
└── docs/               # Architecture docs

Architecture

Two-phase deployment:

Phase 1: Bootstrap (Helmfile)
β”œβ”€β”€ K3s cluster
β”œβ”€β”€ Cilium eBPF CNI
β”œβ”€β”€ GPU Operator
└── Infisical secrets

Phase 2: Platform (Argo CD)
β”œβ”€β”€ 00-core      # cert-manager, cloudflared, tailscale
β”œβ”€β”€ 01-platforms # argo-workflows, harbor, buildkit
β”œβ”€β”€ 02-o11y      # grafana, tempo, quickwit
β”œβ”€β”€ 03-data      # postgres, redis, clickhouse, redpanda
β”œβ”€β”€ 04-ml        # feast, mlflow, ray, qdrant
β”œβ”€β”€ 05-workloads # deepfx, mt5-trader
└── 06-labs      # jupyterhub, n8n, superset

Stack

Layer Tech
Cluster K3s
Network Cilium eBPF + Gateway API
GitOps Argo CD (App-of-Apps)
Secrets Infisical Operator
GPU NVIDIA MPS + Ray
Observability VictoriaMetrics, Tempo, Pyroscope, Quickwit, Grafana
Data PostgreSQL, Redis, ClickHouse, Redpanda
ML Feast, MLflow, Ray, Qdrant

GPU Partitioning

NVIDIA MPS splits RTX 5080 into 16 logical units (1GB each). Ray schedules workloads across hybrid cores:

  • P-cores (0-7): GPU tasks, training, inference
  • E-cores (8-23): Control plane, scheduling

See GPU Partitioning.

Observability

Logs:    Vector β†’ Redpanda β†’ Quickwit β†’ Grafana
Metrics: Prometheus β†’ VictoriaMetrics β†’ Grafana
Traces:  OTel SDK β†’ Alloy β†’ Tempo β†’ Grafana
APM:     Pyroscope (service map, trace-log correlation, error analysis)

Quick Start

Prerequisites

Bootstrap

cd bootstrap
just up
just nvidia-smi
just argocd-password

export KUBECONFIG=$HOME/.kube/config.home
kubectl get pods -A
open https://argocd.home.lab

Networking

Domain Method
*.home.lab Tailscale + CoreDNS
*.restack.tech Cloudflare Tunnel

AI Operations

GitHub Actions runners (ARC) with Claude integration:

  • Issue analysis and PR generation
  • Code review for security and performance
  • Manifest validation

Dynamic Skill Loading

Skills are loaded on-demand based on issue content keywords:

Skill Trigger Keywords
argocd-generator deploy, helm, chart, 배포
troubleshoot error, crash, pending, μ—λŸ¬
kubernetes-review review, validate, yaml, 리뷰
infisical-manager secret, env, μ‹œν¬λ¦Ώ

Only relevant skills (max 2) are loaded per request, reducing prompt size and token usage.

Multi-LLM Routing

  • Local (Qwen): Privacy-sensitive tasks
  • Cloud (Claude, Gemini): Complex reasoning

Docs

License

MIT

About

A modular Kubernetes homelab that evolves from GitOps automation to AI-driven orchestration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors