Cloud OS v5 — Autonomous CQRS + Event-Driven Cloud Operating System ## Architecture Event-driven CQRS system with Kafka as single source of truth, AI-powered SRE automation, multi-region active-active deployment, and continuous chaos validation. ## Quick Start ### Local Development ```bash npm install docker compose up ``` ### Production Deployment ```bash cd infra/terraform terraform init terraform apply aws eks update-kubeconfig --name cloud-os-v5 helm upgrade --install cloud-os ./helm ``` ## Core Modules | Service | Description | Port | |---------|-------------|------| | ingestion-service | Event validation + Kafka producer | 4000 | | cqrs-query-service | Read API (<10ms kernel) | 4001 | | replay-workers | Domain-specific state projection | — | | snapshot-engine | State compaction + checksums | — | | ai-sre-controller | Autonomous SRE decision engine | 4002 | | chaos-engine | Failure injection system | — | ## Architecture Flow ``` Client → Ingestion → Kafka → Replay Workers → Snapshot → CQRS Query → Response ↑ AI SRE Controller ↓ GitOps / K8s / Terraform ``` ## SLO Targets - 99.9% ingestion success rate - <200ms CQRS query latency (p95) - <1s snapshot lag - <0.1% data loss tolerance - Zero silent routing mismatch - <5s auto rollback on SLO breach ## SRE Capabilities - Automatic rollback on error rate spike (>2%) - Auto-scaling based on Kafka lag - Snapshot integrity validation (SHA-256) - Worker crash recovery via checkpoints - Scheduled chaos testing (every 6 hours) - Full system rebuild from Kafka ## Topics See infra/kafka/topic-config.ts for full topic topology. ## License Proprietary — Kairon Polymarket Intelligence Dashboard