A comprehensive, production-ready architecture for transforming MobilityCorp's multi-modal transportation platform with AI-enabled intelligence
- Executive Summary
- Meet the Personas
- Problem Statement
- Solution Overview
- Architecture Highlights
- Repository Structure
- Key Architectural Decisions
- Technology Stack
- Documentation
- Transparency & Best Practices
🎭 Meet the Personas
Understanding the humans behind the architecture
Our solution addresses real problems for real people. Meet the stakeholders who guide our decisions:
- Sarah Chen (CPO): Needs to increase customer retention from 20% to 55% within 18 months
- Marcus Weber (VP Fleet Operations): Must reduce battery swap costs by 50%-60% and improve fleet utilization
- David Park (CTO/CISO): Requires 99.95% uptime with cost-effective scaling and zero-trust security compliance
- Emma Thompson (Commuter): Wants reliable scooter availability for her daily 8:15 AM commute
- Alex Kumar (Tourist): Needs conversational AI to discover vehicles and explore Barcelona
- Lisa Müller (Family User): Requires quality assurance—clean vehicles, reliable battery range
- Nina Petersen (Support Agent): Wants automated dispute resolution with photo evidence
These personas inform every architectural decision, ensuring we solve business problems, not just technical ones.
See personas in action:
- Emma's predictive rebalancing journey
- Alex's conversational AI experience
- Marcus's AI-optimized morning routine
- Lisa's quality assurance workflow
- David's security incident response
MobilityCorp operates a multi-modal last-mile transportation platform across EU cities, providing electric scooters, eBikes, cars, and vans. This architecture addresses three critical business challenges:
- Vehicle Availability Crisis - 15-25% of potential bookings lost due to demand-supply mismatch
- Battery Management Inefficiency - Reactive maintenance and manual relocation operations create significant operational overhead
- Low Customer Retention - Most users rely on ad-hoc trips rather than regular commutes
An AI-enabled microservices architecture featuring:
- 🤖 Predictive Demand Forecasting - ML-powered zone-level demand prediction
- ⚡ Dynamic Pricing & Relocation Incentives - AI-driven fleet rebalancing
- 🔋 Predictive Maintenance - Proactive battery and component failure detection
- 🎨 Computer Vision - Automated damage detection and return verification
- 💬 Conversational AI Assistant - Multi-modal user support and guidance
- 📊 Real-time Telemetry - Edge-cloud hybrid processing for 50K+ vehicles
- Revenue: +20-30% from improved vehicle availability
- Costs: -40-50% reduction in manual relocation expenses
- Retention: +35% increase in daily active users
- Uptime: 99.95% system availability with multi-region deployment
MobilityCorp operates across multiple EU cities with a fixed fleet size per country. The company faces critical operational and customer experience challenges that threaten growth and competitive positioning.
- Impact: 15-25% of potential bookings unfulfilled
- Root Cause: Vehicles in wrong locations at wrong times
- Cost: Lost revenue + poor customer experience leading to churn
- Impact: Manual battery swaps cost approx €10-15 each
- Root Cause: Reactive vs. predictive maintenance approach
- Cost: High operational overhead + vehicle downtime
- Impact: 80%+ users are ad-hoc, not daily commuters
- Root Cause: Unreliable service reduces trust in platform
- Cost: Low Customer Lifetime Value (CLV)
Detailed problem analysis: PROBLEM_STATEMENTS/PROBLEM_STATEMENT.md
┌─────────────────────────────────────────────────────────────────┐
│ AI-ENABLED PLATFORM │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 🔮 Demand Forecasting 🎯 Dynamic Pricing │
│ Zone-level predictions Real-time optimization │
│ Multi-factor analysis Relocation incentives │
│ │
│ 🔧 Predictive Maintenance 🤖 Conversational AI │
│ Component failure Multi-modal assistant │
│ Battery health Voice/text interface │
│ │
│ 📸 Computer Vision 📊 Real-time Analytics │
│ Damage detection Fleet monitoring │
│ Return verification Performance metrics │
│ │
└─────────────────────────────────────────────────────────────────┘
- Microservices Architecture - Independently scalable, fault-isolated services
- Event-Driven Communication - Kafka-based asynchronous messaging
- Edge-Cloud Hybrid - Low-latency processing where needed
- MLOps Pipeline - Automated model lifecycle management
- Multi-Region Deployment - Geographic redundancy and data residency
High-level view of system interactions with users, vehicles, and external services.
Microservices, databases, message queues, and AI/ML components.
Deep dive into machine learning pipelines and model serving.
- ✅ Domain-Driven Design with bounded contexts
- ✅ CQRS for read/write optimization
- ✅ Event Sourcing for audit trails
- ✅ Circuit Breakers for fault tolerance
- ✅ API Gateway for unified access
- ✅ Service Mesh for observability
kata-na/
├── README.md # This file - Start here
├── PERSONAS.md # 🆕 Stakeholder personas (Sarah, Marcus, Emma...)
├── EVENT_STORMING.md # 🆕 Domain events & bounded contexts
├── PHASED_IMPLEMENTATION.md # 🆕 Migration plan with ROI & timelines
├── GLOSSARY.md # Domain terminology
├── COST_ANALYSIS.md # Infrastructure cost estimates (percentage-based)
│
├── ADR/ # Architecture Decision Records (16)
│ ├── ADR_01_microservices_architecture.md
│ ├── ADR_02_AI_DRIVEN_RELOCATION_INCENTIVES.md
│ ├── ADR_03_Vehicle_Telemetry.md
│ ├── ADR_04_EXTERNAL_APIS.md
│ ├── ADR_05_Orchestrator.md
│ ├── ADR_06_EVENT_DRIVEN_ARCHITECTURE.md
│ ├── ADR_07_Tracing_and_Logging.md
│ ├── ADR_08_SCHEDULER_FRAMEWORK.md
│ ├── ADR_09_MULTI_REGION.md
│ ├── ADR_10_Monitoring_and_Metrics.md
│ ├── ADR_11_IoT_Enabled_Vehicles.md
│ ├── ADR_12_CONVERSATIONAL_UX_AND_AI_ASSISTANT.md
│ ├── ADR_13_DATA_COMPLIANT.md
│ ├── ADR_14_MLOps_Pipeline.md
│ ├── ADR_15_Data_Lakehouse_Strategy.md
│ └── ADR_16_NOTIFICATION_SERVICE.md
│
├── ARCHITECTURAL_DIAGRAMS/ # Visual architecture (C4 model)
│ ├── C1_System_Context.md
│ ├── C2_Container.md
│ ├── C3_Component_AIML.md
│ ├── Data_Flow_AIPipeline.md
│ ├── Deployment_Multi_Region.md
│ └── Hybrid_Telemetry_Architecture.md
│
├── HLD/ # High-Level Design
│ ├── README.md
│ ├── scenarios/
│ │ ├── booking_workflow.md
│ │ ├── demand_forecasting.md
│ │ ├── dynamic_pricing.md
│ │ ├── predictive_maintenance.md
│ │ └── telemetry_processing.md
│ └── data_architecture/
│ ├── medallion_layers.md
│ └── feature_store.md
│
├── FUNCTIONAL_REQUIREMENTS/
│ └── FUNCTIONAL_REQUIREMENTS.md
│
├── NON_FUNCTIONAL_REQUIREMENTS/
│ └── NON_FUNCTIONAL_REQUIREMENTS.md
│
├── PROBLEM_STATEMENTS/
│ └── PROBLEM_STATEMENT.md
│
├── WORKFLOWS/
│ ├── CUSTOMER_WORKFLOWS.md
│ └── STAFF_WORKFLOWS.md
│
├── FITNESS_FUNCTIONS/
│ └── FITNESS_FUNCTIONS.md # Architectural metrics
│
├── THREAT_MODEL/
│ └── THREAT_MODEL.md # Security analysis
│
└── TESTING_APPROACHES/
└── TESTING_APPROACHES.md
-
Understand the Problem Start with the Problem Statement to understand business context.
-
Meet the People Read PERSONAS.md to understand who we're building for (Sarah, Marcus, Emma, etc.)
- Review Business Value
- PHASED_IMPLEMENTATION.md - €30.5M NPV, 8.7-month payback
- Focus on Financial Summary and Persona Success Stories
-
Review Architecture Decisions Read key ADRs in order:
-
Explore Architecture
- High-Level: System Context Diagram
- Detailed: Container Architecture
- AI/ML: Component Details
- Deep Dive into Scenarios
- Understand AI Usage
- ADR-02: Dynamic Pricing - ML models and demand forecasting
- ADR-14: MLOps Pipeline - Training and deployment
- Review Domain Modeling
- EVENT_STORMING.md - Domain events and bounded contexts
- Review Non-Functional Aspects
- Security Threats - Comprehensive threat analysis
- Fitness Functions - Success metrics
- Testing Strategy
Why: Best ML services (SageMaker), global infrastructure, cost-effective for EU deployment
Why: Independent scaling, fault isolation, technology flexibility
Why: Reduce manual relocation costs by 40-50%, improve fleet utilization
Why: Loose coupling, scalability, real-time processing
Why: Low latency for critical operations, cost optimization
All ADRs: ADR Directory
- Provider: AWS (Amazon Web Services)
- Regions: EU-Central-1 (Frankfurt), EU-West-1 (Ireland)
- Justification: Best ML services, GDPR compliance, cost-effective
- Training: AWS SageMaker
- Serving: SageMaker Endpoints
- Feature Store: SageMaker Feature Store
- MLOps: SageMaker Pipelines
- Models: LightGBM, XGBoost (custom training)
- Agentic AI: LangChain on AWS Bedrock (Claude 3.5)
- Streaming: Apache Kafka (MSK - Managed Streaming for Kafka)
- Data Lake: Amazon S3 with Medallion Architecture
- Bronze: Raw data (Parquet)
- Silver: Cleaned/validated (Delta Lake)
- Gold: Business aggregations (Delta Lake)
- Data Warehouse: S3 + ETL
- Real-time DB: DynamoDB
- Relational DB: Amazon PostgreSQL
- Cache: Amazon ElastiCache (Redis)
- Time-Series: Amazon TimescaleDB
- Containers: Amazon EKS
- API Gateway: Amazon API Gateway
- Workflow: Apache Airflow + Apache Beam + Temporal
- IoT Core: AWS IoT Core (MQTT)
- Edge Runtime: AWS IoT runtime
- Edge ML: AWS IoT ML Inference
- Device Management: AWS IoT Device Management
- Metrics: VictoriaMetrics + Prometheus + CloudWatch
- Tracing: OpenTelemetry
- Logging: CloudWatch Logs + OpenSearch
- Monitoring: Grafana
- Alerting: PagerDuty
- Identity: AWS IAM (zero-trust) + AWS SSO (MFA)
- Secrets: AWS Secrets Manager (auto-rotation)
- Encryption: AWS KMS (AES-256 at rest), TLS 1.3 (in transit)
- Network: VPC, Security Groups, AWS WAF, Shield
- Threat Detection: AWS GuardDuty, Security Hub, Macie
- IoT Security: X.509 certificates, mTLS, IoT Device Defender
- Compliance: AWS Config, CloudTrail (GDPR, PCI-DSS, ISO 27001)
- Mobile: React Native (iOS/Android)
- Web: Next.js + React
- Maps: Mapbox GL JS
- Source Control: GitHub
- CI/CD: GitHub Actions
- Infrastructure: Terraform
- Configuration: AWS Systems Manager Parameter Store
- ADRs (16): ADR Directory - All architectural decisions with rationale
- Diagrams: ARCHITECTURAL_DIAGRAMS - C4 model diagrams
- HLD: HLD Directory - Detailed scenario walkthroughs
- Glossary: GLOSSARY.md - Domain terminology
- Functional: FUNCTIONAL_REQUIREMENTS.md
- Non-Functional: NON_FUNCTIONAL_REQUIREMENTS.md
- Problem: PROBLEM_STATEMENT.md
- Security: THREAT_MODEL.md - Comprehensive threat analysis
- Testing: TESTING_APPROACHES.md
- Metrics: FITNESS_FUNCTIONS.md
- Deployment: PHASED_IMPLEMENTATION.md - Migration & feature rollout strategy
- Cost: COST_ANALYSIS.md - TCO estimates
- Customer: CUSTOMER_WORKFLOWS.md
- Staff: STAFF_WORKFLOWS.md
- Read Problem Statement
- Review System Context
- Study key ADRs (01, 14, 15, 16)
- Explore HLD Scenarios
- Understand Container Architecture
- Review Technology Stack
- Read service-specific ADRs
- Check Workflows
- Review Business Impact
- Study Functional Requirements
- Understand Customer Workflows
- Review Cost Analysis
- Study Threat Model
- Review security ADRs
- Check compliance requirements
- Audit Security Stack
- Review Deployment Architecture
- Study Phased Implementation
- Check Observability stack
- Review Fitness Functions
- Only solution with complete threat model
- STRIDE analysis across all components
- Asset-based threat identification
- Mitigation strategies documented
- Clear fitness functions with targets
- Business and technical KPIs
- Architectural drift prevention
- Measurable outcomes
- Named technologies, not generic references
- Version numbers and deployment patterns
- Complete MLOps pipeline
- Cost estimates and TCO
- Low-latency critical operations on edge
- Cost-optimized cloud processing
- Seamless edge-cloud orchestration
- OTA model updates
- Clear problem-to-solution traceability
- ROI calculations
- Risk mitigation strategies
- Phased implementation plan
- API Uptime: > 99.95%
- P95 API Latency: < 200ms
- Vehicle Unlock Time: < 3 seconds
- Concurrent Users: 100,000+
- Vehicles Supported: 50,000+
- Predictive Maintenance Lead Time: > 7 days
- Rebalancing Completion Rate: > 90%
- MTTR: < 24 hours
- Vehicle Utilization: +15% YoY
- Demand Forecast MAPE: < 15%
- Damage Detection Accuracy: > 95%
- Price Optimization Uplift: +10-15%
Full metrics: FITNESS_FUNCTIONS.md
- ✅ Zero-Trust: All access authenticated + authorized (no implicit trust)
- ✅ Encryption Everywhere: AES-256 at rest, TLS 1.3 in transit, mTLS for IoT
- ✅ GDPR Compliant: Data residency (EU-only), right-to-erasure, consent management
- ✅ PCI-DSS Level 1: Payment tokenization via Stripe, dedicated VPC isolation
- ✅ ISO 27001: Security controls, risk assessments, incident response playbooks
- ✅ IoT Security: X.509 certificates for 50K vehicles, Device Defender monitoring
- ✅ Automated Compliance: AWS Config, Macie, CloudTrail enforce policies
- ✅ Threat Detection: GuardDuty (<24hr MTTD), Security Hub (unified dashboard)
Security Cost: $24,500/month (6% of total spend, industry standard 5-8%)
Security ROI: €295K/year positive ROI (breach prevention - security cost)
Details: THREAT_MODEL.md
Phased approach with continuous value delivery:
| Phase | Duration | Investment | Annual Savings | Key Outcome |
|---|---|---|---|---|
| Phase 1: Foundation | 4 months | €206K | €460K | Observability & data foundation |
| Phase 2: AI/ML | 5 months | €420K | €10.3M | Demand forecasting, dynamic pricing |
| Phase 3: Microservices | 5 months | €725K | €1.875M | Independent scaling, zero downtime |
| Phase 4: Conversational AI | 2 months | €126K | €776K | NPS +20 points, support automation |
| Phase 5: Multi-Region | 2 months | €216K | Expansion enabler | 99.95% uptime, rapid city launch |
Payback Period: 8.7 months | 3-Year NPV: €12.4M | IRR: 187%
Detailed migration & rollout plan: PHASED_IMPLEMENTATION.md
Where AI/ML Was Used:
- ✅ Traditional ML: Demand forecasting, dynamic pricing, predictive maintenance (deterministic, explainable)
- ✅ Gen AI (LLMs): Conversational UI only (Claude 3.5 Sonnet for natural dialogue)
- ❌ Gen AI NOT Used: Pricing, payments, vehicle control (safety/financial critical)
Why Not Gen AI Everywhere?
"Considering the cost & non-deterministic nature, we would NOT use Gen AI for financially or safety-critical operations. Gen AI's hallucination risk makes it unsuitable for pricing decisions that directly impact revenue and customer trust."
Cost Comparison:
- Gen AI for pricing: €80K/month
- Traditional ML: €3K/month
- Savings: €924K/year by choosing the right tool
Event Storming workshops guided our architecture:
- 3 sessions with Sarah (CPO), Marcus (VP Ops), David (CTO), and product/engineering teams
- Identified 15+ domain events, 8 bounded contexts, and key business policies
- Informed microservices boundaries and event-driven patterns
Full event storming results: EVENT_STORMING.md
Strategic "Buy" Decisions (Opportunity Costs):
| Function | Build Cost | Buy Cost | Decision | Rationale |
|---|---|---|---|---|
| Payment Processing | €200K + PCI-DSS compliance | €1K/month (Stripe 2.9%) | ✅ Buy | Risk transfer worth the fee |
| Kafka Management | 2 FTE (€16K/month) | €6K/month (MSK) | ✅ Buy | Focus eng team on differentiation |
| LLM Hosting | €50K setup + 1 FTE | €1K/month (Claude API) | ✅ Buy | Reliability > cost savings |
Full analysis: PHASED_IMPLEMENTATION.md - Opportunity Costs
Every ADR includes:
- ✅ Alternatives evaluated (e.g., monolith vs microservices vs service-based)
- ✅ Trade-offs documented (pros/cons with business context)
- ✅ Rejection rationale (why alternatives don't meet requirements)
Example: ADR-01 (Microservices) compares against monolith and service-based architecture with specific cost/scaling implications.
- Compute: ~$45,000 (EKS + SageMaker)
- Data Storage: ~$12,000 (S3 + RDS + DynamoDB)
- AI/ML: ~$28,000 (SageMaker training + inference)
- Networking: ~$8,000 (Data transfer + API Gateway)
- Other: ~$7,000 (Monitoring, security, etc.)
Total: ~$100,000/month for 50K vehicles across EU
Detailed breakdown: COST_ANALYSIS.md
This architecture is designed for O'Reilly Architectural Katas Q4 2025. For questions or discussions:
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: kata-na@googlegroups.com
This architectural documentation is provided under MIT License for educational purposes.
Start Here:
- PERSONAS.md - Meet Sarah, Marcus, David, Emma, and the people driving our decisions
- EVENT_STORMING.md - Domain events, bounded contexts, and business rules
- PHASED_IMPLEMENTATION.md - Migration plan with timelines, costs, and ROI
- ADR Directory - All architectural decisions with alternatives and trade-offs
Architecture is: "One foot in business, one foot in technology" - O'Reilly Architectural Katas Judges
- O'Reilly Media - For hosting the Architectural Katas
- Competing Teams - For excellent solutions that raised the bar
- Reviewers - For feedback and insights