KIRAA AI
Apple Silicon Analytics TestBench
10 million ERP transactions. Five aggregation tasks. One takes 8+ minutes. The other finishes in seconds.
This project benchmarks real-world financial data processing — the kind of row-level aggregation and scoring that happens in production ERP systems — across two fundamentally different execution models.
| Python (CPU) | Swift + Metal (GPU) | |
|---|---|---|
| Execution | Row-by-row for loop through CPython interpreter |
Compiled Swift + Metal GPU compute shaders |
| Total time | ~457,000 ms (~7.6 min) | ~5,000 ms (~5s) |
| Throughput | ~22,000 rec/s | ~1,800,000 rec/s |
| Peak memory | ~2,000 MB | ~230 MB |
| Anomalies found | 20,000 | 20,000 (identical) |
Measured on Apple M4 Max with 10M rows. Python results from benchmark_python.py using df.iterrows(). Swift results from on-device Metal GPU benchmark.
Five common ERP aggregation tasks that run against 10 million synthetic transactions:
| # | Task | What It Does | Python Approach | Swift + Metal Approach |
|---|---|---|---|---|
| 1 | Total by cost centre | Group-by sum across 12 departments | for _, row in df.iterrows() with dict accumulator |
Direct array indexing, compiled loop |
| 2 | Top 10 suppliers | Group-by sum, sort, take top 10 | for _, row in df.iterrows() with dict accumulator |
Hash map accumulation, partial sort |
| 3 | Z-score anomaly detection | Per-row statistical scoring against department baselines | for _, row in df.iterrows() with dict lookup per row |
Metal GPU: 1,000+ parallel threads, each scoring one transaction |
| 4 | Plant x cost centre pivot | 4 plants x 12 centres = 48-cell cross-tab | for _, row in df.iterrows() with tuple-key dict |
2D array indexing, single compiled pass |
| 5 | Running total | Cumulative sum over all amounts | for _, row in df.iterrows() |
Single-pass compiled accumulation |
Each task processes all 10 million rows independently and is timed separately.
Swift compiles to native ARM64 machine code. Python interprets bytecode through an eval loop with dynamic type checks on every operation.
Swift iterates contiguous 24-byte structs in memory. Python's iterrows() creates a new pandas Series object per row — 10 million temporary objects for 10 million rows.
Metal dispatches 1,000+ GPU threads simultaneously for z-score scoring. Python processes one transaction at a time through the GIL.
Swift uses packed structs with shared-mode Metal buffers (zero-copy on Apple Silicon). Python DataFrames store each column as a separate numpy array with per-element object overhead.
Swift resolves all method calls at compile time. Python looks up every attribute, method, and operator at runtime through __getattr__ and descriptor protocols.
Both engines compute identical z-scores:
z = (amount - cost_centre_mean) / cost_centre_std
is_anomaly = |z| > 3.5
Python scores each row in a for loop:
for _, row in df.iterrows():
bl = baselines[row["cost_centre"]]
z = (row["amount"] - bl["mean"]) / bl["std"]
if abs(z) > 3.5:
anomaly_count += 1Metal dispatches one GPU thread per transaction:
kernel void scoreTransactions(..., uint gid [[ thread_position_in_grid ]]) {
Transaction txn = transactions[gid];
Baseline baseline = baselines[txn.cost_centre_id];
float z = (txn.amount - baseline.mean) / baseline.std_dev;
results[gid].is_anomaly = (fabs(z) > z_threshold) ? 1u : 0u;
}pip install pandas numpy tqdm
python3 benchmark_python.pyResults are saved to benchmark_results_python.json.
./run_benchmark.sh # Generate CSV, run Python benchmark, copy to app bundle- Open
TestBench.xcodeprojin Xcode - Select an iPhone or iPad simulator (Apple Silicon required)
- Build and run
- Python results load from bundled JSON
- Tap Run Benchmark to run Swift + Metal on-device
- Speedup multiplier appears with side-by-side comparison
./generate_data.sh # Just generate CSV
./run_benchmark.sh --skip-generate # Run Python on existing CSV
./run_benchmark.sh --verbose --rows 100000 # Quick verbose runThe benchmark ships as a native iOS app with five tabs:
| Tab | Purpose |
|---|---|
| Dashboard | Controls, speedup banner, engine result cards, throughput chart, local validation instructions |
| Analysis | Per-task timing comparison charts, speedup-per-task bars, time distribution |
| Explorer | Data exploration and detailed benchmark data views |
| Deep Dive | Detailed explanation of each task, side-by-side Python vs Swift approach, why Metal wins |
| Pipeline | Architecture diagrams, GPU data flow animation, stream visualizer |
- Kiraa-branded app icon + logo — the Kiraa mark appears on every tab header, and shares artwork with the Kiraa engine product family
- Responsive layout — iPhone (compact width) stacks multi-column content, scales hero typography, and resizes charts; iPad and macOS keep their wider layouts unchanged
- Auto-play — floating action button rotates through all tabs every 5 seconds (tap to start/stop)
- Ambient music —
kiraa-10m-music.mp3loops quietly in the background - Particle field — floating particles that react to benchmark state
- GPU wave — triple-layered sine waveform pulsing with processing intensity
- Data flow pipeline — animated dot stream showing transactions through GPU scoring
- Progress ring — circular gauge with gradient sweep during runs
- Stream visualizer — 500-cell dot grid simulating live transaction scanning
./run_benchmark.sh # Generate CSV -> Python benchmark -> copy to app bundleGenerates benchmark_data.csv (10M rows, seed=42), runs the Python benchmark, and copies both the CSV and results JSON into the iOS app bundle. Swift loads the same CSV at runtime.
| Column | Type | Description |
|---|---|---|
amount |
float | Transaction amount (normally distributed per cost centre) |
cost_centre_id |
int (0-11) | Maps to: RAW_MATERIALS, PACKAGING, LOGISTICS, ... |
txn_type_id |
int (0-6) | Maps to: PURCHASE_ORDER, INVOICE_MATCH, ... |
supplier_id |
int (1000-9998) | 4-digit supplier identifier |
plant_code_id |
int (0-3) | Maps to: GOLD_COAST, SYDNEY, MELBOURNE, BRISBANE |
xcodebuild test -project TestBench.xcodeproj -scheme TestBench \
-sdk iphonesimulator -destination 'platform=iOS Simulator,name=iPhone 17 Pro'| Test Class | Tests | Coverage |
|---|---|---|
| MetalStructAlignmentTests | 6 | GPU struct layout (24/8/8 bytes, field offsets) |
| ConfigTests | 6 | All constants match expected values |
| TimingTests | 2 | Monotonicity and positivity |
| SeededRNGTests | 5 | Determinism, Gaussian distribution, edge cases |
| DataGeneratorTests | 13 | Row count, ID ranges, anomaly rate, baselines, CSV loading |
| ZScoreTests | 7 | Known z-scores, threshold boundary, zero stdDev |
| BenchmarkRunnerTests | 13 | All 5 tasks, empty/single input, progress, throughput |
| BenchmarkModelsTests | 4 | JSON decode/encode, snake_case keys, nullable fields |
| BenchmarkViewModelTests | 8 | State machine, speedup calc, guard conditions |
TestBench/
├── benchmark_python.py # Python benchmark (standalone CLI, df.iterrows())
├── generate_data.py # Generate benchmark_data.csv (10M rows)
├── generate_data.sh # Shell wrapper (generates + copies to app)
├── run_benchmark.sh # Full pipeline: generate -> benchmark -> bundle
├── benchmark_results_python.json
│
├── TestBench/ # iOS app (SwiftUI + Metal)
│ ├── TestBenchApp.swift
│ ├── ContentView.swift
│ ├── Models/
│ │ ├── BenchmarkModels.swift # Transaction, Baseline, Config, results
│ │ ├── BenchmarkData.swift # Detailed chart data structures
│ │ ├── BenchmarkEngine.swift # High-resolution timing
│ │ └── AudioManager.swift # Ambient music playback
│ ├── Engines/
│ │ ├── MetalEngine.swift # Metal GPU compute + SwiftBenchmarkRunner
│ │ └── DataGenerator.swift # Data generation + CSV loading
│ ├── Shaders/
│ │ └── AnomalyScoring.metal # GPU z-score kernel
│ ├── ViewModels/
│ │ └── BenchmarkViewModel.swift # Orchestrates runs, loads Python JSON
│ ├── Views/
│ │ ├── MainTabView.swift # 5-tab container
│ │ ├── DashboardView.swift # Dashboard with controls + validation info
│ │ ├── TaskAnalysisView.swift # Per-task timing charts
│ │ ├── DataExplorerView.swift # Data exploration
│ │ ├── DeepDiveView.swift # Task explanations + why Metal wins
│ │ ├── PipelineView.swift # Architecture + animations
│ │ ├── EngineCardView.swift # Per-engine stats card
│ │ ├── SpeedupBannerView.swift # Hero speedup multiplier
│ │ ├── ThroughputChartView.swift
│ │ ├── BenchmarkControlView.swift
│ │ ├── ChartHelpers.swift # Shared chart styling
│ │ ├── ParticleFieldView.swift
│ │ ├── GPUWaveView.swift
│ │ ├── DataFlowView.swift
│ │ ├── ProgressRingView.swift
│ │ ├── ArchitectureGridView.swift
│ │ └── StreamVisualizerView.swift
│ ├── Theme/
│ │ └── AppTheme.swift # Pink/purple neon palette + fonts
│ └── Resources/
│ ├── benchmark_results_python.json
│ └── kiraa-10m-music.mp3
│
├── TestBenchTests/ # Unit tests
├── TestBenchUITests/ # UI tests
├── reference/ # Original reference implementations
└── TestBench.xcodeproj/
| Component | Version |
|---|---|
| Python | 3.10+ |
| pandas | 2.0+ |
| numpy | 1.24+ |
| Xcode | 26+ |
| iOS | 26+ |
| Hardware | Apple Silicon (M1/M2/M3/M4 or A-series) |
Kiraa AI Pty Ltd · Gold Coast, QLD, Australia
Making financial intelligence fast.