Building AI infrastructure for the local LLM era.
I work at the intersection of multi-agent systems, local inference, and AI tooling. Most of my work involves running serious hardware close to home and figuring out how far you can push it.
Local inference cluster — DGX Spark (GB10 Grace Blackwell, 128GB), RTX 5090, and Mac Studio M1 Ultra running as a heterogeneous inference pool. I benchmark homogeneous vs heterogeneous GPU setups to understand the real performance cost of mixing architectures.
AI ATC system — Multi-agent system that controls aircraft in a flight simulator. Stack: local LLM + V-JEPA 2 visual embeddings + MCP protocol + SvelteKit observability dashboard. It sees the radar, understands the situation, and issues instructions.
AgentSkills — Open-source skills for AI coding agents. The heatmap art skill lets agents paint pixel art on GitHub contribution graphs by managing backdated git commits.
Python · vLLM · llama.cpp · MLX · FastAPI · SvelteKit · PyTorch · MCP · OpenRouter · Playwright · Ray
Models I've run locally: Qwen3, Llama 3, V-JEPA 2, various fine-tunes
Hardware: DGX Spark GB10 × 2 (200Gbps RoCE), RTX 5090, Apple M1 Ultra 128GB
| Project | What it is |
|---|---|
| vjepa-server | Inference API for Meta's V-JEPA 2 video model — CUDA + MPS support |
| ai-atc-demo | Real-time observability dashboard for a multi-agent ATC system |
| github-heatmap-art | AgentSkill: paint pixel art on GitHub contribution graphs |
Previously: founded and sold a company (2025). Before that: Stanford GSB. I've been building software for 15+ years, but the last year has been the most technically interesting — local AI infrastructure is genuinely uncharted territory and I'm happy to be lost in it.
Open to conversations about AI infra, agent systems, and anything involving running big models on weird hardware.