Skip to content

feat: ARM64 multi-arch builds (DGX Spark, Snapdragon X Elite, Apple Silicon, Jetson, Pi 5) #115

@itigges22

Description

@itigges22

Why

Every current Dockerfile targets x86_64 only. As ARM64 hardware proliferates the install matrix needs to follow... NVIDIA DGX Spark (GB10 Grace + Blackwell, ARM CPU + Blackwell GPU, shipping Q1-Q2 2026), Snapdragon X Elite laptops (Adreno GPU + Hexagon NPU), Apple Silicon containers, Jetson Orin/Thor, Raspberry Pi 5, AWS Graviton... none of these can even pull our images today.

Pairs with PC-114 (#114). Vulkan covers the GPU diversity axis, ARM64 covers the CPU arch axis... together they unlock basically every modern compute target outside the macOS-native install path (#32).

What changes

  1. .github/workflows/build-images.yml gets docker buildx + QEMU setup, builds for linux/amd64,linux/arm64, pushes multi-arch GHCR manifests. Users keep pulling the same tag and get the right arch automatically.

  2. Per-Dockerfile arm64 audit:

    • Dockerfile.v31 (CUDA): NVIDIA ships nvidia/cuda:*-ubuntu22.04 arm64 variants for CUDA 12.x (GB10 is on this line). Need to verify the exact tag we pin has arm64 + that the patched llama.cpp build works on aarch64 CUDA.
    • Dockerfile.rocm: AMD doesn't ship arm64 ROCm at all currently. Skip arm64 for this image until/unless that changes.
    • Dockerfile.vulkan (feat: Vulkan universal backend (one image, covers NVIDIA + AMD + Intel + Apple + CPU) #114): ubuntu:22.04 + mesa-vulkan-drivers, both arm64-native. Should Just Work but needs validation.
    • proxy/Dockerfile: Go binary, just GOOS=linux GOARCH=arm64 in the buildx target.
    • geometric-lens/Dockerfile, v3-service/Dockerfile, sandbox/Dockerfile: Python + apt, all arm64-native on Ubuntu.
  3. tier.py: detect_gpu() is mostly arch-agnostic but the nvidia-smi/rocm-smi output parsers might trip on the GB10's Grace+Blackwell topology... verify on actual hardware when available.

  4. atlas doctor: docker-pull tests should add --platform when a specific service has only amd64 (so users on arm64 hosts get a clear 'no arm64 build for X' message instead of pulling the wrong arch and crashing).

  5. Docs: SETUP.md gets an arm64 matrix table (which Dockerfile supports which arch per release).

What this unlocks

Hardware Today After this
NVIDIA DGX Spark (GB10, arm64) can't pull images CUDA install
Snapdragon X Elite laptop can't pull images Vulkan install via Adreno
Apple Silicon (Docker route) qemu CPU emulation (very slow) native arm64 containers (still MoltenVK for GPU but no CPU emu)
Jetson Orin / Thor can't pull images CUDA install
Pi 5 (8GB) can't pull images Vulkan via V3DV (slow but boots)
Ampere/Graviton cloud can't pull images CPU-only Vulkan lavapipe

Hardware testing matrix

Each combo needs a tester...

  • DGX Spark + CUDA: waiting on hardware (NVIDIA Q1-Q2 2026)
  • Apple Silicon + Vulkan-in-Docker: need a Mac dev
  • Snapdragon X Elite + Vulkan: need a Snapdragon laptop owner
  • Jetson Orin + CUDA: any Jetson user willing
  • Pi 5 + Vulkan: cheap, can probably pick one up

Out of scope

  • Per-arch performance tuning (one ticket per arch if perf is bad)
  • Apple Silicon NATIVE install path (feat: Apple Silicon (Metal) deployment guide #32) is the no-Docker fast path, tracked separately
  • Windows ARM64 is a separate Docker-on-Windows story

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions