Skip to content

feat: copy-on-write sandbox forking (storage backends + memory fork)#7

Merged
erans merged 29 commits into
mainfrom
feat/cow-sandbox-fork
Apr 25, 2026
Merged

feat: copy-on-write sandbox forking (storage backends + memory fork)#7
erans merged 29 commits into
mainfrom
feat/cow-sandbox-fork

Conversation

@erans
Copy link
Copy Markdown
Owner

@erans erans commented Apr 25, 2026

Summary

  • Adds internal/storage package with pluggable CoW backends (CopyBackend, ReflinkBackend via FICLONE on btrfs/XFS/bcachefs; btrfs-subvol/zfs reserved as stubs), capability probe at startup, and --storage-mode=auto|copy|reflink daemon flag. All Firecracker rootfs clones now route through the registry.
  • Adds memory CoW fork for running Firecracker sandboxes via POST /v1/sandboxes/{id}/fork. Implementation: pause parent → write vmstate.bin + reflink rootfs into a fork-point dir → resume → spawn N children that restore via mmap(MAP_PRIVATE) of the shared memory file. Fork-point lifecycle is descendant-ref-counted; orphans GC'd at daemon start. Provider-side hard cap of 64 children.
  • Adds Incus pool-driver advisory at startup (warns on dir/lvm; --incus-strict-pool-cow upgrades to fatal). Adds OTel metrics for clone duration, fork pause, child spawn. Adds operator docs.

Spec & plan

  • docs/superpowers/specs/2026-04-24-cow-sandbox-fork-design.md
  • docs/superpowers/plans/2026-04-24-cow-sandbox-fork.md

What did NOT ship in this PR

  • UFFD-based no-pause fork (deferred — MAP_PRIVATE covers the current latency targets on CoW-capable hosts).
  • BtrfsSubvolBackend / ZfsBackend for Firecracker (file-based rootfs makes reflink the natural fit; stubs reserved for future non-Firecracker providers).
  • storage_clone_bytes_saved_total (requires per-FS extent introspection; tracked as future work in docs/storage-backends.md).

Test plan

  • CI green across all variants (go test ./..., -tags firecracker, -tags incus)
  • Smoke navarisd --storage-mode=auto on an XFS-reflink host and verify "image_dir_backend=reflink" log line at startup
  • Smoke navarisd --storage-mode=reflink on an ext4 host and verify startup fails fast
  • Run the gated end-to-end fork test on a btrfs/XFS-reflink Firecracker host: go test -tags integration ./test/integration/... -run TestFork_ChildrenSeeParentFiles -v
  • Confirm POST /v1/sandboxes/{id}/fork against an Incus sandbox returns HTTP 501 (Not Implemented) with domain.ErrNotSupported in the operation error
  • Confirm forking a non-running parent returns HTTP 422 (Unprocessable Entity, ErrInvalidState)
  • Verify storage_backend field appears in snapinfo.json and <image>.json after a clone on a CoW-capable host

🤖 Generated with Claude Code

erans and others added 29 commits April 24, 2026 16:59
Adds a design document for copy-on-write sandbox cloning:
stage 1 replaces full-file rootfs copies with a pluggable
storage.Backend (reflink via FICLONE on XFS/btrfs, copy fallback);
stage 2 adds a fork endpoint that spawns children from a Firecracker
live snapshot using MAP_PRIVATE on the shared memory file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20-task plan covering stage 1 (storage.Backend with reflink/copy +
Firecracker wiring + Incus pool check) and stage 2 (fork endpoint
with MAP_PRIVATE memory CoW, fork-point lifecycle, GC, metrics,
integration test, docs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…apping

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…up branches

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sertion

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ime fallback

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…x sort

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…with snapshot

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…estroy

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Fork method to client SDK and an integration test that writes a
sentinel in the parent, forks x3, verifies children inherit the sentinel,
and confirms parent and siblings remain isolated after diverging writes.
Skips automatically on non-btrfs/XFS/bcachefs hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…for fork

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erans erans merged commit cfb326d into main Apr 25, 2026
5 checks passed
erans added a commit that referenced this pull request Apr 25, 2026
* ci: add Incus + btrfs (CoW) integration leg

The existing integration suite preseeds the Incus pool with driver: dir,
which is full-copy and exercises only the fallback half of the storage
advisory added in #7. This leg flips the entrypoint to a btrfs loop pool
and runs navarisd with --incus-strict-pool-cow=true so any regression in
the advisory or its wiring fails the run hard.

The base integration suite is unchanged: scripts/incus-entrypoint.sh
defaults INCUS_STORAGE_DRIVER to dir and only switches when explicitly
overridden. docker-compose.integration-incus-cow.yml is a parallel
standalone compose file matching the existing -firecracker / -mixed
pattern; .github/workflows/integration-incus-cow.yml mirrors
integration.yml across alpine/3.21 and debian/12.

Dockerfile.incus now installs btrfs-progs explicitly (the incus package
already pulls it transitively, but explicit avoids relying on a
transitive dep that could go away).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): incus btrfs source path must be the bare pool name

Incus rejects any source under /var/lib/incus that is not exactly
/var/lib/incus/storage-pools/<pool-name>. Our preseed had ".img" on the
end, which tripped the validator and aborted init. Drop the suffix —
Incus auto-creates the loop file at the bare path.

Reproduced from the failing alpine/3.21 leg of #9: "Failed to create
storage pool 'default': Only allowed source path under '/var/lib/incus'
is '/var/lib/incus/storage-pools/default'".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): provision incus btrfs pool ourselves before init

Letting Incus auto-create a loop file (source + size in the preseed)
fails in a Docker volume because the btrfs driver checks the *parent*
directory's filesystem first and rejects with "Provided path does not
reside on a btrfs filesystem (detected ext4)". The Docker volume
backing /var/lib/incus is on the runner's ext4 host, which we can't
change.

Workaround: do the loop+mkfs+mount ourselves in the entrypoint, then
hand Incus a ready-to-use btrfs mountpoint. Idempotent via stat -f
-c %T. The preseed becomes a plain "use this existing mount" with no
size hint, which Incus accepts cleanly.

Reproduced from #9 alpine/3.21 fail: "Failed to create storage pool
'default': Provided path does not reside on a btrfs filesystem
(detected ext4)".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): mount incus btrfs source outside /var/lib/incus

Two validators in Incus's btrfs driver fight over the same path:

  - "Only allowed source path under /var/lib/incus is
    /var/lib/incus/storage-pools/default" (validates source path)
  - "Storage pool directory ... already exists" (Incus wants to
    create that exact directory itself as the pool's internal layout)

We can satisfy either but not both at the same path. Move the mount
to /var/lib/navaris-incus-btrfs (sibling of /var/lib/incus, neither
validator applies) and point source there. Incus accepts the
existing btrfs path and lays out its pool internals inside it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant