Elide is a log-structured block storage system combining demand-fetch, content-addressed dedup, and delta compression, designed for running many VMs efficiently on shared infrastructure.
| Document | Contents |
|---|---|
| docs/quickstart.md | Import an OCI image, branch a writable replica, and serve it over ublk |
| docs/quickstart-data-volume.md | Create an empty data volume, mount from a Lima VM, write data, upload segments |
| docs/quickstart-tigris.md | Run against a real S3-compatible backend (Tigris); covers AWS S3, MinIO, R2, etc. |
| docs/overview.md | Problem statement, key concepts, operation modes, empirical findings |
| docs/findings.md | Empirical measurements: dedup rates, demand-fetch patterns, delta compression data, write amplification |
| docs/architecture.md | System architecture, directory layout, write/read paths, LBA map, extent index, dedup, snapshots |
| docs/formats.md | WAL format, segment file format (header + index + inline + body + delta), S3 retrieval strategies |
| docs/operations.md | GC, repacking, boot hints, filesystem metadata awareness |
| docs/testing.md | Property-based tests: ULID monotonicity and crash-recovery oracle |
| docs/reference.md | Implementation notes, open questions, and index of prior art docs |
| docs/reference-lsvd.md | lab47/lsvd design decisions and comparison to Elide |
| docs/reference-nydus.md | nydus-snapshotter: lazy loading, RAFS format, NRI optimizer, boot hints |
| docs/vm-boot.md | Booting a VM from an Elide volume with QEMU direct kernel boot |
| docs/design-gc-ulid-ordering.md | Open design: GC ULID ordering race, single-mint invariant, proptest findings |
| docs/design-gc-overlap-correctness.md | Design: GC skips partial-LBA-death entries to avoid shadow/loss on rebuild when multi-LBA entries have been partially overwritten |
| docs/design-gc-partial-death-compaction.md | Design: decouple composite body from surviving sub-runs of partial-LBA-death entries so normal GC can subsequently reclaim each piece independently |
| docs/design-gc-plan-handoff.md | Design: coordinator emits a plaintext plan; volume materialises bodies via BlockReader and signs the output — single source of truth for body resolution |
| docs/design-gc-bucket-unification.md | Proposed: collapse GC's smalls + one-filler tiers into a single bin-pack that emits N output buckets per tick; mirrors #297's pending unification. Selection is filtered to fully cache-resident segments so GC never issues S3 GETs to enable a rewrite — lazy volumes still GC their own writes. Per-tick cap (max_buckets_per_tick, default 4) is sized against retention-peak |
| docs/design-delta-compression.md | Design: delta compression via file-path matching, file-aware import, snapshot filemaps |
| docs/design-delta-materialisation.md | Proposed: local-only cache/<ULID>.dmat sidecar caches lz4-compressed materialised delta-entry bytes after first read; WAL-shaped append-only with hash-verified tail recovery |
| docs/design-replica-model.md | Proposed: replica-based model for forks and recovery; retires volume fork, adds volume materialize, frames snapshot cadence as a retention SLA |
| docs/design-portable-live-volume.md | Accepted: named volumes become portable across hosts; each volume start is a fresh fork inheriting from the previous tail, so each ownership episode has its own ULID and signing key. The only shared mutable thing is names/<name> |
| docs/design-volume-size-ownership.md | Implemented: size lives on the names/<name> claim record (single owner, CAS-protected, signed in event log), not on the unsigned manifest.toml. Ancestors carry no size; resize is a CAS + UBLK_U_CMD_UPDATE_SIZE (Linux 6.16+) |
| docs/design-manifest-toml-removal.md | Implemented: dropped manifest.toml entirely. name / readonly / origin were redundant with existing surfaces; OCI source migrated onto signed volume.provenance as oci_source. Fresh-bucket-only |
| docs/design-volume-event-log.md | Proposed: per-name append-only event log under events/<name>/<ulid> recording lifecycle transitions (claim, release, force-release, start/stop, fork, rename). Pointer stays canonical for "now"; log is canonical for "ever". Includes a rename design that ties two name logs together without copying history |
| docs/design-segment-index.md | Proposed: P3 (folding P4) of list elimination — replace the per-vol segments/+retention/ LISTs with a second manifest at by_id/<vol>/HEAD (single overwritten object; named like events/<name>/HEAD for the same "leading edge of activity" role) for the post-snapshot delta over P2's LATEST anchor. Reaper folded into the per-volume tick loop → sole sequential writer; HEAD PUT per drain tick on any state change, one GET reads any-size HEAD, no lock/chain. Cross-coordinator only (owner uses local index/); derived/unsigned over a per-segment-signed substrate |
| docs/design-force-release-fencing.md | Proposed: split-brain safety for volume claim --force. The claimant's basis is the owner's own published snapshot, already pinned by the owner's GC floor; the fence is the volume-rw liveness predicate (forced CAS kills the zombie's credential renewals); the tail re-own copy retries through HEAD re-resolution |
| docs/design-consistency-surface.md | Exploration: which Elide operations require strong consistency vs. which tolerate eventual; failure-mode walkthrough; sketch of a two-bucket split (small strongly-consistent coordination bucket, large eventually-consistent data bucket) |
| docs/portable-live-volume-plan.md | Plan: phased implementation of portable live volumes (foundations → schema → lifecycle verbs → claim --force recovery → CLI unification → tests/docs). Fresh-bucket-only; clean break for volume remote |
| docs/design-tigris-native.md | Exploration: what Elide looks like if designed Tigris-native (bucket snapshots, forks, versioning as first-class primitives) rather than as a portable S3 consumer |
| docs/design-iam-key-model.md | Superseded by design-mint.md (coordinator in-process path + [iam] section removed); retained for the key-inventory/policy-scoping rationale mint's roles inherit. Per-volume IAM key model for Tigris-style backends. Four key classes (admin, one writer per coordinator, one peer-fetch per coordinator, per-volume RO with ancestor-inclusive policies). IAM-layer invariants: events/ is append-only and coordinators/ is immutable (no key holds Delete). Identity via policy names (Tigris doesn't tag keys); host-local reconciliation; dead-host orphans need operator action |
| docs/design-auth-model.md | Auth model and principles: isolation guarantees on a shared-uid host (what macaroons enforce and don't); the settled principle that operator authorisation gates S3 writes not destructive verbs. Operator IPC verbs are ungated in the codebase today; the concrete central-auth-service design lives in design-auth-service.md. References architecture.md for the keyed-BLAKE3 macaroon construction shared with volume macaroons |
| docs/design-auth-service.md | Proposed: central auth service issues per-op discharges; mint is the sole primary-macaroon issuer. HMAC throughout (keyed-BLAKE3, same construction as volume macaroons) — the auth service is a third-party authority discharging a TPC embedded in mint-issued primaries. Multi-tenant by construction (mandatory OrgId; mint is per-org and the org-identity broker; coord enrolls to a mint, not direct to auth service). Operator login produces a session discharge; CLI trades it for per-op discharges at IPC time. Mint-as-auth packaging for dev/test/demo via demo-enabled = true. Concrete /v1/ API surface for login, discharge, mint enrollment, and coord enrollment. Clean-break migration from PoC |
| docs/design-remote-coord-ipc.md | Exploration: running operator IPC verbs (volume snapshot, etc.) against a coord on another host. Decision: route cli ↔ local coord ↔ remote coord rather than cli ↔ remote coord direct — keeps the CLI unauthenticated, places the operator session at coord (consistent with coord-as-clearer + per-forward attenuation in design-auth-service.md), and reuses the existing peer channel from design-peer-segment-fetch.md. Assumes a coord is always running on the caller's host |
| docs/design-coord-http-ipc.md | Proposed: pivot coord inbound socket (CLI ↔ coord) from NDJSON+UDS to HTTP+UDS, matching mint and the auth-service. RPC-style POST /v1/, JSON bodies, HTTP status codes replace IpcErrorKind, Authorization: Macaroon <bundle>, 401 + WWW-Authenticate: Macaroon for the discharge challenge, chunked NDJSON for streaming verbs. Volume control socket (coord ↔ volume) is out of scope and stays NDJSON+UDS |
| docs/design-mint.md | Proposed (initial draft, supersedes PR #354): mint, a standalone macaroon-authenticated STS-shaped service for Tigris. Admin credentials live off-host; mint vends short-lived scoped keypairs against role configs whose IAM policy templates are rendered from macaroon caveats at issuance. Three deployment shapes (self-hosted / central custodial / central proxy). Elide's four-key IAM model collapses to three roles (coord-rw, volume-ro, peer-fetch) — peer-fetch becomes per-request, eliminating the mid-path wildcard requirement |
| docs/design-mint-template-seal.md | Implemented: signed seal (_mint/templates/seal.json, MAC'd under the keyring) over role TOML blocks + BLAKE3 hashes of policy template files. mint seal stages a pending seal locally; mint serve publishes it on next startup with semantic-equality reconcile against the bucket. Startup verifies and refuses-closed on mismatch; in-memory cache for the process lifetime. Templates themselves stay operator-provisioned out-of-band; the seal carries only their hashes |
| docs/design-mint-volume-attestation.md | Exploration: make mint's per-volume req.volume attested rather than self-asserted, closing the gap where a coordinator can request RW credentials to any volume's prefix. A third-party caveat in the credential is discharged by an attestation coordinator co-located with mint; coord A proves possession of the live volume's volume.key against the public meta/<vol>.pub, and coord B derives the read set (self RW + per-ancestor RO) from the signed meta/<vol>.provenance lineage and names/<name> currency. coord B is a pure function over public signed state (holds no secret); mint stays volume-agnostic and binds the principal via cnf. Generalises to the ancestor chain with one possession proof anchored at the live volume |
| docs/integrations.md | Integration targets: Docker, Firecracker, Cloud Hypervisor, Kubernetes — architecture, sequencing, open work |
| docs/design-oci-export.md | Exploration: squashed OCI export, dual publish via referrers, elide-snapshotter for containerd |
| docs/actor-offload-plan.md | Plan: offload heavy maintenance work off the volume actor to isolate write tail latency |
| docs/promote-offload-plan.md | Plan: offload WAL promotion onto the worker thread (first step of actor-offload-plan) |
| docs/promote-segment-offload-plan.md | Plan: offload promote_segment IPC handler to the worker thread (step 6 of actor-offload-plan) |
| docs/design-ublk-transport.md | Design: ublk as the host-local transport — multi-queue async handler, USER_RECOVERY_REISSUE crash recovery |
| docs/design-ublk-shutdown-park.md | Design (proposed): shutdown leaves ublk device QUIESCED for recovery; deletion becomes an explicit verb. Makes stop → start reliable while a filesystem is still mounted |
| docs/design-peer-segment-fetch.md | Exploration: opportunistic LAN peer-fetch tier in front of S3 for index/body bytes. Targets cross-host handoff (release → claim) and large-fleet image pull. URL space mirrors S3 paths; auth mirrors per-volume IAM prefix scope. |
| docs/peer-segment-fetch-v1-plan.md | Plan: v1 implementation of peer-fetch — .idx-only, coordinator-driven, opt-in via coordinator config. New elide-peer-fetch crate. Decision criteria for whether to extend to body fetch. |
| docs/coordinator-mint-enrollment-plan-v2.md | Plan: coordinator-side mint enrollment — one blocking elide coord enroll (A → wait approval → exchange fan-out) writing credentials/<role>, plus a hard [mint] startup gate. Threads the three operator-discharge gates (enroll / approve / exchange); needs a logged-in operator session; ticket in-memory only; bootstrap operator-supplied not config. Supersedes the v1 plan. |
| docs/list-elimination-plan.md | Plan: remove all s3:ListBucket use from the coordinator runtime — replace each per-volume/event prefix LIST with a deterministic GET (latest-pointer or maintained index), then delete the grant from coord-rw. Resolves design-mint open #12. Phased P1–P5; no-LIST reconcile story. |
| docs/design-deployment-modes.md | Proposed: three deployment modes (coord run, coord start/stop, systemd) with a uniform operator surface. Internal VolumeLauncher trait abstracts direct-fork from systemd transient units (elide-vol-<ulid>.service); each volume in its own cgroup so default KillMode=control-group on the coord unit lets volumes outlive coordinator restart. Avoids the KillMode=process path explicitly discouraged in systemd.kill(5) |
| docs/design-domain-store.md | Proposed: replace Arc<dyn ObjectStore> (215 occurrences across 23 files in elide-coordinator/src/) with object-typed handles vended by role — NameClaims, EventJournal, OwnIdentity, VolumeData (segments/snapshots/head/metadata sub-accessors), ControlPlaneReader. S3 key layout collapses into one module so wrong-prefix keys become unconstructable; events/ append-only invariant becomes type-level (no delete method). Migrates per-object alongside ScopedStores, no atomic swap |
Dated waypoints — each file summarises major changes, bug fixes, and remaining work relative to the previous status. Latest first.
| Date | Document |
|---|---|
| 2026-06-10 | docs/status-2026-06-10.md |
| 2026-06-02 | docs/status-2026-06-02.md |
| 2026-05-26 | docs/status-2026-05-26.md |
| 2026-05-19 | docs/status-2026-05-19.md |
| 2026-05-10 | docs/status-2026-05-10.md |
| 2026-04-27 | docs/status-2026-04-27.md |
| 2026-04-20 | docs/status-2026-04-20.md |
| 2026-04-09 | docs/status-2026-04-09.md |
| 2026-03-30 | docs/status-2026-03-30.md |
Start with docs/overview.md.
Two lanes run unconditionally on every pull request and every push to main:
ci— build, clippy, and userspace tests.ci-kernel— kernel-dependent features (ublk::) exercised inside a nested KVM VM on the GitHub runner. Host builds the test binary; the guest runs it via a 9p share. Blocking, not advisory.