Skip to content

fix(delegate): back-fill prior secret versions for newly-joined members#245

Merged
sanity merged 2 commits into
mainfrom
worktree-fix-241-private-room-backfill
May 14, 2026
Merged

fix(delegate): back-fill prior secret versions for newly-joined members#245
sanity merged 2 commits into
mainfrom
worktree-fix-241-private-room-backfill

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented May 14, 2026

Bug report

Ivvor on production River post-#242 deploy (2026-05-14):

Private room not working right when I invite another node. I see stuff like [Encrypted message - secret v1 not available (have: [0])] on owner node and [Encrypted message - secret v0 not available (have: [1])] on invited user.

[later] now i have a point where the owner can see everything, and the invited user can see new messages, but the room name is encrypted, so is the nickname of the owner, and the initial bunch of messages.

Root cause

When a new member joins a private room, the owner's chat-delegate rotates the secret to current_version + 1 and emits encrypted_secrets only at the new version. Every prior version is missing for the newly-joined member, so they cannot decrypt the room name (sealed at v0 at room creation), the owner's nickname (sealed at v0 when set), or any messages sent before they joined.

The owner has v0 from room creation (UI emits owner's self-encrypted secret then) plus v_n from each rotation, so the owner sees everything. The bug is asymmetric: only the newly-joined member's view is broken.

Fix

Extracted the rotation per-member loop into a pure helper build_rotation_encrypted_secrets and taught it the distinction:

  • Continuing members (already in cached previous_members, plus the owner): only emit new_version. They already have older entries from the rotation that admitted them.
  • Newly-joined members (in current_members but NOT in previous_members): emit every version [0..=new_version]. The older secrets are re-derived deterministically from the owner's signing seed via derive_room_secret(seed, owner_vk, v), so the back-filled blobs are byte-identical to whatever the version was originally produced as.

The owner is always continuing — at v0 the owner emits their own encrypted_secret from the UI; on every subsequent rotation this helper emits one fresh entry at the new version. So the owner never needs back-fill.

Tests

Two new regression tests:

  • rotation_backfills_prior_versions_for_newly_joined_members — owner + alice (continuing) + bob (newly joining); rotation to v2 must emit exactly {owner@v2, alice@v2, bob@v0, bob@v1, bob@v2}. Verifies signatures and byte-identity on bob's back-filled v0 blob.
  • rotation_backfills_for_all_members_on_first_rotation — empty previous_members, every member newly-joined.

22/22 chat-delegate tests pass (was 20).

Deploy footprint

Recovery for users currently stuck

When this lands and Ivvor reloads, the owner's delegate runs the rotation pipeline again on the next state notification (any member-set change, or even just a reload causing the cached member_set to be re-evaluated), emits the back-filled secrets, and Ivvor's UI decrypts history correctly.

For private rooms where membership has stabilised, the rotation pipeline won't naturally re-fire (member set unchanged → no rotation). The owner can trigger a manual rotation by editing the room config or banning + unbanning a member. Both paths go through UI-side rotate_secret which today still does the "new version only" thing — fixing that path too is an obvious follow-up.

Test plan

  • CI green (wasm-sync guard, delegate-migration check, build, playwright)
  • Local Ivvor-scenario smoke test: fresh test-contract publish under separate keypair, owner creates private room + messages, second profile accepts invite, verify second profile sees room name + owner nickname + history
  • Merge + cargo make publish-river to production
  • Confirm Ivvor's existing room recovers (with manual ban+unban trigger if needed)

Closes the bug Ivvor reported on top of #242. Tracking issue #243 (Phase 2 DM UI) unchanged.

[AI-assisted - Claude]

Bug report (Ivvor, 2026-05-14, production River post-#242 deploy):

> Private room not working right when I invite another node. I see stuff
> like [Encrypted message - secret v1 not available (have: [0])] on
> owner node and [Encrypted message - secret v0 not available (have: [1])]
> on invited user
>
> [later] now i have a point where the owner can see everything, and the
> invited user can see new messages, but the room name is encrypted, so
> is the nickname of the owner, and the initial bunch of messages

## Root cause

When a new member joins a private room, the owner's chat-delegate
rotates the room secret to `current_version + 1` and emits
`encrypted_secrets` only at the new version. Every prior version
(v0, v1, ... new_version-1) is missing for the newly-joined member,
so they cannot decrypt:

* the room name (sealed at v0 when the owner created the room)
* the owner's nickname (sealed at v0 when set)
* any messages sent before they joined (sealed at older versions)

Owner-side appears unaffected because the owner has v0 from room
creation (UI emits it then) plus v_n from each rotation. Asymmetric
symptom on Ivvor's side: only the new version visible, history opaque.

## Fix

Extract the rotation per-member loop into a pure helper
`build_rotation_encrypted_secrets` and teach it the distinction:

* **Continuing members** (already in `previous_members`, plus the
  owner): only emit `new_version`. They already have older entries
  from the rotation that admitted them.

* **Newly-joined members** (in `current_members` but NOT in
  `previous_members`): emit every version `[0..=new_version]`. The
  older secrets are re-derived deterministically from the owner's
  signing seed via `derive_room_secret(seed, owner_vk, v)`, so the
  back-filled blobs are byte-identical to whatever the version was
  originally produced as.

The owner is always treated as continuing: at v0 the owner's own
`encrypted_secret` is emitted by the UI at room creation; on every
subsequent rotation the helper emits one fresh entry at the new
version. So the owner never needs back-fill.

## Tests

Two new regression tests in
`delegates/chat-delegate/src/subscription/tests.rs`:

* `rotation_backfills_prior_versions_for_newly_joined_members`: owner
  + alice (continuing, in `previous_members`) + bob (newly joining
  now); rotation to v2 must emit exactly `{owner@v2, alice@v2,
  bob@v0, bob@v1, bob@v2}`. Verifies owner-signature validity on each
  emitted blob, plus byte-identity on bob's back-filled v0 blob
  (re-running deterministic ECIES with the same `(secret_v0, bob_vk)`
  inputs must reproduce the exact wire bytes — pins the
  per-version secret derivation).

* `rotation_backfills_for_all_members_on_first_rotation`: empty
  `previous_members`, every current member is newly-joined and must
  receive every version up through `new_version`.

22/22 chat-delegate tests pass (was 20).

## Deploy

The delegate WASM byte-changes; V17 migration entry recorded in
`legacy_delegates.toml` so existing users transparently migrate from
the post-#242 delegate (code_hash `123f732d…` -> delegate_key
`540b2887…`) to the new one.

`room_contract.wasm` is BYTE-IDENTICAL to #242. So:

* No room contract migration ceremony needed
* No riverctl republish needed (riverctl only embeds room_contract.wasm)
* No gkapi/secrets/quickstart-URL updates needed
* Production deploy is just `cargo make publish-river`

Recovery for existing users on the buggy delegate: when this lands and
they reload the UI, the owner's delegate runs its rotation pipeline
again on the next member-set change OR existing notification, emits
the back-filled secrets, and the previously-stranded member is now
able to decrypt history.

For private rooms where membership has stabilised, the rotation
pipeline won't naturally re-fire (member set unchanged → no
rotation). In that case the owner can trigger a manual rotation by
editing the room config or banning + unbanning a member; both paths
go through UI-side `rotate_secret` which today still does the
"new version only" thing — fixing that path too is the obvious
follow-up.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses all five PR #245 review findings.

## What was wrong with the first iteration

1. **v0 mismatch (skeptical, big-picture, codex all flagged).** The room's
   v0 secret is generated RANDOMLY by the UI in
   `ui/src/room_data.rs::create_new_room_with_name` via
   `generate_room_secret()`. The first iteration of this fix back-filled
   by RE-DERIVING via `derive_room_secret(seed, owner_vk, 0)` —
   producing a different secret. The newly-joined member would receive
   bytes that decrypted to a key unrelated to what the room name was
   sealed under. **The bug was not actually fixed.**

2. **Cache-based dedup (codex flagged).** Continuing-vs-newly-joined was
   decided from the delegate-local member-set cache. On a fresh
   delegate / restart / webapp reinstall, the cache is missing and the
   helper treats every member as newly-joined — emitting duplicate
   `(member, version)` pairs that the room contract rejects via
   `RoomSecretsV1::apply_delta`'s dedup guard, wedging rotation
   permanently.

3. **Tautological test (testing, big-picture flagged).** The regression
   test asserted byte-identity against `derive_room_secret(..., 0)` —
   the same function the broken helper called — so it would pass for
   both the broken and the correct behaviour. It never tested that
   the back-fill matched the actual on-the-wire v0.

4. **Wrong PR number in V17 description (code-first flagged).**

5. **Missing edge-case tests (testing, code-first flagged):** multiple
   simultaneous new joiners; banned-then-readmitted member.

## What this iteration does

* **Recover prior-version secrets from state, not re-derive.** The helper
  scans `new_state.secrets.encrypted_secrets` for the owner's blob at
  each prior version `v < new_version`, ECIES-decrypts it with the
  owner's signing key (which the delegate already holds for rotation),
  and recovers the actual secret bytes the room is using. New
  `decrypt_secret_from_member_blob_raw` wrapper in `common/src/ecies.rs`
  takes raw `[u8; 32]` ephemeral-pubkey bytes so the delegate doesn't
  need a direct `x25519-dalek` dep.

* **Dedup by what's already in state.** Build a `(member, version)` set
  from `existing_encrypted_secrets` and skip any pair already present.
  Continuing-vs-newly-joined is now an emergent property: a continuing
  member already has v0..v_prev entries, so only v_new is emitted; a
  newly-joined member has nothing, so they get the full back-fill.
  Works correctly even when the delegate-local cache is missing.

* **Tests pin REAL behaviour, not self-consistency:**
  - `backfill_uses_real_v0_recovered_from_owners_encrypted_secret`:
    state contains a v0 blob with random bytes; helper produces a
    back-filled v0 for bob; bob's signing key MUST decrypt the
    back-fill to the same random bytes. Asserts what the user cares
    about: bob can read the room name.
  - `backfill_handles_multiple_simultaneous_new_members`: two new
    joiners in one rotation, both back-filled, both decrypt the same
    random v0.
  - `backfill_dedups_against_state_for_continuing_members`: alice has
    v0+v1 in state; helper only emits v2 for her, no duplicates.
  - `backfill_handles_readmitted_member`: alice was removed and her
    entries cleaned up; on readmission she gets a full back-fill.

24/24 chat-delegate tests pass (was 22).

## Deploy footprint unchanged from first iteration

* Delegate WASM changes: new b3sum
  `6229c2a136b30a7868c07099efdea86bee260734339d872aa1e68177733b5fdd`.
* `room_contract.wasm` byte-identical to #242 — no room migration
  ceremony, no riverctl republish.
* V17 description text corrected (was "#243", now "#245").

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented May 14, 2026

Rewrite addressing all 5 reviews

Force-push of de7c962a on top of the original commit. Net: helper rewritten, four reviews addressed, original tautological test replaced.

Critical issues fixed

1. v0 mismatch (skeptical, big-picture, codex) — the original fix back-filled by re-deriving derive_room_secret(seed, owner_vk, 0), but production River creates v0 randomly via generate_room_secret() in ui/src/room_data.rs::create_new_room_with_name. So the fix didn't actually let new members read the room name — same user-visible bug.

The rewrite recovers prior secrets by ECIES-decrypting the owner's existing encrypted_secret-at-v from the room state, using the owner's signing key (which the delegate already holds for rotation signing). The decrypted plaintext IS the actual secret the room is using, so re-encrypting it for the new member produces bytes they can decrypt to recover the real v0.

New decrypt_secret_from_member_blob_raw wrapper in common/src/ecies.rs takes raw 32-byte ephemeral-pubkey bytes so the delegate doesn't need a direct x25519-dalek dep.

2. Cache-based dedup (codex) — the original used the delegate-local member-set cache to decide continuing-vs-newly-joined. On a fresh delegate / restart / webapp reinstall, that cache is missing → every member treated as newly-joined → duplicate (member, version) pairs → contract rejects via RoomSecretsV1::apply_delta dedup → rotation wedged permanently.

The rewrite derives the decision directly from new_state.secrets.encrypted_secrets: skip any (member, version) pair already in state. Continuing-vs-newly-joined is now emergent — a continuing member already has v0..v_prev, so only v_new is emitted; a newly-joined member has nothing, so they get the full back-fill. Works correctly even with no cache.

3. Tautological test (testing, big-picture) — the original test asserted byte-identity against derive_room_secret(..., 0), the same function the broken helper called. It would pass for both behaviours. Replaced with backfill_uses_real_v0_recovered_from_owners_encrypted_secret: state contains a v0 blob with random bytes; helper produces a back-filled v0 for bob; bob's signing key MUST decrypt back to the same random bytes. That's the property the user cares about.

Edge-case tests added

  • backfill_handles_multiple_simultaneous_new_members (testing)
  • backfill_dedups_against_state_for_continuing_members (codex)
  • backfill_handles_readmitted_member (code-first)

24/24 chat-delegate tests pass (was 22). All four new tests do real end-to-end decryption with the recipient's signing key, not byte-identity self-checks.

Minor

Not in this PR

  • UI-side rotate_secret in ui/src/room_data.rs:465 (big-picture, code-first finding) — same bug class. Not pulled in here because the recovery path for Ivvor-style stranded users is: owner triggers a delegate rotation by changing the member set (invite a fresh member, or ban+reinvite). Delegate's now-correct logic back-fills. UI rotate_secret is only hit on direct-from-edit / ban-without-readmission paths and doesn't add new members, so it's not on the recovery path. Filing as follow-up.

  • State-size bloat caveat (skeptical M2) — back-fill scales O(rotations × new_members) per join. Documented in the helper's docstring; per the big-picture reviewer's calculation that's ~22.5 KB per join at 100 rotations, fine in practice. No cap added.

  • WASM end-to-end harness (testing) — still gated on a MockDelegateCtx upstream in freenet-stdlib. Tracked outside this PR.

[AI-assisted - Claude]

@sanity sanity merged commit 75d10bd into main May 14, 2026
6 checks passed
sanity added a commit that referenced this pull request May 15, 2026
…#248)

* fix(ui): leave room sticks across reloads via removed_rooms tombstone

Fixes #247 (Ivvor's report 2026-05-14 17:03 — "every time i leave the
private room on the second node it's coming back after a refresh a
little while later").

## Root cause

`Confirm Leave` in `ui/src/components/room_list/edit_room_modal.rs:283`
did:

    ROOMS.write().map.remove(&room_vk);
    // … save_rooms_to_delegate()

On the next page load the UI re-fetches `rooms_data` from the current
delegate (correctly without the room) AND `fire_legacy_migration_request()`
probes all 17 entries in `legacy_delegates.toml`. Some legacy delegate's
on-disk `rooms_data` still contained the room — that data was never
purged when the delegate was superseded. `Rooms::merge` is purely
additive (`ui/src/room_data.rs:881`), so the room came back.

## Fix

Add a `removed_rooms: HashSet<VerifyingKey>` tombstone set to the
serialised `Rooms` struct (with `#[serde(default)]` for backward
compatibility). `Rooms::merge` now:

1. Takes the union of `removed_rooms` from both sides first.
2. Drops any entry from `self.map` whose key is in the union
   (defensive — should not happen if leave is the only mutation
   path, but enforced so the invariant holds).
3. Filters incoming entries from `other.map` against the union.

New `Rooms::leave_room(vk)` helper does both `map.remove(vk)` AND
`removed_rooms.insert(vk)` atomically. The Confirm-Leave button calls
this. Invitation acceptance (`get_response.rs` after PUT) clears the
tombstone with `rooms.removed_rooms.remove(&owner_vk)` so explicit
rejoins work.

## Tests

`leave_room_tombstone_prevents_merge_re_adding` — full scenario from
the report: room in map → `leave_room` → legacy merge with same room
present → must not re-add. Asserts both `map` and `removed_rooms`
invariants.

`merge_unions_removed_rooms_tombstones` — if either side has the
tombstone, it's honoured; map entries on the receiver are dropped.

14/14 UI `room_data::tests` pass (was 12).

## Deploy footprint

UI-only change. `chat_delegate.wasm` and `room_contract.wasm` are
BYTE-IDENTICAL to #245. No migration entry needed. No riverctl
republish. Deploy is `cargo make publish-river`.

## Recovery for currently-stuck users

After this lands and users reload, the next time they leave a room
the tombstone is recorded. Their previous leave attempts (without
tombstone) won't survive — they'll need to leave once more after
the upgrade. The "rejoining via invitation clears the tombstone"
path means this isn't a one-way door — explicit rejoin still works.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup: filter subscription/CURRENT_ROOM by tombstone; close re-add paths

Review findings on PR #248:

* **Skeptical H1**: response_handler.rs collected room keys and owned-
  rooms-for-subscription from `loaded_rooms.map` BEFORE the merge
  filter ran, then subscribed/synced every key. That re-introduced
  tombstoned rooms via the subscribe + SYNC_INFO path. Now filtered
  against a union of incoming and current `removed_rooms`.

* **Skeptical H2**: `current_room_key` restoration didn't check the
  tombstone. If the user left the room that was their current room,
  reload restored a CURRENT_ROOM pointing at a nonexistent map entry.
  Now skipped when tombstoned.

* **Code-first**: members.rs:580 (room import via export blob)
  inserted into ROOMS without clearing the tombstone. If a user
  imported an export of a room they left, the room came back briefly
  but the next merge dropped it again. Now clears the tombstone on
  import, matching get_response.rs's invitation-accept path.

* **Code-first / serde round-trip**: added two tests —
  `rooms_deserialises_pre_247_blob_with_default_removed_rooms` pins
  the `#[serde(default)]` backward-compat behaviour for blobs serialised
  before this PR; `rooms_round_trip_preserves_removed_rooms_tombstone`
  pins the write-side wire format. Replaces the original ad-hoc test
  that depended on serde_json (which isn't a dep here).

* **Code-first / explicit rejoin test**: added
  `cleared_tombstone_allows_merge_to_re_add` to lock in the
  invitation-accept / import-identity rejoin semantics.

* **Doc comment**: corrected the rustdoc on `removed_rooms` — it was
  documented as "monotonically grows" but the code clears entries on
  explicit rejoin. Now accurate, with notes on the multi-device
  divergence behaviour.

17/17 UI `room_data::tests` pass (was 14).

Deferred follow-ups (not in this PR, captured in the issue tracker):

* Codex P2: stale tombstone re-applied after rejoin in a divergent
  multi-device scenario. Requires timestamped tombstones or a
  rejoin-counter; out of scope for tonight's fix.
* Skeptical M1: removed_rooms GC. Set is small in practice (one entry
  per leave); a future TTL/eviction pass can land separately.
* Skeptical M2: cross-version device divergence (old build drops
  removed_rooms on save). Mitigated by `#[serde(default)]` and a
  "leave once more after the upgrade" recovery note in the PR body.
* Skeptical M4: migrated_rooms not filtered. Minor.
* Skeptical L1: PartialEq ignores removed_rooms. Latent footgun.
* Skeptical L3: stale `.syncthing.*.tmp` files in the worktree.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup: address #249 quick-wins inline (migrated_rooms, PartialEq, gitignore)

Per the "don't accumulate follow-ups while fixing bugs" principle —
folding the small mechanical items from #249 into this PR rather than
deferring:

* **#2 / Skeptical M4**: `Rooms::leave_room` now also retains
  `migrated_rooms` to drop any pending upgrade-pointer entry for the
  left room. Previously, if a room had been recorded for upgrade and
  the user then left, the upgrade pointer would still be sent.

* **#3 / Skeptical L1**: `PartialEq for Rooms` now compares
  `removed_rooms` too. Two `Rooms` differing only in tombstones used
  to compare equal — a latent footgun if any code uses equality to
  short-circuit a save.

* **#6 / Skeptical L3**: `.gitignore` excludes `**/.syncthing.*.tmp`
  so syncthing's in-flight transfer markers don't show up as
  untracked files.

Three items remain in #249, all genuinely deferred:

* **#1 Timestamped tombstones (Codex P2)**: real but requires custom
  serde for backward compat with the HashSet we're shipping. Scenario
  is rare (River doesn't have automatic multi-device sync; single
  delegate writer) and user-recoverable (leave again). Not paid for.
* **#4 `removed_rooms` GC/TTL**: hypothetical for heavy users; defer
  until there's evidence.
* **#5 Cross-version divergence**: mostly moots itself as users
  upgrade. Schema-versioning is an architectural change for later.

17/17 `room_data::tests` still pass.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant