fix(delegate): back-fill prior secret versions for newly-joined members#245
Conversation
Bug report (Ivvor, 2026-05-14, production River post-#242 deploy): > Private room not working right when I invite another node. I see stuff > like [Encrypted message - secret v1 not available (have: [0])] on > owner node and [Encrypted message - secret v0 not available (have: [1])] > on invited user > > [later] now i have a point where the owner can see everything, and the > invited user can see new messages, but the room name is encrypted, so > is the nickname of the owner, and the initial bunch of messages ## Root cause When a new member joins a private room, the owner's chat-delegate rotates the room secret to `current_version + 1` and emits `encrypted_secrets` only at the new version. Every prior version (v0, v1, ... new_version-1) is missing for the newly-joined member, so they cannot decrypt: * the room name (sealed at v0 when the owner created the room) * the owner's nickname (sealed at v0 when set) * any messages sent before they joined (sealed at older versions) Owner-side appears unaffected because the owner has v0 from room creation (UI emits it then) plus v_n from each rotation. Asymmetric symptom on Ivvor's side: only the new version visible, history opaque. ## Fix Extract the rotation per-member loop into a pure helper `build_rotation_encrypted_secrets` and teach it the distinction: * **Continuing members** (already in `previous_members`, plus the owner): only emit `new_version`. They already have older entries from the rotation that admitted them. * **Newly-joined members** (in `current_members` but NOT in `previous_members`): emit every version `[0..=new_version]`. The older secrets are re-derived deterministically from the owner's signing seed via `derive_room_secret(seed, owner_vk, v)`, so the back-filled blobs are byte-identical to whatever the version was originally produced as. The owner is always treated as continuing: at v0 the owner's own `encrypted_secret` is emitted by the UI at room creation; on every subsequent rotation the helper emits one fresh entry at the new version. So the owner never needs back-fill. ## Tests Two new regression tests in `delegates/chat-delegate/src/subscription/tests.rs`: * `rotation_backfills_prior_versions_for_newly_joined_members`: owner + alice (continuing, in `previous_members`) + bob (newly joining now); rotation to v2 must emit exactly `{owner@v2, alice@v2, bob@v0, bob@v1, bob@v2}`. Verifies owner-signature validity on each emitted blob, plus byte-identity on bob's back-filled v0 blob (re-running deterministic ECIES with the same `(secret_v0, bob_vk)` inputs must reproduce the exact wire bytes — pins the per-version secret derivation). * `rotation_backfills_for_all_members_on_first_rotation`: empty `previous_members`, every current member is newly-joined and must receive every version up through `new_version`. 22/22 chat-delegate tests pass (was 20). ## Deploy The delegate WASM byte-changes; V17 migration entry recorded in `legacy_delegates.toml` so existing users transparently migrate from the post-#242 delegate (code_hash `123f732d…` -> delegate_key `540b2887…`) to the new one. `room_contract.wasm` is BYTE-IDENTICAL to #242. So: * No room contract migration ceremony needed * No riverctl republish needed (riverctl only embeds room_contract.wasm) * No gkapi/secrets/quickstart-URL updates needed * Production deploy is just `cargo make publish-river` Recovery for existing users on the buggy delegate: when this lands and they reload the UI, the owner's delegate runs its rotation pipeline again on the next member-set change OR existing notification, emits the back-filled secrets, and the previously-stranded member is now able to decrypt history. For private rooms where membership has stabilised, the rotation pipeline won't naturally re-fire (member set unchanged → no rotation). In that case the owner can trigger a manual rotation by editing the room config or banning + unbanning a member; both paths go through UI-side `rotate_secret` which today still does the "new version only" thing — fixing that path too is the obvious follow-up. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses all five PR #245 review findings. ## What was wrong with the first iteration 1. **v0 mismatch (skeptical, big-picture, codex all flagged).** The room's v0 secret is generated RANDOMLY by the UI in `ui/src/room_data.rs::create_new_room_with_name` via `generate_room_secret()`. The first iteration of this fix back-filled by RE-DERIVING via `derive_room_secret(seed, owner_vk, 0)` — producing a different secret. The newly-joined member would receive bytes that decrypted to a key unrelated to what the room name was sealed under. **The bug was not actually fixed.** 2. **Cache-based dedup (codex flagged).** Continuing-vs-newly-joined was decided from the delegate-local member-set cache. On a fresh delegate / restart / webapp reinstall, the cache is missing and the helper treats every member as newly-joined — emitting duplicate `(member, version)` pairs that the room contract rejects via `RoomSecretsV1::apply_delta`'s dedup guard, wedging rotation permanently. 3. **Tautological test (testing, big-picture flagged).** The regression test asserted byte-identity against `derive_room_secret(..., 0)` — the same function the broken helper called — so it would pass for both the broken and the correct behaviour. It never tested that the back-fill matched the actual on-the-wire v0. 4. **Wrong PR number in V17 description (code-first flagged).** 5. **Missing edge-case tests (testing, code-first flagged):** multiple simultaneous new joiners; banned-then-readmitted member. ## What this iteration does * **Recover prior-version secrets from state, not re-derive.** The helper scans `new_state.secrets.encrypted_secrets` for the owner's blob at each prior version `v < new_version`, ECIES-decrypts it with the owner's signing key (which the delegate already holds for rotation), and recovers the actual secret bytes the room is using. New `decrypt_secret_from_member_blob_raw` wrapper in `common/src/ecies.rs` takes raw `[u8; 32]` ephemeral-pubkey bytes so the delegate doesn't need a direct `x25519-dalek` dep. * **Dedup by what's already in state.** Build a `(member, version)` set from `existing_encrypted_secrets` and skip any pair already present. Continuing-vs-newly-joined is now an emergent property: a continuing member already has v0..v_prev entries, so only v_new is emitted; a newly-joined member has nothing, so they get the full back-fill. Works correctly even when the delegate-local cache is missing. * **Tests pin REAL behaviour, not self-consistency:** - `backfill_uses_real_v0_recovered_from_owners_encrypted_secret`: state contains a v0 blob with random bytes; helper produces a back-filled v0 for bob; bob's signing key MUST decrypt the back-fill to the same random bytes. Asserts what the user cares about: bob can read the room name. - `backfill_handles_multiple_simultaneous_new_members`: two new joiners in one rotation, both back-filled, both decrypt the same random v0. - `backfill_dedups_against_state_for_continuing_members`: alice has v0+v1 in state; helper only emits v2 for her, no duplicates. - `backfill_handles_readmitted_member`: alice was removed and her entries cleaned up; on readmission she gets a full back-fill. 24/24 chat-delegate tests pass (was 22). ## Deploy footprint unchanged from first iteration * Delegate WASM changes: new b3sum `6229c2a136b30a7868c07099efdea86bee260734339d872aa1e68177733b5fdd`. * `room_contract.wasm` byte-identical to #242 — no room migration ceremony, no riverctl republish. * V17 description text corrected (was "#243", now "#245"). [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrite addressing all 5 reviewsForce-push of Critical issues fixed1. v0 mismatch (skeptical, big-picture, codex) — the original fix back-filled by re-deriving The rewrite recovers prior secrets by ECIES-decrypting the owner's existing New 2. Cache-based dedup (codex) — the original used the delegate-local member-set cache to decide continuing-vs-newly-joined. On a fresh delegate / restart / webapp reinstall, that cache is missing → every member treated as newly-joined → duplicate The rewrite derives the decision directly from 3. Tautological test (testing, big-picture) — the original test asserted byte-identity against Edge-case tests added
24/24 chat-delegate tests pass (was 22). All four new tests do real end-to-end decryption with the recipient's signing key, not byte-identity self-checks. Minor
Not in this PR
[AI-assisted - Claude] |
…#248) * fix(ui): leave room sticks across reloads via removed_rooms tombstone Fixes #247 (Ivvor's report 2026-05-14 17:03 — "every time i leave the private room on the second node it's coming back after a refresh a little while later"). ## Root cause `Confirm Leave` in `ui/src/components/room_list/edit_room_modal.rs:283` did: ROOMS.write().map.remove(&room_vk); // … save_rooms_to_delegate() On the next page load the UI re-fetches `rooms_data` from the current delegate (correctly without the room) AND `fire_legacy_migration_request()` probes all 17 entries in `legacy_delegates.toml`. Some legacy delegate's on-disk `rooms_data` still contained the room — that data was never purged when the delegate was superseded. `Rooms::merge` is purely additive (`ui/src/room_data.rs:881`), so the room came back. ## Fix Add a `removed_rooms: HashSet<VerifyingKey>` tombstone set to the serialised `Rooms` struct (with `#[serde(default)]` for backward compatibility). `Rooms::merge` now: 1. Takes the union of `removed_rooms` from both sides first. 2. Drops any entry from `self.map` whose key is in the union (defensive — should not happen if leave is the only mutation path, but enforced so the invariant holds). 3. Filters incoming entries from `other.map` against the union. New `Rooms::leave_room(vk)` helper does both `map.remove(vk)` AND `removed_rooms.insert(vk)` atomically. The Confirm-Leave button calls this. Invitation acceptance (`get_response.rs` after PUT) clears the tombstone with `rooms.removed_rooms.remove(&owner_vk)` so explicit rejoins work. ## Tests `leave_room_tombstone_prevents_merge_re_adding` — full scenario from the report: room in map → `leave_room` → legacy merge with same room present → must not re-add. Asserts both `map` and `removed_rooms` invariants. `merge_unions_removed_rooms_tombstones` — if either side has the tombstone, it's honoured; map entries on the receiver are dropped. 14/14 UI `room_data::tests` pass (was 12). ## Deploy footprint UI-only change. `chat_delegate.wasm` and `room_contract.wasm` are BYTE-IDENTICAL to #245. No migration entry needed. No riverctl republish. Deploy is `cargo make publish-river`. ## Recovery for currently-stuck users After this lands and users reload, the next time they leave a room the tombstone is recorded. Their previous leave attempts (without tombstone) won't survive — they'll need to leave once more after the upgrade. The "rejoining via invitation clears the tombstone" path means this isn't a one-way door — explicit rejoin still works. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup: filter subscription/CURRENT_ROOM by tombstone; close re-add paths Review findings on PR #248: * **Skeptical H1**: response_handler.rs collected room keys and owned- rooms-for-subscription from `loaded_rooms.map` BEFORE the merge filter ran, then subscribed/synced every key. That re-introduced tombstoned rooms via the subscribe + SYNC_INFO path. Now filtered against a union of incoming and current `removed_rooms`. * **Skeptical H2**: `current_room_key` restoration didn't check the tombstone. If the user left the room that was their current room, reload restored a CURRENT_ROOM pointing at a nonexistent map entry. Now skipped when tombstoned. * **Code-first**: members.rs:580 (room import via export blob) inserted into ROOMS without clearing the tombstone. If a user imported an export of a room they left, the room came back briefly but the next merge dropped it again. Now clears the tombstone on import, matching get_response.rs's invitation-accept path. * **Code-first / serde round-trip**: added two tests — `rooms_deserialises_pre_247_blob_with_default_removed_rooms` pins the `#[serde(default)]` backward-compat behaviour for blobs serialised before this PR; `rooms_round_trip_preserves_removed_rooms_tombstone` pins the write-side wire format. Replaces the original ad-hoc test that depended on serde_json (which isn't a dep here). * **Code-first / explicit rejoin test**: added `cleared_tombstone_allows_merge_to_re_add` to lock in the invitation-accept / import-identity rejoin semantics. * **Doc comment**: corrected the rustdoc on `removed_rooms` — it was documented as "monotonically grows" but the code clears entries on explicit rejoin. Now accurate, with notes on the multi-device divergence behaviour. 17/17 UI `room_data::tests` pass (was 14). Deferred follow-ups (not in this PR, captured in the issue tracker): * Codex P2: stale tombstone re-applied after rejoin in a divergent multi-device scenario. Requires timestamped tombstones or a rejoin-counter; out of scope for tonight's fix. * Skeptical M1: removed_rooms GC. Set is small in practice (one entry per leave); a future TTL/eviction pass can land separately. * Skeptical M2: cross-version device divergence (old build drops removed_rooms on save). Mitigated by `#[serde(default)]` and a "leave once more after the upgrade" recovery note in the PR body. * Skeptical M4: migrated_rooms not filtered. Minor. * Skeptical L1: PartialEq ignores removed_rooms. Latent footgun. * Skeptical L3: stale `.syncthing.*.tmp` files in the worktree. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup: address #249 quick-wins inline (migrated_rooms, PartialEq, gitignore) Per the "don't accumulate follow-ups while fixing bugs" principle — folding the small mechanical items from #249 into this PR rather than deferring: * **#2 / Skeptical M4**: `Rooms::leave_room` now also retains `migrated_rooms` to drop any pending upgrade-pointer entry for the left room. Previously, if a room had been recorded for upgrade and the user then left, the upgrade pointer would still be sent. * **#3 / Skeptical L1**: `PartialEq for Rooms` now compares `removed_rooms` too. Two `Rooms` differing only in tombstones used to compare equal — a latent footgun if any code uses equality to short-circuit a save. * **#6 / Skeptical L3**: `.gitignore` excludes `**/.syncthing.*.tmp` so syncthing's in-flight transfer markers don't show up as untracked files. Three items remain in #249, all genuinely deferred: * **#1 Timestamped tombstones (Codex P2)**: real but requires custom serde for backward compat with the HashSet we're shipping. Scenario is rare (River doesn't have automatic multi-device sync; single delegate writer) and user-recoverable (leave again). Not paid for. * **#4 `removed_rooms` GC/TTL**: hypothetical for heavy users; defer until there's evidence. * **#5 Cross-version divergence**: mostly moots itself as users upgrade. Schema-versioning is an architectural change for later. 17/17 `room_data::tests` still pass. [AI-assisted - Claude] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug report
Ivvor on production River post-#242 deploy (2026-05-14):
Root cause
When a new member joins a private room, the owner's chat-delegate rotates the secret to
current_version + 1and emitsencrypted_secretsonly at the new version. Every prior version is missing for the newly-joined member, so they cannot decrypt the room name (sealed at v0 at room creation), the owner's nickname (sealed at v0 when set), or any messages sent before they joined.The owner has v0 from room creation (UI emits owner's self-encrypted secret then) plus v_n from each rotation, so the owner sees everything. The bug is asymmetric: only the newly-joined member's view is broken.
Fix
Extracted the rotation per-member loop into a pure helper
build_rotation_encrypted_secretsand taught it the distinction:previous_members, plus the owner): only emitnew_version. They already have older entries from the rotation that admitted them.current_membersbut NOT inprevious_members): emit every version[0..=new_version]. The older secrets are re-derived deterministically from the owner's signing seed viaderive_room_secret(seed, owner_vk, v), so the back-filled blobs are byte-identical to whatever the version was originally produced as.The owner is always continuing — at v0 the owner emits their own
encrypted_secretfrom the UI; on every subsequent rotation this helper emits one fresh entry at the new version. So the owner never needs back-fill.Tests
Two new regression tests:
rotation_backfills_prior_versions_for_newly_joined_members— owner + alice (continuing) + bob (newly joining); rotation to v2 must emit exactly{owner@v2, alice@v2, bob@v0, bob@v1, bob@v2}. Verifies signatures and byte-identity on bob's back-filled v0 blob.rotation_backfills_for_all_members_on_first_rotation— emptyprevious_members, every member newly-joined.22/22 chat-delegate tests pass (was 20).
Deploy footprint
legacy_delegates.tomlso existing users on the post-fix(delegate): remove wasm-bindgen contamination from committed WASM (#241) #242 delegate (code_hash123f732d…→ delegate_key540b2887…) transparently migrate.room_contract.wasmis byte-identical to fix(delegate): remove wasm-bindgen contamination from committed WASM (#241) #242 — no contract key change, no room migration ceremony, no riverctl republish, no gkapi/secrets/quickstart updates.cargo make publish-river.Recovery for users currently stuck
When this lands and Ivvor reloads, the owner's delegate runs the rotation pipeline again on the next state notification (any member-set change, or even just a reload causing the cached
member_setto be re-evaluated), emits the back-filled secrets, and Ivvor's UI decrypts history correctly.For private rooms where membership has stabilised, the rotation pipeline won't naturally re-fire (member set unchanged → no rotation). The owner can trigger a manual rotation by editing the room config or banning + unbanning a member. Both paths go through UI-side
rotate_secretwhich today still does the "new version only" thing — fixing that path too is an obvious follow-up.Test plan
cargo make publish-riverto productionCloses the bug Ivvor reported on top of #242. Tracking issue #243 (Phase 2 DM UI) unchanged.
[AI-assisted - Claude]