Skip to content

Handle River subscribe retries when caches aren't ready #45

@sanity

Description

@sanity

Summary

Large River harness runs occasionally fail during riverctl invite accept (and the web UI) because the subscribe request exhausts candidates before any peer has finished caching the freshly created room contract. Even after inserting a 2 s FREENET_TEST_PRE_SUBSCRIBE_DELAY_SECS sleep, we still see errors such as Ran out of, or haven't found any, caching peers for contract FTAY…/5eB6…. The failing transaction id changes, but the pattern is the same: the op records early rejections in its skip list, caches become available a couple seconds later, and a brand-new subscribe would have succeeded.

Proposal

Add client-side retry/backoff logic in both the River CLI (riverctl) and the web interface (ui/). If a subscribe fails with "Ran out of caching peers", reissue the request using a limited exponential backoff (e.g. wait 0 s, 2 s, 4 s, 8 s, 16 s, then give up). Each retry gets a fresh transaction id and empty skip list, so once caches are available the next attempt can succeed. We can keep the pre-subscribe delay as a first line of defense but shouldn’t rely on it exclusively.

References

  • /tmp/river_pr2067_large_delay.log: peer1 log around 18:31:02 shows tx 01K9JBRT4K5V5F98CVKXMDHH83 failing for contract 5eB64pw6F… ~2 s after the delay expired.
  • /home/ian/code/tmp/freenet-test-networks/20251108-174544/peer1/peer.log:4491 captures the earlier FTAY failure before the delay was in place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions