-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Summary
Large River harness runs occasionally fail during riverctl invite accept (and the web UI) because the subscribe request exhausts candidates before any peer has finished caching the freshly created room contract. Even after inserting a 2 s FREENET_TEST_PRE_SUBSCRIBE_DELAY_SECS sleep, we still see errors such as Ran out of, or haven't found any, caching peers for contract FTAY…/5eB6…. The failing transaction id changes, but the pattern is the same: the op records early rejections in its skip list, caches become available a couple seconds later, and a brand-new subscribe would have succeeded.
Proposal
Add client-side retry/backoff logic in both the River CLI (riverctl) and the web interface (ui/). If a subscribe fails with "Ran out of caching peers", reissue the request using a limited exponential backoff (e.g. wait 0 s, 2 s, 4 s, 8 s, 16 s, then give up). Each retry gets a fresh transaction id and empty skip list, so once caches are available the next attempt can succeed. We can keep the pre-subscribe delay as a first line of defense but shouldn’t rely on it exclusively.
References
/tmp/river_pr2067_large_delay.log: peer1 log around18:31:02shows tx01K9JBRT4K5V5F98CVKXMDHH83failing for contract5eB64pw6F…~2 s after the delay expired./home/ian/code/tmp/freenet-test-networks/20251108-174544/peer1/peer.log:4491captures the earlier FTAY failure before the delay was in place.