Skip to content

ames: drop %goad if duct is missing#7320

Draft
yosoyubik wants to merge 4 commits into
developfrom
yu/check-cork-on-goad
Draft

ames: drop %goad if duct is missing#7320
yosoyubik wants to merge 4 commits into
developfrom
yu/check-cork-on-goad

Conversation

@yosoyubik

@yosoyubik yosoyubik commented Mar 31, 2026

Copy link
Copy Markdown
Collaborator

Possible fix for: https://pastebin.com/1uTHdPQq

  • fix goad/flub tests

Different flow analysis that could trigger this issue:

  1. %leave handling
  • subscriber sends a %leave

Scenario A)

  • receiver gets it, if agent is running, we ack it, remove the subscription, wait for the cork
  • sender gets the leave ack, sends %cork. we are done

------------- SUSPENSION
Scenario B)

  • receiver gets it, agent not running, we send a %flub %boon on the /gf system flow
    • backward flow gets halted
    • agent^ship^duct added to halts.state
  • %boon %flub is received, %gall %halts the flow (%flub contains the specific agent)
  • agent and duct added to flubs.state

  • agent gets revived

    • %spur boon sent
    • subscriber %ames receives it on the /gf system flow, gives to %gall
      • agent removed from halts.state
    • %gall looks at outstanding.state
    • if some entries outstanding --
      • looking at the flow this is a subscription wire
      • by elimination this would need to be a %leave
      • if the leave is outstanding, it means that we have not corked the flow (cork pleas are always sent after a %leave gets acked; if a leaves gets nacked we capture that in the outstanding queue by adding a %missing flag, then on a two minute timer we will handle every outstanding %leave that got nacked)
    • for each outstanding entry that matches the ship and the agent, we retrieve the duct, and use it to pass a %goad task to ames.
    • %ames receives the task on the duct used to send the poke
    • the duct should exist, otherwise we crash
    • the duct tells us which bone this flow belongs to, %goad then un-halts it, re-activating the flow to send and receive packets
  1. %kick handling
  • host sends %kick

  • subscription gets removed on host

  • the app gets suspended

  • kick arrives before %boon %flub

  • subscription gets removed, we send a %cork

  • %corks don't go into %gall, so all gets resolved silently, the flow is removed

  • before %kick arrives we enqueue a %leave
    A) - %leave plea gets nacked non-deterministically, we retry
    B) - %leave gets handled, and flubbed

  • %kick arrives when the %leave it's still outstanding

    • if there was anything outstanding, it won't get cleared from the queue (see 83e8e24) since ames is actively trying to send it, so gall will eventually get the ack for it
  • the host revives the agent and sends a %boon %spur.

  • the subscriber handles the %boon on the %gf flow, gives to %gall. %gall then retrieves the saved duct from state, gives to %ames. %ames has corked the flow, the duct is gone: we crashed as seen in the pastebin.

@yosoyubik

Copy link
Copy Markdown
Collaborator Author

We need to understand a bit better the root cause of this issue before merging the fix in this PR. It seems that the most likely scenario are %kicks that get automatically acked in %ames, when %leave pleas are outstanding and flubbed, but that would prevent anything to be send.

The behavior before this PR is the right one for now: we need to be able to identify the issue instead of silently no-op/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant