Skip to content

Conversation

@knelli2
Copy link
Contributor

@knelli2 knelli2 commented Mar 31, 2025

Proposed changes

Also adds ExecutePhaseChange to the Worldtube CSW singleton component.

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.

Further comments

@knelli2 knelli2 requested a review from guilara March 31, 2025 18:33
@knelli2 knelli2 marked this pull request as ready for review April 3, 2025 18:29
@guilara
Copy link
Contributor

guilara commented Apr 4, 2025

Thank you @knelli2. Adding the ExecutePhaseChange action to the worldtube singleton has solved the problem.

Copy link
Contributor

@guilara guilara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on a failing run and this fix has solved the problem.

@knelli2
Copy link
Contributor Author

knelli2 commented Apr 4, 2025

There's a subtlety you should be aware of. So the reason the error happened is likely because

  1. Elements check for phase change by halting the algorithm and saying they may want to go into a different phase. This > leaves the elements in a valid state
  2. The singleton, however, doesn't know this so it just requests the elements for data and waits, leaving it in an invalid >state for switching phases
  3. When the algorithm is restarted on the singleton, we get an error because we didn't exit in a valid state.

Putting the ExecutePhaseChange action into the list makes the singleton be in a valid state when switching phases. The subtlety comes in because the trigger for the phase change is checked on both the elements and the singleton which have different time steps/slabs and such. Now the singleton happens to have the same slab size as the elements and is also doing lock-step with the WT boundary, so triggers based on slab number or time value will work for both. But if the slab sizes become different (or you don't do lock-step anymore so the times are out of sync) then you may get a deadlock/error because the elements would try to phase change, but the singleton wouldn't just like currently

Putting this here as record from a conversation with @guilara.

@nilsdeppe nilsdeppe merged commit 4f5a4e5 into sxs-collaboration:develop Apr 11, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants