egraphs: don't let rematerialization override LICM. #7306
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This reworks the way that remat and LICM interact during aegraph elaboration. In principle, both happen during the same single-pass "code placement" algorithm: we decide where to place pure instructions (those that are eligible for movement), and remat pushes them one way while LICM pushes them the other.
The interaction is a little more subtle than simple heuristic priority, though -- it's really a decision ordering issue. A remat'd value wants to sink as deep into the loop nest as it can (to the use's block), but we don't know where the uses go until we process them (and make LICM-related choices), and we process uses after defs during elaboration. Or more precisely, we have some work at the use before recursively processing the def, and some work after the recursion returns; and the LICM decision happens after recursion returns, because LICM wants to know where the defs are to know how high we can hoist. (The recursion is itself unrolled into a state machine on an explicit stack so that's a little hard to see but that's what is happening in principle.)
The solution here is to make remat a separate just-in-time thing, once we have arg values. Just before we plug the final arg values into the elaborated instruction, we ask: is this a remat'd value, and if so, do we have a copy of the computation in this block yet. If not, we make one. This has to happen in two places (the main elab loop and the toplevel driver from the skeleton).
The one downside of this solution is that it doesn't handle recursive rematerialization by default. This means that if we, for example, decide to remat single-constant-arg adds (as we actually do in our current rules), we won't then also recursively remat the constant arg to those adds. This can be seen in the
licm.cliftest case. This doesn't seem to be a dealbreaker to me because most such cases will be able to fold the constants anyway (they happen mostly because of pointer pre-computations: a loop over structs in Wasm computes heap_base + p + offset, and naive LICM pulls aheap_base + offsetout of the loop for every struct field accessed in the loop, with horrible register pressure resulting; that's why we have that remat rule. Most such offsets are pretty small.).Fixes #7283.