Skip to content

Conversation

@njriasan
Copy link
Contributor

In an example kernel we saw a regression arise from change to how the SWP schedule is generated in the presence of a jagged tensor bias. Its unclear the exact cause of this change (could have been schedule changes or op backtracking that enabled picking the op), but in either case the ideal solution is to disable SWP for the bias loads.

To do this we want to modify the kernel to set latency=0, but still derive all other loads in the compiler. This PR does this with the following process:

  1. When loads are annotated by the user we only "skip" if any of these loads are non-zero.
  2. In the MMA side we omit updating anything but the self latency if the op was annotated.
  3. In the load we remove the load loadOpToIndLevel calculation if it was annotated. This will both ensure that "distance" calculations omit this load and that the load's latency is not modified.

To explain in slightly more detail the issue, the latency that is assigned is based on the longest load path. The jagged tensor being considered made the longest path 2, so now every load pipelined num_stages / 2. This lead to a regression because the other loads needed more pipelining (and the bias did not need pipelining). By setting the loads on that path to 0 you retain the original schedule where the loads are all pipelined by the full num_stages.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025
@njriasan
Copy link
Contributor Author

I've started a discussion with Thomas about allowing this upstream. I'll look into merging it once I can upstream the latency annotation information.

@njriasan njriasan changed the base branch from ws-main to main November 16, 2025 03:17
@njriasan njriasan changed the base branch from main to ws-main November 16, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant