Skip to content

Conversation

@antonysigma
Copy link
Contributor

After committing gpu_tiles, reorder all axes such that the for-loops are inside all gpu_blocks.

Also limit the gpu_blocks count to no more than 3.

Refer to: #8640 for the list of pending actions to robustify the Mullapudi2016's experimental GPU schedules.

@antonysigma antonysigma force-pushed the mullapudi-gpu-reorder branch 2 times, most recently from 4a59239 to 893815c Compare June 12, 2025 15:33
After committing `gpu_tiles`, reorder all axes such that the for-loops are
inside all `gpu_blocks`.

Also limit the `gpu_blocks` count to no more than 3.
@antonysigma antonysigma force-pushed the mullapudi-gpu-reorder branch from 893815c to 0bdd227 Compare June 12, 2025 16:57
@mcourteaux
Copy link
Contributor

Avoid force pushing when it's not necessary. It's easier to follow up on your changes when you don't. GitHub know when the last time was somebody reviewed your code, and we get to see a diff since then until now. Force pushing rewrites history, and makes that feature unavailable.

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we figure out what's going on with this call?

VarOrRVar seq(seq_var, (rvars.find(seq_var) != rvars.end()));
if (arch_params.is_gpu_schedule) {
gpu_tiling.canReorder({seq, v});
gpu_tiling.can_reorder({seq, v});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is highly suspicious. The other calls set an order for every dimension. But this only reorders two vars within the existing order in the else branch below. When this is called here, it discards other dimensions that might be present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that I probably should have caught this in the last PR's review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even before the changes, the reorder calls inside the code block if (nested_parallelism) {...} looks suspicious to me. It feels like it is doing bubble sorting by abusing Halide's native API and IR. For bubble sorting to function, the Halide IR must provide the list container data structure and the std::swap function.

So, if we were to intercept the bubble sorting action by GPUTilingDedup, should I implement the class ExplicitBubbleSortingHelper?

Note that we have an issue only inside the code block if (nested_parallelism) {...}. The rest of the code calls can_reorder / require_ordering correctly as intended.

Copy link
Member

@alexreinking alexreinking Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antonysigma - I don't want to prescribe an implementation strategy. At a minimum, I want to be confident the behaviors match and don't regress, even if the actual behavior is less useful or efficient than it could be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the discussion to #8647 (comment) . Closing.

Copy link
Contributor Author

@antonysigma antonysigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we figure out what's going on with this call?

@alexreinking Sure. The code looks suspicious even before my changes. It feels like the autoscheduler is abusing the Halide IR and reorder function to perform bubble sorting.

Please refer to the inline comments for details.

Changing the PR to draft.

Avoid force pushing when it's not necessary. It's easier to follow up on your changes when you don't. GitHub know when the last time was somebody reviewed your code, and we get to see a diff since then until now. Force pushing rewrites history, and makes that feature unavailable.

@mcourteaux Ah, sorry about that. I am so used to the Gerrit-style code review process: contributors must always squash-rebase-force-push changes locally for the server to accept new changes. I will avoid force-pushing from now on.

VarOrRVar seq(seq_var, (rvars.find(seq_var) != rvars.end()));
if (arch_params.is_gpu_schedule) {
gpu_tiling.canReorder({seq, v});
gpu_tiling.can_reorder({seq, v});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even before the changes, the reorder calls inside the code block if (nested_parallelism) {...} looks suspicious to me. It feels like it is doing bubble sorting by abusing Halide's native API and IR. For bubble sorting to function, the Halide IR must provide the list container data structure and the std::swap function.

So, if we were to intercept the bubble sorting action by GPUTilingDedup, should I implement the class ExplicitBubbleSortingHelper?

Note that we have an issue only inside the code block if (nested_parallelism) {...}. The rest of the code calls can_reorder / require_ordering correctly as intended.

@antonysigma antonysigma marked this pull request as draft June 12, 2025 20:01
1. Rename `GPUTilingDedup::can_reorder` -> `reorder`.

2. Write a new function `GPUTilingDedup::ensure_ordering` to implement
the bubble sorting helper function.
@antonysigma antonysigma force-pushed the mullapudi-gpu-reorder branch from 51843d7 to 6aad2c8 Compare June 13, 2025 02:22
Copy link
Contributor Author

@antonysigma antonysigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick feedback regarding the nested_parallelism algorithm behavior. I implemented a new deferred reorder function. Please refer to the inline comments for details.

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much, much better, thank you! Just a couple outstanding questions and we can get this in.

Comment on lines +1324 to +1327
// The nested parallelism implements a bubble sorting algorithm, which
// ensures the inner and outer variables are adjacent to each other.
// Assert the requirement here.
internal_assert(std::abs(std::distance(inner_iter, outer_iter)) == 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand fully why this should be true from the code. I think it would be sufficient simply to swap the two dims in place if they're out of order (equivalent to just dropping the assertion).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to make sure that they are adjacent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you all reached the consensus. I will revise the code accordingly. Again, I prefer small scale and frequent PRs to move us forward.

Copy link
Contributor Author

@antonysigma antonysigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly asking for another round of code review. Please refer to the inline comments.

Comment on lines +1324 to +1327
// The nested parallelism implements a bubble sorting algorithm, which
// ensures the inner and outer variables are adjacent to each other.
// Assert the requirement here.
internal_assert(std::abs(std::distance(inner_iter, outer_iter)) == 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you all reached the consensus. I will revise the code accordingly. Again, I prefer small scale and frequent PRs to move us forward.

Copy link
Member

@alexreinking alexreinking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me pending green.

@alexreinking alexreinking merged commit 68d6fe0 into halide:main Jun 13, 2025
15 checks passed
@antonysigma antonysigma deleted the mullapudi-gpu-reorder branch June 13, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants