-
Notifications
You must be signed in to change notification settings - Fork 22.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Partitioner] Speed up the update of partition map #136616
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136616
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c4e038b with merge base 72dde6e (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@tarun292 can you take a look? Thanks! |
Adding some background info here. In one of our use cases, we have observed pass_propose_partitions took more than 1 minute. We have proposed totally 4 PRs(#136608, #136616, #136614, #136598) to optimize it and they achieved up to 11x speedup in our test. @tarun292 @SherlockNoMad appreciate if you can take a look if these changes make sense. This will be very useful to unblock our use case. |
@ezyang , @SherlockNoMad , can you please take a look at this change? |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
029b2bc
to
e9c3810
Compare
# Iterate through all the users of this node and update the partition map to indicate | ||
# that there is a path from the partition id of this node to the target partition id. | ||
for user_node in node.users: | ||
target_id = assignment.get(user_node, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in order for this to work we're assuming that the graph is being traversed in reverse. I think maybe we should add an assert that target_id is not None
here to ensure that the user node has already been traversed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
e9c3810
to
c4e038b
Compare
We can update partition map by iterating users of node but not all of the downstream users of node. The former is faster than the latter which has many duplicate insertion.