Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Partitioner] Speed up the update of partition map #136616

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lingzhi98
Copy link
Contributor

We can update partition map by iterating users of node but not all of the downstream users of node. The former is faster than the latter which has many duplicate insertion.

Copy link

pytorch-bot bot commented Sep 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136616

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c4e038b with merge base 72dde6e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Sep 25, 2024
@lingzhi98
Copy link
Contributor Author

@tarun292 Same as #136608, can you help to review this PR? Thanks!

Skylion007
Skylion007 previously approved these changes Sep 26, 2024
@Skylion007 Skylion007 dismissed their stale review September 26, 2024 15:08

Wait for tarun292

@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 27, 2024
@lingzhi98
Copy link
Contributor Author

@tarun292 can you take a look? Thanks!

@carsonwang
Copy link

Adding some background info here. In one of our use cases, we have observed pass_propose_partitions took more than 1 minute. We have proposed totally 4 PRs(#136608, #136616, #136614, #136598) to optimize it and they achieved up to 11x speedup in our test. @tarun292 @SherlockNoMad appreciate if you can take a look if these changes make sense. This will be very useful to unblock our use case.

@bsochack
Copy link

@ezyang , @SherlockNoMad , can you please take a look at this change?
As @carsonwang described, the PR helps to improve partitioner performance a lot.

@lingzhi98
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased lingzhi/update_partition_map onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout lingzhi/update_partition_map && git pull --rebase)

# Iterate through all the users of this node and update the partition map to indicate
# that there is a path from the partition id of this node to the target partition id.
for user_node in node.users:
target_id = assignment.get(user_node, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in order for this to work we're assuming that the graph is being traversed in reverse. I think maybe we should add an assert that target_id is not None here to ensure that the user node has already been traversed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open source release notes: fx release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants