Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSDP] Add support for FSDPCommContext on PrivateUse1 backend #138687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shink
Copy link
Contributor

@shink shink commented Oct 23, 2024

I believe NPU supports streams, but what about other privateuse1 backends?

cc: @FFFrog @noemotiovon

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138687

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1998fe1 with merge base 72dde6e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Oct 23, 2024
@awgu
Copy link
Contributor

awgu commented Oct 23, 2024

I am ok with this change. Let me know if any of the PrivateUse1 do not support streams and have a comment about this.

@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 23, 2024
@@ -51,7 +51,7 @@ class FSDPCommContext:

def lazy_init(self, device: torch.device):
self.device_handle = _get_device_handle(device.type)
if device.type not in ["cuda", "hpu"]:
if device.type not in ["cuda", "hpu", torch._C._get_privateuse1_backend_name()]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an API to detect if the given device supports streams? @albanD @FFFrog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD @FFFrog thoughts?

@kwen2501 kwen2501 requested review from awgu and removed request for wanchaol October 29, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (fsdp) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants