-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for queue in distributed training #1021
Improve support for queue in distributed training #1021
Conversation
Looks good! I will update the branch, commit some minor edits and merge. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1021 +/- ##
==========================================
- Coverage 86.48% 86.47% -0.01%
==========================================
Files 91 91
Lines 5770 5774 +4
==========================================
+ Hits 4990 4993 +3
- Misses 780 781 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! looking good. Just left a question for you.
Thanks for your contribution, @hsyang1222! @all-contributors please add @hsyang1222 for code |
I couldn't determine any contributions to add, did you specify any contributions? I've put up a pull request to add @hsyang1222! 🎉 |
Fixes #1013.
Description
Fixed an issue where torch DistributedSampler and torchio Queue inefficiently when using DistributedDataParallel.
The conventional method uses creating patches for all subjects in the queue of all processes, then using only some and just throwing away the rest.
The proposed method is to first divide the subject by process in all processes, then create a patch for the divided subject, and use it all.
Checklist
CONTRIBUTING
docs and have a developer setup (especially important arepre-commit
andpytest
)pytest
make html
inside thedocs/
folder