Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for queue in distributed training #1021

Merged
merged 6 commits into from
Mar 1, 2023

Conversation

hsyang1222
Copy link
Contributor

@hsyang1222 hsyang1222 commented Feb 2, 2023

Fixes #1013.

Description
Fixed an issue where torch DistributedSampler and torchio Queue inefficiently when using DistributedDataParallel.

The conventional method uses creating patches for all subjects in the queue of all processes, then using only some and just throwing away the rest.
The proposed method is to first divide the subject by process in all processes, then create a patch for the divided subject, and use it all.

Checklist

  • I have read the CONTRIBUTING docs and have a developer setup (especially important are pre-commitand pytest)
  • Non-breaking change (would not break existing functionality)
  • Breaking change (would cause existing functionality to change)
  • Tests added or modified to cover the changes
  • Integration tests passed locally by running pytest
  • In-line docstrings updated
  • Documentation updated, tested running make html inside the docs/ folder
  • This pull request is ready to be reviewed
  • If the PR is ready and there are multiple commits, I have squashed them and force-pushed

@fepegar
Copy link
Owner

fepegar commented Feb 28, 2023

Looks good! I will update the branch, commit some minor edits and merge.

@fepegar fepegar changed the title 1013-Queue-support-DDPsampler Improve support for queue in distributed training Feb 28, 2023
@codecov
Copy link

codecov bot commented Feb 28, 2023

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.47%. Comparing base (772cf85) to head (ef0edcc).
Report is 105 commits behind head on main.

Files with missing lines Patch % Lines
src/torchio/data/queue.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1021      +/-   ##
==========================================
- Coverage   86.48%   86.47%   -0.01%     
==========================================
  Files          91       91              
  Lines        5770     5774       +4     
==========================================
+ Hits         4990     4993       +3     
- Misses        780      781       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Owner

@fepegar fepegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! looking good. Just left a question for you.

src/torchio/data/queue.py Show resolved Hide resolved
@fepegar fepegar merged commit 39e190f into fepegar:main Mar 1, 2023
@fepegar
Copy link
Owner

fepegar commented Mar 1, 2023

Thanks for your contribution, @hsyang1222!

@all-contributors please add @hsyang1222 for code

@allcontributors
Copy link
Contributor

@fepegar

I couldn't determine any contributions to add, did you specify any contributions?
Please make sure to use valid contribution names.

I've put up a pull request to add @hsyang1222! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal of a new way for DDP, Distributed Sampler, and Queue to work
2 participants