Improve support for queue in distributed training #1021

hsyang1222 · 2023-02-02T07:56:16Z

Description
Fixed an issue where torch DistributedSampler and torchio Queue inefficiently when using DistributedDataParallel.

The conventional method uses creating patches for all subjects in the queue of all processes, then using only some and just throwing away the rest.
The proposed method is to first divide the subject by process in all processes, then create a patch for the divided subject, and use it all.

Checklist

I have read the CONTRIBUTING docs and have a developer setup (especially important are pre-commitand pytest)
Non-breaking change (would not break existing functionality)
Breaking change (would cause existing functionality to change)
Tests added or modified to cover the changes
Integration tests passed locally by running pytest
In-line docstrings updated
Documentation updated, tested running make html inside the docs/ folder
This pull request is ready to be reviewed
If the PR is ready and there are multiple commits, I have squashed them and force-pushed

fepegar · 2023-02-28T21:07:21Z

Looks good! I will update the branch, commit some minor edits and merge.

codecov · 2023-02-28T23:15:43Z

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.47%. Comparing base (772cf85) to head (ef0edcc).
Report is 105 commits behind head on main.

Files with missing lines	Patch %	Lines
src/torchio/data/queue.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1021      +/-   ##
==========================================
- Coverage   86.48%   86.47%   -0.01%     
==========================================
  Files          91       91              
  Lines        5770     5774       +4     
==========================================
+ Hits         4990     4993       +3     
- Misses        780      781       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fepegar

Thanks! looking good. Just left a question for you.

src/torchio/data/queue.py

fepegar · 2023-03-01T09:56:14Z

Thanks for your contribution, @hsyang1222!

@all-contributors please add @hsyang1222 for code

allcontributors · 2023-03-01T09:56:24Z

@fepegar

I couldn't determine any contributions to add, did you specify any contributions?
Please make sure to use valid contribution names.

I've put up a pull request to add @hsyang1222! 🎉

hsyang1222 and others added 2 commits February 2, 2023 16:50

add all source

ba8b4bb

Merge branch 'main' into 1013-Queue-support-DDPsampler

9afcbe7

hsyang1222 mentioned this pull request Feb 24, 2023

Proposal of a new way for DDP, Distributed Sampler, and Queue to work #1013

Closed

Merge branch 'main' into 1013-Queue-support-DDPsampler

bbf3aa7

fepegar added 2 commits February 28, 2023 22:00

Merge branch 'main' into 1013-Queue-support-DDPsampler

67df9e0

Merge branch 'main' into 1013-Queue-support-DDPsampler

8105529

fepegar changed the title ~~1013-Queue-support-DDPsampler~~ Improve support for queue in distributed training Feb 28, 2023

Minor edits and fix mypy error

ef0edcc

fepegar reviewed Feb 28, 2023

View reviewed changes

src/torchio/data/queue.py Show resolved Hide resolved

fepegar merged commit 39e190f into fepegar:main Mar 1, 2023

allcontributors bot mentioned this pull request Mar 1, 2023

docs: add hsyang1222 as a contributor for code #1038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for queue in distributed training #1021

Improve support for queue in distributed training #1021

hsyang1222 commented Feb 2, 2023 •

edited by fepegar

Loading

fepegar commented Feb 28, 2023

codecov bot commented Feb 28, 2023 •

edited

Loading

fepegar left a comment

fepegar commented Mar 1, 2023

allcontributors bot commented Mar 1, 2023

Improve support for queue in distributed training #1021

Improve support for queue in distributed training #1021

Conversation

hsyang1222 commented Feb 2, 2023 • edited by fepegar Loading

fepegar commented Feb 28, 2023

codecov bot commented Feb 28, 2023 • edited Loading

Codecov Report

fepegar left a comment

Choose a reason for hiding this comment

fepegar commented Mar 1, 2023

allcontributors bot commented Mar 1, 2023

hsyang1222 commented Feb 2, 2023 •

edited by fepegar

Loading

codecov bot commented Feb 28, 2023 •

edited

Loading