Skip to content

Conversation

@scap3yvt
Copy link
Collaborator

Fixes #829

Proposed Changes

  • added a new module under utils called data_splitter that allows a user to perform either stratified k-fold split or normal
  • using the new module in training_manager

Checklist

  • CONTRIBUTING guide has been followed.
  • PR is based on the current GaNDLF master .
  • Non-breaking change (does not break existing functionality): provide as many details as possible for any breaking change.
  • Function/class source code documentation added/updated (ensure typing is used to provide type hints, including and not limited to using Optional if a variable has a pre-defined value).
  • Code has been blacked for style consistency and linting.
  • If applicable, version information has been updated in GANDLF/version.py.
  • If adding a git submodule, add to list of exceptions for black styling in pyproject.toml file.
  • Usage documentation has been updated, if appropriate.
  • Tests added or modified to cover the changes; if coverage is reduced, please give explanation.
  • If customized dependency installation is required (i.e., a separate pip install step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].

@github-actions
Copy link
Contributor

github-actions bot commented Mar 21, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@scap3yvt scap3yvt marked this pull request as draft March 21, 2024 19:47
@scap3yvt scap3yvt requested a review from sarthakpati March 21, 2024 19:56
@codecov
Copy link

codecov bot commented Mar 23, 2024

Codecov Report

Attention: Patch coverage is 96.47887% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 95.09%. Comparing base (0be31c2) to head (b003c3c).
Report is 1 commits behind head on master.

❗ Current head b003c3c differs from pull request most recent head 1d2352b. Consider uploading reports for the commit 1d2352b to get more accurate results

Files Patch % Lines
GANDLF/training_manager.py 84.84% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #831      +/-   ##
==========================================
+ Coverage   95.01%   95.09%   +0.07%     
==========================================
  Files         120      121       +1     
  Lines        8270     8312      +42     
==========================================
+ Hits         7858     7904      +46     
+ Misses        412      408       -4     
Flag Coverage Δ
unittests 95.09% <96.47%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@scap3yvt scap3yvt marked this pull request as ready for review March 23, 2024 02:10
Copy link
Collaborator

@sarthakpati sarthakpati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another test, please.

…iontesting-csv-with-proportional-splits' into 828-feature-add-the-ability-to-split-csvs-for-trainingvalidationtesting-as-a-separate-script
@Geeks-Sid
Copy link
Collaborator

@scap3yvt tag me when it is ready to review.

@sarthakpati
Copy link
Collaborator

@scap3yvt tag me when it is ready to review.

This should be ready for review, @Geeks-Sid

…for-trainingvalidationtesting-as-a-separate-script
…for-trainingvalidationtesting-as-a-separate-script
@sarthakpati sarthakpati enabled auto-merge March 26, 2024 15:05
@sarthakpati sarthakpati merged commit 32d70d4 into mlcommons:master Mar 26, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2024
@scap3yvt scap3yvt deleted the 828-feature-add-the-ability-to-split-csvs-for-trainingvalidationtesting-as-a-separate-script branch August 14, 2025 14:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add the ability to generate training/validation/testing CSV with proportional splits

3 participants