Skip to content

Add pydamage results to binqc table#835

Closed
jfy133 wants to merge 31 commits into
devfrom
merge-binqc-pydamage
Closed

Add pydamage results to binqc table#835
jfy133 wants to merge 31 commits into
devfrom
merge-binqc-pydamage

Conversation

@jfy133

@jfy133 jfy133 commented Jun 22, 2025

Copy link
Copy Markdown
Member

See code TODOs

To close #833

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 linked an issue Jun 22, 2025 that may be closed by this pull request
@github-actions

github-actions Bot commented Jun 22, 2025

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit db7f618

+| ✅ 384 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   6 tests had warnings |!
Details

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 5.3.0
  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2025-12-03 13:11:21

@nf-core-bot

nf-core-bot commented Sep 5, 2025

Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.4.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@jfy133

jfy133 commented Sep 12, 2025

Copy link
Copy Markdown
Member Author

@nf-core-bot fix linting

@jfy133

jfy133 commented Sep 12, 2025

Copy link
Copy Markdown
Member Author

TODO:

  • Veirfy the reordering of contigs is correct
  • Update output.md
  • Update/fix snapshots

Comment thread modules/nf-core/gunc/run/main.nf Outdated
@jfy133

jfy133 commented Nov 18, 2025

Copy link
Copy Markdown
Member Author

Update:

  • My original tests were too simplistic,
  • ✅ I realised the join wasn't working properly because we can have one contig in multiple bins, so in the last PR I switched to a combine instead
  • TODO: I will need to 'clean' the keys before the join to drop anything after the first as MetaBinner keeps the MEGAHIT headers but the other binners appear to drop them (MetaBAT2, MetaBinner), prventing joining
  • TODO: the current test seems to run through without executing the pydamage bin summaries so I will need to continue investgating (I think due to a join failure in the creation of ch_pydamage_to_bins in pydamage_bins

@jfy133

jfy133 commented Nov 21, 2025

Copy link
Copy Markdown
Member Author

TODO

  • Fixed the issues from before, and now can produce the reordered and a summary file, however I'm getting variable numbers of rows in the pydamage summary and number of reordered files per -resume run (it appears to be two different values that it flip flops between). To investigate:
    • Check contents of each reordered file before summarizing
    • Set up sorting of all relevant channels
    • I notice SUMMARISE_PYDAMAGE doesn't get cached again implying it is not recieveing the same number of rows each time
    • I should try an nf-consule reprex
  • Triple check manually that the median summaries make sense by manually getting the rows from original pydamage
    results and manually getting contig IDs from bins and 'manually' calculate median pydamage results
    • I am wondering if there is some consuming during combine (it's not doing a true all-by-all combine, but once one key is used it discards the rest?)
  • Test with CONCOCT
  • Add to a relevant test and update snapshot

@jfy133

jfy133 commented Nov 21, 2025

Copy link
Copy Markdown
Member Author

@nf-core-bot fix linting

@jfy133

jfy133 commented Nov 26, 2025

Copy link
Copy Markdown
Member Author

I realise now I have made this overly complicated, I think I can basically have a single local module with a custom script that does the reordering (as we actually rename the pydamage output files to make them unique)... so will start again

@jfy133

jfy133 commented Dec 3, 2025

Copy link
Copy Markdown
Member Author

Latest status: nextflow code is mostly working, now trying to get the summarise_pydamage.py script working.

Last time however I found there was some discreprencey, where there were a lot fo bins missing in one of the files and I'm not sure why

@jfy133 jfy133 closed this Jan 7, 2026
@jfy133 jfy133 deleted the merge-binqc-pydamage branch May 19, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ancient DNA mode: Add pyDamage results to bin_summary.tsv

3 participants