fix: sort BAM/BAI inputs to Manta for deterministic sample order#815
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
4da6289 to
490b687
Compare
| ch_bam.map{ _meta, bam -> bam } | ||
| .collect(sort: { a, b -> a.getName() <=> b.getName() }) | ||
| .toList() |
There was a problem hiding this comment.
Nice! Good that you found this issue. Previously I've sorted the inputs within the module for these kinds of issues (e.g. https://github.com/nf-core/modules/blob/5d4a8ab3aeed5b759ca0a38b57b6153e9b55deab/modules/nf-core/bcftools/merge/main.nf#L26). But maybe this works since it's converted to a list before it's combined into manta_input? @ramprasadn, what's the reason for having .toList() here in the first place?
There was a problem hiding this comment.
Without toList(), the combine operation downstream doesn't work. Instead of [meta, [bam1, bam2], [bai1, bai2]] you get [meta, bam1, bam2, bai1, bai2]
ramprasadn
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the fix @benstory
Closes #814
Description of changes
In
subworkflows/local/call_sv_manta/main.nf, the BAM and BAI channels were being collected with.collect{ _meta, bam -> bam }, which gathers the files into a list but doesn't guarantee a stable ordering. Since Manta emits sample columns in the order BAMs are provided on the command line, this caused non-deterministic sample column ordering in the output VCF across otherwise identical runs.This PR sorts the collected BAM/BAI files by filename before passing them to Manta, using
.collect(sort: { a, b -> a.getName() <=> b.getName() }). Consistent with other tools in the pipeline (see #814 for breakdown).Two files changed:
subworkflows/local/call_sv_manta/main.nf— the actual fixCHANGELOG.md— entry under### FixedNo functional change to variant calling itself; only the ordering of inputs.
Notes for reviewers
dev, let me know and I'll open a second PR.PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile test_singleton,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).