-
Notifications
You must be signed in to change notification settings - Fork 498
Refactor postvariantcalling. Split out valrociraptor vs other options #2043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor postvariantcalling. Split out valrociraptor vs other options #2043
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.3.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
maxulysse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good
|
Waiting for conlcusion on discussion here before updating all the checksums: https://nfcore.slack.com/archives/C05V9FRJYMV/p1761906884846409?thread_ts=1761564791.923049&cid=C05V9FRJYMV |
maxulysse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
|
Merging with failing sentieon tests only |
Based on #2043 This PR adds optional VCF filtering functionality to the post-variant calling workflow, enabling users to filter variant calls from all variant callers using `bcftools view`. Additionally, it refactors the post-variant calling logic to handle implicit index creation and improve handling of variant callers that produce multiple outputs. **Why These Changes** 1. Streamlines the workflow by providing PASS-filtered variants directly 2. Maintains flexibility through customizable filtering criteria 3. Integrates seamlessly with existing normalization and concatenation steps 4. Improves handling of variant callers with complex output structures (e.g., Strelka, Manta) **New Filtering Feature** - Adds --filter_vcfs parameter to enable optional VCF filtering with `bcftools view` with default PASS filter - Supports custom filtering criteria through `--bcftools_filter_criteria` parameter - Publishes filtered VCFs to variant_calling/filtered/<sample>/ <sample>.<variantcaller>.bcftools_filtered.vcf.gz **Workflow Refactoring** - Makes index computation implicit for all bcftools operations, simplifying channel handling - Updates emit channel structure to properly account for implicit index creation (.tbi/.csi) - Uses basename consistently to handle variant callers that produce multiple outputs - Fixes channel wiring between filtering, normalization, and concatenation steps - Removes unnecessary explicit tabix indexing process - Fix output structure in docs **Testing & CI** - Adds filtering tests for multiple variant callers - Enables filtering on full-size test profiles - Fixes FreeBayes filtered output publishing - Updates snapshots to reflect new output structure **Documentation** - Updates docs/output.md with new filtering section - Updates subway map diagrams to reflect new workflow <!-- # nf-core/sarek pull request Many thanks for contributing to nf-core/sarek! Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release. Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md) --> ## PR checklist - [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/sarek _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. - [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir <OUTDIR>`). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir <OUTDIR>`). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. - [ ] `CHANGELOG.md` is updated. - [ ] `README.md` is updated (including new tool citations and authors/contributors).
Previously, the post-variant calling logic allowed mixing different post-processing strategies (varlociraptor,
normalization, concatenation) in confusing ways. The main workflow had complex conditional logic to determine which VCFs was used based on what post-processing was requested, a lot of computation was repeated on both raw and post-processed variants. This made the data flow hard to follow and error-prone.
Changes
Enforced either-or logic in post-variant calling:
Simplified main workflow:
Additional improvements:
Resume limitations with varlociraptor
While the overall data flow is cleaner, varlociraptor is currently not reliable resuming. I marked the likely location where this occurs but it is currently not clear to me why.