Skip to content

Conversation

@FriederikeHanssen
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen commented Oct 30, 2025

Previously, the post-variant calling logic allowed mixing different post-processing strategies (varlociraptor,
normalization, concatenation) in confusing ways. The main workflow had complex conditional logic to determine which VCFs was used based on what post-processing was requested, a lot of computation was repeated on both raw and post-processed variants. This made the data flow hard to follow and error-prone.

Changes

Enforced either-or logic in post-variant calling:

  • If varlociraptor is requested → process through varlociraptor workflows exclusively
  • Else if concatenate_vcfs or normalize_vcfs → perform standard post-processing
  • Else → pass through original VCFs unchanged

Simplified main workflow:

  • Removed conditional VCF gathering logic from main SAREK workflow
  • POST_VARIANTCALLING now always outputs VCFs (either processed or pass-through)
  • Annotation always consumes POST_VARIANTCALLING.out.vcfs
  • VCF QC now runs exclusively on raw variant calls (before any post-processing)

Additional improvements:

  • Standardized varlociraptor scenario file handling in main.nf
  • Disabled redundant publishing of intermediate normalized files
  • Increase time out for MultiQC to ensure all plots are published
  • Varlociraptor subworkflows now emit separate vcf and tbi outputs
  • This matches the pattern used by other post-processing subworkflows
  • Added branching logic to handle single chunks without concatenation

Resume limitations with varlociraptor

While the overall data flow is cleaner, varlociraptor is currently not reliable resuming. I marked the likely location where this occurs but it is currently not clear to me why.

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@FriederikeHanssen
Copy link
Contributor Author

Waiting for conlcusion on discussion here before updating all the checksums: https://nfcore.slack.com/archives/C05V9FRJYMV/p1761906884846409?thread_ts=1761564791.923049&cid=C05V9FRJYMV

@FriederikeHanssen FriederikeHanssen mentioned this pull request Nov 2, 2025
11 tasks
@FriederikeHanssen FriederikeHanssen marked this pull request as ready for review November 3, 2025 18:34
Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@FriederikeHanssen
Copy link
Contributor Author

Merging with failing sentieon tests only

@FriederikeHanssen FriederikeHanssen merged commit f6abd79 into nf-core:dev Nov 4, 2025
29 of 41 checks passed
@FriederikeHanssen FriederikeHanssen deleted the refactor_postvc branch November 4, 2025 11:21
FriederikeHanssen added a commit that referenced this pull request Nov 4, 2025
Based on #2043 

This PR adds optional VCF filtering functionality to the post-variant
calling workflow, enabling users to filter variant calls
from all variant callers using `bcftools view`. Additionally, it
refactors the post-variant calling logic to handle implicit
index creation and improve handling of variant callers that produce
multiple outputs.

 **Why These Changes**

1. Streamlines the workflow by providing PASS-filtered variants directly
  2. Maintains flexibility through customizable filtering criteria
3. Integrates seamlessly with existing normalization and concatenation
steps
4. Improves handling of variant callers with complex output structures
(e.g., Strelka, Manta)

  **New Filtering Feature**
- Adds --filter_vcfs parameter to enable optional VCF filtering with
`bcftools view` with default PASS filter
- Supports custom filtering criteria through
`--bcftools_filter_criteria` parameter
- Publishes filtered VCFs to variant_calling/filtered/<sample>/
<sample>.<variantcaller>.bcftools_filtered.vcf.gz

  **Workflow Refactoring**
- Makes index computation implicit for all bcftools operations,
simplifying channel handling
- Updates emit channel structure to properly account for implicit index
creation (.tbi/.csi)
- Uses basename consistently to handle variant callers that produce
multiple outputs
- Fixes channel wiring between filtering, normalization, and
concatenation steps
  - Removes unnecessary explicit tabix indexing process
  - Fix output structure in docs
  
  **Testing & CI**
  - Adds filtering tests for multiple variant callers
  - Enables filtering on full-size test profiles
  - Fixes FreeBayes filtered output publishing
  - Updates snapshots to reflect new output structure
  
  **Documentation**
  - Updates docs/output.md with new filtering section
  - Updates subway map diagrams to reflect new workflow

<!--
# nf-core/sarek pull request

Many thanks for contributing to nf-core/sarek!

Please fill in the appropriate checklist below (delete whatever is not
relevant).
These are the most common things requested on pull requests (PRs).

Remember that PRs should be made against the dev branch, unless you're
preparing a pipeline release.

Learn more about contributing:
[CONTRIBUTING.md](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md)
-->

## PR checklist

- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add
tests!
- [ ] If you've added a new tool - have you followed the pipeline
conventions in the [contribution
docs](https://github.com/nf-core/sarek/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on the nf-core/sarek _branch_ on the
[nf-core/test-datasets](https://github.com/nf-core/test-datasets)
repository.
- [ ] Make sure your code lints (`nf-core pipelines lint`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker
--outdir <OUTDIR>`).
- [ ] Check for unexpected warnings in debug mode (`nextflow run .
-profile debug,test,docker --outdir <OUTDIR>`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and
authors/contributors).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants