Changes to UMI processing #1937

SPPearce · 2025-07-09T11:15:14Z

Big set of changes to UMI processing.
Currently the only UMI processing is through generating consensus reads using fgbio, which is inefficient for whole genome sequencing.
Often the UMI might have already been extracted into the read header (such as through bclconvert) or maybe in the bam file already.

Introduces:

fastp for UMI extraction from reads (R1/R2/both or index reads in fastq header) [test added]
movement of UMIs from read name to RX: tag on bam/cram files, allowing MarkDuplicates and Sentieon to get at them. This is required if extracted using fastp, or if they were provided originally in that way. [test added for MarkDuplicates]
If using fgbio consensus, then this will now support UMIs in read names.
sentieon consensus mode (technically doesn't require UMIs, but while I was fiddling with the config...)
Ensures that fgbio merges together the lanes prior to consensus generation (fixes MarkDuplicates step fails with UMIs from read names and 4 lanes #802). I also swapped to samtools rather than samblaster to do the merge and sort, as is done in the fastquorum pipeline; this removes samblaster entirely (I couldn't get samblaster to merge the multiple files in one go).
Uses Fulcrum Genomics plugin to validate the read structures, see feat: validate read structures with the nf-fgbio plugin fastquorum#123
If using sentieon dedup from bam/cram files without first aligning, the UMIs will only be used if one of the two parameters above are set or umi_tag parameter is set.
MarkDuplicates will automatically use the RX tag if it is present in the bam files, but sentieon does not.

Note that long-term I think we should we just be sending people to fastquorum if they want consensus reads rather than implementing in sarek.

nf-core-bot · 2025-07-09T11:15:48Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

github-actions · 2025-07-09T11:18:52Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit ba0bcec

+| ✅ 225 tests passed       |+
#| ❔  12 tests were ignored |#
!| ❗   7 tests had warnings |!

Details

❗ Test warnings:

pipeline_todos - TODO string in base.config: Check the defaults for all processes
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
schema_lint - Input mimetype is missing or empty
schema_description - No description provided in schema for parameter: markduplicates_pixel_distance
schema_description - No description provided in schema for parameter: gatk_pcr_indel_model

❔ Tests ignored:

files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/ci.yml
files_exist - File is ignored: conf/modules.config
nf_test_content - nf_test_content
files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/sarek/sarek/.github/workflows/awstest.yml
template_strings - template_strings
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/nf-test.yml
files_exist - File found: .github/actions/get-shards/action.yml
files_exist - File found: .github/actions/nf-test/action.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-sarek_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-sarek_logo_light.png
files_exist - File found: docs/images/nf-core-sarek_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: nf-test.config
files_exist - File found: tests/default.nf.test
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: modules.json
files_exist - File found: ro-crate-metadata.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-sarek_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowSarek.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.6.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.step= mapping
nextflow_config - Config default value correct: params.split_fastq= 50000000
nextflow_config - Config default value correct: params.nucleotides_per_second= 200000
nextflow_config - Config default value correct: params.clip_r1= 0
nextflow_config - Config default value correct: params.clip_r2= 0
nextflow_config - Config default value correct: params.three_prime_clip_r1= 0
nextflow_config - Config default value correct: params.three_prime_clip_r2= 0
nextflow_config - Config default value correct: params.length_required= 15
nextflow_config - Config default value correct: params.group_by_umi_strategy= Adjacency
nextflow_config - Config default value correct: params.aligner= bwa-mem
nextflow_config - Config default value correct: params.ascat_min_base_qual= 20
nextflow_config - Config default value correct: params.ascat_min_counts= 10
nextflow_config - Config default value correct: params.ascat_min_map_qual= 35
nextflow_config - Config default value correct: params.cf_coeff= 0.05
nextflow_config - Config default value correct: params.cf_contamination= 0
nextflow_config - Config default value correct: params.cf_minqual= 0
nextflow_config - Config default value correct: params.cf_mincov= 0
nextflow_config - Config default value correct: params.cf_ploidy= 2
nextflow_config - Config default value correct: params.freebayes_filter= 30
nextflow_config - Config default value correct: params.sentieon_haplotyper_emit_mode= variant
nextflow_config - Config default value correct: params.sentieon_dnascope_emit_mode= variant
nextflow_config - Config default value correct: params.sentieon_dnascope_pcr_indel_model= CONSERVATIVE
nextflow_config - Config default value correct: params.gatk_pcr_indel_model= CONSERVATIVE
nextflow_config - Config default value correct: params.dbnsfp_fields= rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
nextflow_config - Config default value correct: params.vep_custom_args= --everything --filter_common --per_gene --total_length --offline --format vcf
nextflow_config - Config default value correct: params.vep_version= 111.0-0
nextflow_config - Config default value correct: params.vep_out_format= vcf
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.genome= GATK.GRCh38
nextflow_config - Config default value correct: params.snpeff_cache= s3://annotation-cache/snpeff_cache/
nextflow_config - Config default value correct: params.vep_cache= s3://annotation-cache/vep_cache/
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.test_data_base= https://raw.githubusercontent.com/nf-core/test-datasets/sarek3
nextflow_config - Config default value correct: params.seq_platform= ILLUMINA
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
actions_nf_test - '.github/workflows/nf-test.yml' is triggered on expected events
actions_nf_test - '.github/workflows/nf-test.yml' checks minimum NF version
readme - README Nextflow minimum version badge matched config. Badge: 24.10.5, Config: 24.10.5
readme - README nf-core template version badge found.
readme - README Zenodo placeholder was replaced with DOI.
pipeline_if_empty_null - No ifEmpty(null) strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: fix_linting.yml
actions_schema_validation - Workflow validation passed: nf-test-gpu.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ncbench.yml
actions_schema_validation - Workflow validation passed: cloudtest.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: template-version-comment.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: nf-test.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - UNZIP found in conf/base.config and Nextflow scripts.
base_config - FASTQC found in conf/base.config and Nextflow scripts.
base_config - FASTP found in conf/base.config and Nextflow scripts.
base_config - BWAMEM1_MEM found in conf/base.config and Nextflow scripts.
base_config - CNVKIT_BATCH found in conf/base.config and Nextflow scripts.
base_config - GATK4_MARKDUPLICATES found in conf/base.config and Nextflow scripts.
base_config - GATK4_APPLYBQSR found in conf/base.config and Nextflow scripts.
base_config - MOSDEPTH found in conf/base.config and Nextflow scripts.
base_config - STRELKA found in conf/base.config and Nextflow scripts.
base_config - SAMTOOLS_CONVERT found in conf/base.config and Nextflow scripts.
base_config - GATK4_MERGEVCFS found in conf/base.config and Nextflow scripts.
base_config - MULTIQC found in conf/base.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.3.2
rocrate_readme_sync - RO-Crate descriptions are in sync with README.md.

Run details

nf-core/tools version 3.3.2
Run at 2025-09-08 12:49:36

CHANGELOG.md

Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>

CHANGELOG.md

subworkflows/local/samplesheet_to_channel/main.nf

SPPearce added 2 commits July 9, 2025 12:07

Add initial draft

80221c6

Fix schema

f8c40b0

SPPearce requested review from FriederikeHanssen and maxulysse as code owners July 9, 2025 11:15

SPPearce marked this pull request as draft July 9, 2025 11:15

SPPearce added 23 commits July 9, 2025 13:27

Put params back into FASTQTOBAM

ddba71a

Put versions into condition block

aba1516

Fix subworkflow

eac647a

Add umi_tag parameter and make MarkDuplicates work

ee3442b

Add -- to the MarkDuplicates BARCODE_TAG

f4b3c44

Remove =

4e6910c

Move copyumifromreadname

de35ed2

Move to just after initial alignment

def4d10

Update input

0909fd1

Change default name

a19b027

Fix bracket

8af7271

Fix back before merge

db6ad89

Pipe up index

afbf857

Add tests for umi extraction from read names and extracting with fastp

f32f469

Remove samblaster, update parameter checking

9cf5056

Add sentieon tests

579699e

Update sentieon dedup

005b392

Update CHANELOG, rename merge process

bebeb1d

Add updated sentieon snapshot

e510594

Merge remote-tracking branch 'origin/dev' into umi_processing

fb684d4

Add projectDir to sentieon_aligner test and prettify

ab6e264

Add projectDir and remove trailing whitespace

1caef51

Merge remote-tracking branch 'origin/dev' into umi_processing

911c2d1

maxulysse reviewed Aug 14, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

eff78f4

Co-authored-by: Maxime U Garcia <max.u.garcia@gmail.com>

SPPearce commented Aug 15, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

SPPearce added 6 commits August 15, 2025 08:19

Apply suggestions from code review

fe26e3c

Fix as only have bai when using sentieon

110f05f

Prettier

3485487

Update sentieon_dedup snapshot

0652865

Fix read structure check, remove view, update snapshots

a3d3438

Update snapshot

0527af9

nh13 reviewed Aug 28, 2025

View reviewed changes

subworkflows/local/samplesheet_to_channel/main.nf Show resolved Hide resolved

SPPearce added 2 commits August 28, 2025 10:05

Fix fgbio plugin use

ec10efd

Merge branch 'dev' into umi_processing

248bfc9

SPPearce marked this pull request as ready for review August 28, 2025 14:14

maxulysse changed the title ~~[DRAFT] Changes to UMI processing~~ Changes to UMI processing Aug 28, 2025

Merge branch 'dev' into umi_processing

ba0bcec

maxulysse approved these changes Sep 8, 2025

View reviewed changes

FriederikeHanssen approved these changes Sep 8, 2025

View reviewed changes

SPPearce merged commit 78c6e3e into dev Sep 8, 2025
102 of 107 checks passed

maxulysse deleted the umi_processing branch September 8, 2025 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to UMI processing #1937

Changes to UMI processing #1937

Uh oh!

SPPearce commented Jul 9, 2025 •

edited

Loading

Uh oh!

nf-core-bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Changes to UMI processing #1937

Changes to UMI processing #1937

Uh oh!

Conversation

SPPearce commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nf-core-bot commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SPPearce commented Jul 9, 2025 •

edited

Loading

github-actions bot commented Jul 9, 2025 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️