Skip to content

METASPADES exit status 21: paired read files contain unequal number of reads #890

@maartenciers

Description

@maartenciers

Description of the bug

I'm using mag 5.0.0 for the first time after successfully running it with earlier versions and I mainly wanted to make use of the --fastp_trim_polyg option to see if it could improve earlier co-assemblies of my Illumina NovaSeq paired-end data. Everything went well until the assembling step as I kept on getting:

[de/077bc1] NOTE: Process `NFCORE_MAG:MAG:ASSEMBLY:SHORTREAD_ASSEMBLY:METASPADES (group-2)` terminated with an error exit status (21) -- Execution is retried (1)

I looked in the .command.log and .spades.log and found:

13:41:50.984   129G / 172G  ERROR   General                 (hammer_tools.cpp          : 194)   Pair of read files "/.../nf-core_mag/mag_5.0.0_Run1_22_10_2025/_scratch_/de/077bc1c4526da920b3f45c29a9f3dd/group-2_1.merged.fastq.gz" and "/.../nf-core_mag/mag_5.0.0_Run1_22_10_2025/_scratch_/de/077bc1c4526da920b3f45c29a9f3dd/group-2_2.merged.fastq.gz" contain unequal amount of reads

When inspecting the POOL_SHORT_READS step I noticed a big mismatch in file size of the merged fastq files and inspected those further with seqkit stats to confirm the issue

seqkit stats group-2_1.merged.fastq.gz group-2_2.merged.fastq.gz 
processed files:  2 / 2 [======================================] ETA: 0s. done
file                       format  type     num_seqs         sum_len  min_len  avg_len  max_len
group-2_1.merged.fastq.gz  FASTQ   DNA   119,559,197  17,878,497,150       15    149.5      151
group-2_2.merged.fastq.gz  FASTQ   DNA   101,589,939  15,161,387,675       15    149.2      151

This issue is true for all my merged fastq files so I'm unable to run SPADES at the moment. I do not think this is supposed to happen and would like to know if you know a fix for this issue as I would assume paired end date should drop the read if the pair is missing or was removed in trimming step? Thank you for your help!

Command used and terminal output

# Nextflow command
nextflow run nf-core/mag \
    -resume \
    -r 5.0.0 \
    -work-dir "_scratch_" \
    -profile singularity \
    -process.executor "slurm" \
    -process.queueSize 10 \
    -process.maxForks 10 \
    -c "custom.config" \
    --input "samplesheet.csv" \
    --outdir "./results" \
    --spades_fix_cpus 32 \
    --fastp_trim_polyg \
    --host_fasta "./references/genomes/homo_sapiens/refdata-gex-GRCh38-2024-A/fasta/genome.fa" \
    --host_removal_verysensitive \
    --coassemble_group \
    --spades_downstreaminput "contigs" \
    --skip_megahit \
    --prokka_compliance_centre EBI \
    --prokka_with_compliance \
    --skip_concoct \
    --min_contig_size 1500 \
    --save_assembly_mapped_reads \
    --busco_db_lineage "auto_prok" \
    --save_busco_db \
    --busco_clean \
    --gtdb_db ./gtdbtk_r226_data.tar.gz \

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions