Description of the bug
I'm using mag 5.0.0 for the first time after successfully running it with earlier versions and I mainly wanted to make use of the --fastp_trim_polyg option to see if it could improve earlier co-assemblies of my Illumina NovaSeq paired-end data. Everything went well until the assembling step as I kept on getting:
[de/077bc1] NOTE: Process `NFCORE_MAG:MAG:ASSEMBLY:SHORTREAD_ASSEMBLY:METASPADES (group-2)` terminated with an error exit status (21) -- Execution is retried (1)
I looked in the .command.log and .spades.log and found:
13:41:50.984 129G / 172G ERROR General (hammer_tools.cpp : 194) Pair of read files "/.../nf-core_mag/mag_5.0.0_Run1_22_10_2025/_scratch_/de/077bc1c4526da920b3f45c29a9f3dd/group-2_1.merged.fastq.gz" and "/.../nf-core_mag/mag_5.0.0_Run1_22_10_2025/_scratch_/de/077bc1c4526da920b3f45c29a9f3dd/group-2_2.merged.fastq.gz" contain unequal amount of reads
When inspecting the POOL_SHORT_READS step I noticed a big mismatch in file size of the merged fastq files and inspected those further with seqkit stats to confirm the issue
seqkit stats group-2_1.merged.fastq.gz group-2_2.merged.fastq.gz
processed files: 2 / 2 [======================================] ETA: 0s. done
file format type num_seqs sum_len min_len avg_len max_len
group-2_1.merged.fastq.gz FASTQ DNA 119,559,197 17,878,497,150 15 149.5 151
group-2_2.merged.fastq.gz FASTQ DNA 101,589,939 15,161,387,675 15 149.2 151
This issue is true for all my merged fastq files so I'm unable to run SPADES at the moment. I do not think this is supposed to happen and would like to know if you know a fix for this issue as I would assume paired end date should drop the read if the pair is missing or was removed in trimming step? Thank you for your help!
Command used and terminal output
# Nextflow command
nextflow run nf-core/mag \
-resume \
-r 5.0.0 \
-work-dir "_scratch_" \
-profile singularity \
-process.executor "slurm" \
-process.queueSize 10 \
-process.maxForks 10 \
-c "custom.config" \
--input "samplesheet.csv" \
--outdir "./results" \
--spades_fix_cpus 32 \
--fastp_trim_polyg \
--host_fasta "./references/genomes/homo_sapiens/refdata-gex-GRCh38-2024-A/fasta/genome.fa" \
--host_removal_verysensitive \
--coassemble_group \
--spades_downstreaminput "contigs" \
--skip_megahit \
--prokka_compliance_centre EBI \
--prokka_with_compliance \
--skip_concoct \
--min_contig_size 1500 \
--save_assembly_mapped_reads \
--busco_db_lineage "auto_prok" \
--save_busco_db \
--busco_clean \
--gtdb_db ./gtdbtk_r226_data.tar.gz \
Relevant files
No response
System information
No response
Description of the bug
I'm using mag 5.0.0 for the first time after successfully running it with earlier versions and I mainly wanted to make use of the
--fastp_trim_polygoption to see if it could improve earlier co-assemblies of my Illumina NovaSeq paired-end data. Everything went well until the assembling step as I kept on getting:I looked in the
.command.logand.spades.logand found:When inspecting the POOL_SHORT_READS step I noticed a big mismatch in file size of the merged fastq files and inspected those further with seqkit stats to confirm the issue
This issue is true for all my merged fastq files so I'm unable to run SPADES at the moment. I do not think this is supposed to happen and would like to know if you know a fix for this issue as I would assume paired end date should drop the read if the pair is missing or was removed in trimming step? Thank you for your help!
Command used and terminal output
Relevant files
No response
System information
No response