-
Notifications
You must be signed in to change notification settings - Fork 180
Open
Description
Using salmon alevin v1.9.0, I noticed that my total reads were less than the deduplicated UMI count when I combined all three libraries together (from a NovaSeq run):
{
"total_reads": 284216343,
"reads_with_N": 165542,
"noisy_cb_reads": 1240522569,
"noisy_umi_reads": 6297,
"used_reads": 3338489231,
"mapping_rate": 52.32744469106451,
"reads_in_eqclasses": 2396169786,
"total_cbs": 48399818,
"used_cbs": 867051,
"initial_whitelist": 49593,
"low_conf_cbs": 1000,
"num_features": 5,
"final_num_cbs": 40432,
"deduplicated_umis": 359865640,
"mean_umis_per_cell": 8900,
"mean_genes_per_cell": 2814
}
I suspect this has happened due to an integer overflow: 284216343 + 2^32 = 4579183639, which matches the total count that I get when I add the total reads from each barcoded sample together:
==> salmon_1.9_OG_2022-Oct-13_S1/aux_info/alevin_meta_info.json <==
{
"total_reads": 1550672340,
"reads_with_N": 56210,
"noisy_cb_reads": 465287865,
"noisy_umi_reads": 3313,
"used_reads": 1085324952,
"mapping_rate": 52.507052779441469,
"reads_in_eqclasses": 814212344,
"total_cbs": 32341973,
"used_cbs": 1550909,
"initial_whitelist": 28000,
"low_conf_cbs": 991,
"num_features": 5,
"final_num_cbs": 18888,
"deduplicated_umis": 113155025,
"mean_umis_per_cell": 5990,
"mean_genes_per_cell": 2035
}
==> salmon_1.9_OG_2022-Oct-13_S2/aux_info/alevin_meta_info.json <==
{
"total_reads": 1371374162,
"reads_with_N": 50003,
"noisy_cb_reads": 389036191,
"noisy_umi_reads": 3005,
"used_reads": 982284963,
"mapping_rate": 54.0580725189425,
"reads_in_eqclasses": 741338439,
"total_cbs": 30332499,
"used_cbs": 1470602,
"initial_whitelist": 28000,
"low_conf_cbs": 997,
"num_features": 5,
"final_num_cbs": 19134,
"deduplicated_umis": 127624221,
"mean_umis_per_cell": 6670,
"mean_genes_per_cell": 2229
}
==> salmon_1.9_OG_2022-Oct-13_S3/aux_info/alevin_meta_info.json <==
{
"total_reads": 1657137137,
"reads_with_N": 59329,
"noisy_cb_reads": 447471964,
"noisy_umi_reads": 3629,
"used_reads": 1209602215,
"mapping_rate": 55.061293216313938,
"reads_in_eqclasses": 912441138,
"total_cbs": 33411349,
"used_cbs": 1567701,
"initial_whitelist": 28000,
"low_conf_cbs": 997,
"num_features": 5,
"final_num_cbs": 18395,
"deduplicated_umis": 125889439,
"mean_umis_per_cell": 6843,
"mean_genes_per_cell": 2248
}
To Reproduce
Steps and data to reproduce the behavior:
- Run
salmon alevinon more than 2^32 sequenced reads
Specifically, please provide at least the following information:
- Which version of salmon was used? v1.9.0
- How was salmon installed (compiled, downloaded executable, through bioconda)? binary download from github
- Which reference (e.g. transcriptome) was used? Gencode Human v41 + CHM13 v2.0 assembly
- Which read files were used? BD Rhapsody + NovaSeq
- Which which program options were used?
[cell barcodes were pre-corrected and merged using my own [custom script](https://gitlab.com/gringer/bioinfscripts/-/blob/master/synthSquish.pl)]
salmon alevin -l ISR \
-1 $(ls demultiplexed/squished_${machineID}*_R1_001.fastq.gz | sort) \
-2 $(ls demultiplexed/${machineID}*_R2_001.fastq.gz | sort) \
-i ${indexDir}/${indexName} --expectCells ${expectCellCount} \
-p 10 -o salmon_1.9_cbc_${projectID}_combined --tgMap ${indexDir}/txp2gene_${targetName}.txt \
--umi-geometry '1[28-35]' --bc-geometry '1[1-27]' --read-geometry '2[1-end]'
Expected behavior
{
"total_reads": 4579183639,
"reads_with_N": 165542,
"noisy_cb_reads": 1240522569,
"noisy_umi_reads": 6297,
"used_reads": 3338489231,
"mapping_rate": 52.32744469106451,
"reads_in_eqclasses": 2396169786,
"total_cbs": 48399818,
"used_cbs": 867051,
"initial_whitelist": 49593,
"low_conf_cbs": 1000,
"num_features": 5,
"final_num_cbs": 40432,
"deduplicated_umis": 359865640,
"mean_umis_per_cell": 8900,
"mean_genes_per_cell": 2814
}
Desktop (please complete the following information):
- OS/Version: Linux musculus 5.18.0-4-amd64 Add a Gitter chat badge to README.md #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) x86_64 GNU/Linux
Metadata
Metadata
Assignees
Labels
No labels