-
Notifications
You must be signed in to change notification settings - Fork 829
Description
Hi,
I'm running a pipeline for unique mapping in order to map TEs to the reference genome. I'm quantifying using transcript ID. When I run the pipeline, salmon seems to have removed some of the transcripts that are present in the annotation file when quantifying.
I read that this happens because salmon removes duplicates, however I'm unable to find the duplicate_clusters.tsv file that is supposed to be generated when that happens. I have also tried supplying the --keepDuplicates option to salmon index and salmon quant. However, none of it worked.
I also tried running the pipeline with star_rsem instead of star_salmon. The same thing seems to be happening as with salmon. Perhaps it isn't a quantification problem.
What are my options here? Should I just remove the transcripts that don't show up in salmon.merged.transcript_counts.tsv from the annotation file?
Thank you in advance for your answer! I'm not sure if this issue has been raised/resolved already (sorry if it has), I haven't been able to find a solution online.
These are the line counts for the annotation and each of the quantification output files: