salmon seems to remove transcripts

Hi,

I'm running a pipeline for unique mapping in order to map TEs to the reference genome. I'm quantifying using transcript ID. When I run the pipeline, salmon seems to have removed some of the transcripts that are present in the annotation file when quantifying.

I read that this happens because salmon removes duplicates, however I'm unable to find the duplicate_clusters.tsv file that is supposed to be generated when that happens. I have also tried supplying the --keepDuplicates option to salmon index and salmon quant. However, none of it worked.

I also tried running the pipeline with star_rsem instead of star_salmon. The same thing seems to be happening as with salmon. Perhaps it isn't a quantification problem.

What are my options here? Should I just remove the transcripts that don't show up in salmon.merged.transcript_counts.tsv from the annotation file? 

Thank you in advance for your answer! I'm not sure if this issue has been raised/resolved already (sorry if it has), I haven't been able to find a solution online.

These are the line counts for the annotation and each of the quantification output files:
<img width="341" alt="Image" src="https://github.com/user-attachments/assets/9c7f452a-dbb1-4433-a270-83976258b110" />

<img width="354" alt="Image" src="https://github.com/user-attachments/assets/04060783-2503-4426-9f08-818ed811d6c3" />

<img width="342" alt="Image" src="https://github.com/user-attachments/assets/def10d95-9655-4cd5-b0f7-ae71d2ab585f" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

salmon seems to remove transcripts #1500

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

salmon seems to remove transcripts #1500

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions