Skip to content

salmon seems to remove transcripts #1500

@jedi-bogate

Description

@jedi-bogate

Hi,

I'm running a pipeline for unique mapping in order to map TEs to the reference genome. I'm quantifying using transcript ID. When I run the pipeline, salmon seems to have removed some of the transcripts that are present in the annotation file when quantifying.

I read that this happens because salmon removes duplicates, however I'm unable to find the duplicate_clusters.tsv file that is supposed to be generated when that happens. I have also tried supplying the --keepDuplicates option to salmon index and salmon quant. However, none of it worked.

I also tried running the pipeline with star_rsem instead of star_salmon. The same thing seems to be happening as with salmon. Perhaps it isn't a quantification problem.

What are my options here? Should I just remove the transcripts that don't show up in salmon.merged.transcript_counts.tsv from the annotation file?

Thank you in advance for your answer! I'm not sure if this issue has been raised/resolved already (sorry if it has), I haven't been able to find a solution online.

These are the line counts for the annotation and each of the quantification output files:
Image

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions