Summary
Extend the DTE pipeline from gene level to ORF level. The current tools (anota2seq, deltaTE) operate at gene level; this issue adds ORF-level DT/DTE and lays the groundwork for DOTSeq when it reaches Bioconductor stable.
Blocked by: #166 (ORF-level P-site counts), #167 (merged ORF catalogue).
Terminology
- Differential Translation (DT): change in absolute ribosome occupancy. DESeq2 on Ribo-seq counts directly. Does not require RNA-seq.
- Differential Translation Efficiency (DTE): change in translation normalised against mRNA abundance. Requires paired Ribo-seq + RNA-seq. Tools: anota2seq, deltaTE, DOTSeq DTE.
- Differential ORF Usage (DOU): relative contribution of each ORF to gene's total translation output; beta-binomial GLM (DOTSeq). Does not require an absolute RNA-seq denominator. Detects uORF/mORF balance shifts invisible to gene-level DTE.
Tier 1 — Gene-level DTE (backward compatible)
Keep existing anota2seq and deltaTE modules. Pre-aggregate ORF P-site counts to gene level using main CDS ORFs only before these tools:
gene_level_counts <- orf_psite_counts %>%
left_join(orf_to_gene_mapping, by = "orf_id") %>%
filter(orf_type == "CDS") %>% # main CDS ORFs only — NOT uORFs or dORFs
group_by(gene_id, sample) %>%
summarise(counts = sum(counts), .groups = "drop") %>%
pivot_wider(names_from = sample, values_from = counts)
Summing uORF counts into the gene total contaminates the TE numerator. Use orf_to_gene.tsv from issue #167.
anota2seq distribution note: P-site counts are sparser and more zero-inflated than Salmon TPM-derived counts. Default anota2seq filtering thresholds (minimum counts per gene) were calibrated on RNA-seq-like distributions. Validate empirically and document adjusted thresholds for P-site input before releasing.
Tier 2 — ORF-level DTE via DESeq2 interaction model
Apply ~ condition + seq_type + condition:seq_type per ORF. The condition:seq_type term gives per-ORF TE change. Extrapolation from gene-level published precedent (Chothani et al. 2019); P-site counts at ORF level are often sparse.
Before releasing: inspect plotDispEsts on real data to confirm DESeq2 dispersion estimation is well-behaved. A minimum-count filter (≥N samples with ≥5 P-site counts) will likely be required; tune per ORF class.
RNA-seq denominator by ORF class:
| ORF class |
Denominator |
| Canonical CDS ORF |
Gene-level RNA-seq count |
| uORF / dORF on annotated transcript |
Transcript-level Salmon count |
| ORF on StringTie novel transcript |
Transcript-level count (Salmon against hybrid GTF from #164) |
| Novel intergenic ORF, no host transcript |
None — counts only, excluded from DTE |
Row-independence caveat: multiple ORFs from the same gene sharing the same RNA-seq denominator row are not statistically independent — perfectly correlated rows remain dependent. Generating synthetic copies per ORF does not resolve this. The accepted practical approach is DESeq2 with this limitation acknowledged. The rigorous solution (Fishpond/Swish) is out of scope. Document this assumption clearly for users.
Tier 3 — DOTSeq (when Bioconductor stable)
DOTSeq (Lim & Chieng, bioRxiv 2025) provides DTE + DOU at ORF level. Currently Bioconductor dev branch only. Add as --run_dotseq (default false) once stable.
DOU is particularly valuable: it detects shifts in the relative contribution of each ORF to a gene's total translation output (uORF vs mORF balance) without requiring an absolute RNA-seq denominator — a capability unavailable in any currently stable tool.
DOTSeq requires a flattened ORF annotation (orf_to_gtf.py from the DOTSeq toolkit) as a pre-processing step.
Novel intergenic ORFs
Cannot be included in DTE without a paired RNA-seq denominator. Output:
Exception: if RNA-seq is available and Salmon was run against the hybrid GTF (issue #164 transcripts included), the containing transcript's RNA-seq abundance can serve as the denominator → eligible for Tier 2.
References
Summary
Extend the DTE pipeline from gene level to ORF level. The current tools (anota2seq, deltaTE) operate at gene level; this issue adds ORF-level DT/DTE and lays the groundwork for DOTSeq when it reaches Bioconductor stable.
Blocked by: #166 (ORF-level P-site counts), #167 (merged ORF catalogue).
Terminology
Tier 1 — Gene-level DTE (backward compatible)
Keep existing anota2seq and deltaTE modules. Pre-aggregate ORF P-site counts to gene level using main CDS ORFs only before these tools:
Summing uORF counts into the gene total contaminates the TE numerator. Use
orf_to_gene.tsvfrom issue #167.anota2seq distribution note: P-site counts are sparser and more zero-inflated than Salmon TPM-derived counts. Default anota2seq filtering thresholds (minimum counts per gene) were calibrated on RNA-seq-like distributions. Validate empirically and document adjusted thresholds for P-site input before releasing.
Tier 2 — ORF-level DTE via DESeq2 interaction model
Apply
~ condition + seq_type + condition:seq_typeper ORF. Thecondition:seq_typeterm gives per-ORF TE change. Extrapolation from gene-level published precedent (Chothani et al. 2019); P-site counts at ORF level are often sparse.Before releasing: inspect
plotDispEstson real data to confirm DESeq2 dispersion estimation is well-behaved. A minimum-count filter (≥N samples with ≥5 P-site counts) will likely be required; tune per ORF class.RNA-seq denominator by ORF class:
Row-independence caveat: multiple ORFs from the same gene sharing the same RNA-seq denominator row are not statistically independent — perfectly correlated rows remain dependent. Generating synthetic copies per ORF does not resolve this. The accepted practical approach is DESeq2 with this limitation acknowledged. The rigorous solution (Fishpond/Swish) is out of scope. Document this assumption clearly for users.
Tier 3 — DOTSeq (when Bioconductor stable)
DOTSeq (Lim & Chieng, bioRxiv 2025) provides DTE + DOU at ORF level. Currently Bioconductor dev branch only. Add as
--run_dotseq(defaultfalse) once stable.DOU is particularly valuable: it detects shifts in the relative contribution of each ORF to a gene's total translation output (uORF vs mORF balance) without requiring an absolute RNA-seq denominator — a capability unavailable in any currently stable tool.
DOTSeq requires a flattened ORF annotation (
orf_to_gtf.pyfrom the DOTSeq toolkit) as a pre-processing step.Novel intergenic ORFs
Cannot be included in DTE without a paired RNA-seq denominator. Output:
Exception: if RNA-seq is available and Salmon was run against the hybrid GTF (issue #164 transcripts included), the containing transcript's RNA-seq abundance can serve as the denominator → eligible for Tier 2.
References
modules/local/deseq2/deltate/,modules/nf-core/anota2seq/anota2seqrun/,workflows/riboseq/main.nf~line 589