Skip to content

feat: ORF-level differential translation analysis (DT, DTE, and DOU) #168

Description

@pinin4fjords

Summary

Extend the DTE pipeline from gene level to ORF level. The current tools (anota2seq, deltaTE) operate at gene level; this issue adds ORF-level DT/DTE and lays the groundwork for DOTSeq when it reaches Bioconductor stable.

Blocked by: #166 (ORF-level P-site counts), #167 (merged ORF catalogue).

Terminology

  • Differential Translation (DT): change in absolute ribosome occupancy. DESeq2 on Ribo-seq counts directly. Does not require RNA-seq.
  • Differential Translation Efficiency (DTE): change in translation normalised against mRNA abundance. Requires paired Ribo-seq + RNA-seq. Tools: anota2seq, deltaTE, DOTSeq DTE.
  • Differential ORF Usage (DOU): relative contribution of each ORF to gene's total translation output; beta-binomial GLM (DOTSeq). Does not require an absolute RNA-seq denominator. Detects uORF/mORF balance shifts invisible to gene-level DTE.

Tier 1 — Gene-level DTE (backward compatible)

Keep existing anota2seq and deltaTE modules. Pre-aggregate ORF P-site counts to gene level using main CDS ORFs only before these tools:

gene_level_counts <- orf_psite_counts %>%
    left_join(orf_to_gene_mapping, by = "orf_id") %>%
    filter(orf_type == "CDS") %>%   # main CDS ORFs only — NOT uORFs or dORFs
    group_by(gene_id, sample) %>%
    summarise(counts = sum(counts), .groups = "drop") %>%
    pivot_wider(names_from = sample, values_from = counts)

Summing uORF counts into the gene total contaminates the TE numerator. Use orf_to_gene.tsv from issue #167.

anota2seq distribution note: P-site counts are sparser and more zero-inflated than Salmon TPM-derived counts. Default anota2seq filtering thresholds (minimum counts per gene) were calibrated on RNA-seq-like distributions. Validate empirically and document adjusted thresholds for P-site input before releasing.

Tier 2 — ORF-level DTE via DESeq2 interaction model

Apply ~ condition + seq_type + condition:seq_type per ORF. The condition:seq_type term gives per-ORF TE change. Extrapolation from gene-level published precedent (Chothani et al. 2019); P-site counts at ORF level are often sparse.

Before releasing: inspect plotDispEsts on real data to confirm DESeq2 dispersion estimation is well-behaved. A minimum-count filter (≥N samples with ≥5 P-site counts) will likely be required; tune per ORF class.

RNA-seq denominator by ORF class:

ORF class Denominator
Canonical CDS ORF Gene-level RNA-seq count
uORF / dORF on annotated transcript Transcript-level Salmon count
ORF on StringTie novel transcript Transcript-level count (Salmon against hybrid GTF from #164)
Novel intergenic ORF, no host transcript None — counts only, excluded from DTE

Row-independence caveat: multiple ORFs from the same gene sharing the same RNA-seq denominator row are not statistically independent — perfectly correlated rows remain dependent. Generating synthetic copies per ORF does not resolve this. The accepted practical approach is DESeq2 with this limitation acknowledged. The rigorous solution (Fishpond/Swish) is out of scope. Document this assumption clearly for users.

Tier 3 — DOTSeq (when Bioconductor stable)

DOTSeq (Lim & Chieng, bioRxiv 2025) provides DTE + DOU at ORF level. Currently Bioconductor dev branch only. Add as --run_dotseq (default false) once stable.

DOU is particularly valuable: it detects shifts in the relative contribution of each ORF to a gene's total translation output (uORF vs mORF balance) without requiring an absolute RNA-seq denominator — a capability unavailable in any currently stable tool.

DOTSeq requires a flattened ORF annotation (orf_to_gtf.py from the DOTSeq toolkit) as a pre-processing step.

Novel intergenic ORFs

Cannot be included in DTE without a paired RNA-seq denominator. Output:

Exception: if RNA-seq is available and Salmon was run against the hybrid GTF (issue #164 transcripts included), the containing transcript's RNA-seq abundance can serve as the denominator → eligible for Tier 2.

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions