Skip to content

feat: wire hybrid GTF (canonical + novel intergenic) into ORF callers #165

@pinin4fjords

Description

@pinin4fjords

Summary

PR #158 produces a merged StringTie GTF but does not feed it into the ORF callers — novel transcript discovery currently has no effect on biological outputs. This issue wires the hybrid GTF (canonical backbone from #161 + class u novel transcripts from #164) into Ribo-TISH and Ribotricer.

Blocked by: #161 (canonical backbone), #164 (StringTie/gffcompare filter), nf-core/modules#11644 (ribotish/predict -a flag), #162 (ribotish/quality investigation). Gated on --extended_orf_analysis.

Architecture decision: transcriptome-BAM constraint

STAR --quantMode TranscriptomeSAM produces a BAM keyed to the reference transcriptome at alignment time. RiboCode and riboWaltz consume this transcriptome BAM — novel StringTie sequences not present at alignment time are invisible to them.

Resolution:

  • riboWaltz: canonical annotation only, permanently. It is a QC tool; hybrid GTF input has no scientific value and would degrade CDS-diagnostic plot readability.
  • RiboCode: canonical only in this phase. Equal novel-locus coverage for RiboCode requires a second STAR alignment against a hybrid transcriptome — addressed separately in issue feat: ORF-level differential translation analysis (DT, DTE, and DOU) #168 (second STAR pass). That issue must be filed and addressed to restore three-caller parity for novel ORFs.
  • Ribo-TISH + Ribotricer: genome-BAM tools — can accept any GTF directly. Updated in this issue.

Hybrid GTF construction

canonical_backbone.gtf   (MANE / Ensembl_canonical, issue #161)
  + novel_intergenic.gtf (StringTie class 'u', issue #164)
  = hybrid_reference.gtf

Emit hybrid_reference.gtf as a published output at <outdir>/stringtie/hybrid_reference.gtf.

Per-tool wiring

Ribo-TISH predict (requires nf-core/modules#11644):

-g novel_intergenic.gtf        # discovery target
-a canonical_backbone.gtf      # background model + ORF classification

Or pass hybrid_reference.gtf to -g (simpler; slightly less clean separation).

Ribotricer prepare-orfs: pass hybrid_reference.gtf directly. No module change needed — Ribotricer classifies all CDS-absent transcripts as novel automatically.

Plastid metagene_generate / psite: canonical backbone GTF only (requires CDS for ROI generation). Plastid wiggle tracks are genome-wide — they can quantify P-sites at any coordinate including novel ORF loci without a GTF change.

Do not pre-annotate novel transcripts with TransDecoder — labelling sequence-predicted ORFs as annotated CDS confuses all downstream classifiers and conflates sequence-based prediction with Ribo-seq evidence.

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions