Skip to content

haibol2016/riboseq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

537 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nf-core/riboseq

Open in GitHub Codespaces GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodo nf-test

Nextflow nf-core template version run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on BlueskyFollow on MastodonWatch on YouTube

Disclamation

This pipeline is an enhanced version of [nf-core riboseq pipeline](https://nf-co.re/riboseq/dev/). For changes see what I have changed.

Introduction

nf-core/riboseq is a bioinformatics pipeline for analysis of Ribo-seq data. It borrows heavily from nf-core/rnaseq in the preprocessing stages:

The map needs to be updated to reflect changes

nf-core/riboseq metro map

  1. Merge re-sequenced FastQ files (cat)
  2. Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
  3. Read QC (FastQC)
  4. UMI extraction (UMI-tools)
  5. Adapter and quality trimming (Trim Galore!)
  6. Removal of genome contaminants (BBSplit)
  7. Removal of non-coding RNAs (rRNA, tRNA, snRNA) (SortMeRNA or Bowtie) - Bowtie index can be provided as a pre-built .tar.gz file
  8. Genome alignment of reads, outputting both genome and transcriptome alignments with STAR
  9. Sort and index alignments (SAMtools)
  10. UMI-based deduplication (UMI-tools)

Differences occur in the downstream analysis steps. Currently these specialist steps are:

  1. Check reads distribution around annotated protein coding regions on user provided transcripts, show frame bias and estimate P-site offset for different group of reads (RiboCode)
  2. (default, optional) Detect actively translated open reading frames (ORFs) from alignment data (RiboCode)
  3. (default, optional) Predict translated open reading frames using RibORF 2.0
  4. (default, optional) Derive candidate ORFs from reference data and detect translated ORFs from that list (Ribotricer)
  5. (default, optional) Derive P-sites and QC from transcriptome alignments (riboWaltz)
  6. (optional, TI-seq data only) Quality control and P-site offset calculation for TI-seq samples using Ribo-TISH
  7. (optional, TI-seq data only) Predict translated ORFs and translation initiation sites (TIS) using Ribo-TISH
  8. (optional, TI-seq data only) Differential translation initiation site analysis between two TI-seq conditions using Ribo-TISH tisdiff (requires ≥3 replicates per condition)
  9. (optional) Use a translational efficiency approach to study the dynamics of transcription and translation, with anota2seq or Riborex (both require ≥3 replicates per condition). requires matched RNA-seq and Ribo-seq data
  10. (optional) Perform gene set enrichment analysis using fgsea to identify biological pathways and functions associated with differentially translated genes. Supports both over-representation analysis (FORA) and gene set enrichment analysis (GSEA). Applied to both translational efficiency results and differential TIS results (when TI-seq contrasts are provided). requires gene set database in GMT format

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2,strandedness,type
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward,riboseq

Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Each row should have a 'type' value of riboseq, tiseq or rnaseq.

The pipeline supports:

  • riboseq: Standard Ribo-seq data for detecting actively translated ORFs
  • tiseq: TI-seq (Translation Initiation sequencing) data for detecting translation initiation sites and differential TIS analysis
  • rnaseq: RNA-seq data for translational efficiency analysis (requires matched Ribo-seq samples)

The pipeline conducts paired analysis of matched riboseq and rnaseq samples for translational efficiency analysis. TI-seq samples are analyzed using Ribo-TISH for quality control, ORF/TIS prediction, and differential TIS analysis.

Now, you can run the pipeline using:

nextflow run nf-core/riboseq \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Including a translational efficiency analysis

anota2seq - fold change plot

In the translational efficiency analysis provided by anota2seq, we use matched pairs of Ribo-seq and RNA-seq data to study the relationship between transcription and translation as they differ between two treatment groups. For example the test data for this workflow has a contrasts file like:

id,variable,reference,target,batch,pair
treated_vs_control,treatment,control,treated,,pair

This describes how to compare groups of samples between treament groups, and between RNA-seq and Ribo-seq. In order the columns are:

  • id: a unique identifier to use for the contrast
  • 'variable`: which vaiable (column) of the sample sheet should be used to separate the treatment groups?
  • reference: which value of the variable column should be used to select samples to be used as the reference/ base group?
  • target: which value of the variable column should be used to select samples to be used as the target/treated group?
  • batch: (optional) specify a variable in the sample sheet that defines sample batches
  • pair: (optional) specify a variable in the sample sheet that defines sample pairing between RNA-seq and Ribo-seq samples. If not specified, it is assumed that the two types of sample are ordered the same.

Create a parameter file

You can generate a parameter file template using one of the following methods:

  1. Using nf-core launch (recommended):

    nf-core launch nf-core/riboseq

    This will open an interactive web interface where you can configure all pipeline parameters and download a params.json file.

  2. Create a parameter file (YAML):

    nf-core pipeline create-params-file /path/to/riboseq/pipeline/directory

    This create a params.yaml file containing all parameters set to the pipeline default value along with their description in comments. This template can then be used by uncommenting and modifying the value of parameters you want to pass to a pipline run. Hidden options are not included by default, but can be included using the -x/--show-hidden flag.

Once created, use the parameter file when running the pipeline:

nextflow run nf-core/riboseq -profile docker -params-file params.yaml

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Best Practices

For comprehensive guidelines on experimental design, library preparation, quality control, parameter selection, and troubleshooting, see the Best Practices Guide.

Pipeline output (to be updated)

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/riboseq was originally written by Jonathan Manning (Bioinformatics Engineer at Seqera) with support from Altos Labs and in discussion with Felix Krueger and Christel Krueger. We thank the following people for their input:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #riboseq channel (you can join with this invite).

Citations

If you use nf-core/riboseq for your analysis, please cite it using the following doi: 10.5281/zenodo.10966364

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

Pipeline for the analysis of ribosome profiling, or Ribo-seq (also named ribosome footprinting) data.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Nextflow 86.1%
  • Python 10.5%
  • Perl 2.0%
  • HTML 1.2%
  • Shell 0.2%