GitHub - haibol2016/riboseq: Pipeline for the analysis of ribosome profiling, or Ribo-seq (also named ribosome footprinting) data.

Disclamation

This pipeline is an enhanced version of [nf-core riboseq pipeline](https://nf-co.re/riboseq/dev/). For changes see what I have changed.

Introduction

nf-core/riboseq is a bioinformatics pipeline for analysis of Ribo-seq data. It borrows heavily from nf-core/rnaseq in the preprocessing stages:

The map needs to be updated to reflect changes

Merge re-sequenced FastQ files (cat)
Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
Read QC (FastQC)
UMI extraction (UMI-tools)
Adapter and quality trimming (Trim Galore!)
Removal of genome contaminants (BBSplit)
Removal of non-coding RNAs (rRNA, tRNA, snRNA) (SortMeRNA or Bowtie) - Bowtie index can be provided as a pre-built .tar.gz file
Genome alignment of reads, outputting both genome and transcriptome alignments with STAR
Sort and index alignments (SAMtools)
UMI-based deduplication (UMI-tools)

Differences occur in the downstream analysis steps. Currently these specialist steps are:

Check reads distribution around annotated protein coding regions on user provided transcripts, show frame bias and estimate P-site offset for different group of reads (RiboCode)
(default, optional) Detect actively translated open reading frames (ORFs) from alignment data (RiboCode)
(default, optional) Predict translated open reading frames using RibORF 2.0
(default, optional) Derive candidate ORFs from reference data and detect translated ORFs from that list (Ribotricer)
(default, optional) Derive P-sites and QC from transcriptome alignments (riboWaltz)
(optional, TI-seq data only) Quality control and P-site offset calculation for TI-seq samples using Ribo-TISH
(optional, TI-seq data only) Predict translated ORFs and translation initiation sites (TIS) using Ribo-TISH
(optional, TI-seq data only) Differential translation initiation site analysis between two TI-seq conditions using Ribo-TISH tisdiff (requires ≥3 replicates per condition)
(optional) Use a translational efficiency approach to study the dynamics of transcription and translation, with anota2seq or Riborex (both require ≥3 replicates per condition). requires matched RNA-seq and Ribo-seq data
(optional) Perform gene set enrichment analysis using fgsea to identify biological pathways and functions associated with differentially translated genes. Supports both over-representation analysis (FORA) and gene set enrichment analysis (GSEA). Applied to both translational efficiency results and differential TIS results (when TI-seq contrasts are provided). requires gene set database in GMT format

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2,strandedness,type
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward,riboseq

Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Each row should have a 'type' value of riboseq, tiseq or rnaseq.

The pipeline supports:

riboseq: Standard Ribo-seq data for detecting actively translated ORFs
tiseq: TI-seq (Translation Initiation sequencing) data for detecting translation initiation sites and differential TIS analysis
rnaseq: RNA-seq data for translational efficiency analysis (requires matched Ribo-seq samples)

The pipeline conducts paired analysis of matched riboseq and rnaseq samples for translational efficiency analysis. TI-seq samples are analyzed using Ribo-TISH for quality control, ORF/TIS prediction, and differential TIS analysis.

Now, you can run the pipeline using:

nextflow run nf-core/riboseq \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Including a translational efficiency analysis

In the translational efficiency analysis provided by anota2seq, we use matched pairs of Ribo-seq and RNA-seq data to study the relationship between transcription and translation as they differ between two treatment groups. For example the test data for this workflow has a contrasts file like:

id,variable,reference,target,batch,pair
treated_vs_control,treatment,control,treated,,pair

This describes how to compare groups of samples between treament groups, and between RNA-seq and Ribo-seq. In order the columns are:

id: a unique identifier to use for the contrast
'variable`: which vaiable (column) of the sample sheet should be used to separate the treatment groups?
reference: which value of the variable column should be used to select samples to be used as the reference/ base group?
target: which value of the variable column should be used to select samples to be used as the target/treated group?
batch: (optional) specify a variable in the sample sheet that defines sample batches
pair: (optional) specify a variable in the sample sheet that defines sample pairing between RNA-seq and Ribo-seq samples. If not specified, it is assumed that the two types of sample are ordered the same.

Create a parameter file

You can generate a parameter file template using one of the following methods:

Using nf-core launch (recommended):
```
nf-core launch nf-core/riboseq
```
This will open an interactive web interface where you can configure all pipeline parameters and download a params.json file.
Create a parameter file (YAML):
```
nf-core pipeline create-params-file /path/to/riboseq/pipeline/directory
```
This create a params.yaml file containing all parameters set to the pipeline default value along with their description in comments. This template can then be used by uncommenting and modifying the value of parameters you want to pass to a pipline run. Hidden options are not included by default, but can be included using the -x/--show-hidden flag.

Once created, use the parameter file when running the pipeline:

nextflow run nf-core/riboseq -profile docker -params-file params.yaml

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Best Practices

For comprehensive guidelines on experimental design, library preparation, quality control, parameter selection, and troubleshooting, see the Best Practices Guide.

Pipeline output (to be updated)

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/riboseq was originally written by Jonathan Manning (Bioinformatics Engineer at Seqera) with support from Altos Labs and in discussion with Felix Krueger and Christel Krueger. We thank the following people for their input:

Anne Bresciani (ZS)
Felipe Almeida (ZS)
Mikhail Osipovitch (ZS)
Edward Wallace (University of Edinburgh)
Jack Tierney (University College Cork)
Maxime U Garcia (Seqera)
Ira A Iosub (The Francis Crick Institute)

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #riboseq channel (you can join with this invite).

Citations

If you use nf-core/riboseq for your analysis, please cite it using the following doi: 10.5281/zenodo.10966364

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 537 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
assets		assets
bin		bin
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
tests		tests
workflows/riboseq		workflows/riboseq
.gitattributes		.gitattributes
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DOCKER_IMAGES_CHECK.md		DOCKER_IMAGES_CHECK.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-test.config		nf-test.config
ro-crate-metadata.json		ro-crate-metadata.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disclamation

Introduction

The map needs to be updated to reflect changes

Usage

Including a translational efficiency analysis

Create a parameter file

Best Practices

Pipeline output (to be updated)

Credits

Contributions and Support

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disclamation

Introduction

The map needs to be updated to reflect changes

Usage

Including a translational efficiency analysis

Create a parameter file

Best Practices

Pipeline output (to be updated)

Credits

Contributions and Support

Citations

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages