Nextflow Pipeline for DeepTrio

This repository contains a Nextflow pipeline for Google’s DeepTrio, optimised for execution on NCI Gadi.

Quickstart Guide

Edit the pipeline_params.yml file to include:
- samples: a list of family trio entries, where each entry includes a unique family id, child sample name, child BAM file path, parent1 sample name, parent1 BAM file path, parent2 sample name, parent2 BAM path, path to an optional regions-of-interest BED file (set to '' if not required), and the model type. For BAM files, ensure corresponding .bai is in the same directory.
- ref: path to the reference FASTA (ensure corresponding .fai is in the same directory).
- output_dir: directory path to save output files.
- nci_project, nci_storage : NCI project and storage.
Update nextflow.config to match the resource requirements for each stage of the pipeline. For NCI Gadi, you may need to adjust only time and disk (i.e. jobfs) parameters based on the size of the datasets used (the default values are tested to be suitable for a PacBio dataset with each family member's BAM being ~45GB in size).
Load the Nextflow module and run the pipeline using the following commands:
```
module load nextflow/24.04.1
nextflow run main.nf -params-file pipeline_params.yml
```
Note: Additional Nextflow options can be included (e.g., -resume to resume from a previously paused/interrupted run)
For each family trio, output files will be stored in the directory output_dir/family_id.

Notes

It is assumed that the user has access to NCI's if89 project (required for using DeepTrio via module load). If not, simply request access using this link.

Case Study

A case study was conducted using a PacBio dataset with each family member's BAM being ~45GB in size to evaluate the runtime and service unit (SU) efficiency of deeptrio-nextflow compared to the original DeepTrio running on a single node. The benchmarking results are summarised in the table below.

Version	Gadi Resources	Runtime (hh:mm:ss)	SUs
Original DeepTrio	`gpuvolta` (48 CPUs, 4 GPUs, 384 GB memory)	09:52:07	1421.08
deeptrio-nextflow	`normalsr` (104 CPUs, 500 GB memory) → `gpuvolta` (12 CPUs, 1 GPU, 96 GB memory) → `normal` (24 CPUs, 96 GB memory)	04:01:28	598.79

Acknowledgments

The deeptrio-nextflow workflow was developed by Dr Kisaru Liyanage and Dr Matthew Downton (National Computational Infrastructure), with support from Australian BioCommons as part of the Workflow Commons project.

We thank Leah Kemp (Garvan Institute of Medical Research) for her collaboration in providing test datasets and assisting with pipeline testing.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
pipeline_params.yml		pipeline_params.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nextflow Pipeline for DeepTrio

Quickstart Guide

Notes

Case Study

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

kisarur/deeptrio-nextflow

Folders and files

Latest commit

History

Repository files navigation

Nextflow Pipeline for DeepTrio

Quickstart Guide

Notes

Case Study

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages