Current Version: 0.1.24
This repository contains the GATK-SV workflow, which is a wrapper around the GATK-SV Cromwell workflow. It has been migrated from Production-Pipelines. This has been generated as part of the migration from Production-Pipelines's CPG_workflows framework, to the separate CPG-Flow.
This repository contains the following workflows:
-
singlesample_workflow- previously calledgatk_sv_singlesample. This is the initial per-sample variant calling, generation of QC metrics following variant calling, through to batching of samples based on those metrics. The exact stages are:- GatherSampleEvidence
- EvidenceQC
CreateSampleBatches- a somewhat bespoke sample batching script
-
multisample_workflow- previously calledgatk_sv_multisample. This is the Cohort based batch processing of clustered samples, through to a final joint-called dataset.
Following a best-practices CPG-Flow structure, the repository is structured as follows:
├── Dockerfile
├── LICENSE
├── README.md
├── pull_request_template.md
├── pyproject.toml
└── src
├── cpg_flow_gatk_sv
│ ├── __init__.py
│ ├── config_template.toml
│ ├── singlesample_workflow.py
│ ├── multisample_workflow.py
│ ├── jobs
│ │ ├── CreateSampleBatches.py
│ │ ├── EvidenceQC.py
│ │ ├── GatherSampleEvidence.py
│ │ └── ...
│ ├── scripts
│ │ ├── __init__.py
│ │ └── sample_batching.py
│ │ └── ...
│ └── utils.py
This structure contains the following important files:
pyproject.toml- the build instructions and linter settings for the repositorysinglesample_workflow- the Stages and workflow trigger for the first workflowmultisample_workflow- the Stages and workflow trigger for the second workflowjobs- a directory containing the logic for each of the Stages in the workflow. Each file contains a single Stage, and is named after the Stage it contains. This implements all the logic for a stage, including assembling the input dictionary to be passed to Cromwell.scripts- a directory containing any scripts which are not part of the GATK-SV Cromwell workflow, but are used in the workflow. This includes any helper scripts, or CPG-custom logic.
This is designed to be run using analysis-runner, using a fully containerised installation of this codebase.
analysis-runner \
--skip-repo-checkout \
--image australia-southeast1-docker.pkg.dev/cpg-common/images/cpg-flow-gatk-sv:0.1.24 \
--dataset DATASET \
--description 'GATK-SV, CPG-flow' \
-o gatk-sv_cpg-flow \
--access-level full \
--config CONFIG \
singlesample_workflowSemantic Versioning should be implemented with bump-my-version
bump-my-version bump patch/minor/major
This is using the configuration block inside pyproject.toml