Skip to content

WGS-EPIPE: A Snakemake-driven WGS workflow using Shovill, database comparisons, and Snippy for isolate analysis.

Notifications You must be signed in to change notification settings

stalbrec/WGS-EPIPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whole-Genome-Sequencing Epi Pipeline @ FLI (WGS-EPIPE)

This short guide explains how to install the pipeline, run the two main parts (assembly & database preparation, then snippy comparative analyses), and how to edit config/config.yaml for your own samples.

1) Install

Clone the repository and enter it:

git clone https://github.com/stalbrec/WGS-EPIPE && cd WGS-EPIPE

Create and activate the Snakemake environment (replace the name if you prefer):

conda env create -f workflow/envs/snakemake.yaml -n snakemake
conda activate snakemake

2) Prepare your config/config.yaml

  • Provide your reads by either:
    • pointing samples to a directory with paired-end FASTQ files (naming like SAMPLE_R1*.fastq.gz and SAMPLE_R2*.fastq.gz), or
    • creating a tab-separated samples.tsv with columns sample, R1, R2 and setting samples to that file.

Examples (edit config/config.yaml):

# option A: directory with FASTQ files
samples: "/path/to/your/reads"

# option B: explicit sample table
# samples: "samples.tsv"

# change where results are written
output_dir: "results"

Save your changes to config/config.yaml before running Snakemake.

3) Run the first part (reads → QC → assembly + DB prep)

This run will: copy/read FASTQs, perform trimming/QC, run assemblies (shovill) and prepare abricate databases.

Dry-run (recommended):

snakemake -n

Run (example using 8 cores):

snakemake --cores 8

Where to find assemblies (default):

  • Assemblies: results/<sample>/shovill/contigs.fa
  • You can change output_dir in config/config.yaml.

4) Run the snippy comparative part

  • Add one or more snippy_runs entries in config/config.yaml. Each run should list the isolates, the reference isolate, and data_paths where contigs and trimmed reads are located (for example the results/ from the first part).

Example snippy_runs entry:

snippy_runs:
  - name: myrun
    isolates: [sampleA, sampleB, sampleC]
    reference: sampleA
    data_paths: ["results/"]

Example with custom glob patterns (put these at the top level of config/config.yaml):

# override the default discovery patterns for contigs and trimmed reads
contigs_pattern: "**/*contigs.fa"   # default; change to match your filenames if needed
reads_pattern: "**/*trimmed.fq"     # default; change to match your trimmed read filenames

snippy_runs:
  - name: myrun
    isolates: [sampleA, sampleB, sampleC]
    reference: sampleA
    data_paths: ["results/"]

Notes:

  • reference must be one of the isolates.
  • File matching is substring-based: the pipeline looks for files containing the isolate name (for contigs and trimmed reads) under each path in data_paths.
  • If your contigs are produced elsewhere, set data_paths accordingly.

Extra note about how contigs/reads are discovered

  • The Snakefile builds a table of isolates, contigs and trimmed reads using the helper load_contigs_and_reads and the snippy_runs entries in config/config.yaml.
  • By default it searches for contig files using the glob pattern **/*contigs.fa and for trimmed reads using **/*trimmed.fq. You can override these defaults by adding contigs_pattern and/or reads_pattern at the top level of config/config.yaml.
  • The discovered table is written to the pipeline output directory as contigs.tsv (e.g. results/contigs.tsv by default). Inspect this file if a sample is not matched as you expect — it shows which contigs/reads were picked for each isolate and which isolate was marked as the reference for a snippy run.

Dry-run the snippy workflow to check configuration:

snakemake -n snippy

Run only the snippy target:

snakemake --use-conda snippy --cores 8

5) Quick tips

  • Use snakemake -n to inspect what would run without executing anything.
  • If something fails, check logs in the logs/ directory (per-rule and per-sample logs).

That's the minimal workflow: create the snakemake environment, edit config/config.yaml for your samples and snippy runs, run the pipeline to produce assemblies and then run the snippy comparative analyses.

6) Troubleshooting

missing input files inside the container

If Snakemake complains about missing input files, make sure any paths you added in your config are also bound to the singularity runtime in workflow/profiles/default/config.yaml:

singularity_args: "--bind /absolute/path/to/some/data/path"

The docker image is not accessible/gone and I need to switch to installing the conda environment on the system

Just go through all rules (i.e. in workflow/Snakefile and in all files in workflow/rules) and replace:

    container:
        "docker://ghcr.io/stalbrec/wgs-epipe:latest"

with:

    conda:
        "../envs/wgs.yaml"

and uncomment the line that add the abricate-db updater as dependency in the abricate-rule workflow/rules/abricate.smk

About

WGS-EPIPE: A Snakemake-driven WGS workflow using Shovill, database comparisons, and Snippy for isolate analysis.

Topics

Resources

Stars

Watchers

Forks

Packages