This short guide explains how to install the pipeline, run the two main parts (assembly & database preparation, then snippy comparative analyses), and how to edit config/config.yaml for your own samples.
Clone the repository and enter it:
git clone https://github.com/stalbrec/WGS-EPIPE && cd WGS-EPIPECreate and activate the Snakemake environment (replace the name if you prefer):
conda env create -f workflow/envs/snakemake.yaml -n snakemake
conda activate snakemake- Provide your reads by either:
- pointing
samplesto a directory with paired-end FASTQ files (naming likeSAMPLE_R1*.fastq.gzandSAMPLE_R2*.fastq.gz), or - creating a tab-separated
samples.tsvwith columnssample,R1,R2and settingsamplesto that file.
- pointing
Examples (edit config/config.yaml):
# option A: directory with FASTQ files
samples: "/path/to/your/reads"
# option B: explicit sample table
# samples: "samples.tsv"
# change where results are written
output_dir: "results"Save your changes to config/config.yaml before running Snakemake.
This run will: copy/read FASTQs, perform trimming/QC, run assemblies (shovill) and prepare abricate databases.
Dry-run (recommended):
snakemake -nRun (example using 8 cores):
snakemake --cores 8Where to find assemblies (default):
- Assemblies:
results/<sample>/shovill/contigs.fa - You can change
output_dirinconfig/config.yaml.
- Add one or more
snippy_runsentries inconfig/config.yaml. Each run should list the isolates, the reference isolate, anddata_pathswhere contigs and trimmed reads are located (for example theresults/from the first part).
Example snippy_runs entry:
snippy_runs:
- name: myrun
isolates: [sampleA, sampleB, sampleC]
reference: sampleA
data_paths: ["results/"]Example with custom glob patterns (put these at the top level of config/config.yaml):
# override the default discovery patterns for contigs and trimmed reads
contigs_pattern: "**/*contigs.fa" # default; change to match your filenames if needed
reads_pattern: "**/*trimmed.fq" # default; change to match your trimmed read filenames
snippy_runs:
- name: myrun
isolates: [sampleA, sampleB, sampleC]
reference: sampleA
data_paths: ["results/"]Notes:
referencemust be one of theisolates.- File matching is substring-based: the pipeline looks for files containing the isolate name (for contigs and trimmed reads) under each path in
data_paths. - If your contigs are produced elsewhere, set
data_pathsaccordingly.
Extra note about how contigs/reads are discovered
- The Snakefile builds a table of isolates, contigs and trimmed reads using the helper
load_contigs_and_readsand thesnippy_runsentries inconfig/config.yaml. - By default it searches for contig files using the glob pattern
**/*contigs.faand for trimmed reads using**/*trimmed.fq. You can override these defaults by addingcontigs_patternand/orreads_patternat the top level ofconfig/config.yaml. - The discovered table is written to the pipeline output directory as
contigs.tsv(e.g.results/contigs.tsvby default). Inspect this file if a sample is not matched as you expect — it shows which contigs/reads were picked for each isolate and which isolate was marked as the reference for a snippy run.
Dry-run the snippy workflow to check configuration:
snakemake -n snippyRun only the snippy target:
snakemake --use-conda snippy --cores 8- Use
snakemake -nto inspect what would run without executing anything. - If something fails, check logs in the
logs/directory (per-rule and per-sample logs).
That's the minimal workflow: create the snakemake environment, edit config/config.yaml for your samples and snippy runs, run the pipeline to produce assemblies and then run the snippy comparative analyses.
If Snakemake complains about missing input files, make sure any paths you added in your config are also bound to the singularity runtime in workflow/profiles/default/config.yaml:
singularity_args: "--bind /absolute/path/to/some/data/path"The docker image is not accessible/gone and I need to switch to installing the conda environment on the system
Just go through all rules (i.e. in workflow/Snakefile and in all files in workflow/rules) and replace:
container:
"docker://ghcr.io/stalbrec/wgs-epipe:latest"
with:
conda:
"../envs/wgs.yaml"
and uncomment the line that add the abricate-db updater as dependency in the abricate-rule workflow/rules/abricate.smk