A modular, flexible and reproducible Snakemake workflow to perform k-mers-based GWAS.
kGWASflow is a Snakemake pipeline developed for performing k-mers-based genome-wide association study (GWAS) based on the method developed by Voichek et al. (2020). It performs several pre-GWAS analysis including read trimming, quality control and k-mer counting. It implements the kmersGWAS method worfklow for performing k-mers-based GWAS. The pipeline also contains post-GWAS analysis, such as mapping k-mers to a reference genome, finding and mapping the source reads for k-mers, assembling source reads of k-mers into contigs and mapping them to a reference genome. kGWASflow is also highly customizable and offers users multiple options to choose from depends on their needs.
1. Clone this repository to your local machine using below command:
git clone https://github.com/akcorut/kGWASflow.gitAlternatively, you can also download and extract the source code of the latest release.
2. Change into the kGWASflow directory:
cd kGWASflowIn order to use this worklow, you need conda to be installed (to install conda, please follow the instructions here).
1. Install Snakemake via mamba package manager:
Snakemake recommends mamba to be used to install snakemake. More detailed information can be found in the Snakemake manual. To install mamba, you can use the below command:
conda install -c conda-forge mambaAfter installing mamba, you can use below commands to install and activate snakemake and the other dependencies:
mamba env create -f environment.yaml
conda activate kGWASflow1a. Alternative installation without mamba
You can also install snakemake and the other dependencies without mamba and just using conda as below:
# This assumes conda is installed in your local machine or computing environment
conda env create -f environment.yaml
conda activate kGWASflowThe other options on how to deploy this workflow can be found in the Snakemake Workflow Catalog.
Configure the workflow according to your needs by modifying the files in the config/ folder.
-
config/config.yamlis a YAML file containing the workflow configuration. -
config/samples.tsvis a TSV file containing the sample information. -
config/phenos.tsvis a TSV file contains the phenotype information.
For more information, please click here.
After changing into the kGWASflow directory and activating the kGWASflow conda environment, you can start using the workflow as below:
1. Test your configuration by performing a dry-run
snakemake -n --use-conda 2. Run the workflow and install software dependencies
snakemake --cores all --use-condaIf you want to run the workflow with a different config.yaml file, you can us the --configfile parameter to specify it:
snakemake --use-conda --configfile <path/to/config.yaml>The usage of this workflow is also described in the Snakemake Workflow Catalog.
kGWASflow was developed by Adnan Kivanc Corut .
If you use kGWASflow in your research, please cite using the DOI: 10.5281/zenodo.7290926 and the original method paper by Voichek et al. (2020):
Kivanc Corut. akcorut/kGWASflow: v1.0.0. (2022). https://doi.org/10.5281/zenodo.7290926
Voichek, Y., Weigel, D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes.
Nat Genet 52, 534–540 (2020). https://doi.org/10.1038/s41588-020-0612-7
kGWASflow is licensed under the MIT license.