Skip to content

akcorut/kGWASflow

Repository files navigation

kGWASflow

A modular, flexible and reproducible Snakemake workflow to perform k-mers-based GWAS.

Snakemake GitHub actions status License DOI

Table of Contents

Summary

kGWASflow is a Snakemake pipeline developed for performing k-mers-based genome-wide association study (GWAS) based on the method developed by Voichek et al. (2020). It performs several pre-GWAS analysis including read trimming, quality control and k-mer counting. It implements the kmersGWAS method worfklow for performing k-mers-based GWAS. The pipeline also contains post-GWAS analysis, such as mapping k-mers to a reference genome, finding and mapping the source reads for k-mers, assembling source reads of k-mers into contigs and mapping them to a reference genome. kGWASflow is also highly customizable and offers users multiple options to choose from depends on their needs.

My project-1-3


Installation

Step 1: Obtain the latest release of this workflow

1. Clone this repository to your local machine using below command:

git clone https://github.com/akcorut/kGWASflow.git

Alternatively, you can also download and extract the source code of the latest release.

2. Change into the kGWASflow directory:

cd kGWASflow

Step 2: Install Snakemake and the other dependencies

In order to use this worklow, you need conda to be installed (to install conda, please follow the instructions here).

1. Install Snakemake via mamba package manager:

Snakemake recommends mamba to be used to install snakemake. More detailed information can be found in the Snakemake manual. To install mamba, you can use the below command:

conda install -c conda-forge mamba

After installing mamba, you can use below commands to install and activate snakemake and the other dependencies:

mamba env create -f environment.yaml
conda activate kGWASflow

1a. Alternative installation without mamba

You can also install snakemake and the other dependencies without mamba and just using conda as below:

# This assumes conda is installed in your local machine or computing environment
conda env create -f environment.yaml
conda activate kGWASflow

Other Options:

The other options on how to deploy this workflow can be found in the Snakemake Workflow Catalog.


Configuration

Configure the workflow according to your needs by modifying the files in the config/ folder.

  • config/config.yaml is a YAML file containing the workflow configuration.

  • config/samples.tsv is a TSV file containing the sample information.

  • config/phenos.tsv is a TSV file contains the phenotype information.

For more information, please click here.


Usage

After changing into the kGWASflow directory and activating the kGWASflow conda environment, you can start using the workflow as below:

1. Test your configuration by performing a dry-run

snakemake -n --use-conda 

2. Run the workflow and install software dependencies

snakemake --cores all --use-conda

If you want to run the workflow with a different config.yaml file, you can us the --configfile parameter to specify it:

snakemake --use-conda --configfile <path/to/config.yaml>

The usage of this workflow is also described in the Snakemake Workflow Catalog.


Authors

kGWASflow was developed by Adnan Kivanc Corut .


Citation

If you use kGWASflow in your research, please cite using the DOI: 10.5281/zenodo.7290926 and the original method paper by Voichek et al. (2020):

Kivanc Corut. akcorut/kGWASflow: v1.0.0. (2022). https://doi.org/10.5281/zenodo.7290926

Voichek, Y., Weigel, D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes.
Nat Genet 52, 534–540 (2020). https://doi.org/10.1038/s41588-020-0612-7

License

kGWASflow is licensed under the MIT license.