Overview

Given a set of genome fasta files, generate primer candidates (for qPCR) that is specific to each genome.

Approach

Focus on coding sequences - first annotate each genome to extract protein coding sequences.
Concatenate every genome fasta file to make an “excluded” genome. For every genome G, primers specific to G should not bind to G’, the excluded genome.
Index each genome and its cognate excluded genome using bowtie2.
Propose primer candidates with parameters suitable for qPCR using primer3. Format the forward and reverse primers as “reads”
Align these primer “read” files to both the genome and the excluded genome.
Filter out primer pairs with non specific binding. I am using bowtie2 with a default alignment range of [0,5000]. This is meant to filter out short confordant alignments.

Installation and setup notes

Setup conda with python 3.11 and snakemake 8.11.3. My initial choice of using python3.11 is affecting a lot of downstream decisions, because most conda packages still require python<3.11.
Installed the snakemake executor plugin for slurm to work
I am using two custom conda environments.
- bakta has weird dependency issues with python 3.11
  - bakta also has weird issues currently with its reliance on amrfinderplus. I have had to comment out the parts of the bakta code which handle this.
- bowtie2 was easier to install in a separate environment for some reason.
I am using bakta because I have not been able to install prokka with my current setup.
Finally, this pipeline is setup to use slurm for parallelizing tasks.
Applied this patch to get a local env woring https://github.com/snakemake/snakemake/compare/main…ShogoAkiyama:snakemake:conda-url-env-bug
bakta performance is affected by a lot of network IO (oschwengers/bakta#282). Typical runs I’ve observed are ~30 minutes.

[2024-12-07 Sat]

For gene finding I am using (meta)prodigal for viruses, and genemark-et for eukaryotic (fungal genomes).
- The latter two don’t get filtered in any way, though I could possibly rely on VAPID for viral annotations.
- It appears that viral genomes typically have polyprotein encoding CDSs, so the number of total CDSes might end up being low.

Usage

Create a directory, say RUNNAME.
Modify config.yaml to point run_path to RUNNAME.
In RUNNAME/genome/ copy the individual genome FASTA files that are the targets for primer generation.
If using slurm modify the supplied sbatch file to reflect the partition parameters, and run using sbatch run_primergen.sbath

If running from the commandline, use something like

snakemake --snakefile primergen.snakefile -p\
 --rerun-incomplete\
 --cores 4\
 --use-conda\
 --configfile config.yaml\
 -j 100\
 --keep-incomplete --latency-wait 60

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data/templates		data/templates
img		img
src		src
.gitignore		.gitignore
config.yaml		config.yaml
primergen.snakefile		primergen.snakefile
readme.org		readme.org
run_primergen.sbatch		run_primergen.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Approach

Installation and setup notes

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Approach

Installation and setup notes

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages