Thorough easy-to-use resistome profiling bioinformatics pipeline for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens using Illumina Whole-genome sequencing (WGS) paired-end reads
The evolution of the genomics era has led to generation of sequencing data at an unprecedented rate. Many bioinformatics tools have been created to analyze this data; however, very few tools can be utilized by individuals without prior reasonable bioinformatics training
rMAP(Rapid Microbial Analysis Pipeline) was designed using already pre-existing tools to automate analysis WGS Illumina paired-end data for the clinically significant ESKAPE group pathogens. It is able to exhaustively decode their resistomes whilst hiding the technical impediments faced by inexperienced users. Installation is fast and straight forward. A successful run generates a .html report that can be easily interpreted by non-bioinformatics personnel to guide decision making
The rMAP pipeline toolbox is able to perform:
- Download raw sequences from NCBI-SRA archive
- Run quality control checks
- Adapter and poor quality read trimming
- De-novo assembly using shovill or megahit
- Contig and scaffold annotation using prokka
- Variant calling using freebayes and annotation using snpEff
- SNP-based phylogeny inference using Maximum-Likelihood methods using iqtree
- Antimicrobial resistance genes, plasmid, virulence factors and MLST profiling
- Insertion sequences detection
- Interactive visual
.HTMLreport generation using R packages and Markdown language
Install Miniconda by running the following commands:
For Linux Users: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
For MacOS Users: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
export PATH=~/miniconda3/bin:$PATH and source using source ~/.bashrc
git clone https://github.com/GunzIvan28/rMAP.git
cd rMAP
conda update -n base -y -c defaults conda
Select the appropriate installer for your computer (either rMAP-1.0-Linux-installer.yml or rMAP-1.0-macOs-installer.yml)
For Linux Users: conda env create -n rMAP-1.0 --file rMAP-1.0-Linux-installer.yml
For MacOS Users: conda env create -n rMAP-1.0 --file rMAP-1.0-macOs-installer.yml
conda activate rMAP-1.0
bash setup.sh
cd && bash clean.sh
rm -rf clean.sh
rMAP -h
This is rMAP 1.0
Developed and maintained by Ivan Sserwadda & Gerald Mboowa
SYPNOSIS:
Bacterial analysis Toolbox for profiling the Resistome of ESKAPE pathogens using WGS paired-end reads
USAGE:
rMAP [options] --input <DIR> --output <OUTDIR> --reference <REF>
GENERAL:
-h/--help Show this help menu
-v/--version Print version and exit
-x/--citation Show citation and exit
OBLIGATORY OPTIONS:
-i/--input Location of the raw sequences to be analyzed by the pipeline [either .fastq or .fastq.gz]
-o/--output Path and name of the output directory
-r/--reference Path to reference genome(.gbk). Provide '.gbk' to get annotated vcf files and insertion
sequences [default="REF.gbk"]
-c/--config Install and configure full software dependencies
OTHER OPTIONS:
-d/--download Download sequences from NCBI-SRA. Requires 'list.txt' of sample ids saved at $HOME
directory
-f/--quality Generate .html reports with quality statistics for the samples
-q/--trim Trims adapters off raw reads to a phred quality score[default=27]
-a/--assembly Perform De novo assembly [default=megahit] Choose either 'shovill' or 'megahit'
-vc/--varcall Generates SNPs for each sample and a merged 'all-sample ID' VCF file to be used to infer
phylogeny in downstream analysis
-t/--threads Number of threads to use <integer> [default=4]
-m/--amr Profiles any existing antimicrobial resistance genes, virulence factors, mlsts and plasmids
present within each sample id.
-p/--phylogeny Infers phylogeny using merged all-sample ID VCF file to determine diversity and evolutionary
relationships using Maximum Likelihood(ML) in 1000 Bootstraps
-s/--pangenome Perform pangenome analysis. A minimum of 3 samples should be provided to run this option
-g/--gen-ele Interrogates and profiles for mobile genomic elements(MGE) and insertion sequeces(IS) that
may exist in the sequences
For further explanation please visit: https://github.com/GunzIvan28/rMAP
Before starting the pipeline, run the command below to install and enjoy the full functionality of the software. This is done only once
rMAP -t 8 --config or rMAP -t 8 -c
Using a sample-ID 'list.txt' saved at $HOME, use rMAP to download sequences from NCBI-SRA
rMAP -t 8 --download
Perform a full run of rMAP using
rMAP -t 8 --reference full_genome.gbk --input dir_name --output dir_name --quality --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele
The short notation for the code above can be run as follows:
rMAP -t 8 -r full_genome.gbk -i dir_name -o dir_name -f -a shovill -m -vc -q -p -s -g
-c | --configThis installs R-packages and other dependancies required for downstream analysis. It is run only once, mandatory and the very first step performed before any analysis-i | --inputLocation of sequences to be analyzed either in.fastqor.fastq.gzformats. If reads are not qzipped, rMAP will compress them for the user for optimization-o | --outputName of directory to output results. rMAP creates the specified folder if it does not exist-r | --referenceProvide the recommended reference genome ingenbankformat renamed with extension.gbke.greference_name.gbkrequired for variant calling. A reference infastaformat e.greference_name.fastaorreference_name.facan be used but will not produce annotated vcf files
-o | --downloadThis option downloads sequences from NCBI-Sequence Read Archive. Create a text file'list.txt'containing the IDs of the samples to be downloaded and save it at$HOMEdirectory. The downloaded samples will be saved at$HOME/SRA_READS-f | --qualityGenerates quality metrics for the input sequences visualized as.htmlreports-q | --trimIdentifies and trims illumina library adapters off the raw reads and poor quality reads below a phred quality score of27with minimum length of80bp set as the default for the software-a | --assemblyPerforms De-novo assembly for the trimmed reads. Two assemblers are available for this step:shovillormegahit. Selecting"shovill"will perform genome mapping and several polishing rounds with removal of 'inter-contig' gaps to produce good quality contigs and scaffolds but is SLOW. Selecting"megahit"produces contigs with relatively lower quality assembly metrics but is much FASTER-vc | --varcallMaps reads to the reference genome and callsSNPssaved in vcf format. A merged 'all-sample ID' VCF file to be used to infer phylogeny in downstream analysis is also generated at this stage-t | --threadsSpecifies the number of cores to use as an integer. Default cores are set to 4-m | --amrProvides a snapshot of the existing resistome (antimicrobial resistance genes, virulence factors, mlsts and plasmids) present in each sample id-p | phylogenyUses the vcf file containing SNPs for all of the samples combined as an input, transposes it into a multiple alignment fasta file and infers phylogenetic analysis using Maximum-Likelihood method. The trees generated are in 1000 Bootstrap values-s | --pangenomePerforms pangenome analysis for the samples using Roary. A minimum of 3 samples is required for this step-g | --gene-eleThis interrogates for any Insertion Sequences that may have been inserted anywhere within the genomes of the samples. These sequences are compared against a database of the commonly reported insertion Sequences found in organisms originating from the ESKAPE fraternity-h | --helpShows the main menu-v | versionPrints software version and exits-x | citationShows citation and exits
A sample of the interractive report generated from the pipeline can be viewed at this link
Not yet published
This pipeline was written by Ivan Sserwadda GunzIvan28 and Gerald Mboowa gmboowa. If you want to contribute, please open an issue or a pull request and ask to be added to the project - everyone is welcome to contribute
This softwares' foundation is built using pre-existing tools. When using it, please don't forget to cite the following:
- ABRicate=1.0.1
- FastQC=0.11.9
- MultiQC=1.9
- Snippy=4.3.6
- SnpEff=4.5covid19
- AMRFinderPlus=3.8.4
- Prokka=1.14.6
- Prodigal=2.6.3
- Freebayes=1.3.2
- Unicycler=0.4.8
- Mlst=2.19.0
- Assembly-stats=1.0.1
- SRA-Tools=2.10.8
- BWA=0.7.17
- Kleborate=1.0.0
- Mafft=7.471
- Quast=5.0.2
- R-base=4.0.2
- Trimmomatic=0.39
- Megahit=1.2.9
- Parallel=20200722
- Shovill=1.0.9
- Vt=2015.11.10
- Fasttree=2.1.10
- Samclip=0.4.0
- Nextflow=20.07.1
- Any2fasta=0.4.2
- Biopython.convert=1.0.3
- Iqtree=2.0.3
- Bmge=1.12
- Tormes=1.1
- Samtools=1.9
- Roary=3.13.0
- ISmapper=2.0.1
- Cairosvg=2.4.2
The software developing team works round the clock to ensure the bugs within the program are captured and fixed. For support or any inquiry: You can submit your query using the Issue Tracker