0% found this document useful (0 votes)
65 views7 pages

Roarytutorial

Uploaded by

nhupydoan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views7 pages

Roarytutorial

Uploaded by

nhupydoan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Estimating Pangenomes with Roary

Farrah Sitto1 and Fabia U. Battistuzzi*,1,2


1
Department of Biological Sciences, Oakland University, Rochester, MI
2
Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI
*Corresponding author: E-mail: battistu@oakland.edu.
Associate editor: Barry G. Hall

Abstract
A description of the genetic makeup of a species based on a single genome is often insufficient because it ignores the
variability in gene repertoire among multiple strains. The estimation of the pangenome of a species is a solution to this
issue as it provides an overview of genes that are shared by all strains and genes that are present in only some of the
genomes. These different sets of genes can then be analyzed functionally to explore correlations with unique phenotypes

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


and adaptations. This protocol presents the usage of Roary, a Linux-native pangenome application. Roary is a straight-
forward software that provides 1) an overview about core and accessory genes for those interested in general trends and,
also, 2) detailed information on gene presence/absence in each genome for in-depth analyses. Results are provided both
in text and graphic format.
Key words: Roary, pangenome, core genes, accessory genes.

Protocol therefore, a good starting point for the novice to pangenome


analyses. The potentially most challenging aspect of using
The concept of a pangenome, the collection of all genes Roary is its command-line interface, which doubles as
shared by multiple strains of a species, was first introduced strength because it makes it easy to be integrated into com-
by Tettelin et al. (2005) and has been selectively applied to putational pipelines or large-scale analyses. To acquire some
investigate genomic variability at the species level in a few basic knowledge of command-line interface in Linux there are
tens of species (both prokaryotes and eukaryotes) (Vernikos many online resources, such as https://ryanstutorials.net/
et al. 2015; McInerney et al. 2017). Since then, the applicability
linuxtutorial/commandline.php; last accessed December 9,
of the pangenome concept has grown alongside the expo-
2019 or https://maker.pro/linux/tutorial/basic-linux-com-
nential increase in sequenced genomes for subspecies lineages
mands-for-beginners; last accessed December 9, 2019, that
(e.g., strains, isolates, subspecies). The power of knowing the
will help users better understand the step-by-step process
pangenome of a species resides in (i) guiding sequencing
to install and use Roary described below.
efforts to identify new unexplored genetic diversity within a

Protocol
species (represented by an open pangenome), (ii) providing
information on shared and unique traits of strains within a Step 1: Installation of Roary
species (exemplified by core and accessory genes), and, more Roary is a Linux-native software that can be installed on Linux,
recently, (iii) using it to identify species boundaries (repre- MacOSX, and Windows machines in a variety of ways. In this
sented by a high frequency of core genes). section, we will provide a series of commands that will allow
These large-scale applications of a pangenome necessitate you to install Roary in a Linux environment (see Step 5 for
a fast and accurate software that can analyze and produce installation in different operating system) (we show com-
results for tens or hundreds of lineages in a reasonable mands to be typed with a different font). The easiest way
amount of computational time. One such software is Roary to run Roary is to install it in a Linux environment using the
(Page et al. 2015), a Linux-native software that takes as inputs package manager “conda,” which is part of the Anaconda
GFF3 (General Feature Formats version 3) files (easily obtain- distribution. This will work also in a MacOSX environment
able from NCBI) and outputs a series of files with statistics on and the Linux Subsystem in Windows with very minor mod-
genes shared by all or most (core and soft core genes) lineages ifications (see Step 5).
or only by some genomes (accessory, further subdivided into The first step is to download Anaconda (https://www.an-
shell and cloud genes). This software is complemented by aconda.com/distribution/; last accessed December 9, 2019)
python scripts and other software that produce a graphical for the appropriate operating system and select the most
view of the results. recent version of Python that is supported and updated reg-
Although other software are available for pangenome ularly (currently it is Python 3.7) (e.g., for Linux: Anaconda3-
reconstructions, such as PGAP, PanX, get_homologues, and 2019.03-Linux-x86_64.sh). Open a terminal window and type
Pantools (Zhao et al. 2012; Contreras-Moreira and Vinuesa bash /Downloads/Anaconda3-2019.03-Linux-
2013; Sheikhizadeh et al. 2016; Ding et al. 2018), we found x86_64.sh (if the file was downloaded in a different direc-
Roary to be the simplest and most flexible to use and, tory change /Downloads to the correct location). Press
ß The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any me-
dium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Open Access
Mol. Biol. Evol. 37(3):933–939 doi:10.1093/molbev/msz284 Advance Access publication December 17, 2019 933
Sitto and Battistuzzi . doi:10.1093/molbev/msz284 MBE

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


FIG. 1. Roary help file. The list of options available to complete an analysis with Roary is shown with the command: roary –h.

Enter to start the installation and space bar to visualize the To check whether installation is successful type roary -h to
license agreement. You will be prompted to accept the de- visualize the list of parameters Roary uses (fig. 1). The location
fault location for installation by pressing Enter (or change the in which Roary is now installed does not have to also be the
installation location), and the installation will start (it can take one that will include your input and output files. We suggest
a minute or so to start seeing progress on the screen). Finally, creating a separate directory in which to upload the input files
answer “yes” to initialize Anaconda3 by running conda init and where the output files will be saved.
and, at the end, you will see “Thank you for installing
Anaconda!” Enter the command source /.bashrc Step 2: Input Files
for the installation to take effect. These instructions can The format of the input files for Roary is GFF3 (General
also be found at https://docs.anaconda.com/anaconda/in- Feature Format version 3). This format includes a series of
stall/linux/; last accessed December 9, 2019. To test the in- information in a specific order and needs to be followed
stallation, type in the Linux terminal conda –V and it will strictly for Roary to accept the input file (see https://github.
return the version of conda you just installed. Once conda has com/The-Sequence-Ontology/Specifications/blob/master/gff3.
been installed correctly, the next step is to create an environ- md; last accessed December 9, 2019 for a description of the
ment in which Roary will run. This can be achieved with the format). There are two primary ways to obtain GFF3 files: from
following command at the command prompt (shown in the NCBI website or from the software Prokka by converting
Linux as $): conda create –name Roaryenv (note .fna files into GFF3 (see Step 5). An easy way to obtain the input
that you can use any name for the environment instead of files without additional software installation is to download
Roaryenv). In order to work within this environment, you will genome *.gbff files from NCBI and then run the bp_gen-
need to activate it (this step will need to be repeated every bank2gff3.pl script. This is a Perl script that is installed
time you open a new terminal window): source acti- along with Roary and that can be found in the Roary
vate Roaryenv. conda environment in the directory “bin.” It is also avail-
Next, install Roary in your newly created environment with able through BioPerl (https://bioperl.org/INSTALL.html;
the following 5 “conda config” commands: last accessed December 9, 2019) and can be easily run
in the terminal window. Note that for this script to
conda config –-add channels r work, Perl needs to be installed in the system you are
conda config -–add channels using (https://www.activestate.com/products/active-
defaults perl/downloads/; last accessed December 9, 2019). For
conda config –-add channels conda- example, let us say that you are interested in estimating
forge the pangenome of three strains of Bifidobacterium ani-
conda config –-add channels malis A6, KLDS2.0603, and RH. From the Genome function
in NCBI (https://www.ncbi.nlm.nih.gov/genome; last
bioconda
accessed December 9, 2019) you can browse by organism
conda install roary
and search for B. animalis. The individual assemblies can

934
Estimating Pangenomes with Roary . doi:10.1093/molbev/msz284 MBE

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


FIG. 2. How to obtain input files in GFF3 format from NCBI. Links to proceed to download are circled in red.

be visualized by selecting “Prokaryotes.” After having Options in Roary fall broadly into three categories: file ac-
identified the strains of interest, select the GenBank FTP cess, analysis settings, and visualization. The “file access” set-
site on the right-hand side and download the *.gbff.gz file tings are the least likely to need modification. They include
for each of them (fig. 2). those that allow users to manipulate the location of inputs/
Next, move all the downloaded gbff files into a single di- outputs and the location (path) of where the software that
rectory (if you have used a Windows machine to download Roary depends on is located. Roary requires mcl, blastp,
the files, upload them into the Linux machine) and, from mcxdeblast, and makeblastdb that are installed along with
terminal, issue the command perl bp_gen- Roary within the environment in conda. However, users
bank2gff3.pl *.gbff.gz. can use a different location of these software, if preferred.
If you are using the perl script within the Roary environ- Additionally, users can provide directory names for outputs
ment, you will need to specify the path to the script (e.g., (option -f).
perl /home/Roaryenv/bin/bp_genbank2gff3. The “analysis settings” parameters allow users to refine the
pl). To identify the path of this perl script, use the command sensitivity of the analysis itself to identify core and accessory
which bp_genbank2gff3.pl. If your current working genes. These are most likely the parameters that users will
directory is not the same as the one where the gbff files are, want to modify to explore the robustness of the results to
either navigate into that directory and use the above com- variations. For computational speed, the -p option will allow
mand or add the path of the directory before the “*” (e.g., users to select the number of threads to use during the com-
perl /home/Roaryenv/bin/bp_genbank2gff3. putation. Many new computers are multicore with multi-
pl /home/Roary/Inputs/*.gbff.gz). This com- threads for each core, so selecting >1 (e.g., roary –f
mand will create as many output files as the input files all output_dir –p 10 *.gff) for this parameter is likely
with an extension *.gff. These will be the input files for Roary. to speed up the analysis. For the pangenome calculation,
the two most important parameters are the threshold (in
Step 3: Parameters and Commands percentage) of isolates required to define a core gene (-cd:
Roary can be run very easily with a single short command: default is 99%) and the minimum percentage identity for
roary *.gff (remember to activate the Roary environ- sequence comparisons performed by BlastP (-i: default is
ment [source activate Roaryenv] every time you 95%). Decreasing the threshold of isolates will increase the
use terminal window for the first time). number of core genes identified, and increasing the min-
This command will run Roary with default parameters (see imum identity will partition the genes in more and
below) from within a directory that contains all the gff3- smaller clusters.
converted files obtained from Step 2. All output files gener- Finally, to visualize results, Roary has a series of options.
ated will be located in this same directory, which could make The standard option, which requires no additions to the pre-
downstream analyses more difficult. To specify an output vious command, will produce a series of text outputs (see
directory, add the option -f to the command: roary –f Step 4). If the user desires an additional graphical output, the
output_dir *.gff (where output_dir is user-defined). option -r can be added to produce plots using R (this option
935
Sitto and Battistuzzi . doi:10.1093/molbev/msz284 MBE
will need R and ggplot2 to be installed). Note that the graphs uniquely present in one set of strains and not others. This
can also be obtained after the results have already been pro- kind of analysis can be done calling the query_pan_ge-
duced because Roary will output R formatted files in addition nome –a difference -–input_set_one 1.gff,
to text files. Finally, one of the most useful parameters for 2.gff –-input_set_two 3.gff, 4.gff –g
visualization is the possibility of creating alignments from core clustered_proteins (where the *.gff files are the
genes (options -e and -n). Such files are potentially important names of the genomes of interest in two subsets). Finally,
for downstream analyses including phylogenetic tree recon- the same query_pan_genome function can be used to
struction and SNP identification. Additional visualization output genes that are unique, shared by all, or shared by some
tools are provided as separate scripts and packages (e.g., roar- of the strains (e.g., query_pan_genome –a union –g
y_plots.py) that can be found on the main Roary website clustered_proteins *.gff).
(https://sanger-pathogens.github.io/Roary/; last accessed A good description of all the output files created by Roary
December 9, 2019). is available in the supplementary material of the Roary pub-
lication (Page et al. 2015) and, in a less detailed way, on the

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


Step 4: Interpretation of Output Files github page (https://sanger-pathogens.github.io/Roary/; last
A simple run of Roary will produce 17 output files, of which accessed December 9, 2019).
the summary_statistics.txt and the gene_presence_absen-
ce.csv are the most important. The summary_statistics text Step 5: Installation on MacOSX or Windows and Use
file reports the number of genes in each of four categories of Prokka
(core, soft, shell, and cloud) and also the total number of Installation on MacOSX
genes in the pangenome. These values effectively describe
Download Anaconda3 (https://www.anaconda.com/distri-
the nature of the pangenome of the species analyzed. The
bution/; last accessed December 9, 2019) for MacOSX
gene_presence_absence file provides additional information
and select the most recent version of Python that is sup-
including the individual gene IDs of sequences that belong to
ported and updated regularly (currently it is Python 3.7)
each of the categories in the summary statistic (although this
(e.g., for Anaconda3-2019.03-MacOSX-x86_64.sh). Follow
is not clearly stated, it can be easily inferred by calculating the
the instructions at https://docs.anaconda.com/anaconda/in-
ratio of the number of genes present in each cluster and the
stall/mac-os/; last accessed December 9, 2019, which are
total number of genomes analyzed).
very similar to those for the Linux operating system.
Other output files (starting with “number_of_”) provide
Once conda is installed, follow the instructions given for
information specific to each category (i.e., core or accessory).
Linux (see Step 2) to create a Roary environment and
It should be noted that for these outputs (e.g.,
install Roary.
number_of_conserved_genes.Rtab) the results for ten ran-
dom iterations of the input files are shown. This is important
because pangenome calculations will vary depending on the Installation on Windows
order in which genomes are added and results obtained from Because Roary is a native Linux software, it cannot run directly
multiple orders will allow to establish minimum and maxi- in Windows. There are two ways of running Roary on a
mum boundaries around the core and accessory gene esti- Windows machine: First, Windows 10 users (version 1709
mates. These files are provided in R format to facilitate and later) can install the Linux Subsystem on Windows; sec-
downstream analyses. One example of such an analysis is to ond, it can run within a virtual machine. For the first scenario,
obtain curves for the number of core and accessory genes to launch Control Panel > Programs and Features > Turn
determine whether the pangenome is closed or open Windows Features on or off and check “Windows
(Tettelin et al. 2005). This can be easily done using the Rtab Subsystem for Linux.” Then, open Microsoft store, search
outputs from Roary and the create_pan_genome_plots.R for “Linux,” and select the Linux distribution desired (e.g., in
script (available in the Roary conda environment). this tutorial we use Ubuntu). Install and launch the new dis-
Unfortunately, there is no statistical analysis carried out au- tribution and follow the prompts to complete the installation
tomatically on the curves but it can be done separately, for process in the command line window (https://docs.microsoft.
example, by fitting an exponential curve and calculating its com/en-us/windows/wsl/install-win10; last accessed
distance to the empirical curve through a least square December 9, 2019 and https://docs.microsoft.com/en-us/win-
method or using Heap’s law (Tettelin et al. 2008). dows/wsl/initialize-distro; last accessed December 9, 2019). To
To view results graphically, there are two outputs (ending be able to use Roary within your new Linux distribution, fol-
in _graph.dot) that allow the user to glean over information low the instructions described above for the Linux installa-
regarding the relative position of genes that belong to either tion. For the second scenario, download the VirtualBox
accessory or core categories. These files can be visualized using installer from VirtualBox (https://www.virtualbox.org/wiki/
the open-source software Gephi (www.gephi.org; last Downloads; last accessed December 9, 2019). Double click
accessed December 9, 2019) and can be useful, for example, the executable and proceed with the installation. Download
to investigate patterns in gene clusters such as operons. also the virtual machine (VM) created by the authors of Roary
An interesting additional feature of Roary is the possibility from ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/
of comparing different pangenomes to identify genes that are pathogens-vm.latest.ova. After starting the virtual box, go to

936
Estimating Pangenomes with Roary . doi:10.1093/molbev/msz284 MBE

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


FIG. 3. Virtual machine environment to run Roary on Windows. Circled in red are the icons to start the virtual machine and the terminal window.

File ! “Import appliance” and select the VM (*.ova file) you The contents of the shared directory are now visible from
downloaded. To start the VM, click on the green arrow icon the VM (ls/mnt/share/) and can be used to proceed
and a new window will open showing the VM desktop. On with a normal Roary installation for Linux. Input and output
the left-hand side, click on the terminal window icon (fig. 3) files for Roary can be exchanged through the shared folder if
and type sudo apt-get install virtualbox- the path is provided at the command line (e.g., roary –f /
guest-utils (the password is manager). mnt/share/RoaryVM/output /mnt/share/Roary
To be able to use Roary within the VM, you will follow the VM/input/*.gff).
Linux installation instructions. However, this requires that files
are shared between the host (Windows) and the VM. To
achieve this, a shared directory has to be created and used Prokka to Create Input Files
to exchange files. Within the Windows machine, go to the An alternative way to converting gbff files into input files for
Anaconda website and download the Linux version as shown Roary is to use Prokka. This is particularly useful when gbff
in Step 1. Save this file in a directory you will share with the files are not already available, as it may be the case for se-
VM. Then, switch to the VM, select Devices ! “Shared quencing projects that are in progress. First, using terminal in
folders” ! “Shared folder settings” and click on the “Add Linux (or in MacOSX or Windows) type conda install –
folder” icon on the right-hand side. Provide the path of the c conda-forge –c bioconda prokka. To check
location of the Anaconda installer, assign a name to the VM whether Prokka was installed correctly, type prokka –
(e.g., RoaryVM), a path where it will be mounted (e.g., /mnt/ h and the menu options of Prokka will be listed.
share/) and check “auto mount” and “make permanent” to Next, download *.genomic.fna.gz files from NCBI for the
ensure that the folder will be recognized upon restart of the strains of interest, extract them, and upload these uncom-
VM. Then, in the VM terminal, type sudo mkdir/mnt/ pressed files into the Linux/MacOSX/Windows machine. In
share/(the password is again manager) and then sudo the terminal window type: prokka –kingdom
mount -t vboxsf RoaryVM/mnt/share/. If the Bacteria –outdir prokka_GCA_XXXXX –genus
shared folder is not visible, repeat the mounting command. YYYYY –locustag GCA_XXXXX GCA_XXXXX_ASMZ

937
Sitto and Battistuzzi . doi:10.1093/molbev/msz284 MBE
ZZZZ_genomic.fna where XXXXX is the genome and and Gelfand 2018). Defining prokaryotic species bound-
ZZZZZ is the assembly number of one of the strains and aries is a long-standing issue that, for now, has been
YYYYY is the genus of the same strain (e.g., for one of the approached using DNA similarity thresholds (e.g., average
three B. animalis strains mentioned in Step 1: prokka – nucleotide identity measures; Jain et al. 2018). However, a
kingdom Bacteria –outdir prokka_GCA_ pangenome approach has the advantage of adding an
000816205 –genus Bifidobacterium –locus- evolutionary perspective by considering not only identity
tag GCA_000816205 GCA_000816205.1_ASM81 (-i parameter in Roary) but also orthology/paralogy and
620v1_genomic.fna). Repeat for all the strains (each gene flow (Bobay and Ochman 2017; Moldovan and
strain will take a few minutes to process). Each run will pro- Gelfand 2018).
duce multiple output files, one of which is the GFF3 format Finally, pangenome results can be used to investigate the
required by Roary. correlation between the spread of some genes and the traits
they encode. A corollary software, Scoary (Brynildsrud et al.
2016), is available to work with Roary’s outputs to identify
Applications of a Pangenome

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


those genes (core or accessory) that are associated with spe-
The concept of a pangenome has become useful in many cific traits. Such analysis could explain current trait distribu-
different fields, from classification to genome evolution. The tions and the evolutionary history of those traits (Abreo and
original and most typical application of the results of a pan- Altier 2019).
genome analysis is to identify the cumulative curve of genetic
variability that can be attributed to a species as more and Alternative Resources
more individual genomes are sequenced. In a sense, this way
of analyzing prokaryotes (or viruses) mirrors basic population The number of software that can estimate a pangenome is
genetic studies in eukaryotes where the sequencing of mul- growing. Originally, Roary was compared with a few other
tiple individuals is necessary to understand the range of poly- software, like PGAP, and was shown to be computationally
morphisms within a species (Muzzi and Donati 2011; Nguyen more efficient (speed and memory usage) while producing
et al. 2015). Indeed, pangenome approaches are starting to be comparable results (Page et al. 2015). Other tools that have
used in read mapping software to account for polymor- been developed since Roary was released include PGAP-X,
phisms that would otherwise be lost or lead to errors in PanTools, and panX (Sheikhizadeh et al. 2016; Ding et al.
read alignments (Nguyen et al. 2015; Eggertsson et al. 2017). 2018; Zhao et al. 2018). PGAP-X is unique in its visualization
In the case of the pangenome, gene counts are used as proxy features that allow to observe the alignment of multiple
of genetic variability with genes unique and new to each genomes at once. PanTools, instead, fills a unique niche be-
strain adding to the overall genetic makeup of a species. cause it is built to analyze eukaryotic genomes (with genes
The expectation is that, as the number of strains analyzed that have introns and exons) that Roary cannot analyze.
grows, the number of new genes will approach 0 and the total Finally, panX differs from Roary because it is able to analyze
size of the pangenome will stabilize (reaching a plateau in an genomes with higher genomic diversity between them
initially exponential curve) leading to the definition of a whereas Roary is recommended for highly similar (within
closed pangenome (Tettelin et al. 2005, 2008). If the plateau- species) genomes.
ing is not observed, the pangenome is defined open and it is
expected that more genomes will need to be sequenced to be Acknowledgments
able to estimate the total genetic complement of the species.
We thank Cody Clark and Victoria Hall for installation and
Tettelin et al. (2008) have proposed to compare the new
testing of Roary. This work was supported by the National
genes’ accumulation curve with Heaps’ law to determine sta-
Institute of General Medical Sciences at the National Institute
tistically whether a pangenome is open or closed. However,
of Health (R15GM121981 to F.U.B.) and the National
even with this statistical framework, it is not possible to eval-
Aeronautics and Space Administration (NNX16AJ30G to
uate the functional weight, if any, of each new gene and,
F.U.B.).
therefore, their biological importance or evolutionary driving
force remains unknown. In other words, it is possible that new
genes identified in a strain will not be maintained within that References
genome over long evolutionary time frames (because of se- Abreo E, Altier N. 2019. Pangenome of Serratia marcescens strains from
lective or neutral forces; McInerney et al. [2017] but see also nosocomial and environmental origins reveals different populations
Rodriguez-Valera and Ussery [2012]) and, therefore, they may and the links between them. Sci Rep. 9(1):46.
Bobay L-M, Ochman H. 2017. Biological species are universal across life’s
not effectively contribute to the long-term genetic makeup of domains. Genome Biol Evol. 9(3):491–501.
the species. Additionally, considering the very small number Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. 2016. Rapid scoring of
of sequenced genomes available compared with predicted genes in microbial pan-genome-wide association studies with
species numbers (Locey and Lennon 2016), it is possible Scoary. Genome Biol. 17(1):238.
that a newly sequenced strain will reopen a currently closed Contreras-Moreira B, Vinuesa P. 2013. GET_HOMOLOGUES, a versatile
software package for scalable and robust microbial pangenome anal-
pangenome. ysis. Appl Environ Microbiol. 79(24):7696–7701.
A more recent application of pangenomes is to better Ding W, Baumdicker F, Neher RA. 2018. panX: pan-genome analysis and
define the concept of species in prokaryotes (Moldovan exploration. Nucleic Acids Res. 46(1):e5.

938
Estimating Pangenomes with Roary . doi:10.1093/molbev/msz284 MBE
Eggertsson HP, Jonsson H, Kristmundsdottir S, Hjartarson E, Kehr B, scale prokaryote pan genome analysis. Bioinformatics
Masson G, Zink F, Hjorleifsson KE, Aslaug J, Adalbjorg J, et al. 2017. 31(22):3691–3693.
Graphtyper enables population-scale genotyping using pangenome Rodriguez-Valera F, Ussery DW. 2012. Is the pan-genome also a pan-
graphs. Nat Genet. 49(11):1654–1660. selectome? F1000Res. 1:16.
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. Sheikhizadeh S, Schranz ME, Akdel M, de Ridder D, Smit S. 2016.
High throughput ANI analysis of 90K prokaryotic genomes reveals PanTools: representation, storage and exploration of pan-genomic
clear species boundaries. Nat Commun. 9(1):5114. data. Bioinformatics 32(17):i487–i493.
Locey KJ, Lennon JT. 2016. Scaling laws predict global microbial diversity. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL,
Proc Natl Acad Sci U S A. 113(21):5970–5975. Angiuoli SV, Crabtree J, Jones AL, Durkin AS, et al. 2005. Genome
McInerney JO, McNally A, O’Connell MJ. 2017. Why prokaryote have analysis of multiple pathogenic isolates of Streptococcus agalactiae:
pangenomes. Nat Microbiol. 2(4):17040. implications for the microbial “pan-genome.” Proc Natl Acad Sci U S
Moldovan MA, Gelfand MS. 2018. Pangenomic definition of prokaryotic A. 102(39):13950–13955.
species and the phylogenetic structure of Prochlorococcus spp. Front Tettelin H, Riley D, Cattuto C, Medini D. 2008. Comparative geno-
Microbiol. 9:428. mics: the bacterial pan-genome. Curr Opin Microbiol.
Muzzi A, Donati C. 2011. Population genetics and evolution of the pan- 11(5):472–477.
genome of Streptococcus pneumoniae. Int J Med Microbiol. Vernikos G, Medini D, Riley DR, Tettelin H. 2015. Ten years of pan-

Downloaded from https://academic.oup.com/mbe/article/37/3/933/5652084 by guest on 27 February 2024


301(8):619–622. genome analyses. Curr Opin Microbiol. 23:148–154.
Nguyen N, Hickey G, Zerbino DR, Raney B, Earl D, Armstrong J, Kent WJ, Zhao Y, Sun C, Zhao D, Zhang Y, You Y, Jia X, Yang J, Wang L, Wang J, Fu
Haussler D, Paten B. 2015. Building a pan-genome reference for a H, et al. 2018. PGAP-X: extension on pan-genome analysis pipeline.
population. J Comput Biol. 22(5):387–401. BMC Genomics 19(1 Suppl):36.
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Zhao Y, Wu J, Yang J, Sun S, Xiao J, Yu J. 2012. PGAP: pan-genomes
Fookes M, Falush D, Keane JA, Parkhill J. 2015. Roary: rapid large- analysis pipeline. Bioinformatics 28(3):416–418.

939

You might also like