0% found this document useful (0 votes)
6 views13 pages

Cannabis sativa Genetic Insights

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Cannabis sativa Genetic Insights

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

ORIGINAL RESEARCH

published: 21 December 2018


doi: 10.3389/fpls.2018.01876

Latitudinal Adaptation and Genetic


Insights Into the Origins of Cannabis
sativa L.
Qingying Zhang 1*† , Xuan Chen 1† , Hongyan Guo 1 , Luisa M. Trindade 2 ,
Elma M. J. Salentijn 2 , Rong Guo 1 , Mengbi Guo 1 , Yanping Xu 1 and Ming Yang 1*
1
Industrial Crops Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China, 2 Wageningen University
and Research Plant Breeding, Wageningen University and Research, Wageningen, Netherlands

Cannabis is one of the most important industrial crops distributed worldwide. However,
the phylogeographic structure and domestication knowledge of this crop remains poorly
understood. In this study, sequence variations of five chloroplast DNA (cpDNA) regions
were investigated to address these questions. For the 645 individuals from 52 Cannabis
accessions sampled (25 wild populations and 27 domesticated populations or cultivars),
Edited by: three haplogroups (Haplogroup H, M, L) were identified and these lineages exhibited
Donald Lawrence Smith,
McGill University, Canada distinct high-middle-low latitudinal gradients distribution pattern. This pattern can most
Reviewed by: likely be explained as a consequence of climatic heterogeneity and geographical
Yoshiko Shimono, isolation. Therefore, we examined the correlations between genetic distances and
Kyoto University, Japan
geographical distances, and tested whether the climatic factors are correlated with the
Umesh K. Reddy,
West Virginia State University, cpDNA haplogroup frequencies of populations. The “isolation-by-distance” models were
United States detected for the phylogeographic structure, and the day-length was found to be the most
*Correspondence: important factor (among 20 BioClim factors) that influenced the population structures.
Qingying Zhang
hempzhangqingying@126.com; Considering the distinctive phylogeographic structures and no reproductive isolation
Ming Yang among members of these lineages, we recommend that Cannabis be recognized as
ymhemp@163.com
a monotypic genus typified by Cannabis sativa L., containing three subspecies: subsp.
† These authors have contributed sativa, subsp. Indica, and subsp. ruderalis. Within each haplogroup which possesses a
equally to this work
relatively independent distribution region, the wild and domesticated populations shared
Specialty section:
the most common haplotypes, indicating that there are multiregional origins for the
This article was submitted to domesticated crop. Contrast to the prevalent Central-Asia-Origin hypothesis of C. saltiva,
Crop and Product Physiology,
molecular evidence reveals for the first time that the low latitude haplogroup (Haplogroup
a section of the journal
Frontiers in Plant Science L) is the earliest divergent lineage, implying that Cannabis is probably originated in low
Received: 19 August 2018 latitude region.
Accepted: 05 December 2018
Keywords: Cannabaceae, industrial hemp, genetic diversity, phylogeography, cpDNA
Published: 21 December 2018

Citation:
Zhang Q, Chen X, Guo H,
Trindade LM, Salentijn EMJ, Guo R,
INTRODUCTION
Guo M, Xu Y and Yang M (2018)
Latitudinal Adaptation and Genetic
Cannabis is one of the oldest crops and has been distributed worldwide by humans. This plant
Insights Into the Origins of Cannabis may have been utilized for at least 10,000 years (Schultes et al., 1974; Long et al., 2016), and its
sativa L.. Front. Plant Sci. 9:1876. cultivation in China can be traced back to around 6,000 years ago according to the archaeological
doi: 10.3389/fpls.2018.01876 findings and records of ancient literatures (Li, 1974; Yang, 1991). Cannabis has been developed

Frontiers in Plant Science | www.frontiersin.org 1 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

as a multi-purpose crop, which is widely used for the production ranging from about 23 to 51◦ N, 80 to 125◦ E. China has been
of biomaterials such as textile, paper, construction, and insulation a major hemp growing country with the largest cultivation area
materials, but also as functional foods, namely the oil and seeds, and has developed many landraces and cultivars. China is part of
and for other applications including cosmetics and personal care the potential center of origin for cannabis, with abundant genetic
products, and in the pharmaceutical industry. The global market resources in wild populations but also developed cultivars, thus
for hemp has been estimated to consist of more than 25,000 provides a unique opportunity to investigate the domestication
products (Johnson, 2013; Salentijn et al., 2015). In recent years, origin of cannabis plants. However, the wild populations of
the hemp industry has increasingly received attention and the cannabis have been poorly studied, and the genetic diversity and
development of high value products has been the main focus structure of these populations, as well as the relationships among
of various studies (Amaducci et al., 2015). Especially, cannabis the wild populations and the domesticated cultivars remains
plants can produce more than 100 pharmacologically active largely unknown.
compounds (cannabinoids), with the most studied compounds Chloroplast DNA (cpDNA) markers and phylogeographic
being tetrahydrocannabinol (THC) and cannabidiol (CBD), and methods have been proven to be very useful tools in investigating
CBD has sparked an increasing interest for product development. genetic diversity, population structure, domestication origin,
Based on the content of cannabinoids in this herbaceous and historical context of species (Avise, 2000, 2004, 2009).
annual crop, cannabis plants have been often classified as hemp, The cpDNA is a haploid (and thus are homoplasmic), non-
mostly referring to a fiber crop with low tetrahydrocannabinol recombining genome that is maternally inherited in most
(THC) and marijuana, the drug type with often high THC angiosperms (Schaal et al., 1998; Avise, 2009). However, like
content. This plant comprises both wild and domesticated many other plant species, cannabis cpDNA displayed very low
populations which can be either dioecious or monoecious genetic diversity (Gilmore et al., 2007; Zhang et al., 2017;
cultivars. The flowering is very sensitive to photoperiod Mcpartland and Hegman, 2018). A key to successful utilization
and cultivars can be early-, intermediate-, and late-ripening. of cpDNA markers for estimating diversity and phylogenetic
Compared to the domesticated cannabis, the wild forms usually relationships among populations of Cannabis species requires
exhibit the following distinct morphological and physiological obtaining sufficient genetic variation in cpDNA and developing
features: remarkably smaller seeds (mature achene, thousand suitable cpDNA markers. In this study, based on scrutinizing
seed weight <10 g), easy seed shattering behavior (seeds readily differences in the whole chloroplast genomes DNA sequences
disarticulate from the pedicel), long-term seed dormancy and of four Cannabis accessions (Oh et al., 2016; Vergara et al.,
the need for cold-moist stratification treatment to facilitate 2016), we developed five DNA markers for the most variable
germination. For a long time, researchers have disputed the polymorphic regions and investigated the genetic diversity of
taxonomy of Cannabis regarding the definitions of species, an extensive set of Cannabis samples. These samples include
subspecies, and/or varieties (McPartland and Guy, 2004, 2014; wild populations, representative landraces and breeding cultivars
Hillig, 2005; Gilmore et al., 2007; Small, 2015). The issue of from China, as well as some accessions from other countries
Cannabis taxonomy continues to puzzle botanical taxonomists (The Netherlands, France, Hungary, Italy, Russia, Nigeria, Korea,
(Piomelli and Russo, 2016; Welling et al., 2016; Mcpartland and and USA). Our main objectives were: (1) to estimate the genetic
Guy, 2017; Mcpartland and Hegman, 2018). Linnaeus named diversity and elucidate the distribution patterns of the wild and
Cannabis sativa L. (hereafter as C. sativa) as a unique species. domesticated cannabis from China; (2) to determine the main
Later on, two species, C. indica Lam. (1785) and C. ruderalis factors that affected the spatial distribution of cannabis and
Jan. (1924), were split from C. sativa based on certain distinct provide information on historical processes of this plant; (3)
morphological Characteristics (Hillig and Mahlberg, 2004), while to infer the genetic relationships between the populations or
Small and coauthors recommended retaining only one species lineages, as well as domestication origins of cannabis cultivars in
(C. sativa) but including two subspecies, subsp. sativa and subsp. China.
indica, where each subspecies includes both domesticated and
wild varieties (Small and Cronquist, 1976; Small, 2015). Recently, MATERIALS AND METHODS
based on allozyme analysis results, Hillig (2005) suggested a
taxonomic concept of three species (C. sativa, C. indica, and C. Plant Material
ruderalis) including seven putative taxa in the genus Cannabis. The studied material comprised 645 Cannabis individuals
Germplasm collections of Cannabis are the most valuable (derived from 52 accessions: 25 wild populations and 27
fundamental materials for breeding as they are a potential source domesticated populations or cultivars), and four closely related
of novel genes controlling important traits such as increased seed out group species, Humulus scandens, Humulus yunnanensis,
productivity, improved qualitative characteristics for example Humulus lupulus, and Aphananthe aspera. Information relevant
fiber quality, or resistance to adverse environmental factors such to the samples is shown in Table 1.
as cold, drought, strong wind, and pest/disease pressure. The Twenty-five wild populations represented by 430 individuals
native distribution range of Cannabis is commonly believed to be were collected from 2011 to 2016. These populations covered
in Central Asia, Siberia, the Himalayas, and possibly extending the only distribution ranges of extant wild Cannabis throughout
into China (de Candolle, 1885; Vavilov, 1926; Li, 1974; Hillig, China: Inner Mongolia, Jilin, Liaoning, Shandong, Xinjiang,
2005; Small, 2015; Mcpartland and Hegman, 2018). Currently, Tibet, and Yunnan provinces or regions. The population
the distribution of cannabis covers most of the Chinese territory, size of wild Cannabis is generally ranging from hundreds to

Frontiers in Plant Science | www.frontiersin.org 2 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

TABLE 1 | Sample information and summary of haplotype distribution, genetic diversity for each population based on the combined five cpDNA regions.

Code/Name Origin/location Type No. Latitude (◦ N) Haplotypes (Nh) Hd π (×10−2 )

EG Inner Mongolia, China W 20 50.21 H1(19), H2(1) 0.100 ± 0.088 0.025 ± 0.020
HE Inner Mongolia, China W 27 49.28 H3(17), H6(10) 0.484 ± 0.054 0.013 ± 0.013
YK Inner Mongolia, China W 20 49.25 H3(19), H4(1) 0.100 ± 0.088 0.003 ± 0.006
JL Jilin, China W 13 45.02 H3(9), H4(4) 0.462 ± 0.110 0.013 ± 0.014
AL Xinjiang, China W 20 48.20 H1(20) 0.000 0.000
HG Xinjiang, China W 20 44.21 H1(13), H9(7) 0.479 ± 0.072 0.357 ± 0.188
YN Xinjiang, China W 24 43.84 H1(10), H9(14) 0.507 ± 0.045 0.379 ± 0.196
KS Xinjiang, China W 10 43.68 H1(1), H2(9) 0.200 ± 0.154 0.149 ± 0.089
XH Inner Mongolia, China W 25 43.78 H1(18), H5(7) 0.420 ± 0.082 0.290 ± 0.153
TL Inner Mongolia, China W 10 43.58 H3(7), H4(3) 0.467 ± 0.132 0.013 ± 0.014
MN Xinjiang, China W 20 43.35 H9(20) 0.000 0.000
NL Xinjiang, China W 22 43.25 H1(22) 0.000 0.000
ZL Inner Mongolia, China W 12 42.96 H5(12) 0.000 0.000
ZW Liaoning, China W 16 42.66 H3(11), H4(3), H8(2)* 0.508 ± 0.126 0.016 ± 0.015
CH Inner Mongolia, China W 16 42.26 H3(2), H6(14)* 0.233 ± 0.126 0.007 ± 0.009
SD Shandong, China W 19 36.25 H4(2), H7(17)* 0.199 ± 0.112 0.110 ± 0.064
GJ Tibet, China W 8 29.88 H10(8) 0.000 0.000
BM Tibet, China W 8 29.87 H9(4), H10(4) 0.571 ± 0.095 0.126 ± 0.079
XZ Tibet, China W 25 29.68 H9(21), H10(4) 0.280 ± 0.101 0.062 ± 0.039
MK Tibet, China W 8 29.58 H5(8) 0.000 0.000
DQ Yunnan, China W 15 28.47 H10(1), H12(14) 0.133 ± 0.112 0.052 ± 0.035
DX Yunnan, China W 16 28.15 H10(16) 0.000 0.000
DM Yunnan, China W 16 27.90 H9(16) 0.000 0.000
XG Yunnan, China W 19 27.49 H5(19) 0.000 0.000
XL Yunnan, China W 21 27.15 H9(10), H10(11) 0.524 ± 0.036 0.116 ± 0.067
C445 Heilongjiang, China L 10 50.25 H3(5), H4(5) 0.556 ± 0.075 0.015 ± 0.016
C448 Heilongjiang, China L 11 48.01 H4(11) 0.000 0.000
C254 Inner Mongolia, China L 16 43.48 H3(12), H4(1), H9(2), H11(1)* 0.442 ± 0.145 0.136 ± 0.078
C564 Xinjiang, China L 10 43.37 H9(10) 0.000 0.000
C261 Inner Mongolia, China L 9 40.42 H5(1), H9(5), H21(1)*, H22(2)* 0.694 ± 0.147 0.095 ± 0.061
C187 Gansu, China L 11 39.71 H4(4), H9(4), H10(1), 13(1)*, H14(1) 0.782 ± 0.095 0.337 ± 0.186
JinMa1 Shanxi, China B 11 37.3 H4(2), H9(9) 0.327 ± 0.153 0.190 ± 0.109
C274 Xinjiang, China L 11 37.16 H9(11) 0.000 0.000
C467 Qinghai, China L 10 36.43 H9(7), H19(2)*, H20(1)* 0.511 ± 0.164 0.213 ± 0.122
C468 Shandong, China L 10 36.13 H1(9), H2(1) 0.200 ± 0.154 0.006 ± 0.008
C292 Gansu, China L 10 36.03 H9(8), H14(1), H17(1)* 0.378 ± 0.181 0.135 ± 0.081
C224 Anhui, China L 11 31.45 H3(11) 0.000 0.000
C666 Tibet, China L 10 29.72 H10(8), H18(2) 0.356 ± 0.159 0.069 ± 0.046
C269 Tibet, China L 8 29.71 H9(4), H10(4) 0.571 ± 0.095 0.126 ± 0.079
C290 Guizhou, China L 10 26.87 H10(10) 0.000 0.000
C001 Yunnan, China L 10 25.60 H10(10) 0.000 0.000
C218 Guangxi, China L 10 24.15 H5(10) 0.000 0.000
YunMa7 Yunnan, China B 10 23.36 H23(10)* 0.000 0.000
Kompolti Hungary B 8 H1(8) 0.000 0.000
Futura75 France B 10 H1(7), H15(2)*, 16(1)* 0.511 ± 0.164 0.093 ± 0.059
Afghanica The Netherlands (70% B 2 H9(2)
indica, 30% sativa)
Dame The Netherlands (80% B 2 H9(2)
Blanche indica, 20% sativa)
Purple Kush USA (http://genome.ccbr. B 1 H9(1)
utoronto.ca/cgi-
bin/hgGateway)

(Continued)

Frontiers in Plant Science | www.frontiersin.org 3 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

TABLE 1 | Continued

Code/Name Origin/location Type No. Latitude (◦ N) Haplotypes (Nh) Hd π (×10−2 )

Carmagnola Italy (https://www.ncbi.nlm. B 1 H12(1)


nih.gov/)
Dagestani Russia (https://www.ncbi. B 1 H24(1)*
nlm.nih.gov/)
Yoruba Nigeria, Africa (https://www. B 1 H25(1)*
Nigeria ncbi.nlm.nih.gov/)
Cheungsam Korea (https://www.ncbi. B 1 H1(1)
nlm.nih.gov/)
Humulus Liaoning and Anhui, China O 2
scandens
Humulus Yunnan, China O 1
yunnanensis
Humulus Czech (https://www.ncbi. O 1
lupulus nlm.nih.gov/)
Aphananthe China (https://www.ncbi. O 1
aspera nlm.nih.gov/)

W, wild; L, Landrace (domesticated, locally adapted, traditional variety); B, Breeding (cultivar selected by humans for desirable traits); O, Out group; No., sample size; Hd, haplotype
diversity; π, nucleotide diversity; Nh, number of haplotype; * , private haplotypes.

FIGURE 1 | Geographic location of the 43 populations of Cannabis analyzed in the present study and haplogroup distribution patterns of Cannabis (see Table 1 for
population codes); population codes in black represent the wild samples and blue ones are the domesticated accessions. (B) The haplotype network generated from
the 25 haplotypes of Cannabis; pie chart size corresponds to the sample size of each population (A) or haplotype (B).

several thousand individuals. Healthy leaves were collected in within each population, individuals growing at least 10 m apart
the field and immediately dried with silica gel until DNA were randomly sampled and in addition, eight to thirty plants
extractions. To increase the possibility of detecting variation were sampled from the edges and the interior of populations,

Frontiers in Plant Science | www.frontiersin.org 4 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

depending on the actual population size. For domesticated


Nh populations, 27 cultivars represented by 215 individuals were

25
7

6
included. Eighteen cultivars (188 individuals) from China were
obtained from the Industrial Crops Research Institute, Yunnan
0.521
Academy of Agricultural Sciences, and two European hemp

0.679

0.614

0.529

0.775

0.848
cultivars, Kompolti and Futura75, were obtained from Hungary
Hd

and France, respectively. About 200 seeds from each cultivar


were planted and during the flowering stage leaves were sampled
for DNA extraction. Additionally, two marijuana materials
Total informative

(named Afghanica and Dame Blanche) from The Netherlands


characteristics

were used, whereas sequence data for another five cultivars

23
(Purple Kush, Carmagnola, Dagestani, Yoruba Nigeria, and
6

4 Cheungsam) were downloaded from GenBank and The Cannabis

Cf, the forward primer for Cannabis; Cr, the reverse primer for Cannabis; Hf, the reverse primer for Humulus. The four Indels: AAATATT; GAATTGAAAAAAAAA; TATATTAAAA; AAAAAT.
Genome Browser website (http://genome.ccbr.utoronto.ca/cgi-
bin/hgGateway) (Table 1).
For the 43 hemp populations originating from China [25
wild populations (W) and 18 domesticated cultivars (L and
No. of Indels

B)] (Table 1, Figure 1), the sampled regions throughout China


spanned an area from 50.25◦ to 23.36◦ N and from 79.44◦ to
0

126.08◦ E, with an altitude span from about 50 m above sea level


in Anhui (C224) to 3,700 m in Tibet (MK).

DNA Extraction, Primer Development, PCR


substitutions

Amplification, and Sequencing


No. of

Total DNA of each sample was extracted from leaf material


19
6

according to the modified CTAB method (Doyle, 1991; Chen


TABLE 2 | Primer pairs of cpDNA regions used in this study and polymorphism on the 645 individuals of Cannabis.

et al., 2015).
To develop genetic markers for population genetic analyses,
we first tested 17 universal primer sets, developed for
length (bp)

1081–1084

3616–3645

amplification of highly variable chloroplast DNA regions of


sequence

813–822

389–420

604-620

angiosperms, on six individuals from different wild Cannabis


720

populations. However, Cannabis individual sequences generated


from these primers are too conserved to obtain variable sites
suitable for population-level studies despite repeated tests (Zhang
temperature (◦ C)

et al., 2017). Based on comparisons of the four available whole


Annealing

chloroplast genomes from cultivars of C. sativa (Oh et al.,


58

55

55

53

58
47

2016; Vergara et al., 2016), we developed five pairs of PCR


primers targeting several highly variable chloroplast regions
(rps16; psaI-accD; rps11-rps8; rpl32-trnL; ndhF-rpl32). These
new primers are suitable for the population genetic study of
Cannabis and its closest relative Humulus (Table 2). Due to
accD Cr : GCTCCATGCTTTCTCTCCTCCTTTG
rps16Hf : GCAGAGAAAAAAAAGATTCTAATCC
rps16Cf : TTAAAATAGCAGAGAAAAGATTAT

psaI Cf : GAACATGAAGAGATAAAGAAACC
rps16Cr : AAACGATGTGGTAGAAAGCAAC

unsuccessful PCR amplification of the rps16 region of Humulus


rps8 Cr : CGCTTCCCACATTAGTTAATCCC

trnL Cr :: AGAAAATGCCATGCCGCTACTC

rpL32-R: CCAATATCCCTTYYTTTTCCAA

rpL32 Cr :: CTGCCCAATATCCTTTCTTTT
ndhF: GAAAGGTATKATCCAYGMATATT
rps11 Cf :GGGCCTACAGCCATTATGTG

rpl32 Cf : CGACAAATTCTATTAGATAGA

species, a specific forward primer for the genus Humulus was


ndhF Cf : GGTATAATCCATGAATACTG

designed.
PCR amplification reactions were carried out in a total
Primers sequence (5’ - 3’)

volume of 25 µL, containing 2.0 µL DNA template (20–


30 ng/µL), 2.5 µL 10 × PCR reaction Buffer (with Mg2+ ),
1.5 µL dNTPs mix (2.5 mmol/L), 0.5 µL each forward and
reverse primers (10 µmol/L), 0.3 µL Taq DNA polymerase (5
U/µL, Beijing TransGen Biotech Co., Ltd., China), and 17.7 µL
double-distilled water. Amplifications were conducted on an ABI
Veriti Thermal Cycler (Applied Biosystems, Foster City, CA,
-

USA) using the following program setting: an initial 4 min pre-


denaturation at 94◦ C, followed by 30 cycles of 45 s at 94◦ C, 30 s
regions
cpDNA

rps11-

at 47–58◦ C (Table 2), 45–90 s at 72◦ C, and a final 10 min at


rpl32-

ndhF-
rps16

accD

rpl32
psaI-

Total
rps8

trnL

72◦ C.

Frontiers in Plant Science | www.frontiersin.org 5 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

The obtained PCR products were purified with a Gel on geographical coordinates and haplotype distribution data
Extraction and PCR Purification Combo Kit (Beijing Tsingke of Cannabis populations from China. Different hierarchical
BioTech Co., Ltd., China) and then bidirectional sequencing levels of genetic variation including within populations, among
was performed on an ABI 3730xl DNA Analyzer (Applied populations within groups and among groups were assessed by
Biosystems, Foster City, CA, USA) employing the same primers the analysis of molecular variance (AMOVA) implemented in
used for PCR amplifications. All sequences of the rps16, psaI- Arlequin v 3.5.2.2 (Excoffier and Lischer, 2010), with significance
accD, rps11-rps8, rpl32-trnL, and ndhF-rpl32 cpDNA regions assessed by 1,000 permutations on the 43 populations from
have been deposited in GenBank under the accession numbers China. The 43 populations were grouped into three population
from MG731579 through MG731614. groups (Group H, Group M, and Group L) by SAMOVA based
on variation in cpDNA or into two morphology groups by
Observation of Main Phenological and morphological and physiological features (the wild Group and
Morphological Traits domesticated Group) where the population genetic structure and
To test whether there are obvious differences among the 43 the domestication pattern for Cannabis in China were assessed.
accessions (including both wild and domesticated germplasms) Indices of nucleotide diversity (π) and haplotype diversity (Hd)
on phenotypic characteristics, we also carried out a Varieties were calculated for each population, population groups, and for
Evaluation Field Trial in 2016 involving all 43 accessions. The all samples combined, using Arlequin v 3.5.2.2. Also, Tajima’s
trial site was located in Kunming, Yunnan province of China. D and Fu’s Fs neutrality tests were conducted. Levels of gene
This trial was set up as a randomized complete block design with flow (Fst and Nm) were measured using DNASP v5.10. Mantel
three replicates and each plot was 6 m2 , with a distance between tests were conducted to examine the correlation between two
rows of 40 cm (with a density about 50 plant individuals per matrixes (genetic distances and pairwise geographical distances
square meter). The plots were directly seeded at a depth of 3– or latitude differences) with 9,999 permutations using GenALEx
5 cm on May 28 in 2016 and all wild-type seeds were pretreated v 6.5 (Peakall and Smouse, 2012).
to facilitate germination before sowing about 10 days, and the To identify the main climatic factors affecting the distribution
whole trial was managed with normal management practices. of the Cannabis genetic lineages, we also tested correlations of
Main phenological and morphological traits for each accession 20 bioclimatic factors on a compilation of cpDNA haplogroup
were investigated, including initiation of flowering, full flowering, frequencies for 43 populations. The values of 19 BioClim
seed full maturity time, stem diameter, plant height, and number variables were extracted by using DIVA-GIS v7.5 (http://www.
of branches. These data were collected based on 20 individuals diva-gis.org/) based on the global climate layer data (at 2.5
randomly selected for each plot (10 individuals for female and arc-min resolution) downloaded from the WorldClim v2.0
male respectively). database (http://www.worldclim.org/), and the mean day length
of cannabis growth season (from the Spring Equinox to
Data Analysis Autumnal equinox) were calculated according to solar geometry
Raw sequence data of the five amplified DNA fragments (Spitters et al., 1986; Yuan et al., 2014) for 43 sampling
(amplicons) were assembled with SeqMan (DNAStar Inc., sites. The correlation between environmental variables and
Madison, WI, USA) and carefully checked for genetic variation haplogroup frequencies was analyzed by redundancy analysis
together with the chromatograms. Sequences were aligned (RDA). We first assessed the effects of all 20 climatic factors
using the CLUSTAL W (Thompson et al., 1994) followed by on haplogroup frequencies distribution. And then, to identify a
manual adjustment implemented in MEGA 6.0 (Tamura et al., minimum subset of climatic variables that significantly explain
2013). Small insertion/deletion events (indels), excluding long variation of genotype spatial distribution, we further tested the
mononucleotide repeats (poly A/T or poly G/C), were counted multicollinearity in the whole data set, and the redundant factors
as single mutations. The haplotypes for each gene marker, and in (variance inflation factors, VIF > 10) were excluded through
the combined five-fragment dataset matrix, were identified using stepwise regression. To explore the percent variance uniquely
DNASP v5.10 (Librado and Rozas, 2009). explained by each factor, the Analysis of Variance (ANOVA)
Based on the combined five-fragment dataset, the was calculated. RDA and ANOVA analyses were performed
relationships among haplotypes were reconstructed by median- using the vegan package in R version 3.3.1 (R Core Team,
joining (MJ) network method (Bandelt et al., 1999) implemented 2015).
in the software NETWORK v5.0.0.1 (available at http://www. Phylogenetic relationships based on the cpDNA haplotypes
fluxus-engineering.com) with the maximum parsimony (MP) were deduced using MrBayes ver. 3.1.2 (Ronquist and
post-processing option. Huelsenbeck, 2003). Using the sequences from Aphananthe
To detect genetic diversity and population structure, we and Humulus species as out groups, the divergence times for
carried out the following analyses. The distribution of three the major groups of these haplotypes were further estimated
haplogroups (identified by phylogenetic tree and network) with BEAST v1.8.1 (Drummond and Rambaut, 2007) with
was plotted on maps of China using ArcGIS v 10.2 (ESRI GTR + G selected by MrModeltest 2.3 as the best substitution
Inc., Redlands, CA, USA). To define the most differentiated model for the data set (Nylander, 2004). The data was analyzed
groups of populations we performed a spatial analysis of using a relaxed log-normal clock model and a Yule Process
molecular variance (SAMOVA) using the software SAMOVA v speciation model for the tree priors. As the earliest fossil species
2.0 (available at http://cmpg.unibe.ch/software/samova2/) based of Aphananthe was reported around 66–72.1 million years ago

Frontiers in Plant Science | www.frontiersin.org 6 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

FIGURE 2 | (A) Bayesian phylogenetic tree based on cpDNA data. (B) Divergence time estimated for the major clades of Cannabis by the BAEST analysis (Blue bars
indicate the 95% highest posterior density credibility for node ages).

(Ma) from the Maastrichtian (66–72.1 Ma) in late Cretaceous Distribution of cpDNA Haplotypes,
(Ervín et al., 1986), the stem age of the Aphananthe was set to Phenotypic Characteristics and Genetic
66 Ma based on the low boundary age (node A in Figure 2B).
Diversity
Prior settings for calibrating node were: offset of 66 Ma, a log
In the haplotype network (Figure 1B), the 25 haplotypes were
mean of 1.0 (log stdev of 0.5). Two independent runs were
split into three distinct haplogroups: Haplogroup H (blue
conducted for 10 million generations. Log files resulted from the
colored), Haplogroup M (red colored), and Haplogroup L
two runs were combined using LogCombiner after the first 25%
(Green colored). Haplogroup H contained 13 haplotypes (H1,
were discarded as burn-ins, and the convergence of the chains
H2, H3, H4, H6, H8, H11, H14, H15, H16, H20, H24, H25),
was checked in Tracer v1.6 (Rambaut et al., 2014). Similarly,
Haplogroup M contained 5 haplotypes (H9, H12, H13, H19,
the resulted trees were combined in LogCombiner, and the
H21), and Haplogroup L contained 7 haplotypes (H5, H7, H10,
maximum clade credibility (MCC) tree was produced with Tree
H17, H18, H22, H23). The phylogenetic tree (Figure 2A) also
Annotator, and then viewed in FigTree v1.4.2.
exposed three well-supported lineages corresponding to the
above-mentioned three haplogroups illustrated by the haplotype
RESULTS network. The haplotypes are not evenly distributed for each
haplogroup (Figure 1B): In Haplogroup H, the two most
Sequence Characteristics and common haplotypes, H1 (40.1%) and H3 (34.7%), were observed
Identification of cpDNA Haplotypes in 15 out of the 20 sampled populations north of 40◦ N. For
We successfully obtained high quality sequences for all the five Haplogroup M, the most common haplotype H9 (89.7%) and
target cpDNA genes (rps16, psaI-accD, rps11-rps8, rpl32-trnL, other 4 rare haplotypes were found in the area ranging from
ndhF-rpl32) for each of the 640 Cannabis individual plants. 27◦ to 43◦ N. In Haplogroup L, seven haplotypes, including the
Five additional sequences of Cannabis lines were retrieved two major haplotypes H5 (34.3%) and H10 (46.4%), were mainly
from the published chloroplast genomes (Table 1). In total, the distributed throughout the area south of 30◦ N.
combined alignment of the five cpDNA fragments (five-gene As the lineages displayed distinct structure by network
matrix) covered 3,635 base pairs in length, and harbored 19 single analyses (Figure 1B) and structural phylogeographic distribution
nucleotide polymorphisms (SNPs) and four indels varying up to patterns (Figure 1A), SAMOVA analysis (based on a simulated
38 bp in length (Table 2), thus the proportion of variable sites was annealing method) was performed to define groups of
1.57%. The indel mutations introduced by long mononucleotide populations. The result showed that when k = 3 the differences
repeats (poly A/T or poly G/C) were excluded from the analysis. between groups (F CT = 0.64) was the highest, and the 43
The AT content was 70.5% and a total of 25 haplotypes (H1–H25) populations from China were divided into three groups: Group
were identified based on the genetic variation found among the H, Group M, and Group L (Figure 1A). Group H included 16
645 samples. For the rps16 intron, and four intergenic spacers populations mainly from the high latitude region: EG, HE, YK,
(psaI-accD, rps11-rps8, rpl32-trnL, and ndhF-rpl32), the sequence JL, AL, HG, NL, XH, TL, ZW, CH, C445, C448, C254, C468,
length and polymorphic informative characters are shown in and C224. This group largely corresponds to haplogroup H.
Table 2. Group M also included 16 populations but from the middle

Frontiers in Plant Science | www.frontiersin.org 7 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

TABLE 3 | ANOVA analyses between BioClim variables and the three cpDNA Yoruba Nigeria). Haplotype diversity (Hd) and nucleotide
haplogroup frequencies for 43 Cannabis populations. diversity (Π) of each population are summarized in Table 1. The
Variables Full name Df Variance F Pr (>F)
domesticated population C187 possessed the highest haplotype
diversity (Hd = 0.782) and nucleotide diversity (Π = 0.00337),
MDL Mean day length (Spring 1 0.208027 29.0255 0.001*** while the lowest number of haplotypes (Nh = 1; Hd = 0; Π =
Equinox-Autumnal eq uinox) 0) were found in 18 other populations, including domesticated
bio2 Mean diurnal range [mean of 1 0.009598 1.3391 0.256 accessions and wild populations. Among the wild populations,
monthly (max temp–min temp)] the BM population had the highest haplotype diversity (Hd =
bio8 Mean temperature of wettest 1 0.051886 7.2395 0.002** 0.571), YN population had the highest nucleotide diversity (Π
quarter
= 0.00379), and ZW had the highest number of haplotypes (Nh
bio13 Precipitation of wettest month 1 0.043932 6.1297 0.004**
= 3, Hd = 0.508).
bio14 Precipitation of driest month 1 0.002271 0.3169 0.756
bio15 Precipitation seasonality 1 0.00323 0.4506 0.64
(coefficient of variation) ISOLATION BY DISTANCE AND CLIMATIC
Residual 36 0.258014 CORRELATES OF CPDNA LINEAGES
(* p < 0.05; ** p < 0.01; *** p < 0.001). FREQUENCY
To examine whether the observed genetic distributions are
latitude region: YN, KS, MN, BM, XZ, DQ, DM, XL, C564, C261, correlated to geographical localization, Mantel tests were
C187, JinMa 1, C274, C467, C269, and C292, corresponding to performed. Between Nei’s pairwise genetic distances (Nei, 1978)
above haplogroup M. Group L included 11 populations mainly and the two-dimensional geographical distances (based on
from low latitude region: GJ, MK, DX, XG, C290, C001, C218, longitudinal and latitudinal coordinates), and the results showed
C666, ZL, SD, and YunMa 7, corresponding to haplogroup L. that there is a significant positive correlation among the 43
Frequencies of the three lineages in each population and their sampled populations from China (r = 0.379, p = 0.000) and
geographical distribution are displayed in Figure 1. the “isolation-by-distance” pattern was detected. Furthermore,
Interestingly, we also noted that the main phenotypic traits the testing between Nei’s pairwise genetic distances and the
of 43 wild or domesticated accessions originating from different latitudinal differentiation also showed a significant positive
latitudes shifted along latitudinal gradients (23.36–50.21◦ N), correlation (r = 0.348, p = 0.000). Similarly, for the 25
which matched the regular distribution of three lineages. Our wild populations alone, significant positive correlations were
phenotype data (Table S1) indicated there were very high found between the genetic distances and pairwise geographical
variations among 43 accessions. The six measured traits involved distances (r = 0.368, p = 0.000), as well as between the genetic
three phenological Characteristics (initiation of flowering, full distances and latitude differences (r = 0.416, p = 0.000).
flowering, and seed full maturity time) and three morphological Haplogroup distribution frequencies shifted smoothly along
features (stem diameter, plant height, and number of branches). latitudinal gradients and the three lineages distinctively show a
The correlations between the phenotypes and latitude were high-middle-low latitude distribution pattern (Figure 1). Based
assessed, and all six traits had a negative, very strong, and on the RDA analysis and ANOVA partition (Table 3), 15 out
significant (p < 0.001) relationship with latitude of origin, of the 20 tested BioClim variables had a significant (p <
Pearson’ correlation coefficients (r) respectively were 0.858, 0.05) relationship with haplogroup distribution frequencies for
0.949, 0.906, 0.911, 0.914, 0.815 for initiation of flowering, full all the 43 populations (Table S2). This result indicated that
flowering, seed full maturity, stem diameter, plant height, and climate obviously affected the genetic distribution of Cannabis
number of branches, respectively. When the phenotype of three populations. When the redundancy factors were removed, only
genetic groups (above mentioned SAMOVA grouping) were MDL (Mean day length), Bio2 (Mean diurnal range), Bio8
compared, group H had the shortest growth time (mean seed- (Mean temperature of wettest quarter), Bio13 (Precipitation
maturity time, 77.2 ± 18.1 days), thinnest stem diameter (0.54 of wettest month), Bio14 (Precipitation of driest month),
± 0.22 cm), shortest plant height (99.2 ± 52.4 cm), and fewest Bio15 (Precipitation seasonality) formed a minimum subset of
branches (3.2 ± 1.7), while Group L had the longest growth climatic variables. Based on the ANOVA analysis, MDL was the
time (mean seed-maturity time,133.6 ± 36.8 days), widest stem most significant factor influencing the haplogroup distribution
diameter (1.14 ± 0.40 cm), tallest height (238.0 ± 86.5 cm), and frequencies (r2 = 0.6024, p < 0.001), and the subset of 6
most branches (11.0 ± 3.9), and the traits data of Group M were climatic variables totally explained 74.2% of variation, and MDL
in-between. accounted for the largest fraction of the total explained variation
For genetic diversity features, our studies showed that the (20.8%).
number of haplotypes is different among the 43 Chinese
populations, plus the cultivars Futura75 and Kompolti, ranging Genetic Structure and Gene Flow
from 1 to 5 haplotypes. We observed that out of the 25 Based on the groups defined by SAMOVA, the analysis of
haplotypes, 15 private haplotypes were exclusively found in three molecular variance (AMOVA) revealed that most variance
wild populations (ZW, CH, SD) and in nine cultivated accessions (69.48%) of the total observed genetic variations was due to
(C254, C261, C187, C292, C467, YunMa7, Futura75, Dagestani, variations between-groups, 14.43% was attributed to variance

Frontiers in Plant Science | www.frontiersin.org 8 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

among populations within groups, and 16.10% to variance within Group H is characterized by short plant height, thin stem,
the same population (Table 4). F-statistics of all the three levels of fewer branches, and short life cycle. On the contrary, Group
hierarchy were highly significant (p < 0.001). Population genetic L demonstrates opposite characteristics compared with Group
differences (Fst) within the High-latitude lineage (Group H) was H. This is well-linked to the quantitative (facultative) short-day
higher than that of the lower-latitude lineages (Group M and plant trait of Cannabis. The flowering of Cannabis is normally
Group L), while gene flow (Nm) within Group M and Group induced by a required duration of days with a minimum
L was higher than in Group H (Table 5). For genetic diversity uninterrupted period of darkness (10–12 h for most cultivars)
within each group, Group H had the highest haplotype diversity, (Small, 2015). Due to the sensitivity to photoperiod, shortening
and Group M had the highest nucleotide diversity and number of day length can promote Cannabis plant pre-flowering. On the
haplotypes. contrary, prolonged day length would delay this crop from
When two morphological groups (wild and domesticated) shifting from a vegetative stage into a reproductive stage. Indeed,
were considered for the same 43 populations, AMOVA the northernmost distribution of group L is located at about 43◦
analysis indicated low and non-significant (2.34% of molecular N, which is consistent with previous observations that cultivars
variance, Fst =0.023, p = 0.19) genetic differentiation between from the southern (low latitude) areas have extended vegetative
the two groups. Most variance components were present cycles and failed to produce seeds when grown in the North
among populations within groups. The degree of population (High latitude areas) (Pahkala et al., 2008; Amaducci et al., 2012;
differentiation was slightly higher in the wild group compared Small, 2015). Our results suggest that photoperiod sensitivity is
to the domesticated group. Results of the neutrality tests for each a potential factor that prevents group L from extending further
group and total sample set are shown in Table 5. All values of Fu’s north. In contrast, the southernmost boundary of group H
Fs and Tajima’s D were statistically non-significance, suggesting is 31◦ N (landrace C224 in Figure 1A). It was surprising to
stable populations on a different level. observe that Cannabis lineages still present a distinctive high-
middle-low latitude distribution pattern after several thousand
Divergence Time Estimations years despite human activities. Nevertheless, each of the three
The phylogenetic tree (Figure 2) inferred from the five-gene haplogroups is not strictly limited to its main corresponding
matrix clustered the 25 haplotypes into a monophyletic clade, geographical locations: North of 40◦ N (Haplogroup H), 30 to
in which the haplotypes from the high, middle, and low latitude 40◦ N (Haplogroup H), and South of 30◦ N (Haplogroup L).
regions formed three monophyletic subclades, with strong Some haplotypes of the haplogroups were aberrantly growing
statistical support. The stem age of Cannabis (Figure 2B) was out of the main distribution latitude range (Figure 1A). For
estimated at 18.23 Ma with 95% highest posterior density (HPD) instance, haplotype H3 in cultivar C224, which belongs to
8.83–36.56 Ma, and the crown age of this species was 2.24 Ma, Haplogroup H, was found in lower latitude areas around 31◦
with 95% HPD 0.81–5.81 Ma. N; while the haplotype H5 in wild population XH and ZL,
which belongs to Haplogroup L, was found at a higher latitude
DISCUSSION area around 43◦ N. These exceptions may result from the
influences of human agricultural activities. Clarke and Merlin
Distinct Pattern of Lineage Distribution (2016) have stated, “Humans and the Cannabis plant share an
and Genetic Structure intimate history spanning millennia.” There might have been
One major finding of this study is that Cannabis can be much more stringent distribution limits between haplogroups
divided into three distinct genetic lineages (Figure 1), namely prior to human activities (see below).
the H, M, and L haplogroups. Interestingly the haplogroups The high genetic diversity of this crop has been reported
exhibited latitudinal gradients distribution and this distinctive based on nuclear genetic markers (Gao et al., 2014; Sawler et al.,
high-middle-low latitude pattern was supported by NETWORK, 2015; Soler et al., 2017), but this is the first report of genetic
AMOVA, SAMOVA, and Mantel Tests based on cpDNA data. diversity from cpDNA markers. The rather low mutation rate
High-latitude group members (group H) were mainly distributed among numerous organelle loci of Cannabis (Gilmore et al., 2007;
in regions north of about 40◦ N and Low-latitude group members Zhang et al., 2017), makes genetic analyses of populations based
(group L) were mainly distributed in areas south of about 30◦ on single organelle sequence extremely difficult. Our results
N, while the middle-latitude group members (group M) were revealed a high level of haplotype diversity (Hd = 0.848) at the
mainly distributed in the zone between about 30◦ N and 40◦ species level, a strong genetic differentiation among the three
N. This current distribution pattern implies an adaptation to groups (Fst = 0.695), and the molecular variations observed are
distinct latitudinal gradient climatic features. In the present mostly between-cultivars (76.85%) or among groups (69.48%).
study, the lineage distribution was significantly correlated with It is worth noting that genetic variation at different levels of
latitude and climatic factors. In particular, the day-length has hierarchy contrasts to previous studies based on nuclear markers
a strong and significant (r2 = 0.6024, p < 0.001) influence on (Gilmore et al., 2003; Chen et al., 2015; Soler et al., 2017), where
the haplogroup distribution frequencies in each population by the largest molecular variation observed was due to differences
RDA analysis and ANOVA partition (Table 3). Furthermore, within cultivars, instead of among cultivars. These contrasting
our field phenotype trial results showed that phenological and results are probably due to the fact that the cpDNA markers are
morphological traits had a negative, very strong, and significant maternally inherited, and detect therefore variations only from
correlation with latitude of accession origin. For instance, the maternal parent, instead of an unspecified mixture of both

Frontiers in Plant Science | www.frontiersin.org 9 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

TABLE 4 | Analysis of molecular variance (AMOVA) for on the Cannabis populations from China based on the five cpDNA regions.

Source of variation d.f. Sum of squares Variance components Percentage of variation Fixation index (Fst)

AMONG 3 GROUPS DEFINED BY SAMOVA


Among groups 2 2497.697 6.18572 69.48 0.69478***
Among populations within groups 40 789.902 1.28440 14.43 0.47264***
Within populations 575 824.019 1.43308 16.10 0.83904***
Total 617 4111.618 8.90320
AMONG 2 GROUPS (WILD & DOMESTICATED)
Among groups 1 113.560 0.16155 2.34 0.02345n.s.
Among populations within groups 41 3174.039 5.29484 76.85 0.78700***
Within populations 575 3174.039 1.43308 20.80 0.79199***
Total 617 4111.618 6.88947
AMONG 43 POPULATIONS
Among populations 42 3287.599 5.36510Va 78.92
Within populations 575 824.019 1.43308Vb 21.08 0.78920***
Total 617 4111.618 6.79817

n.s., not significant; *** , p < 0.001.

TABLE 5 | Population genetic statistics among Cannabis population groups based on SAMOVA grouping, the two morphology groups (wild & domesticated) and all
samples from China.

Groups Np Ns Nh Hd π (×10−2 ) D Fs Nm Fst

Group H 16 267 9 0.716 ± 0.016 0.159 ± 0.084 −1.013(n.s.) 10.109(n.s.) 0.32 0.607
Group M 16 219 13 0.500 ± 0.039 0.180 ± 0.094 0.729(n.s.) 6.390(n.s.) 1.20 0.294
Group L 11 132 6 0.690 ± 0.022 0.059 ± 0.036 −0.007(n.s.) 2.767(n.s.) 1.52 0.247
Group W 25 430 11 0.838 ± 0.008 0.379 ± 0.189 1.392(n.s.) 32.064(n.s.) 0.11 0.820
Group D 18 188 15 0.810 ± 0.015 0.311 ± 0.157 1.813(n.s.) 11.904(n.s.) 0.15 0.768
Total 43 618 21 0.848 ± 0.006 0.367 ± 0.183 1.103(n.s.) 18.956(n.s.) 0.12 0.802

Np, number of populations; Ns, sample size; Nh, number of haplotype; Hd, haplotype diversity; π, nucleotide diversity; D, Tajima’s D; Fs, Fu’s Fs; Nm, number of effective migrants; Fst,
fixation index; n.s., not significant (p > 0.05).

parents, which occurs for nuclear markers. The haploid and non- from China and 9 accessions from the other countries or
recombining nature of the cpDNA makes it possible to better regions) were split into three gene pools without exception.
trace genealogical histories in plant populations (Avise, 2009). On the phylogenetic tree, all Cannabis haplotypes formed
a monophyletic clade (Figure 2B) containing three distinct
Three Subspecies Classification subclades, with each subclade significantly different from the
The genus Cannabis was previously placed in family Moraceae, others (Figure 2A). At first glance, the three distinct subclades
then in its own family Cannabaceae together with Humulus could be treated as three different species corresponding to
(Rendle, 1925). This family contains ten genera based on the three commonly recognized species C. sativa, C. indica,
molecular phylogenies (Sytsma et al., 2002; Mabberley, 2008; and C. ruderalis. However, there is no reproductive isolation
Yang et al., 2013). The cultivation and selection of hemp that exists between these lineages in nature based on our
has been performed for several thousand years, and this has observations as well as recognitions by most researchers (Beutler
resulted in difficulty when classifying Cannabis accessions based and Marderosian, 1978). Furthermore, few sequence variations
only on morphological traits. In recent studies, three lineages have been detected in Cannabis chloroplast DNA: <0.03% for
have been identified in Cannabis by enzyme variants analysis the whole chloroplast genomes based on four Cannabis cultivars
(Hillig, 2005), 7 polymorphic sites of organelle DNA sequences and <0.24% for the 16 cpDNA non-coding regions based on
(Gilmore et al., 2007), and EST-SSR markers (Gao et al., six individuals of wild Cannabis (Zhang et al., 2017); <0.1%
2014) based on worldwide sampling. However, whether these for the 7 cpDNA regions (Gilmore et al., 2007). In addition,
three lineages should be treated as three distinct species, three significantly lower divergence (0.41%) was observed between
varieties of a single species or other taxonomic treatments materials identified as C. sativa and C. indica based on DNA
have been debated (Hillig, 2005; Gilmore et al., 2007; Piluzza barcoding sequences (rbcL, matK, trnH-psbA, trnL-trnF, ITS),
et al., 2013; Small, 2015; Mcpartland and Hegman, 2018). In compared to the mean divergence of 3.0% that separated five
the present study, 645 Cannabis individuals (all 43 populations pairs of plants considered as different species such as Humulus

Frontiers in Plant Science | www.frontiersin.org 10 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

lupulus and H. japonicus in Canabaceae (McPartland and Guy of C. saltiva was India Orientali (encompassing the Indian
(2014). These accumulating pieces of evidence also hint that subcontinent, southeastern Asia, and the Malay Archipelago),
a rank below that of species is more reasonable. Thus we Japonia (Japan), and Malabaria (the Malabar coast of southwest
suggest that Cannabis should be considered as a monotypic genus India). Indeed, the seeds from wild Cannabis populations in
with only one species, Cannabis sativa L. Considering that the India are remarkably small, unlike those collected from any other
three distinctive lineages revealed by cpDNA molecular markers area, also indicating that the wild Indian populations may be an
also clearly demonstrated obvious geographic regions as stated ancient wild form (Small, 2015).
above, this species can be further divided into three subspecies.
Meanwhile, based on nomenclature history of this species,
original geographic range, and basic difference in phenotype, Multiregional Domestication Origin of
we recommend the naming of the three subspecies as: Cannabis Cannabis Plant
sativa subsp. sativa, C. sativa subsp. indica, and C. sativa subsp. Each of the three haplogroups (M, L, and H) identified in
ruderalis, corresponding to the Haplogroup M, Haplogroup L, this study contains haplotypes from both wild populations and
and Haplogroup H, respectively. Small and Cronquist (1976) also cultivars. Within each haplogroup, the wild and domesticated
pointed out that C. sativa subsp. sativa is typically distributed at populations shared the most common haplotypes. For instance,
areas with latitudes north of 30◦ N. Our present results that the haplotype H1, H3, and H4 are the most common haplotypes
haplogroup M (i.e., subsp. sativa) is distributed in areas ranging shared by the wild and domesticated populations in Group
from 27 to 43◦ N, is largely consistent with the observations by H; similar trends are observed for haplotype H9 in Group
Small and Cronquist. M, and haplotypes H5 and H10 in the Group L. The fact
that the haplotype of the domesticated Cannabis cultivars are
not limited to one of the three haplogroups indicates that
Divergence Time Inference and there are probably multiregional domestication origins for this
Evolutionary History crop from the three subspecies of Cannabis. Otherwise, the
In the present study, we included Aphananthe aspera, the basal same genotype (haplogroup) should have been detected in
taxon of the family Cannabaceae, and all the three Humulus different cultivars from high-middle-low latitude regions if the
species (the sister group of Cannabis) as outgroups for the cultivars were domesticated from one single region. AMOVA
dating analysis based on cpDNA markers and large numbers analyses results also demonstrate that there is no significant
of Cannabis individuals. The reconstructed phylogenetic tree difference (Fst = 0.023) between the wild population group
(Figure 2) shows the stem age of C. sativa is at 18.23 Ma and domesticated cultivar group based on cpDNA data. This
(95% HPD: 8.84–36.6 Ma), which means Humulus and Cannabis molecular evidence is in accordance with the multiregional
diverged from a common ancestor before 18.23 Ma. This time origin of human use of the cannabis plant proposed based on
period is in agreement with the divergence time (about 14 Ma) archaeological investigation (Long et al., 2016) and Fossil pollen
inferred by Zhang et al. (2017). In fact, the history of Cannabis, studies (Mcpartland et al., 2018). Actually, contemporaneous
Humulus and their extinct sister genus can be dated back to the cannabis achenes (5,000–10,200 years ago) have been found
Oligocene and Miocene Epoch (33.9–5.33 Ma) according to the in more than ten different archaeological sites located in
fossil records (Tiffney, 1986; McPartland, 2018). The crown age of the two distal parts (both Europe and East Asia) of the
C. sativa is at 2.24 Ma (95% HPD: 0.81–5.81 Ma), which is also the continent (Long et al., 2016). Thus the domestication of
stem age of the three lineages. This diversification time coincides C. sativa could have occurred in more than three areas in
with the Quaternary glaciation, the last of five known glaciations Eurasia.
during Earth’s history which is thought to have started at 2.58 Ma,
indicating that the Quaternary glaciation could have played a
major role in the evolutionary history of the three subspecies of C. AUTHOR CONTRIBUTIONS
sativa. The current distribution of the three subspecies could be
explained as a consequence of secondary contact after historical QZ designed and performed the research, analyzed the data,
divergence events. and wrote the manuscript. QZ and XC contributed equally as
The Central-Asia-Origin has been the prevalent opinion for first author. LT and ES carried out pre-experiment research and
C. saltiva (de Candolle, 1885; McPartland, 2018), although some revised the manuscript in detail. HG, RG, MG, and YX conducted
botanists considered Europe as the center of origin (Thiébaut de field work. XC and MY provided the technical assistance. MY
Berneaud, 1835; Keppen, 1886), or a region spanning Asia and organized this work. All authors contributed to and approved the
Europe (Herder, 1892; Vavilov, 1926). However, our molecular final manuscript.
analyses revealed for the first time that the low latitude region
distributed subsp. indica (Haplogroup L) possesses the basal
group position within Cannabis, indicating that this species is FUNDING
possibly originated from low latitude areas in the evolutionary
history of this plant. This finding does not support the hypothesis This work was supported by the National Natural Science
of the Central-Asia-origin of Cannabis, but is partly in agreement Foundation of China (31360350) and the project of China
with the speculation of Linnaeus (1737) that the native range Agriculture Research System (CAR-16-E07).

Frontiers in Plant Science | www.frontiersin.org 11 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

ACKNOWLEDGMENTS SUPPLEMENTARY MATERIAL


The authors are very grateful to Dr. Zai-Wei Ge and Bang Feng The Supplementary Material for this article can be found
from Kunming Institute of Botany, Chinese Academy of Sciences, online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.
for helping in dating analyses. 01876/full#supplementary-material

REFERENCES Johnson, R. (2013). Hemp as an Agricultural Commodity. CRS Report for Congress.
Congressional Research Service. Available online at: https://www.fas.org/sgp/
Amaducci, S., Colauzzi, M., Bellocchi, G., Cosentino, S. L., Pahkala, K., Stomph, crs/misc/RL32725.pdf (Accessed August 15, 2018).
T.J., et al. (2012). Evaluation of a phenological model for strategic decisions Keppen, T. (1886). Догадка о происхожденiи большинства
for hemp (Cannabis sativa L.) biomass production across European sites. Ind. индоевропейскихъ названiй конопли. Журнал Министерства
Crops Prod. 37, 100–112. doi: 10.1016/j.indcrop.2011.11.012 народнаго просвěщенiя. 245, 73–86.
Amaducci, S., Scordia, D., Liu, F. H., Zhang, Q., Guo, H., Testa, G., et al. (2015). Li, H. L. (1974). An archaeological and historical account of Cannabis in China.
Key cultivation techniques for hemp in Europe and China. Ind. Crops Prod. 68, Econ. Bot. 28, 437–448. doi: 10.1007/BF02862859
2–16. doi: 10.1016/j.indcrop.2014.06.041 Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive
Avise, J. C. (2000). Phylogeography: The History and Formation of Species. analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452.
Cambridge: Harvard University Press, 248. doi: 10.1093/bioinformatics/btp187
Avise, J. C. (2004). Molecular Markers Natural History, and Evolution. Sunderland: Long, T., Wagner, M., Demske, D., Leipe, C., and Tarasov, P. E. (2016). Cannabis
Sinauer and Associates Press, 78–79. in eurasia: origin of human use and bronze age trans-continental connections.
Avise, J. C. (2009). Phylogeography: retrospect and prospect. J. Biogeogr. 36, 3–15. Veget. Hist. Archaeobot. 26, 245–258. doi: 10.1007/s00334-016-0579-6
doi: 10.1111/j.1365-2699.2008.02032.x Mabberley, D.J. (2008). Mabberley’s Plant-Book: A Portable Dictionary of Plants,
Bandelt, H. J., Forster, P., and Rohl, A. (1999). Median-joining networks their Classification and Uses. New York, NY: Cambridge University Press.
for inferring intraspecific phylogenies. Mol. Biol. Evol. 16, 37–48. McPartland, J. M. (2018). Cannabis systematics at the levels of family, genus,
doi: 10.1093/oxfordjournals.molbev.a026036 and species. Cannabis Cannabinoid Res. 3, 203–212. doi: 10.1089/can.2018.
Beutler, J. A., and Marderosian, A. H. (1978). Chemotaxonomy of Cannabis 0039
I. crossbreeding between Cannabis sativa and C. ruderalis, with analysis of McPartland, J. M., and Guy, G. W. (2004). “The evolution of Cannabis and
cannabinoid content. Econ. Bot. 32, 387–394. doi: 10.1007/BF02907934 coevolution with the cannabinoid receptor - a hypothesis,” in The Medicinal
Chen, X., Guo, R., Wan, R. X., Xu, Y. P., Zhang, Q. Y., Guo, M.B., et al. (2015). Uses of Cannabis and Cannabinoids, eds G. W. Guy, B. A. Whittle, and P. J.
Genetic structure of five dioecious industrial hemp varieties in Yunnan. Mol. Robson (London: Pharmaceutical Press), 71–101.
Plant Breed. 13, 2069–2075. doi: 10.13271/j.mpb.013.002069 (in Chinese) McPartland, J. M., and Guy, G. W. (2014). “A question of rank: Using DNA
Clarke, R. C., and Merlin, M. D. (2016). Cannabis domestication, breeding history, barcodes to classify Cannabis sativa and Cannabis indica,” in Proceedings of
present-day genetic diversity, and future prospects. Crit. Rev. Plant Sci. 35, the 24th annual symposium on the Cannabinoids. International Cannabinoid
293–327. doi: 10.1080/07352689.2016.1267498 Research Society (Research Triangle Park, NC).
de Candolle, A. (ed.). (1885). “Hemp – Cannabis sativa L.” in Origin of Cultivated Mcpartland, J. M., and Guy, G. W. (2017). Models of cannabis, taxonomy, cultural
Plants (New York, NY: D. Appleton), 148–149. doi: 10.5962/bhl.title.29067 bias, and conflicts between scientific and vernacular names. Bot. Rev. 83, 1–55.
Doyle, J. (1991). “DNA protocols for plants-CTAB total DNA isolation.” in doi: 10.1007/s12229-017-9187-0
Molecular Techniques in Taxonomy, eds G. M. Hewitt, A. W. B. Johnston, and J. Mcpartland, J. M., Guy, G. W., and Hegman, W. (2018). Cannabis, is indigenous to
P. W. Young (Berlin: Springer), 283–293. doi: 10.1007/978-3-642-83962-7_18 europe and cultivation began during the copper or bronze age: a probabilistic
Drummond, A. J., and Rambaut, A. (2007). BEAST: bayesian evolutionary analysis synthesis of fossil pollen studies. Veget. Hist. Archaeobot. 27, 635–638.
by sampling trees. BMC Evol. Biol. 7:214. doi: 10.1186/1471-2148-7-214 doi: 10.1007/s00334-018-0678-7
Ervín, K., Hans,D. M., and František, H. (1986). Monographie der Frúchte und Mcpartland, J. M., and Hegman, W. (2018). Cannabis, utilization, and
Samen in der Kreide von Mitteleuropa. Prague: Vydal ústrední ústav geologický diffusion patterns in prehistoric europe: a critical analysis of archaeological
v Academii, nakladatelství Ceskoslovenské akademie ved, 219. evidence. Veget. Hist. Archaeobot. 27, 627–634. doi: 10.1007/s00334-017-
Excoffier, L., and Lischer, H. E. (2010). Arlequin suite ver 3.5: a new series of 0646-7
programs to perform population genetics analyses under Linux and windows. Nei, M. (1978). Estimation of average heterozygosity and genetic distance from
Mol. Ecol. Resour. 10, 564–567. doi: 10.1111/j.1755-0998.2010.02847.x small number of individuals. Genetics 89, 583–590.
Gao, C., Xin, P., Cheng, C., Tang, Q., Chen, P., Wang, C., et al. (2014). Diversity Nylander, J. A. A. (2004). MrModeltest v2. Program Distributed by the Author.
analysis in Cannabis sativa based on large-scale development of expressed Evolutionary Biology Centre, Uppsala University.
sequence tag-derived simple sequence repeat markers. PLoS ONE 9:e110638. Oh, H., Seo, B., Lee, S., Ahn, D.H., Jo, E., Park, J.K., et al. (2016). Two complete
doi: 10.1371/journal.pone.0110638 chloroplast genome sequences of Cannabis sativa varieties. Mitochondrial DNA
Gilmore, S., Peakall, R., and Robertson, J. (2003). Short tandem repeat Part A 27, 2835–2837. doi: 10.3109/19401736.2015.1053117
(STR) DNA are hypervariable and informative in Cannabis sativa L.: Pahkala, K., Pahkala, E., and Syrjäläl, H. (2008). Northern limits to
implications for forensic investigations. Forensic Sci. Intl. 9, 65–74. fibre hemp production in Europe. J. Ind. Hemp. 13, 104–116.
doi: 10.1016/S0379-0738(02)00397-3 doi: 10.1080/15377880802391084
Gilmore, S., Peakall, R., and Robertson, J. (2007). Organelle DNA haplotypes reflect Peakall, R., and Smouse, P. E. (2012). GenAlEx 6.5: genetic analysis in
crop-use characteristics and geographic origins of Cannabis sativa. Forensic Sci. Excel. Population genetic software for teaching and research-an update.
Intl. 172, 179–190. doi: 10.1016/j.forsciint.2006.10.025 Bioinformatics 28, 2537–2539. doi: 10.1093/bioinformatics/bts460
Herder, F. G. (1892). Plantae Raddeanae apetalae V. Acta Horti. Petropol. 12, Piluzza, G., Delogu, G., Cabras, A., Marceddu, S., and Bullitta, S. (2013).
31–132. Differentiation between fiber and drug types of hemp (Cannabis sativa) from
Hillig, K. W. (2005). Genetic evidence for speciation in Cannabis (Cannabaceae). a collection of wild and domesticated accessions. Genet. Resour. Crop Evol. 60,
Genet. Resour. Crop Evol. 53, 161–180. doi: 10.1007/s10722-003-4452-y 2331–2342. doi: 10.1007/s10722-013-0001-5
Hillig, K. W., and Mahlberg, P. G. (2004). A chemotaxonomic analysis of Piomelli, D., and Russo, E. B. (2016). The cannabis sativa versus cannabis indica
cannabinoid variation in Cannabis (Cannabaceae). Am. J. Bot. 91, 966–975. debate: an interview with Ethan Russo, MD. Cannabis Cannabinoid Res. 1,
doi: 10.3732/ajb.91.6.966 44–46. doi: 10.1089/can.2015.29003.ebr

Frontiers in Plant Science | www.frontiersin.org 12 December 2018 | Volume 9 | Article 1876


Zhang et al. Genetic Structure and Origins of Cannabis sativa

R Core Team (2015). R: A Language and Environment for Statistical Computing. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTAL
Vienna: R Foundation for Statistical Computing. Available online at: http:// W: improving the sensitivity of progressive multiple sequence alignment
www.r-project.org through sequence weighting, position-specific gap penalties and weight
Rambaut, A., Suchard, M., Xie, D., and Drummond, A. (2014). Tracer v1. 6. matrix choice. Nucleic Acids Res. 22, 4673–4680. doi: 10.1093/nar/22.22.
Computer Program and Documentation Distributed by the Author. Available 4673
online at: http://beast.bio.ed.ac.uk/Tracer Tiffney, B.H. (1986). Fruit and seed dispersal and the evolution of the
Rendle, A.B. (1925). The Classification of Flowering Plants, Vol. 2, London: Hamamelidae. Ann. Mo Bot. Gard. 73, 394–416. doi: 10.2307/239
Cambridge University Press. 9119
Ronquist, F., and Huelsenbeck, J. P. (2003). MrBayes 3: bayesian Vavilov, N. I. (1926). “The origin of the cultivation of ‘primary’ crops, in particular
phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. cultivated hemp,” in Studies on the Origin of Cultivated Plants, Leningrad, USSR:
doi: 10.1093/bioinformatics/btg180 Institute of Applied Botany and Plant Breeding, 221–233.
Salentijn, E. M. J., Zhang, Q. Y., Amaducci, S., Yang, M., and Trindade, L. M. Vergara, D., White, K. H., Keepers, K. G., and Kane, N. C. (2016). The
(2015). New developments in fiber hemp (Cannabis sativa L.) breeding. Ind. complete chloroplast genomes of Cannabis sativa and Humulus lupulus.
Crops Prod. 68, 32–41 doi: 10.1016/j.indcrop.2014.08.011 Mitochondrial DNA Part A 27, 3793–3794. doi: 10.3109/19401736.2015.10
Sawler, J., Stout, J. M., Gardner, K. M., Hudson, D., Vidmar, J., Butler, L., et al. 79905
(2015). The genetic structure of marijuana and hemp. PLoS ONE 10:e0133292. Welling, M. T., Shapter, T., Rose, T. J., Lei, L., Stanger, R., and King, G. J. (2016).
doi: 10.1371/journal.pone.0133292 A belated green revolution for cannabis: virtual genetic resources to fast-
Schaal, B. A., Hayworth, D. A., Olsen, K. M., Rauscher, J. T., and Smith, W. A. track cultivar development. Front. Plant Sci. 7:1113. doi: 10.3389/fpls.2016.
(1998). Phylogeographic studies in plants: problems and prospects. Mol. Ecol. 01113
7, 465–474. doi: 10.1046/j.1365-294x.1998.00318.x Yang, M. Q., Velzen, R. V., Bakker, F. T., Sattarian, A., Li, D. Z., and Yi, T. S. (2013).
Schultes, R., Klein, W., Plowman, T., and Lockwood, T. (1974). Cannabis: Molecular phylogenetics and character evolution of Cannabaceae. Taxon 62,
an example of taxonomic neglect. Bot. Museum Leaflets Harvard Univ. 23, 473–485. doi: 10.12705/623.9
337–364. Yang, X. Y. (1991). History of cultivation on hemp, sesame and flax. Agric.
Small, E. (2015). Evolution and classification of Cannabis sativa, (marijuana, Archaeol. 3, 267–274. (in Chinese)
hemp) in relation to human utilization. Bot. Rev. 81, 189–294. Yuan, Z. X., Xin, F., and Du, C. X. (2014). Analysis of the variation of
doi: 10.1007/s12229-015-9157-3 photoperiod with latitude and season. Sustainable Energy 4, 41–50. (in Chinese)
Small, E., and Cronquist, A. (1976). A practical and natural taxonomy for doi: 10.12677/SE.2014.44007
Cannabis. Taxon 25, 405–435. doi: 10.2307/1220524 Zhang, Q. Y., Chen, X., Guo, M. B., Guo, R., Xu, Y. P., Yang, M., et al.
Soler, S., Gramazio, P., Figàs, M. R., Vilanova, S., Rosa, E., Llosa, E. R., et al. (2017). (2017). Screening and development of chloroplast polymorphic molecular
Genetic structure of Cannabis sativa var. indica cultivars based on genomic ssr markers on wild hemp (Cannabis sativa L.), Mol. Plant Breed. 5, 979–985.
(gssr) markers: implications for breeding and germplasm management. Ind. doi: 10.13271/j.mpb.015.000827 (in Chinese).
Crops Prod. 104, 171–178. doi: 10.1016/j.indcrop.2017.04.043 Zhang, S. D., Jin, J. J., Chen, S. Y., Chase, M. W., Soltis, D. E., Li, H. T., et al.
Spitters, C. J. T., Toussaint, H. A. J. M., and Goudriaan, J. (1986). Separating (2017). Diversification of rosaceae since the late cretaceous based on plastid
the diffuse and direct component of global radiation and its implications for phylogenomics. New Phytol. 214, 1355–1367. doi: 10.1111/nph.14461
modeling canopy photosynthesis. Part I. Components of incoming radiation.
Agric. Forest Meteorol. 38, 217–229. doi: 10.1016/0168-1923(86)90060-2 Conflict of Interest Statement: The authors declare that the research was
Sytsma, K. J., Morawetz, J., Pires, J. C., Nepokroeff, M., Conti, E., Zjhra, M., et al. conducted in the absence of any commercial or financial relationships that could
(2002). Urticalean rosids: circumscription, rosid ancestry, and phylogenetics be construed as a potential conflict of interest.
based on rbcL, trnL-trnF, and ndhF sequences. Am. J. Bot. 89, 1531–1546.
doi: 10.3732/ajb.89.9.1531 Copyright © 2018 Zhang, Chen, Guo, Trindade, Salentijn, Guo, Guo, Xu and Yang.
Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: This is an open-access article distributed under the terms of the Creative Commons
molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, Attribution License (CC BY). The use, distribution or reproduction in other forums
2725–2729. doi: 10.1093/molbev/mst197 is permitted, provided the original author(s) and the copyright owner(s) are credited
Thiébaut de Berneaud, A. (1835). “Chanvre,” in Dictionnaire Pittorosque d’histoire and that the original publication in this journal is cited, in accordance with accepted
Naturelle et des Phénomènes de la Nature, Tome 2, ed F. E. Guérin-Méneville academic practice. No use, distribution or reproduction is permitted which does not
(Paris: De Cosson), 87–89. comply with these terms.

Frontiers in Plant Science | www.frontiersin.org 13 December 2018 | Volume 9 | Article 1876

You might also like