Phylogeography and population genetic structure of ﬂowering cherry Cerasus serrulata (Lindley) Loudon in subtropical and temperate China

Cerasus serrulata (Rosaceae) is an important ﬂowering cherry resource. It is almost the most widely distributed species in the genus, mainly included in the subtropical and temperate China, which enables the geographic evolutionary pattern to be a representative. Besides, the morphological traits are greatly varied especially in ornamental characters. All of these makes Cerrasus serrulata a valuable research object. Thus, phylogeographic analysis was conducted to apprehend the spatial pattern and evolutionary history, which can also add insights into the phylogeography of the genus Cerasus and plants in subtropical and temperate China, as well as to deeper understand the genetic diversity and structure of the germplasm to make better and more eﬀective utilization. A total of 327 individuals of 18 populations were collected. Three cpDNA fragments (matK, trnD-E, trnS-G) and the nuclear internal transcribed spacer (ITS) were utilized. The result showed a high genetic diversity both in species level and population level of Cerrasus serrulata. The high genetic diﬀerentiation among populations and the existence of phylogeographic structure in whole were detected. In addition, no bottleneck was identiﬁed. The the distribution pattern and center were formed before the LGM. Two geographical lineages were inferred. One was conﬁned to Qingling Mountain and Taihang Mt. The other was from the Wuling Mt to Lu-Huang Mt, and then went northeast to the coast of Asia. Besides, taxonomic treatments of the Cerasus serrulata complex were reconsidered.


Introduction
Phylogeography focuses on the gene genealogy spatio-temporal pattern of related species or intraspecies as well as how the pattern was formed (Hickerson et al. , 2010;Bai & Zhang, 2014). Taking advantages of genetic diversity, genetic structure analysis and molecular time dating, temporal and spatial population divergence and expansion, evolution history can be inferred; the glacial refugia can also be identified combing the climate changes and glacial events in late Quaternary (Hickerson et al. , 2010). It is widely acknowledged that the glacial-interglacial oscillations in Quaternary glaciation, especially the LGM, had eminent effects on shaping the contemporary distribution of species and their genetic diversity (Chung et al., 2017).
The Sino-Japanese forest subkingdom, involved in the Sino-Japanese Floristic Region (SJFR) except the Himalayan forest region (Wu & Wu, 1996), covers the south and east of China, as well as the Korean Peninsula and the Japanese Archipelago. This region is embroiled in the subtropical and temperate climate zones in the northern hemisphere, mainly consisted of the subtropical and temperate China. There are remarkably rich flora in this region due to the high rainfall brought by the south-eastern monsoon, and they may have experienced dramatic distribution changes during the Quaternary climatic oscillations (Mitsui et al., 2007). Mountains are believed to play important roles as refugia and dispersal corridors for species during the Quaternary global climate changes. Nanling Mountain and Qinling Mountain acted dual roles as a dispersal corridor in east-west direction and as a glacial refugium in subtropical China during the late Quaternary (Tian et al., 2018;Guo et al., 2014;Wulufu, 1964;Zhou, 1997). The main Korean mountain range [the so-called "Baekdudaegan" (BDDG)] is also thought be a glacial refugium, mainly for the boreal and temperate flora of northeastern Asia (Chung et al. , 2017). In addition to large mountains, microrefugia such as smaller massifs or lowland sites are also of dominant importance for sustaining species. It was hypothesized that multiple microrefugia locate in the northern part of the southern refuge (24°N-33°N) (Wang & Ge, 2006).
Cerasus serrulata (Lindley) Loudon is one of the most widely spread species in genus Cerasus , distributing from the north in Heilongjiang Province to the south in Guangzhou Province (Li & Bartholomew, 2003), covering a widest latitude range (˜24°N-45°N) (Li et al., 2014); and from west in Guizhou Provincce to Japan (Li & Bartholomew, 2003), also across a broad longitude range. The distribution pattern is involved in the Sino-Japanese forest subkingdom, and is also within the subtropical and temperate monsoon climate zones. All the extensive coverage, various and special habitats as well as complex climate, indicating a strong adaptation, have bred high variants in C. serrulata , and highlighted a high research value of phylogeography to bring new clues to the history of geological events, as well as the development and utilization of cherries such as breeding new or improved ornamental cultivars since C. serrulataitself posses good ornamental characteristics and discrepancies of flower color, flower type, leaf color, tree shape, high resistance, etc. Nevertheless, the high variant level has as well caused some controversies on taxonomic relationships of related species, which needs further studies.
In this study, C. serrlata was selected as a typical model of cherry in subtropical and temperate China along with Korean peninsula. Three maternal-inherited chloroplast DNA and the bi-parentally inherited nuclear DNA ITS sequences were used together for phylogeographic analysis of C. serrulata for the first time to better understand the genetic diversity and genetic structure, as well as to clarify the temporal and spatial population divergence and expansion history ofC. serrulata , which can also provide new clues for cherry evolutionary history and identifying refugia of subtropical and temperate China in SJFR. In addition, controversial taxonomic groups within the species and of related species, seemingly caused by the different processes of speciation inner Cerasus serrulata , were also discussed to add more proofs for further clarifying their relationships.

Population sampling
We have conducted field investigations of Cerasus serrulataaccording to specimen and literature records since 2010, and a total of 18 populations of Cerasus serrulata with 327 accessions have been collected (Table 1, Figure 1). Among them, seventeen populations were from China and one was from Korea. Green, young and healthy leaves were gathered from individuals at least 30 m apart, and were quickly stored in silica gel. All accessions were used for cpDNA analysis, while 174 were used for ITS analysis, with a maximum accession number of 10 for each population (Table 1 ). The whole collections should have covered the most of the distributions of Cerasus serrulata .

Genetic diversity and phylogeographic analysis
The sequences were checked visually and edited manually by BioEditor ver. 7.09 (Hall, 1999), and then aligned by Clustal W in MEGA ver. 5.0 (Tamura et al., 2011). The three cpDNA sequences were concatenated for subsequent analysis. Identification of cpDNA haplotypes (H) and ITS ribotypes (R), and calculation of their diversities (H d /R d ) as well as nucleotide diversity (P i ) were performed by DnaSP v. 5.0 (Librado & Rozas, 2009). Analysis of molecular variance (AMOVA) was conducted to estimate the genetic variances within and between populations and groups by Arlequin ver. 3.1 (Laurent et al. , 2005). Network of cpDNA haplotypes and ITS ribotypes were drew by TCS1.2.1 (Clement et al. , 2000). Phylogenetic analysis, takingPrunus salicina , Cerasus tomentosa and C. yedoensisas outgroups, was constructed based on maximum parsimony (MP), maximum likelihood (ML) and Bayesian(BI) to make the best use of the CIPRES Science Gateway V. 3.3. (Miller & Gross, 2011). The robustness of each internal branch of the genealogical trees was evaluated with 1,000 bootstrap replicates. All the above analysis involved both cpDNA and ITS sequences. G ST , representing haplotype frequency, and N ST , representing haplotype difference, were calculated to examine the existence of population structure by PERMUT v. 2.0 with 10,000 permutations (Pons & Petit, 1996). N ST > G ST , P < 0.05 indicates the presence of a phylogeographic structure. Mismatch distribution analyses and neutrality tests were conducted to test population expansion using ARLEQUIN v. 3.5 with 10,000 permutations. These analysis were only conducted with cpDNA sequences.

Divergence time dating
Best-fit models of nucleotide substitution of chloroplast haplotypes were chosen by jModelTest2 v2.1.6. The divergence times were estimated by BEAST v1.8.4 (Drummond et al., 2016 ). Lacking fossil records, secondary calibration points according to the diversification of Rosaceae were applied to calibrate node ages. Divergence time dating was calculated in two steps with two trees respectively. Five secondary calibration points were applied to the first tree: Rosaceae Crown, 90.18 Mya (Zhang et al., 2017;Li et al., 2017); Amygdaleae, 84.84Mya (Zhang et al., 2017); Spiraeeae, 84.30 Mya (Zhang et al., 2017); Roseae+Rubeae, 59.51 Mya (Zhang et al., 2017); Maleae Crown, 50.06 Mya (Zhang et al., 2017;Li et al., 2017) to evaluate the divergence time of Prunus(node 1) and the time of Cerasus Crown (node 2). These two nodes were subsequently used to calibrate the node ages in the second tree of haplotypes of Cerasus serrulata . The GTR substitution model and the uncorrelated log-normal relaxed clock and Yule process as for tree prior were selected. MCMC was run for 1 ×10 7 generations with sampling every 5,000 generations. The parameters were checked in Tracer ver. 1.7.1 (Rambaut et al., 2018). Final phylogenetic trees were edited in FigTree v. 1.4.

Genetic diversity
The total length of aligned sequences for the three chloroplast fragments (matK , trnD-E , trnS-G ) was 2161 bp with 25 variable sites, including 21 single nucleotide variant sites, 2 6-bp insertions and 5 deletions in 323 individuals, and 661 bp for ITS with 10 variable sites in 174 individuals from 18 Cerasus serrulatapopulations. Results of diversity analyses were shown in Table 1. At species level, the haplotype diversity (H d ) was 0.828 and the ribotype diversity (R d ) was 0.673; the nucleotide diversity (P i ) of cpDNA and ITS were 1.490 and 2.521, respectively. All of the data suggested a high genetic diversity of Cerasus serrulata . At population level,H d varied from 0.000 to 0.700 whileR d varied from 0.000 to 0.733;P i ×10 -3 of cpDNA varied from 0.000 to 1.090 while that of ITS varied from 0.000 to 2.820. Population CPL showed the highest haplotype diversity both in cpDNA and ITS regions. Combing the geographic distribution ( Figure 1) along with haplotype distribution (Figure 1, 4), networks (Figure 2, 5) and cladograms (Figure 3, 6), four regions of groups were divided (Table 1). At region level, H d varied from 0.266 to 0.747 and R d varied from 0.246 to 0.752, whileP i varied from 0.280 to 1.940 for cpDNA, and from 0.370 to 1.670 for ITS. Both kinds of sequences revealed a lowest diversity level in Northwest China (NW). But differently, region in Southeast China (SE) showed a highest H d while region in Southwest China (SW) showed a highest Rd, indicating a highest haplotype/ribotype diversity level in South China.

Genetic structure
Analysis of molecular variance (AMOVA) revealed a high differentiation level of both cpDNA (F st =0.69325) and ITS (F st =0.78692) in C. serrulata (Table 2). It was estimated that 69.32% of the whole variations of cpDNA sequences and 78.69% of the whole variations of ITS sequences existed among populations, signifying a phylogeographic structure in C. serrulata (N ST =0.388 >G ST =0.328, P < 0.05) ( Table 3) Table 1), among which 8 haplotypes were unique only to one population, while haplotype H13 was the most common one, followed by H5. Number of haplotypes in each populations ranged from 1 to 4, whereas 5 populations owned only one kind of halotype.
The distribution showed a geographically relevant pattern with 3 geographic groups, also conferred by the haplotype network ( Figure 2) and phylogenetic tree (Figure 3). The first group distributed in the northwest (NW), clearly involving population MP, LiS and BTM, contained 3 haplotypes (H1-H3) with H3 being the most frequent one. The second group scattered in the southwest (SW), west from Dabie Mt. to the east of Hengduan Mt., comprising 6 haplotypes (H4-H9) in which H5 was the most dominant one. The third group dispersed from the northeast of Dabie Mt. to Korea, regarded as the east region (E), possessed the most abundant haplotypes (H10-H19), in which H13 took up the most common part.
In a higher resolution, Group E can be further separated into two groups: the northeast group (NE) involving haplotypes of H14-H19, and the southeast group (SE) comprising haplotypes of H10-H12, as four regional groups described above (Table 1). Nonetheless, group SE was not monophyletic in this partition. In a lower resolution, all the haplotypes could be allocated into 2 groups: group NW, and all the other groups, amalgamating group SW and E (NE+SE).

Distribution pattern and phylogenetic relationship of ITS ribotypes
A total of 9 ITS ribotypes were identified in the 174 individuals ( Figure 4, Table 1). Of all the ribotypes, 5 were unique only to one population and ribotype R4 was the most standard ribotype which covered 12 populations, and was the unique ribotype for 7 populations.
Taken together, the distribution pattern (Figure 4), ribotype network ( Figure 5) and phylogenetic tree ( Figure 6) exhibited a structure with 2 groups of these ribotypes which was consistent with the cpDNA haplotype relationship in a lower resolution. Group one, consisting of 5 ribotypes (R1-R5) with R4 being the most extraordinary ribotype, and involving 13 populations as the utmost group, located in Southeast China and the southwest region (SW+SE). Group two incorporated 4 ribotypes (R6-R9) , taking R7 as the main ribotype, distributed discontinuously in the northwest region and around Korea (including population FHS in China), covering the northwest and northeast groups (NW+NE).

Divergence time dating
The divergence time to the most recent common ancestor (TMRCA) ofC. serrulata was estimated to be around 10.6 Mya (PP=1; 95% HPD: 8.83-12.46 Mya), near the middle of Miocene (Figure 7, Table 4). During 10.6 Mya, the northwest group began its divergence from the other groups. When it came to 8.02 Mya, the southwest group and the east group involving the northeast and southest groups, started to separate away.

Population dynamics
In neutrality test, the value of Tajima's D and Fu'sF S of all populations and geographical groups was mainly positive, or unremarkable negative but P >0.05 (Table 5), denoting that the hypothesis of neutral evolution was not to be excluded. Mismatch distribution analyses implied that all geographical groups did not refuse to have experienced sudden expansion (P (SSD) > 0.05, P (Rag) > 0.05), except the northeast group whoseP (Rag)=0.04, slightly lower than 0.05, but also indicating a potential expansion (Table 6). Unimodal distribution curves detected in all and in each region further proved a recent demographic expansions (Figure 8). The expansion time of each groups were accessed according to the formula T =τ/2u (Rogers & Harpending, 1992;Rogers, 1995). It was estimated that the southwest group (SW) expanded first in 0.719 Mya (95% CI: 0.022-1.452), followed by the southeast group (SE) in 0.665 Mya (95% CI: 0.000-1.249), and the last northwest group (NW) in 0.586 Mya (95% CI: 0.000-0.684).

Genetic diversity
Cerasus serrulata is a widely distributed species in China of the genus Cerasus and is involved in the subtropical and temperate monsoon climate zones. Both cpDNA (H d =0.828,P i =1.490) and ITS (R d =0.673, P i = 2.521) analysis proved a high genetic diversity of C. serrulata , higher than wild C. pseudocerasus (cpDNA: h =0.557, π =2.09) (Chen et al., 2015). The diversity based on ITS sequences was lower thanC. dielsiana (ITS: H d=0.879, π=3.56) (Zhu et al. , 2019), which was ought to be a result of less sampling, might not really mean a lower diversity level. The broad spreading pattern, allowed by its high adaptation ability, must be a significant reason for the high diversity, since diverse topographies such as alps, mountains and hills, as well as plains, additionally along with different climate zones have provided variant habitats for C. serrulata during adaptation in diffusion which impelled variations and high diversity. Obviously, large sampling, covering the most of its natural distribution records, was also a fundamental factor. Besides, another reason may be the long evolutionary history of C. serrulata , which was dated back to 10.60 Mya, even earlier than C. dielsiana (6.28 Mya) (Zhu et al., 2019), because the long existence allowed a more multiple accumulation of variants, especially strengthened by a wide and strong adaptation.  (Lv, 2006;Su, 2007), C. jamasakura (SSR:F ST = 0.043) (Tsuda et al., 2009). This might because both C. serrulata and C. dielsiana are much more widely distributed than most of species in Cerasus . Additionally, adequate seed flow were also believed to weaken the interspecific divergence of C. pseudocerasus and other similar Cerasusspecies (Chen et al., 2015;Kato et al., 2011;Tsuda et al., 2009;Zhu et al., 2019).
Together of the cladograms and networks of both cpDNA and ITS sequences, finally two groups of C. serrulata within all the regions were defined: one mainly consisted of the northwest groups (NW), and the other was primarily composed of the left groups (SE+SW). Nevertheless, the northeast group (NE), involving population FHS in Liaoning Province and a foreign population K-NS in Korea, appeared a confusing phylogeographic pattern. It was assigned to group SE+SW in cpDNA analysis, but was related to group NW in ITS analysis. An earlier study based on nuclear SSRs, involving the main northeast and southeast populations of C. serrulata , indicated a consistent connection within the eastern regions involving population FHS and K-NS (Yi et al., 2018), which in accordance with the result of cpDNA analysis, making the distribution more puzzled. N ST /G ST analysis manifested no or weak phylogeographic structure within groups in northwest (NW) (N ST =0.305 <G ST =0.461, P > 0.05) and northeast (NE) (N ST =0.496 >G ST =0.473, P < 0.05). Actually,C. serrulata distributions were also recorded in Hebei Province and Heilongjiang Province in China, as well as in Japan, but we failed to obtain samples in these regions. Thus, the lack of the comprehensive sampling in Northeast China and Japan was mainly ought to be the reason of this confusion.

Demographic dynamic and potential glacial refugia
Collectively, two genealogies of C. serrulata were detected: the northweast (NW) genealogy and the other genealogy consisted of the left groups involving the northeast (NE), southeast (SE) and southwest (SW) groups in our sampling areas, located mainly in the south and east of China within the subtropical and temperate monsoon climate zones. But the phylogeographic pattern did not compare the geographic pattern or climate zones.
The most recent common ancestor of C. serrulata appeared around 10.6 Mya in the mid-late of Miocene, which was also the divergence time of the two genealogies, calculated in the molecular dating analysis. To be detailed, the northwest group (NW) which formed one lineage began to divided out firstly in 10.6 Mya; afterwards, the southwest (SW) and the east (E, containing NE+SE) groups in the other lineages started to differentiate away from each other during 8.02 Mya (Fig. 7, Table 4). In other species, adjacent time was also detected of C. dielsianaancestor (6.28 Mya) (Zhu et al., 2019), Polystichumglaciale ancestor (6.89 Mya) (Dong et al. , 2018) and the divergence of Ixiolirion tataricum and I. songaricum(˜7 Mya) . It seems that the divergence event of many species was concentrated during such a period of time, and this time overlapped with that of the Himalayan movement and that of the Arctic ice-sheet to begin developing while the global climate was cooling down (Liu et al., 1998) and drier (Zhao et al., 2018;Ma & Tian, 2015). The strong landform and climate changes might lead to such a concentration of species divergence. In this, the uplifting of Qinling Mountains might took place accompanied by the Himalayan movement (Xue et al., 2004), and gradually isolated the north groups from the other, finally resulting in the divergence of the two geographical lineages.
Demographic expansion was recognized both in neutrality test and mismatch distribution analysis, except the northeast group (NE). The expansion time of group southwest (SW), southeast (SE) and northwest (NW) were dated back to 0.719 Mya, 0.665 Mya and 0.586 Mya, respectively (Table 3). All the time was before the the Last Glaciation Maximum (LGM, from 0.0265 Mya to 0.019-0.02 Mya) during the Last Glaciation of Quaternary glacial period, indicating that the distribution center and pattern of C. serrulata was formed before the LGM, which was accordant to that of genus Cerasus (Cao, 2006;Li, 2009). Comparing these time with the global temperature change curve of the Earth since 5 Mya (Cao, 2006;Li, 2009;Wang, 2014), it was found that these time was all sited in the uprising temperature periods in the frequent coolingwarming climate oscillations (Lisiecki & Raymo, 2005). The warmed-up temperature which also brought humidity should have enabled the expansions. Besides, within all these populations, CPL possesses the highest diversity which was located in Guizhou Province in the northwest group (NW). Regarding the divergence time and genetic diversity level together, two lineage colonization events were inferred. One was confined to Qingling Mountain and Taihang Mt. The other comprised two separate steps: firstly from the Wuling Mt. to Jiangnan Hilly; and then northeast to the coast of Asia. Glacial refugia which protected species to survive are usually considered to be large mountains. Nevertheless, importance of microrefugia such as smaller massifs or lowland sites is rising more and more attentions (Tomasz et al., 2019;Zhu et al., 2019). The broad land of the southeast of Asia, linked together by Bohai Sea and the coast of Asia today as sea level declined resulted from the glaciation (Guo et al., 2014), as well as valleys in east-west mountains such as Qinling Mt, provided refuges and expansion accesses for many species (Wulufu, 1964;Zhou, 1997). In this, the large mountain of Qinling, Wuling and Taibai Mts in Korea, as well as some microrefugia in Jiangnan Hilly especially the Huang Mt., Lu Mt. and Tianmu Mt. were inferred to be glacial refugia for C. serrulata , also having regard to their higher genetic diversity. In addition, the consistent distribution pattern in cpDNA and SSR analysis (Chen, 2016;Yi et al., 2018) of the northeast (NE) and southeast (NW) groups which are isolated by the Bohai Sea is also elucidated by the conjunction of Korea and eastern China during glacial period. All the lineage colonizations and potential refugia are generally coincident with the result of phylogeographic analysis of Cerasus (Li, 2009), indicating that the wide spread C. serrulta can be a representative species of the genus Cerasus in the evolutionary process.

Taxonomic reconsideration
Cerasus serrulata , a complex with ample variations and transition phenotypes some of which are morphologically specialized in the process of diversification by specific environments, has brought about taxonomic controversies. Here, C. xueluoensis , published in 2013 (Nan et al., 2013), treated as an outgroup in this study, was totally mixed up with haplotypes or ribotypes of C. serrulata in cladograms built by both cpDNA and ITS sequences (Figure 3, Figure 6). To further discuss the relationship between C. xueluoensis and C. serrulata , more wild related species were added to rebuild the cladograms based on ITS sequences. This time, C. xueluoensis was separated from the clade formed by all ribotypes of C. serrulata (Figure 9). However, more adequate and accurate information is needed to define the interrelationship between C. xueluoensis and C. serrulata .
Interestingly, C. serrulata var. lannesiana was significantly detached from C. serrulata in this tree ( Figure  9), supporting its treatment as C. lannesiana . C. laoshanensispublished in 2017 was described to resemble C. serrulata (Zang, 2017). Nevertheless, population LaoS in this study representing C. laoshanensis , showed no specificity from C. serrulataindividuals in any relationship analysis. Thus, it was endorsed to mergeC. laoshanensis into C. serrulata . Besides, C. huangangensis , a record in 2007 (Yi, 2007), represented by population HGS which showed distinctiveness as exclusively possessed only one kind of haplotype or ribotype both in cpDNA and ITS (Figure 1, Figure 4), also within the big clade in both trees of C. serrulata , is ought to treated as C. serrulata var. huangangensis .

Conclusion
We found dominant genetic diversity and existence of phylogeographic structure in Cerasus serrulata . The estimated divergence time and expansion events further indicated that the distribution pattern and center of C. serrulata was formed before LGM. Two geographical lineage were inferred. One was confined to Qingling Mountain and Taihang Mt. The other was from the Wuling Mt to Lu-Huang Mt, and then went northeast to the coast of Asia, totally with two separate steps. Taxonomic statements of several controversial related species were reconsidered. C. xueluoensis was close to C. serrulata but also exhibited differentiation at species level, which needs further study. C. lannesiana is confirmed rather than C. serrulatavar. lannesiana . And finally a new variation, C. serrulatavar. huangangensis is identified.     Figure7 BEAST-derived chronograms of Cerasus serrulata based on cpDNA sequences. Numbers at each node indicate the node age (million years ago, Mya). Critical nodes are marked in red circles and their node ages are enlarged in red squares. The blue bars illustrate the extent of the 95% highest posterior density (HPD) credibility intervals for each divergence time.