Haplotype network reconstruction
We assigned cpDNA SNPs to different haplotypes using the pegas R package v.1.1 . We constructed the haplotype network using the maximum parsimonious tree method available in PopART v.1.7 , with a 95% statistical parsimony criterion. To test for phylogeographic signal, we calculated the N ST, and theG ST , using Permut v.1.2.1 applying 1,000 permutation tests .
Inference of population size change in M. fragrans
Past demographic events in M. fragrans populations from the Moluccas were inferred using Approximate Bayesian Computation (ABC) analyses. ABC approaches estimate posterior probability distributions substituting the calculation of likelihood by comparing observed data with simulated data . Using the nSSR dataset, we characterized the demographic evolution through time for each population using the program DIYABCskylineplot . Coalescent simulations and the computation of prior parameters and summary statistics were done using command line in DIYABC v.2.0 . We set the log-uniform priors for scaled effective population size (θ ) ranging from 1 to 1,000,000 for each population. Time parameters were randomly drawn at the interval [1;2000], and uniform prior in the interval [0;1] for the generalized stepwise mutation model. We performed 100,000 simulations for each population.
To estimate the robustness of past demographic event inferences, we further explored the likelihood of demographic changes through time using an importance sampling algorithm under a maximum likelihood framework as implemented in MIGRAINE . First, we set the parameter of multistep mutation proportion (p GSM) from the range [0;1] on each population. We used the OnePopVarSize model since this model consider a single past change in population size. We determined high values of parameter points and trees to ensure smooth demographic projection with 20,000 trees, 500 points, and a total of 16 iterations. All demographic settings inferred by MIGRAINE were using the GSM model with a finite number of allelic states (K = 50).
Reconstructing population history of M. fragrans
We inferred the ancestry and divergence time of the genetic clusters defined by STRUCTURE (see Results section), using DIYABC Random Forest v.1.1.1 . Populations assigned to a given genetic cluster according to STRUCTURE results were pooled as a single genetic group to reduce demographic model complexity. For these analyses, we only kept individuals with a cluster assignment probability higher than 0.85.
We first performed the analysis on each dataset (nSSRs and cpDNA) separately to understand the behavior of confidence interval on the posterior distribution of demographic parameters. We then combined the two datasets and analyzed them together to assess the robustness of the analyses.
Two scenarios of species’ population divergence were tested with historical demographic parameters drawn from the prior distribution (Supplementary Table S1), with no admixture event included in the analysis (Supplementary Figure S1). We also linked this demographic prior distribution according to historical data of M. fragransdistribution in the Moluccas as stated by Hanna and Lape to infer the ancestry and divergence time of the genetic groups.
For each scenario, 100,000 coalescent simulations were performed. We checked for the congruence of simulation results under the different scenario with the observed data using (i) Principal Component Analysis (PCA) on the summary statistics from simulated data sets, and (ii) projecting the observed data on the first two PC axes. We used ABC with random forest (RF) to perform scenario choice and parameter estimation, in DIYABCRF v.1.1.1 module. For scenario choice, we used the maximum number of simulated data sets (100,000 each scenario) and the best scenario were chosen according to a classification vote. For parameter estimation, RFs of 1000 trees were used to obtain median point estimates for each demographic parameter in the model, as well as a 95% posterior credible interval.