Modelling the pattern of lineage-splitting within A. marina
To infer the lineage-splitting pattern within A. marina , we compared our real sequences against simulated sequences under eight models assuming a variety of topologies (Simulation 1). Simulated sequences under these models were produced using the ms software (Hudson, 2002). The models are: (1) panmictic; (2) eucalyptifoliaby itself and the other two subspecies together; (3) australasicaby itself and the other two subspecies together; (4) marina by itself and the other two subspecies together; (5) three separate lineages with eucalyptifolia diverging first; (6) three lineages with marina diverging first; (7) three lineages withaustralasica diverging first. (8) three lineages diverging simultaneously. In simulation 1, groups were divided according to subspecies designation in the prior. As a control, we constructed artificial groups by pooling two populations each from one subspecies. Using these groupings, we repeated the simulations and model selection on the eight models described above (Simulation 2).
The effective population sizes of the lineages (N) and coalescent times (T) were common among all models. Notably, to reduce the complexity of parameter setting and to speed up computation, all population size parameters were derived from a single parameter N0randomly chosen from the prior distribution. In models with more than one lineage, N0 was assigned to any one of the lineages (using as baseline). N of other lineages were produced by multiplying N0 by θx0, where θx and θ0 are the observed θ of the current and baseline lineage respectively.
For each model, we performed 100,000 coalescent simulations using the ms program (Hudson, 2002). Each simulation contained 80 loci of 1000 base pairs. Mutation rate was set at 3.26x10-8/generation/bp, estimated from phylogenomic comparisons to closely related species with whole genomes (He et al., 2020). The sample size of each group was consistent with our real field sampling (Table 1). Demographic parameters were drawn randomly from a uniform prior distribution. Identical prior distributions of corresponding parameters were set for models within each set (Table S3 & S4).
Ten summary statistics were calculated for each simulated data set, including segregating site number (S), Watterson’s estimator (θ), nucleotide polymorphism (π) and Tajima’s D within each group, as well asDXY and FST for each pair of groups. Summary statistics were calculated for each simulation independently. Euclidean distances were calculated by comparing simulated statistics with corresponding observed summary statistics. The tolerance of retaining simulated data was set to 0.05. Bayesian posterior probabilities of each model were then estimated following the Approximate Bayesian Computation (ABC) schema (Beaumont, Zhang, & Balding, 2002) using the “abc” package in R (Csilléry, François, & Blum, 2012). The “postpr” function together with “neuralnet” option in the “abc” R package was used to perform model selection.
We also built four models (v1, v2, v3, and v4) to test whether the population from Bunbury, Australia (BB, Table1) genetically belongs tomarina or eucalyptifolia (Simulation 3, Table S5). In model v1 and v2, BB (constant effective population size of Nbb) and marina (Nma) coalesced at vT1 generations ago and then the common ancestor further coalesced with eucalyptifolia (effective population size Neu) at vT0 generations ago (vT0>vT1). Model v1 differed from v2 by presence or absence of gene flow (m1and m2) between BB and eucalyptifolia . Similarly, in models v3 and v4, BB (Nbb) coalesced witheucalyptifolia (Neu) at vT1generations ago. The common ancestor then coalesced with marina(effective population size Nma) at vT0generations ago (vT0>vT1). Nine summary statistics, Watterson’s estimator (θ) for each population and pairwise FST and DXY , were used in the model selection procedure similar to the one previously described.