Demographic history simulation
To test whether the three putative subspecies represent separately evolving lineages, we built models and simulated sequences under these models using the ms software (Hudson, 2002). Ten models were constructed to reflect demographic history of the three subspecies (Simulation 1): (1) The three subspecies are not separate; (2) A. m. australasicaand A. m. eucalyptifolia are not separate; (3) A. m. marina and A. m. eucalyptifolia are not separate; (4) A. m. marina and A. m. australasica are not separate; (5) All three subspecies are separate and A. m. marina diverged first; (6) All three subspecies are separate and A. m. eucalyptifolia diverged first; (7) All three subspecies are separate and A. m. australasica diverged first. (8-10) have the same divergence topology as models 5, 6 and 7 but with gene flow allowed between A. m. marina and A. m. eucalyptifolia (Figure 3b). Long-term effective population sizes of the populations (N) and coalescent times (T) were common among all models. Notably, to reduce the complexity of parameter setting and speed up computation, all population size parameters were derived from a single parameter N0 randomly picked from the prior distribution. In models with more than one lineage, N0 was assigned to one lineage and the N for other specific lineage was produced by multiplying N0 by θx0, where θx and θ0 are the observed θ of the specific and the assigned lineage respectively.
For each model, we performed 100,000 coalescent simulations using the ms program (Hudson, 2002). Each simulation contained 80 loci of 1000 base pairs. Mutation rate was set at 3.26e-8/generation/bp, which was estimated by phylogenomic comparisons with closely related species on whole genomes (He et al., 2020). The sample size of each population was consistent with our field sampling described previously (Table 1). Demographic parameters were drawn randomly from a uniform prior distribution. Identical prior distributions of corresponding parameters were set for models within each set (Supplementary Table 2).
Ten summary statistics were calculated for each simulated data set, including segregating site number (S), Watterson’s estimator (θ), nucleotide polymorphism (π) and Tajima’s D of each population, as well as DXY and FST . Summary statistics were calculated for each simulation independently. Euclidean distances were calculated by comparing simulated statistics with corresponding observed summary statistics. The tolerance of retaining simulated data was set to 0.05. Bayesian posterior probabilities of each model were then estimated following the Approximate Bayesian Computation (ABC) schema (Beaumont, Zhang, & Balding, 2002) using the “abc” package in R (Csilléry, François, & Blum, 2012). The “postpr” function together with “neuralnet” option in the “abc” R package was used to perform model selection.
We also built four models to test whether maBB genetically belongs toA. m. marina or A. m. eucalyptifolia(Simulation 2). In model v1 and v2, maBB (constant effective population size of Nbb) and A. m. marina(Nma) coalesced at vT1 generations ago, and then the common ancestor further coalesced with A. m. eucalyptifolia (effective population size Neu) at vT0 generations ago (vT0>vT1). Model v1 differed from v2 by presence or absence of gene flow (m1and m2) between maBB and A. m.eucalyptifolia . Similarly, in models v3 and v4, maBB (Nbb) coalesced with A. m. eucalyptifolia(Neu) at vT1 generations ago. The common ancestor then coalesced with A. m. marina (effective population size Nma) at vT0 generations ago (vT0>vT1). Nine summary statistics, namely Watterson’s estimator (θ) for each population and pairwise FST and DXY, were used in the model selection procedure similar to the one previously described.