Demographic history simulation
To test whether the three putative subspecies represent separately
evolving lineages, we built models and simulated sequences under these
models using the ms software (Hudson, 2002). Ten models were constructed
to reflect demographic history of the three subspecies (Simulation 1):
(1) The three subspecies are not separate; (2) A. m. australasicaand A. m. eucalyptifolia are not separate; (3) A. m.
marina and A. m. eucalyptifolia are not separate; (4) A.
m. marina and A. m. australasica are not separate; (5) All three
subspecies are separate and A. m. marina diverged first; (6) All
three subspecies are separate and A. m. eucalyptifolia diverged
first; (7) All three subspecies are separate and A. m.
australasica diverged first. (8-10) have the same divergence topology
as models 5, 6 and 7 but with gene flow allowed between A. m.
marina and A. m. eucalyptifolia (Figure 3b). Long-term effective
population sizes of the populations (N) and coalescent times (T) were
common among all models. Notably, to reduce the complexity of parameter
setting and speed up computation, all population size parameters were
derived from a single parameter N0 randomly picked from
the prior distribution. In models with more than one lineage,
N0 was assigned to one lineage and the N for other
specific lineage was produced by multiplying N0 by
θx/θ0, where θx and
θ0 are the observed θ of the specific and the assigned
lineage respectively.
For each model, we performed 100,000 coalescent simulations using the ms
program (Hudson, 2002). Each simulation contained 80 loci of 1000 base
pairs. Mutation rate was set at 3.26e-8/generation/bp,
which was estimated by phylogenomic comparisons with closely related
species on whole genomes (He et al., 2020). The sample size of each
population was consistent with our field sampling described previously
(Table 1). Demographic parameters were drawn randomly from a uniform
prior distribution. Identical prior distributions of corresponding
parameters were set for models within each set (Supplementary Table 2).
Ten summary statistics were calculated for each simulated data set,
including segregating site number (S), Watterson’s estimator (θ),
nucleotide polymorphism (π) and Tajima’s D of each population, as well
as DXY and FST . Summary
statistics were calculated for each simulation independently. Euclidean
distances were calculated by comparing simulated statistics with
corresponding observed summary statistics. The tolerance of retaining
simulated data was set to 0.05. Bayesian posterior probabilities of each
model were then estimated following the Approximate Bayesian Computation
(ABC) schema (Beaumont, Zhang, & Balding, 2002) using the “abc”
package in R (Csilléry, François, & Blum, 2012). The “postpr”
function together with “neuralnet” option in the “abc” R package was
used to perform model selection.
We also built four models to test whether maBB genetically belongs toA. m. marina or A. m. eucalyptifolia(Simulation 2). In model v1 and v2, maBB (constant effective population
size of Nbb) and A. m. marina(Nma) coalesced at vT1 generations ago,
and then the common ancestor further coalesced with A. m.
eucalyptifolia (effective population size Neu) at
vT0 generations ago
(vT0>vT1). Model v1
differed from v2 by presence or absence of gene flow (m1and m2) between maBB and A. m.eucalyptifolia . Similarly, in models v3 and v4, maBB
(Nbb) coalesced with A. m. eucalyptifolia(Neu) at vT1 generations ago. The common
ancestor then coalesced with A. m. marina (effective
population size Nma) at vT0 generations
ago (vT0>vT1). Nine summary
statistics, namely Watterson’s estimator (θ) for each population and
pairwise FST and DXY, were
used in the model selection procedure similar to the one previously
described.