2.6. Phylogeographic pattern
TreeMix v. 1.13 (Pickrell and Pritchard, 2012) was used to infer
population splitting and mixture patterns. The method employs
allele frequencies to build a graph-based model of the population
network (as opposed to a bifurcating tree) by first building a Maximum
Likelihood (ML) tree and then searching for migration events that
increase the composite likelihood (Flesch et al., 2020; Pickrell and
Pritchard, 2012). The program utilizes a Gaussian approximation to model
genetic drift (drift parameter) along each population (Flesch et al.,
2020; Pickrell and Pritchard, 2012). Using the combined data set
filtered for LD, a ML tree was built with a window size (k) of 500 SNPs,
evaluating from 0 to 16 migrations edges (m ), 10 iterations per
edge, using the “-noss” option to prevent overcorrection of sample
size. The optimal number of significant migration edges was then
inferred from the second-order rate of change in likelihood (Δm )
across incremental values of m with the OptM package in R
(Fitak, 2021).
In addition to network analyses and as a complementary result, a Maximum
Likelihood (ML) tree was built using the concatenated SNP dataset (2867
bp). Prior to concatenation, the combined VCF file was filtered using
BCFtools (Li, 2011) to remove individuals with more than 10% of missing
data (N=8) and candidate SNP outliers (N=18, see section 2.7). The
filtered VCF file was then converted to PHYLIP format using thevcf2phylip.py script (Ortiz, 2019). Consensus sequences for each
population were estimated with the function consensusString in
the R package ‘Biostrings’. The two Cabrera localities were considered
as a single population (Fst =0.03) and data merged. A ML tree was then
built on the consensus alignment with IQ-TREE 2 v2.2.0.8 (Minh et al.,
2020) using variable sites only and applying an ascertainment bias
correction for SNP data (model GTR+ASC) (Lewis, 2001), with 10000
pseudo-replicates.
To identify the most ancestral population in our dataset we used IQ-TREE
2 with non-reversible substitution models (model 12.12) (Naser-Khdour et
al., 2022) with 1,000 ultrafast bootstrap replicates, using both the
consensus alignment as well as one random specimen per population (all
sites or only variables). The program performs a bootstrap analysis to
obtain several ML rooted bootstrap trees; it then computesrootstrap support values for each branch in the tree, as the
proportion of rooted bootstrap trees that have the root on that branch.
A root testing was then performed with option –root-test to compare
the log-likelihoods of the trees being rooted on every branch of the ML
tree. The resulting trees were visualized and edited with Figtree v1.4.4
(http://tree.bio.ed.ac.uk/software/figtree/).