2.2 | Population structure and phylogenetic
relationship
Expected (H E) and observed
(H O) heterozygosity, inbreeding coefficient
(FIS ), and population differentiation values
(F ST) were calculated using Stacks. Population
structure among the 23 native and six introduced populations was
analyzed using three complementary approaches.
First, the most likely number of genetic clusters (i.e., K) in
the dataset was estimated, and individuals were assigned into each of
them using fastSTRUCTURE v1.040 (Raj et
al. 2014). The algorithm ran following an admixture model with allele
frequencies correlated and without any geographic a priori about
localities. The algorithm was parallelized and automated using
Structure_threader (Pina-Martins et al.
2017), and ran for K ranging from one to 29. The chooseK.pyfunction was used to select the most likely number of genetic clusters.
Plots were created by Distruct v2.3
(Chhatre 2019) (available at
http://distruct2.popgen.org).
Second, genetic clustering was described using a principal component
analysis (PCA) and a discriminant analysis of principal components
(DAPC), creating discriminant functions that maximize variance among
groups while minimizing variance within groups
(Jombart et al. 2010). The most likely
number of genetic groups was first inferred by the find.clustersalgorithm on the principal component analysis PCA data, with the
Bayesian information criterion utilized to select the number of genetic
groups. The optimal number of principal components to inform the DAPC
(i.e., maximizing discriminatory power between groups, while
preventing overfitting) was then defined using the functionoptim.a.score . Both the PCA and DAPC were performed in R
(R Core Team 2020) using theadegenet package (Jombart 2008).
Third, population structure was visualized using the relatedness matrix
produced by the RADpainter and fineRADstructure software
(Malinsky et al. 2018). This method
calculates co‐ancestry between samples as an independent assessment of
population structure. Analyses ran using default parameters of 100,000
burn‐in and 100,000 MCMC iterations, and results were visualized in R
through scripts provided with the program (available at
http://cichlid.gurdon.cam.ac.uk/fineRAD
structure.html).
Phylogenetic relationships among R. flavipes individuals were
inferred using maximum likelihood (ML) analysis implemented in RAxML
v8.2.12 (Stamatakis 2014). In addition,
16 individuals of the sister species R. virginicus were used as
an outgroup. An acquisition bias correction was applied to the
likelihood calculations, removing invariant sites from the alignment
through the Phrynomics R script (available at
https://github.com/bbanbury/phrynomics/). The rapid bootstrap
analysis and search for the best-scoring maximum likelihood tree was
performed using the extended majority rule (MRE)-based bootstopping
criterion (Pattengale et al. 2010).
Analysis was performed using the GTR+G nucleotide substitution model.