2.2 | Population structure and phylogenetic relationship
Expected (H E) and observed (H O) heterozygosity, inbreeding coefficient (FIS ), and population differentiation values (F ST) were calculated using Stacks. Population structure among the 23 native and six introduced populations was analyzed using three complementary approaches.
First, the most likely number of genetic clusters (i.e., K) in the dataset was estimated, and individuals were assigned into each of them using fastSTRUCTURE v1.040 (Raj et al. 2014). The algorithm ran following an admixture model with allele frequencies correlated and without any geographic a priori about localities. The algorithm was parallelized and automated using Structure_threader (Pina-Martins et al. 2017), and ran for K ranging from one to 29. The chooseK.pyfunction was used to select the most likely number of genetic clusters. Plots were created by Distruct v2.3 (Chhatre 2019) (available at http://distruct2.popgen.org).
Second, genetic clustering was described using a principal component analysis (PCA) and a discriminant analysis of principal components (DAPC), creating discriminant functions that maximize variance among groups while minimizing variance within groups (Jombart et al. 2010). The most likely number of genetic groups was first inferred by the find.clustersalgorithm on the principal component analysis PCA data, with the Bayesian information criterion utilized to select the number of genetic groups. The optimal number of principal components to inform the DAPC (i.e., maximizing discriminatory power between groups, while preventing overfitting) was then defined using the functionoptim.a.score . Both the PCA and DAPC were performed in R (R Core Team 2020) using theadegenet package (Jombart 2008). Third, population structure was visualized using the relatedness matrix produced by the RADpainter and fineRADstructure software (Malinsky et al. 2018). This method calculates co‐ancestry between samples as an independent assessment of population structure. Analyses ran using default parameters of 100,000 burn‐in and 100,000 MCMC iterations, and results were visualized in R through scripts provided with the program (available at http://cichlid.gurdon.cam.ac.uk/fineRAD structure.html).
Phylogenetic relationships among R. flavipes individuals were inferred using maximum likelihood (ML) analysis implemented in RAxML v8.2.12 (Stamatakis 2014). In addition, 16 individuals of the sister species R. virginicus were used as an outgroup. An acquisition bias correction was applied to the likelihood calculations, removing invariant sites from the alignment through the Phrynomics R script (available at https://github.com/bbanbury/phrynomics/). The rapid bootstrap analysis and search for the best-scoring maximum likelihood tree was performed using the extended majority rule (MRE)-based bootstopping criterion (Pattengale et al. 2010). Analysis was performed using the GTR+G nucleotide substitution model.