2.3 | Population genomics
To identify and characterize genetic clusters within the combined river
dataset, and within the river-only datasets, two methods were used for
comparison: a Bayesian assignment test and a discriminant analysis of
principal components (DAPC). The Bayesian assignment test was performed
using the software package fastStructure (Raj et al., 2014). DAPC
analyses were run using the adegenet package for R (Jombart,
2008; Jombart et al., 2010).
Large modern SNP datasets can impose challenging computational
requirements in time and processing power. The software packagefastStructure makes use of efficient algorithms to employ a
Bayesian framework model for the inference of the total genetic clusters
(K) within the data, and assignments of fish to a cluster based on
individual genotypes without a priori definitions (Falush et al.,
2007; Hubisz et al., 2009; Pritchard et al., 2000; Raj et al., 2014).
The range of potential K values chosen included a value one above the
total number of sampling localities within each dataset (Pritchard et
al., 2000). K values = 1 - 11 were analyzed for the combined river
dataset, and K = 1 - 6 for each river separately. Additional parameters
used for fastStructure included ‘–cv=500 ’ which enabled
cross-validation over 500 test runs. The supplemental programStructure_threader was used to decrease the overall processing
time required by fastStructure by the automation and
parallelization of runs upon multiple CPU processing threads
(Pina-Martins et al., 2017). Structure_threader also automated
the identification of the most appropriate K value for each dataset
using the fastStructure chooseK.py script to pinpoint the
value of K that maximizes marginal likelihood (Raj et al. 2014).
Visualizations and plotting of population memberships and admixture fromfastStructure outputs were completed using Distruct v.2.3(Chhatre, 2019).
A DAPC identifies differences between groups through discriminant
functions (Jombart et al., 2010). The sampling localities within each
river system were used as the groups in this test. A DAPC analysis can
be substantially affected by the selection of user-defined numbers of
principal components (PC) to preserve. The find.clusters andxvalDapc functions within the R package adegenet provided
a procedure for effective cross-validation and optimization to identify
the number of PCs to keep for each dataset (Jombart & Collins, 2015).
The number of PCs retained for each analysis was therefore selected by
using the value of primary components with the lowest root mean squared
error (RMSE) after 100 iterations per PC values of 1 - 100 for the
combined dataset, and values 1 - 50 for each independent river system.
Pairwise FST values between all localities within each
dataset were calculated as described in Weir & Cockerham (1984) using
the hierfstat package for the R platform (Goudet, 2005; R
Development Core Team 2020). An analysis of isolation by distance was
conducted for the Volga-only and Meramec-only datasets using a Mantel
test. Pairwise FST values were linearized
(FST / 1 - FST) following Rousset (1997)
and river distance measures were used. The Mantel test was conducted
with 100,000 replicates in the R package ade4 (Dray & Dufour,
2007). Finally, a nested analysis of molecular variance (AMOVA) was
performed for each of the three datasets to further determine the
spatial structure of genetic diversity. The analyses were performed
using Arlequin v.3.5.2.2 (Excoffier & Lischer, 2010).