2.3 Molecular analyses
Putatively neutral markers were
assessed using a combination of multivariate methods to detect
underlying population structure, which we expected to coincide with
coastal and inland lineages described in previous studies. A principal
component analysis (PCA) was plotted for all populations based on allele
frequencies of putatively neutral markers determined to be without
linkage disequilibrium (LD). The
PCA of putatively neutral markers accompanies a discriminate analysis of
principal components (DAPC) with the R package adegenet 2.1.0 to assign
probability of individual membership to genetic groups, K ,
revealed with the putative neutral marker PCA (Jombart, 2008; Jombart &
Ahmed, 2011). The DAPC recovers maximum genetic variation between
groups, while minimizing genetic variation within groups (Jombart, 2008;
Jombart & Ahmed, 2011). The ‘find.clusters’ adegenet function ran for
25 instances for K =1 through K =10. The Bayesian
information criterion (BIC) was averaged and scaled by the standard
deviation for each K value. The most appropriate number of
genetic groups was determined with the greatest ΔK value as
described in Evanno et al. (2005).
The distribution of genetic variation underlying adult migration timing
in steelhead across the landscape was described by genotype frequencies.
We examined 13 markers occurring on chromosome 28 within thegreb1L, rock1 , and intergenic region between greb1L androck1 that were previously shown to be strongly associated with
adult migration timing (Hess et al., 2016; Micheletti et al., 2018a;
Table 1). Initially the two most significant SNPs were retained from a
previous RAD study (Hess et al. 2016), and the remaining 11 with the
strongest association with migration timing from the whole genome
resequencing conducted by Micheletti et al. (2018a).To reduce
ascertainment bias, we examined genetic variation in this candidate
region from several populations of O. mykiss in the region to
design primers (Table 1). Premature, mature, and heterozygote genotypes
for migration timing were established based on genotype association from
previous studies (Hess et al., 2016; Micheletti et al., 2018a), as well
as using reference Skamania broodstock genotyped,
which is a hatchery-strain
intensively selected for early migration and cultured since 1956 with
steelhead from the Washougal and Klickitat Rivers (Chilcote et al.,
1986). Premature, mature, and
heterozygote migration timing genotype proportions were assessed across
all collection locations. A PCA of allele frequencies of adaptive
markers was also conducted for all collection locations to assess
genetic groupings based on migration timing.
We assessed linkage disequilibrium (LD) within the 13 candidate markers
to identify haplotype blocks that would be informative for estimating
frequencies of migration types.
Haplotype
blocks within the 13 candidate markers were defined with solid spine LD
analysis in the Java Runtime Environment software, Haploview 4.2, across
all collection locations (Barrett et al., 2005). A solid spine of LD was
extended across a haploblock if D’, or a normalization of the
coefficient of LD, exceeded 0.74. The same markers were assessed for LD
in individuals from coastal and inland lineages (as delineated by DAPC)
separately. The effect of population structure on the LD of the markers
was corrected in the analysis with the LDcorSV 1.3.2 R package (Mangin
et al., 2012; Appendix S1 Table S2).
Variation of genotype proportions
were also evaluated with various groupings of the candidate markers.
Redundancy analyses (RDA) were conducted for all Columbia River basin
collections to model the degree to which the variation in environmental
variables explained the variation in allele frequencies of candidate
markers included in the haplotype blocks (Borcard et al., 1992; Kierepka
& Latch, 2015). Redundancy analysis was performed on two sets of
collections, all populations and each lineage (coastal vs. inland) using
the R package Vegan 2.5-6 (Oksanen et al., 2019). We used environmental
variables that were significantly associated with adaptive genetic
variation in a previous study (Micheletti et al., 2018b; Table 2;
Appendix S1 Table S3). When two highly correlated (>0.75
pairwise correlation; Asuero et al., 2006) environmental variables were
identified, one was removed from further analyses and the variable kept
was determined from biological relevance to salmonids according to
previous studies (Olsen et al., 2011; Hecht et al., 2015; Micheletti et
al., 2018b). One-way analysis of variance (ANOVA) with a Tukey’s range
test (Tukey, 1949) identified significant variability in salmonid
habitat. Environmental variables were analyzed with the “envfit” PCA
function of the vegan R package. The ANOVA test and PCA together
determined significant environmental variables within and among O.
mykiss habitats measured in this study. The final RDAs were run with
significant environmental variables retained from permutation tests with
1000 permutations (α=0.05). Frequency of alleles in the haplotype block
associated with migration timing were correlated to environmental
variables with RDA constraint scores. Constraint scores indicated the
degree of correlation and whether there was a positive or negative
relationship between environmental variables and allelic frequencies.