Genome-wide-association analyses
To estimate SNP-based heritability for each trait, we ran Bayesian
mixture models for each trait independently using the BAYESR software .
We selected this method as it has been found to be more accurate in
estimating additive genetic variance explained by SNPs than alternative
methods (e.g., BSLMM or LMM ). All models included TL and sex as
covariates (except when modelling TL explicitly, which included only sex
as a factor), and all traits were standardised to a mean of 0 and
standard deviation of 1. We also aimed to estimate pairwise genetic
correlations between traits using bivariate models in BAYESR. However,
none of these models converged, likely owing to the large sample sizes
usually required to estimate genetic covariances, and hence are not
reported further.
To identify SNPs underlying phenotypic variation, we performed a GWA
using a linear mixed effects model approach with the program GEMMA .
This program fits models that control for relatedness among samples
and/or population stratification, which reduced false positives even
with small sample sizes. GEMMA was selected because it implements
generalized linear mixed models for traits with non-normal
distributions, and therefore robustly handles data on different scales.
These models estimate the linear coefficient for the relationship
between each SNP in turn and a given trait. P -values for each SNP
were calculated using Wald tests. We used a false discovery rate of 5%
to correct for multiple testing and a P -value cut off of
10-5 when identifying whether a SNP was putatively
associated with trait variance. All models included sex as a fixed
factor and length as a covariate, except for the model with total length
as a response variable which just fix sex. The number of SNPs used in
each model was dependent on the number of individuals available for each
trait, because gemma only models SNPs with less than 5%
missingness. We then identified whether there was overlap between
regions that were associated with trait variation in our data, and
quantitative trait loci (QTLs) previously mapped on the stickleback
genome , using LIFTOVER and custom R scripts.
Genome-environment association
analyses
We investigated gene-environment associations using latent-factor mixed
models (LFMM) in the R package LEA , which models loci against
environmental variables while controlling for unobserved latent
variables (i.e. spatial structure or autocorrelation). Models were run
for 100 000 iterations, with 10 000 burn‐in cycles and 5 replicates.
z‐scores were combined from five runs, and we used adjustedP ‐values using the genomic control method (a recalibration
procedure which decreases the false discovery rate ). Following guidance
of , the resulting P ‐values for the outlier tests were adjusted
for multiple testing to Q‐values using the false‐discovery rate method
as implemented in the “qvalue” 2.6.0 package. A Q-value for SNPs of
< 0.05 was considered significant (i.e. a “candidate SNP”).
We compared DIC of models that included one ecological variable at the
time to try to identify which ecological variable best predicted the
genomic data. To identify whether gene-environment associations were
linked to phenotypic variation, we then investigated whether any of the
candidate SNPs identified in any of the LFMM analyses were linked
(within 5kb) to SNPs identified in GWA analyses.
To explore the molecular function of genomic regions that showed
signatures of selection, we first analysed candidate genes for
enrichment of molecular functions. To do this, we identified genes that
the candidate SNPs were within 5kb of and compared these candidate genes
with the reference set of 20 805 genes across the stickleback genome
(‘gene universe’). Gene ontology (GO) information was obtained from the
stickleback reference genome on ENSEMBL using the R package BIOMART ,
and functional enrichment was investigated using the package TOPGO 2.42
and the Fisher’s exact test (at P < 0.01). To reduce
false positives, we pruned the GO hierarchy by requiring that each GO
term had at least 10 annotated genes in our reference list (“nodeSize =
10”). Secondly, physical overlap on the genome between candidate SNPs
identified in environment-association analyses and GWA SNPs indicate
regions of the genome under selection. For regions of the genome
containing both environmentally associated candidate SNPs and GWAs SNPs,
we identified (1) the genes in this region (within 5kb), (2) whether
haplotypes on that gene in our dataset are predicted to cause variation
in protein translation, and (3) the function of the gene. We used the
program “snpEff” to detect whether SNPs fall on coding regions
changing amino acid sequence.