Genome-wide association study – exploring correlations between
allelic frequencies and colour morphs
Because population stratification is known to be a confounding factor in
genome-wide association studies, we tested for the existence of
population clusters in our dataset with a) PCA performed in SNPrelate
and a discriminant analyses of principal components (DAPC) in R’s
package adegenet (Jombart, 2008). We searched for the likely number of
clusters with the function find.clusters and chose the likely
number of K according to Bayesian Information Criteria (BIC). The two
main linear discriminants, i.e., explaining most the largest amount of
variation due to putative population structure, were used as covariates
in downstream genome-wide association analyses (GWAS). Three approaches
were employed to detect associations between colour (binary phenotype
either grey or brown ) and underlying genetic variation
among RADtags. The p-values to consider associations significant
or suggestive were defined by correcting for multiple comparisons
according to the number of utilized SNPs - at the end of filtering steps
- following Guo (2017) (Guo et al., 2017). Specifically, the thresholds
(log(p) ) defined for suggestive and significant associations
corrected for the number of input loci ranged between 3.65 and 4.95 for
the 4755 loci utilized in PLINK’s GLM after filtering and between 4.10
and 5.40 for the 12673 loci utilized in GEMMA after filtering. First, we
utilized Plink2’s (Chang et al., 2015) generalized linear model
logistic-Firth hybrid regression while after applying the following
filters to the SNP dataset: Hardy-Weinberg Equilibrium = p-value
< 1E-6; minor allelic frequency = 0.05; SNP presence = 80%;
and r2 = 0.1 for pairwise linkage-disequilibrium. We
then utilized GEMMA (Zhou & Stephens, 2012) with the following filters:
SNP presence = 5%, minor allelic frequency = 0.05, Hardy-Weinberg
Equilibrium = 0.001. Family structure and/or pedigree was considered by
calculating and incorporating a relationship matrix with centered
genotypic variance. In GEMMA, associations between SNPs and covariates
were inspected with a univariate linear mixed model with one intercept
and two covariates corresponding to linear discriminants obtained with
DAPC. Lastly, we also used the R package wtest to further explore
allele-allele interactions (Sun et al., 2019). Wtest is based on
its namesake, the W-test (Wang et al., 2016), developed to post-hoc
explore epistatic interactions.