Genome-wide association mapping for dewlap traits
The genome-wide association study (GWAS) was based on all LD-filtered
markers. Prior to GWAS, missing data was imputed using BEAGLE v5.0
(Browning & Browning, 2016) at default settings. To investigate
correlations among the 11 traits that capture dewlap size (i.e., area
and perimeter) and color (i.e., hue, total brightness, mean brightness,
UV chroma, red chroma, yellow chroma, and composition of red, orange, or
yellow), we first used the ggcorrplot R package (v. 0.1.3;
Kassambara, 2019). The GWAS was then performed using two complementary
approaches, which we refer to as the “standard GWAS” and the
“ancestry-specific GWAS”. The standard GWAS consisted of a linear
mixed model implemented in GEMMA (Zhou & Stephens, 2012). We first used
GEMMA to calculate 10 relatedness matrices, by excluding SNPs from one
chromosome at a time (i.e., each matrix was based on SNPs from 9
chromosomes). The GWAS for markers on a given chromosome was then
performed using the relatedness matrix that excluded markers on that
same chromosome. This approach should improve power to detect
associations, given that the relatedness matrix does not include markers
that are in linkage disequilibrium with the marker being tested (Cheng
et al., 2013). We then obtained P values from the Wald test as
outputted by GEMMA.
The ancestry-specific GWAS was conducted in asaMap (Skotte et al.,
2019). To control for the confounding effect of population structure, we
relied on the first 10 PCs from a PCA computed using the adegenetR package (v. 2.1.1; Jombart & Ahmed, 2011), which were included as
covariates in the asaMap analysis. The PCA was based on all LD-filtered
autosomal SNPs. The asaMap software performs eight association tests,
with and without different ancestry-specific effects considered for two
ancestral populations (Skotte et al., 2019). Following asaMap manual
recommendations, we estimated genome-wide ancestry proportions of each
sample, as well as SNP allele frequencies for each ancestral population
using ADMIXTURE v. 1.3.0 (Alexander et al., 2009). For each dewlap
trait, we obtained genome-wide P values for all tests and
selected the test with the strongest association (i.e., the lowest
observed P value among all tested SNPs), following Bock et al.,
(2021).
For both the standard and the ancestry-specific GWAS results, we used
the GenABEL R package (Aulchenko et al., 2007) to calculate the
lambda genomic inflation factor (λ), and account for any remainingP value inflation, which can be caused by unaccounted population
stratification. Genome-wide significance thresholds were then set using
the Bonferroni method (0.05/total SNPs used for the association test),
which is the most conservative option used in association studies (e.g.,
Duggal et al., 2008; Kaler & Purcell, 2019). We also considered a
suggestive genome-wide significance threshold (1/total SNPs used for the
association test; Duggal et al., 2008), for which one false positive per
GWAS is expected. To visualize GWAS results, we used Manhattan plots,
made in R v.4.1.0 using publicly available scripts
(https://danielroelfs.com/blog/how-i-create-manhattan-plots-using-ggplot/).
Finally, to estimate the effect size of associations, we build linear
models in R. These models had number of alternative alleles (i.e.,
alleles different from those incorporated in the reference genome) at
the lead GWA SNP as the predictor variable, and the corresponding dewlap
trait as the response variable. Effect sizes were then estimated as
R2 values, using the “summary” function in R.
To better understand the observed effect of ancestry on dewlap traits,
we next focused on results of the ancestry-specific GWAS. Specifically,
we investigated how dewlap traits vary among genotype classes for
ancestry-specific associations that passed the Bonferroni or suggestive
significance thresholds. For this, we stratified the invasive A.
sagrei samples based on ancestry, using a cutoff of 70% Western Cuba
ancestry, as inferred using the STRUCTURE analysis. Because of the
genetic makeup of invasive A. sagrei populations across the
southeastern United States, this approach partitions samples into two
groups, one with limited hybridization (i.e., Western Cuba ancestry
samples) and another with frequent hybrids for which Central-eastern
Cuba ancestry predominates (see also Bock et al., 2021).
Identifying the signature of local adaptation for