2.8. Outlier association with environmental variables
A potential association between outliers and geographic/environmental
variables (Major Archipelagos, Islet surface, Maximum altitude and SDI)
was assessed through a permutational multivariate analysis of variance
(PERMANOVA) with the function adonis2 in the ‘vegan’ R package
(Oksanen et al., 2022). A Euclidean distance matrix among individuals
was built on allele frequencies calculated with gl.percent.freqin the ‘dartR’ package (Gruber et al., 2018). Previously, missing
genotype values were imputed using the function ‘gl.impute’ in the dartR
package. Model selection was performed by first fitting a model with all
two-way interactions and all main terms, and then refitting the model
after excluding non-significant interactions. Significance of model
terms was assessed using marginal tests (1000 permutations). SNPs that
significantly discriminated between major archipelagos were identified
with logistic regression models using p<0.01 after adjusting
p-values to minimize false discovery rate (FDR). A genotype matrix coded
as 0/1/2 was obtained using function extract.gt in the ’vcfR’ R
package (Knaus and Grünwald, 2017, 2016) and used to visualize
geographic clustering of individual SNP outliers by a Principal
Components Analysis (PCA).
Results
We generated a total of 15 billion paired-end GBS raw reads for 100
specimens from eight distinct localities/islands (99% passing the
filters, Table S2). Newly generated data and previous RADSeq data from
Bassitta et al. (2021) were individually processed through the
same pipeline and combined for an integrative analysis of all 16
populations. After genome mapping, variant calling and filtering, we
obtained a total of 143272 biallelic SNPs for the GBS dataset (100
individuals, 8 localities), 166943 for the RADseq dataset (91
individuals, 10 localities) and 5523 for the COMBINED dataset (191
individuals, 16 localities, including only common loci between GBS and
RADSeq datasets) (Table 2). As expected, multiallelic variants
represented a minor fraction of the total polymorphic sites (2.1% for
GBS and 2.5% for RADSeq) and were excluded from the study.
Both raw and filtered SNPs were widely distributed across the 18
autosomal and two sexual chromosomes (Z and W), with SNP frequencies
being proportional to chromosome size (Figure 2; chromosome W
representation strictly depends on the number of males in the dataset,
see Table S1). Both sequencing methods covered the entire genome
diversity, with chromosome representation being highly congruent among
the three datasets (Figure 2).