2.2 SNP filtering regimes
The genetic datasets were imported into R as genlight objects and filtered using dartR package v2.0.4 (Mijangos et al. 2022) and our designed functions in R v4.2.1 (R Core Team 2022). The tally of filtering steps and remaining loci and individuals is presented in Table 2.
The first step in both filtering regimes controlled for very close physical linkage by keeping only one randomly-selected SNP per sequenced fragment (i.e., remove secondaries; method = ‘random’). The resultant datasets served as the starting point for both regimes:
a) ‘Standard’ regime. The next filtering step removed SNPs with exceptionally low (< 5) and twice the average read depth, followed by the removal of SNPs with large amounts of missing data (> 70th percentile). At this point, individuals with > 20% missing data were dropped from the datasets, as were loci that became monomorphic as a result.
b) ‘Removing sex-linked loci’ regime. We started by removing sex-linked loci with function filter.sex.linked . For EYR, all but one individual in the input genlight were of known sex (352 females and 429 males), while for YTH, 646 out of 641 individuals had known sex (289 females and 347 males). The output was used to infer the genetic sex of all individuals with function infer.sex , which led to a number of more-accurate sex assignments (one de novo sex assignment and one re-assignment in EYR, and five de novo sex assignments in YTH). We continued by removing highly heterozygous SNPs with functionfilter.excess.het . The rest of the steps (i.e., filtering for read depth, missing data and monomorphic loci) were done using the same parameters as for the ‘Standard’ regime.