2.2 SNP filtering regimes
The genetic datasets were imported into R as genlight objects and
filtered using dartR package v2.0.4 (Mijangos et al. 2022) and
our designed functions in R v4.2.1 (R Core Team 2022). The tally
of filtering steps and remaining loci and individuals is presented in
Table 2.
The first step in both filtering regimes controlled for very close
physical linkage by keeping only one randomly-selected SNP per sequenced
fragment (i.e., remove secondaries; method = ‘random’). The resultant
datasets served as the starting point for both regimes:
a) ‘Standard’ regime. The next filtering step removed SNPs with
exceptionally low (< 5) and twice the average read depth,
followed by the removal of SNPs with large amounts of missing data
(> 70th percentile). At this point,
individuals with > 20% missing data were dropped from the
datasets, as were loci that became monomorphic as a result.
b) ‘Removing sex-linked loci’ regime. We started by removing
sex-linked loci with function filter.sex.linked . For EYR, all but
one individual in the input genlight were of known sex (352 females and
429 males), while for YTH, 646 out of 641 individuals had known sex (289
females and 347 males). The output was used to infer the genetic sex of
all individuals with function infer.sex , which led to a number of
more-accurate sex assignments (one de novo sex assignment and one
re-assignment in EYR, and five de novo sex assignments in YTH).
We continued by removing highly heterozygous SNPs with functionfilter.excess.het . The rest of the steps (i.e., filtering for
read depth, missing data and monomorphic loci) were done using the same
parameters as for the ‘Standard’ regime.