RESULTS
Identification and removal of sex-linked loci
The function filter.sex.linked identified and removed 3,807 sex-linked loci in EYR (10.7% of the total 35,663 loci tested; Table 3). Of these, 69.3% were identified based on differential call rate between the sexes (i.e., W-linked and sex-biased; Figure 3a, b) and 30.7% based on differential heterozygosity between the sexes (i.e., Z-linked and gametologs; Figure 3c, d). For YTH, the function identified 3,414 sex-linked loci (4.6% of the total 74,470 loci tested; Table 3) of which 65% were identified by call rate, and 35% by heterozygosity (Figure S1).
Comparison of ‘before’ and ‘after’ datasets revealed that, when the function filter.sex.linked was not used, 28.7% (n = 1,093) and 19.0% (n = 650) of the sex-linked loci remained in the final SNP datasets of EYR and YTH, respectively. Standard locus-filters had variable efficiency in removing different types of sex-linked loci (Figure 4): together, read depth and loci missing data filters were capable of removing all W-linked loci, and 90% and 99% of sex-biased loci from EYR and YTH datasets, respectively. However, they were unable to remove 75% and 57% of Z-linked loci (EYR: n = 620 were not removed; YTH: n = 652), and 71% and 37% of gametologs (EYR: n = 241; YTH: n = 21). Other filtering steps such as removing individual missing data and applying a minor allele count (MAC) had little effect on removing additional sex-linked loci (Figure 4). This inefficiency translated in 7.8% and 5.7% of the final dataset SNPs being sex-linked in EYR and YTH, respectively.
Impact of removing sex-linked loci on population genetic diversity, individual heterozygosity, genetic structure and parentage analyses
Population genetic diversity. In general, removal of sex-linked loci produced a decrease in estimates of population genetic diversity (Figure S2 and S3). However, the magnitude of this change varied with different measures of genetic diversity and, importantly, magnitude and direction of the change ranged across populations (Figure 5): the largest impact was on F IS, which ranged from 9.3% decrease to 2% increase, and private alleles (PA), which ranged from 8% decrease to 0.5% increase. Expected heterozygosity (He) experienced decreases ranging from 0.7% to 2.4%. The direction and magnitude of the change did not correspond to the F:M ratios of samples (EYR: Crusoe = 0.87, Muckleford = 0.93, Timor = 0.79, Wombat = 0.39; YTH: Cassidix = 0.94, Gippslandicus = 0.55, Melanops = 1.0, Meltoni = 0.1).
Individual observed heterozygosity (Ho). The removal of sex-linked loci produced a statistically significant change in individual Ho whose magnitude and direction varied between sexes and species (Table 4). For EYR, the decrease in female and male Ho was significant but small (F: 0.2% decrease, Cohen’s D = 0.35; M: 0.3% decrease, Cohen’s D = 0.23). For YTH, the change was an order of magnitude larger and went in opposite directions between the sexes: female Ho increased 3.8% (p-value < 0.001, Cohen’s D = -8.7) and male Ho decreased 2.9% (p-value < 0.001, Cohen’s D = 1.9). This opposite effect in male and female Ho translated into the disappearance of the significant (but misleading) difference between male and female Ho (p-value < 0.001) after the removal of sex-linked loci from the YTH dataset (p-value = 0.1; Table 5). There were no significant differences in Ho between the sexes in EYR before or after removing sex-linked loci.
Genetic structure. Before the removal of sex-linked loci, PC1 explained 2.4% of the genetic variation in EYR, and divided the individuals into two groups (Crusoe-Timor and Muckleford-Wombat; Figure 6a). PC2, on the other hand, explained 1.6% of variation and captured genetic structure due to the presence of sex-linked loci: it divided the individuals into males and females (Figure 6b). This division between male and females disappeared from PC2 after removing sex-linked loci (Figure 6c, d). For YTH, none of PC1, PC2, PC3 or PC4 showed sex genetic structure, before or after using function filter.sex.linked(Figure S4).
Accuracy of parentage analyses. For EYR, before removing sex-linked loci, an average of 3.83 runs out of five identified the correct parent. After removing sex-linked loci, the average increased significantly to 4.26 (p-value = 0.003; Table 6). We also found a significant association between the removal of sex-linked loci and the number of correct final parentage assignments (χ2 = 4.8, df = 1, p-value = 0.03): before removing sex-linked loci, 91 out of 119 (76.5%) final assignments were correct, compared to 104 (87.4%) correct final assignments after removing sex-linked loci. For YTH (cassidix ), we found that removing sex-linked loci did not significantly change the average number of runs that correctly identified parents, which started with the high average of 4.9 runs (Table 6).
Minimum number of known-sex individuals forfilter.sex.linked function
For EYR, 24 known-sex individuals (12 females and 12 males) were the minimum with which it was still possible to identify sex-linked loci:filter.sex.linked identified 267 loci which represented 7% of the total sex-linked loci (Figure 7a, Table S1). For YTH, 30 known-sex individuals (15 females and 15 males) were the minimum:filter.sex.linked identified 61 loci which represented 1.8% of the total sex-linked loci in the full dataset (Figure 7b, Table S1). With fewer known-sex individuals the function was unable to identify any sex-linked loci.
For EYR, filter.sex.linked function identified, on average, only 7.2% (range = 6.6-7.9%) of all sex-linked loci for the five subsets of 24 known-sex individuals (91.5% of all W-linked loci, 0% of all sex-biased, 0.1% of all Z-linked and 32.6% of all gametologs). For YTH, filter.sex.linked function identified only 1.9% (range = 1.8-2.0%) of all sex-linked loci for the five subsets of 30 known-sex individuals (99.3% of all W-linked loci, 0.1% of all sex-biased, 0% of all Z-linked and 8.6% of all gametologs). These retrieved sex-linked loci allowed infer.sex to correctly identify the sexes of all individuals which it assigned as ‘M’ or ‘F’ (cf. marked as ‘*M’ or ‘*F’; 587 EYR and 519 YTH; the same individuals for the five sets). Using the new 587 EYR and 519 YTH assignments to re-run filter.sex.linkedidentified 100% of all sex-linked loci for both EYR and YTH (3,807 and 3,414 sex-linked loci, respectively). It is likely that functionfilter.sex.linked was able to identify sex-linked loci with fewer known-sex EYR individuals than YTH individuals because EYR has larger sex chromosomes (i.e., it has neo-sex chromosomes in which a portion of chromosome 1A got fused to the Z chromosome while the other portion got fused to the W chromosome; Gan et al. 2019). We recommend the use of at least 15 males and 15 females to allow the identification of all sex-linked loci, although a larger number might be needed for species with shorter, less differentiated or less variable sex chromosomes.