Identifying sex-linked loci using whole genome resequencing data
for Macquarie perch
Whole genome resequencing data (WGS) was obtained for 100 known-sex
Macquarie perch from Dartmouth and Yarra (25 individuals of each sex per
population, all of which were also DArT-sequenced). DNA was extracted
and an Illumina Novaseq library was constructed for each individual, as
explained above. Libraries were sequenced at Deakin Genome Centre to at
least 10X depth (>7 Gb of data per library; list of samples
and individual read-depth coverage in Supplementary Material S1).
First, we searched the Macquarie perch genome for loci with sex-specific
alleles (ii-iv above) using four in-silico pools:
Dartmouth females, Dartmouth males, Yarra females, and Yarra males (25
individuals of each sex per population). Each pool combined 25 million
paired-end reads randomly subsampled from each individual’s WGS data.
Reads from the four pools were mapped onto the newly-assembled Macquarie
perch reference genome (Table 1) using bwa-mem v0.7.17 (Li & Durbin,
2009) followed by filtering for mapping quality of >20
using samtools (Li et al., 2009) (samtools view -q 20). A
Cochran-Mantel-Haenszel (CMH) test implemented in PoPoolation2(Kofler, Pandey, & Schlötterer, 2011) was used to detect significant
and consistent allele-frequency differences (based on read counts)
between sexes in two biological replicates. After examining the genomic
spread of loci with sex-specific alleles detected under p<1e-5
significance threshold, we tested whether loci highly significantly
(p<1e-20) differentiated between sexes clustered in the same
genomic region.
One Macquarie perch scaffold contained a region enriched with loci
highly differentiated between sexes, and using individual WGS genotypes
we tested it for presence of XY-gametologs that could act as
sex-determining loci. WGS genotypes were obtained by aligning the
trimmed paired-end reads for each individual to the reference genome
using bwa-mem v0.7.17-r1188, followed by variant-calling on the merged
BAM files from all individuals, using strelka v2.9.10 (Kim et al.,
2018). Individual alignments for the putative sexing region in males
were visualized in Tablet (Milne et al., 2012) to investigate whether
the mate-pair architecture of the sequenced libraries resolved female-
and male-specific haplotypes (similar approach in Kijas et al., 2018).
We then investigated read-depth coverage for each base of the sexing
region, to test for sex-difference in ploidy. Read depth was collected
for each individual using the CollectAllelicCounts function of
GATK v4.1.0 (DePristo et al., 2011) from individual alignment files,
normalized by that individual’s genome-wide average read depth (the
number of all mapped reads multiplied by 151 bp (read length) and
divided by genome size), and averaged across the four sex-by-population
samples. We also calculated read depth for reference and alternative SNP
alleles of the sexing region separately.