Introduction
Accurate sex determination is an integral aspect of conservation genomics research, particularly when studying parameters such as relatedness, dispersal, and philopatry. Sexing of individuals used in conservation genomics studies typically takes place in the field at the time of collection. However, sex assignments recorded in the field are not always reliable and there is a wide margin for human error, particularly for species that do not demonstrate sexual dimorphism or when researchers are working in difficult conditions. Further, field records can easily be lost or incorrectly transcribed during trapping and monitoring. Genetic sex determination is a favourable alternative or complement to field identification, as it is an objective, highly standardized, and accurate approach that eliminates the possibility of upstream sex misidentification confounding genomic studies (Hrovatin & Kunej, 2017).
While PCR-based sex identification methods have been used for several decades to identify and amplify sex chromosomes in individual samples (Akane et al., 1992; Clapcote & Roder, 2005; McFarlane et al., 2013), such processes can be time consuming and expensive. In addition, they require taxon-specific primers that are not always available or applicable to the target species. With the advent of high-throughput sequencing (HTS) technology it is now possible to produce high-resolution genomic data that may allow researchers to determine the sex of sequenced individuals bioinformatically. For example, single nucleotide polymorphisms (SNPs) in the genome can often be linked to the sex chromosomes in model organisms, allowing sex to be determined on chromosomal presence-absence basis (Fowler & Buonaccorsi, 2016; Lambert et al., 2016). For non-model organisms where a well-assembled and well-annotated reference genome is unavailable, the overall “dosage” of sequencing reads mapping to the sex chromosomes can be assessed to determine whether the individual is heterogametic or homogametic (Bover et al., 2018; Gamble, 2016; Gower et al., 2019; Pečnerová et al., 2017).
Read-dosage-based approaches to sex determination have only been applied using shotgun sequencing data, where molecules are randomly sampled and sequenced (Flamingh et al., 2020; Motahari et al., 2013; Skoglund et al., 2013). However, many conservation programs employ reduced-representation sequencing approaches (e.g. RADseq), where sequenced molecules belong to a subset of genomic loci. One commercial provider of reduced-representation sequencing that is growing in popularity in the conservation genomics field is Diversity Arrays Technology (DArT) (Cummins et al., 2019; Ewart et al., 2019; Pazmiño et al., 2018; Sansaloni et al., 2011; Schultz et al., 2018; van Deventer et al., 2020). The DArT workflow uses restriction enzymes to reduce genomic complexity, allowing identification of informative markers that are subsequently sequenced for all submitted samples (Kilian et al., 2012). However, despite the growing popularity of DArT for conservation genomics projects, no simple and widely applicable sex-determination framework has emerged that can be applied to DArT data. In the present study we apply a read-dosage sex-determination approach to DArT data from an Australian rodent, the Greater Stick-Nest Rat (Leporillus conditor ), and demonstrate that - despite being originally designed for application to shotgun data - this method remains robust when applied to FASTQ files generated as part of the DArT workflow.