For species of conservation concern, individuals selected for reference genomes may not be the most representative in that some are chosen to showcase individuals known to the public (e.g., animal ambassadors) or are the result of opportunistic sampling. Regardless, in most cases, it is unlikely that a single individual accurately represents the genomic variation within the species as a whole, even in highly inbred species (Gao et al., 2019; McHale et al., 2012). The primary advantage of a pangenomic approach is that multiple alternative alleles are assessed while read mapping, effectively removing reference bias (Paten, Novak, Eizenga, & Garrison, 2017). In other words, sequence reads are not precluded from mapping if they are not represented in the primary reference (e.g. insertions, Figure 3). The promise of pangenomes is exemplified in crop species such as soy, wheat and Brassica spp. where many more SVs are resolved, leading to a paradigm shift in crop improvement (Della Coletta et al., 2021; Golicz et al., 2016; Liu et al., 2020; Montenegro et al., 2017; Song et al., 2020). However, the ability of conservation programs to establish pangenomes largely depends on the availability of existing genomic resources and the ability to secure substantive funding. In lieu of a conventional pangenome, whereby all genomic variation is captured, the targeted sequencing and assembly of multiple representative individuals to construct genome graphs may provide significant improvements over the use of a single reference genome of comparable assembly quality (Figure 2). For a related approach see Tigano et al., (2021).

SV discovery and genotyping with short-read sequence data

Moderate coverage (e.g., ~10x) short-read sequencing is becoming the predominant approach in conservation genomics (Cam et al., 2020; Galla et al., 2020; Lado et al., 2020; Lew et al., 2015; Lujan, Weir, Noonan, Lovejoy, & Mandrak, 2020; Oyler-McCance, Cornman, Jones, & Fike, 2015, p.; Robinson et al., 2016). Moderate coverage datasets are generally cost effective and appropriate for characterizing SNPs using a single reference genome, whereas a minimum of 30x coverage is often recommended for de novo SV discovery and genotyping (Ahn et al., 2009; Kosugi et al., 2019; Sims, Sudbery, Ilott, Heger, & Ponting, 2014; Wang et al., 2008). This is because at low coverage it is challenging to determine whether a detected variant is an artefact of sequencing/mapping error or a ‘true’ variant (Figure 3). For example, mapping errors may occur when the relative size of a SV spans a large portion or the entire length of a read, or in the case of a complex rearrangement, prevents mapping altogether (Sedlazeck et al., 2018; Yi & Ju, 2018). In addition, read insert size (400 - 600 bp) may bias the discovery of insertions and deletions in some pipelines (Kosugi et al., 2019). By increasing average read depth to 30x, the likelihood of distinguishing between sequencing error, read mapping errors and true genomic variation increases as well. However, due to the increased costs, such high coverage short-read datasets may be challenging to generate for many species of conservation concern. The question then becomes how best to fully utilize moderate coverage short-read datasets to investigate SVs. Recent studies demonstrate that an average read depth of 10x is sufficient for population-scale comparisons, provided sampling is representative and a high-quality reference genome is available (Collins et al., 2020; Du et al., 2021; Zhou et al., 2019). Regardless of coverage, it is important to note that genotypes called in short-read SV discovery programs are prone to high error rates (Chander, Gibbs, & Sedlazeck, 2019). One strategy to alleviate this is to conduct SV discovery across all individuals to establish a SV call set and then use dedicated programs for SV genotyping (e.g., BayesTyper, SVTyper, Paragraph)(Chander et al., 2019). In addition, SV genotyping programs have variable performance across a range of SV sizes and types. As such, many programs are unable to genotype across the whole range of SVs that may have been called during discovery. A final consideration is that genotyping programs are still an area of active development. Although they may alleviate some of the false discovery rates prevalent in SV discovery programs, they do not provide a definitive solution (Chander et al., 2019). As a result, putative SVs identified using short-read data alone should be treated as preliminary. One method that may aid in removing false variants is a trio-binning approach, whereby SVs that fail to abide by Mendelian inheritance patterns (either due to challenges with accurate genotyping or incorrect variant calls) are removed from the SV call set (e.g., Patel et al., 2014; Pilipenko et al., 2014). This is particularly relevant for intensively managed threatened species, as pedigree relationships are often known, and suitable trios can be identified for sequencing. Alternatively, if sufficient resources are available, the sensitivity of long-read sequence data may be leveraged multiple reference genomes approach for SV discovery and may facilitate the use of lower coverage short-read resequencing data for SV genotyping, but this largely remains untested. Target-capture methods for genes or regulatory regions of interest can also provide an affordable alternative to WGS for species with large genomes and projects with tight budgets (Andermann et al., 2020). For example, vonHoldt et al., (2017) used a targeted-sequencing approach combined with PCR validation to characterize a region on chromosome 6 under positive selection in domestic dog breeds, which significantly reduced sequence coverage requirements.

SV discovery and genotyping beyond short-read sequence data

When characterizing genomic features, especially SVs, there are many sequencing platforms and approaches to choose from, and although they may perform well when addressing specific challenges, each has its own caveats (Table 1). Two providers prominently feature in long-read sequencing: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). Since the launch of these technologies in 2011 and 2014, the long-read sequencing space has been characterized by fast-paced progress and innovation as demonstrated by the first telomere to telomere assembly of the human X chromosome achieved with ultra-long-read sequencing (Miga et al., 2020). The precise error rates between these two technologies remain somewhat contentious (Dohm, Peters, Stralis-Pavese, & Himmelbauer, 2020; Lang et al., 2020 preprint), but as a general rule, ONT currently provides longer average read lengths than PacBio overall (Logsdon, Vollger, & Eichler, 2020) at the cost of higher sequence error rates. Further, when PacBio sequencing providers are overseas, ONT offers the advantage that data can be generated in-house, which may, for example, be more responsive to the needs and aspirations of Indigenous Peoples (Collier-Robinson, Rayne, Rupene, Thoms, & Steeves, 2019; Galla et al., 2016) and/or alleviate the need for export permits. Despite the challenges, the power of long-read sequencing technologies to span a significant portion, if not the entire length, of complex regions of the genome in a single read provides a powerful tool for SV discovery and population-level genotyping. When used in conjunction with a high-quality, well annotated reference genome, this improves confidence in read mapping across the genome (Amarasinghe et al., 2020 for review), and substantially increases precision (the proportion of variant calls that are ‘true’) and recall (the proportion of ‘true’ SVs detected) rates for both SNPs and SVs (Wenger et al., 2019). In addition, platforms that directly sequence native DNA remove the amplification bias common in many short-read sequencing approaches (Depledge et al., 2019). Furthermore, there are emerging ‘adaptive’ sequencing approaches that have the potential to selectively sequence specific regions of the genome (Payne et al., 2020 preprint).
Structural variants significantly alter genome topology and impact the gene regulatory landscape (Sadowski et al., 2019; Shanta et al., 2020). In light of these impacts, the hierarchical organization of DNA within the nucleus is of particular interest when investigating the relationship of transcriptional regulation mechanisms. Chromatin conformation capture (3C) based sequencing approaches enable the investigation of the organization of chromatin across genomes (Kong & Zhang, 2019 for review) and have identified the chromatin signature in gene expression (Lieberman-Aiden et al., 2009; Lupiáñez et al., 2015; Shanta et al., 2020). In addition, there are emerging advancements in Nanopore sequencing methods to integrate chromatin conformation capture with long-read sequencing (i.e., Pore-C; Ulahannan et al., 2019 preprint). Rather than the amplification bias introduced by preparing a short read library, long-read sequencing provides data on chromatin at a range of distances along the linear genome and enables contacts to be sequenced without amplification. This information is particularly beneficial for ‘non-model’ species where gene annotations, regulatory element annotations and gene regulatory networks are un- or poorly- characterized, leading to challenges in predicting the impacts of SVs on gene expression. For example, 3C-based approaches combined with SV detection and analysis have revealed the molecular basis for human developmental diseases previously diagnosed through karyotypes (Melo et al., 2020) and resolved the mechanisms underlying reproductive disorders in goats (Guang-Xin et al., 2021). 
Optical mapping approaches are a useful complement to long-read sequencing approaches, and have enhanced genome assembly outcomes by providing insights into the ‘big picture’ of large-scale genomic variants (as per Weissensteiner et al., 2020). Optical mapping utilizes a technique based on light-microscopy to identify specific sequence motifs (such as restriction enzyme cut sites), which are then used to generate images of fluorescently-labeled DNA molecules (Schwartz et al., 1993), enabling the characterization of large, complex rearrangements missed by long-reads alone (Yuan, Chung, & Chan, 2020). On average, optical maps span ~225 kb, providing information on the physical distance and relationship among genomic features. Besides being used to improve the scaffolding of genome assemblies (Howe & Wood, 2015; Zhang, 2015), including those of endangered species (Rhie et al., 2021), optical mapping methods directly enable the identification of both intraspecific and interspecific SVs (Levy-Sakin et al., 2019; Zhihai et al., 2016). The primary current commercial provider of optical mapping technology is Bionano Genomics and their Saphyr instrument, which uses a nano-channel microfluidic chip to linearise and capture images of fluorescently-labeled ultra-long DNA fragments to generate optical maps at a resolution of 500bp (Yuan et al., 2020). While optical maps provide information on the physical topology of chromosomes, they do not provide sequence information on an allele. Because long-reads and optical maps complement each other, the ideal data set for SV discovery would include both data types (e.g., Soto et al., 2020; Weissensteiner et al., 2020).