Advances in sequencing technologies and declining costs are increasing the accessibility of large-scale biodiversity genomic datasets. To maximise the impact of these data, a careful, considered approach to data management is essential. However, challenges associated with the management of such datasets remain, exacerbated by uncertainty among the research community as to what constitutes best practices. As an interdisciplinary team with diverse data management experience, we recognise the growing need for guidance on comprehensive data management practices that minimise the risks of data loss, maximise efficiency for stand-alone projects, enhance opportunities for data reuse, facilitate Indigenous data sovereignty and uphold the FAIR and CARE Guiding Principles. Here, we describe four fictional personas reflecting user experiences with data management to identify data management challenges across the biodiversity genomics research ecosystem. We then use these personas to demonstrate realistic considerations, compromises, and actions for biodiversity genomic data management. We also launch the Biodiversity Genomics Data Management Hub (https://genomicsaotearoa.github.io/data-management-resources/), containing tips, tricks and resources to support biodiversity genomics researchers, especially those new to data management, in their journey towards best practice. The Hub also provides an opportunity for those biodiversity researchers whose expertise lies beyond genomics and are keen to advance their data management journey. We aim to support the biodiversity genomics community in embedding data management throughout the research lifecycle to maximise research impact and outcomes.
There is growing interest in the role of structural variants (SVs) as drivers of local adaptation and speciation. From a biodiversity genomics perspective, the characterisation of genome-wide SVs provides an exciting opportunity to complement single nucleotide polymorphisms (SNPs). However, little is known about the impacts of SV discovery and genotyping strategies on the characterisation of genome-wide SV diversity within and among populations. Here, we explore a near whole-species resequence dataset, and long-read sequence data for a subset of highly represented individuals in the critically endangered kākāpō (Strigops habroptilus). We demonstrate that even when using a highly contiguous reference genome, different discovery and genotyping strategies can significantly impact the type, size and location of SVs characterised genome-wide. Further, we found that the mean number of SVs in each of two kākāpō lineages differed both within and across generations. These combined results suggest that genome-wide characterisation of SVs remains challenging at the population-scale. We are optimistic that increased accessibility to long-read sequencing and advancements in bioinformatic approaches including multi-reference approaches like genome graphs will alleviate at least some of the challenges associated with resolving SV characteristics below the species level. In the meantime, we address caveats, highlight considerations, and provide recommendations for the characterization of genome-wide SVs in biodiversity genomic research.
Over the past 50 years conservation genetics has developed a substantive toolbox to inform species management. One of the most long-standing tools available to manage genetics - the pedigree - has been widely used to characterize diversity and maximize evolutionary potential in threatened populations. Now, with the ability to use high throughput sequencing (HTS) to estimate relatedness, inbreeding, and genome-wide functional diversity, some have asked whether it is warranted for conservation biologists to continue collecting and collating pedigrees for species management. In this perspective, we argue that pedigrees remain a relevant tool, and when combined with genomic data, create an invaluable resource for conservation genomic management. Genomic data can address pedigree pitfalls (e.g., founder relatedness, missing data, uncertainty), and in return robust pedigrees allow for more nuanced research design, including well-informed sampling strategies and quantitative analyses (e.g., heritability, linkage) to better inform genomic inquiry. We further contend that building and maintaining pedigrees provides an opportunity to strengthen trusted relationships among conservation researchers, practitioners, Indigenous Peoples, and local communities. Keywords: conservation genomics, quantitative genetics, pedigree, kinship,ex situ , in situ
Structural variants (SVs) are large rearrangements (> 50 bp) within the genome that impact the form and structure of chromosomes. As a result, SVs are a significant source of functional genomic diversity, i.e. variation at genomic regions underpinning phenotype differences, that can have large effects on individual and population fitness. While there are increasing opportunities to investigate functional genomic diversity in threatened species via single nucleotide polymorphism (SNP) datasets, SVs remain understudied despite their potential influence on complex traits of conservation interest. In this future-focused Opinion, we contend that characterizing SVs offers the conservation genomics community an exciting opportunity to complement SNP-based approaches to enhance species recovery. We identify three critical resources to characterize SVs de novo: 1) High-quality, contiguous, annotated reference genome(s); 2) Whole genome resequence data from representative individuals of the target species/populations; and 3) Well-curated metadata including pedigrees. We also leverage the existing literature–predominantly in human health, agriculture and eco-evol biology–to identify pangenomic approaches for readily characterizing SVs and consider how integrating these into the conservation genomics toolbox may transform the way we intensively manage some of the world’s most threatened species.