Natalie Forsdick

and 8 more

Advances in sequencing technologies and declining costs are increasing the accessibility of large-scale biodiversity genomic datasets. To maximise the impact of these data, a careful, considered approach to data management is essential. However, challenges associated with the management of such datasets remain, exacerbated by uncertainty among the research community as to what constitutes best practices. As an interdisciplinary team with diverse data management experience, we recognise the growing need for guidance on comprehensive data management practices that minimise the risks of data loss, maximise efficiency for stand-alone projects, enhance opportunities for data reuse, facilitate Indigenous data sovereignty and uphold the FAIR and CARE Guiding Principles. Here, we describe four fictional personas reflecting user experiences with data management to identify data management challenges across the biodiversity genomics research ecosystem. We then use these personas to demonstrate realistic considerations, compromises, and actions for biodiversity genomic data management. We also launch the Biodiversity Genomics Data Management Hub (, containing tips, tricks and resources to support biodiversity genomics researchers, especially those new to data management, in their journey towards best practice. The Hub also provides an opportunity for those biodiversity researchers whose expertise lies beyond genomics and are keen to advance their data management journey. We aim to support the biodiversity genomics community in embedding data management throughout the research lifecycle to maximise research impact and outcomes.

Jana Wold

and 4 more

There is growing interest in the role of structural variants (SVs) as drivers of local adaptation and speciation. From a biodiversity genomics perspective, the characterisation of genome-wide SVs provides an exciting opportunity to complement single nucleotide polymorphisms (SNPs). However, little is known about the impacts of SV discovery and genotyping strategies on the characterisation of genome-wide SV diversity within and among populations. Here, we explore a near whole-species resequence dataset, and long-read sequence data for a subset of highly represented individuals in the critically endangered kākāpō (Strigops habroptilus). We demonstrate that even when using a highly contiguous reference genome, different discovery and genotyping strategies can significantly impact the type, size and location of SVs characterised genome-wide. Further, we found that the mean number of SVs in each of two kākāpō lineages differed both within and across generations. These combined results suggest that genome-wide characterisation of SVs remains challenging at the population-scale. We are optimistic that increased accessibility to long-read sequencing and advancements in bioinformatic approaches including multi-reference approaches like genome graphs will alleviate at least some of the challenges associated with resolving SV characteristics below the species level. In the meantime, we address caveats, highlight considerations, and provide recommendations for the characterization of genome-wide SVs in biodiversity genomic research.