Ann McCartney

and 7 more

We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.

Tom Oosting

and 3 more

1) The more demanding requirements of DNA preservation for genomic research can be difficult to meet when field conditions limit the methodological approaches that can be used, or cause samples to be stored in suboptimal conditions. Such limitations may increase rates of DNA degradation, potentially rendering samples unusable for applications such as genome-wide sequencing. Nonetheless, little is known about the impact of suboptimal sampling conditions. 2) We evaluated the performance of two widely used preservation solutions (1. DESS: 20% DMSO, 0.25M EDTA, NaCl saturated solution, and 2. ethanol) under a range of storage conditions over a three-month period (sampling at 1 day, 1 week, 2 weeks, 1 month, and 3 months) to provide practical guidelines for DNA preservation. DNA degradation was quantified as the reduction in average DNA fragment size over time (DNA fragmentation) because the size distribution of DNA segments plays a key role in generating genomic datasets. Tissues were collected from a marine teleost species, the Australasian snapper, Chrysophrys auratus. 3) We found that the storage solution has a dramatic effect on DNA preservation. In DESS, DNA was only moderately degraded after three months of storage while DNA stored in ethanol showed high levels of DNA degradation already within 24 hours, making samples unsuitable for next-generation-sequencing. 4) We recommend DESS as the most promising solution to improve DNA preservation. These results provide practical and economical advice to improve DNA preservation when sampling for genome-wide applications. Keywords: DMSO, DNA preservation, ethanol, fish, next-generation-sequencing, NGS, snapper