Genome assembly, polishing, and draft quality checks
Whole genome DNA was obtained from two male Tawny owls, where DNA
extraction (from nucleated red blood cells) library preparation and
whole-genome sequencing was outsourced to BGI. Sequencing consisted of
PacBio’s circular library construction with ends linked to SMRT
adapters. Read polishing at this stage included removal of SMRT adapters
and clustering of redundant subreads sequenced from the same circular
molecule into single reads of insert (ROI). Genome assembly was
performed with flye (Kolmogorov, Yuan, Lin, & Pevzner, 2019).Flye uses a repeat graph as a core data structure as opposed to
the most commonly utilized De Bruijn graphs in short-read and hybrid
assemblies. Repeat graphs do not require exact k-mer matches as
those are built with approximate sequence matches – to tolerate high
noise of single-molecule sequencing reads such as PacBio. Flyemajor parameters were set to default overlap of 5000 base pairs (bp)
between reads, while enforcing a minimum reduced coverage for initial
disjointing assembly of 20x – reads with 20x or more were utilized to
initiate the process. In order to explore how enforcing overlaps change
the assembly quality, we performed one assembly with forced minimum
overlap to 1000bp between reads. Lastly, we replicated each assembly to
check consistency of the algorithm and variance of assembly statistics.
Despite flye having a built-in polishing step, we further
utilized PacBio´s polishing pipeline gcpp and pbmm2(https://github.com/PacificBiosciences/pbbioconda). All assemblies were
compared with quast (Gurevich, Saveliev, Vyahhi, & Tesler, 2013)
where we chose the most contiguous, complete and with higher coverage as
a future reference genome. Taxa specific completeness of the chosen
draft assembly was verified with busco utilizing aves_odb as
database of coding regions while also utilizing the northern spotted owl
(Strix occidentalis
caurina ), burrow owl (Athene cunicularia ) and barn owl
(Tyto alba ) assemblies as a term of comparison (Simão,
Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015). Repetitive
elements were identified and masked with RepeatMasker version
4.1.2-p1 and utilizing the HMM-Dfam_3.3 database updated in November
2020 (Chen, 2004). Genome versions utilized in this analysis can be
consulted in the supplemental information document.