2.3 | Genome assembly and quality control
To obtain high-quality genome, WGS data were used for genome survey
analysis to estimate essential genome information, including genome
size, level of heterozygosity, and repeat content. We used GCE
pipeline(Liu et al., 2013) to estimate the genome size of B.
schroederi before genome assembly. We generated a total of 29.32 Gb
(about 110×) raw PacBio long reads. First, we used Blastall
(v2.2.26)(Camacho et al., 2009) to compare the raw data with the NCBI
database to confirm that the DNA of the sequenced samples was not
contaminated by other species. Then we counted the length frequency
distribution of TGS reads to evaluate the sequencing quality and provide
a reference for the setting of subsequent assembly parameters.
The B. schroederi genome was assembled using a
“correct-then-assemble” strategy. First, NextDenovo (v2.0-beta.1;https://github.com/Nextomics/NextDenovo)
was used to correct and assemble a draft genome. Arrow algorithm was
then used to carry out a second round of correction for this assembly.
NextPolish (v1.0.5)(J. Hu, Fan, Sun, & Liu, 2020) was further used for
genome polishing by using the WGS data. We then obtained a primary
genome assembly with 639 contigs with N50 of 1.27 Mb. To finally ligate
the scaffolds to chromosomes, Hi-C technology(Lieberman-Aiden et al.,
2009) was used to capture the chromosome conformations. 105Gb
(~400 X) Hi-C sequencing data were generated from a
single Hi-C library which was constructed as previously described. All
Hi-C reads were first mapped against our assembled genome using BWA
(v0.7.13-r1126)(H. Li & Durbin, 2010) with parameters “bwa mem -t 16
-k 19 -a -V ”. HiC-Pro pipeline(Servant et al., 2015) was then used for
filtering the mapping result, leaving 645 million valid read pairs.
Next, Juicer v1.5.7 software was used for auxiliary assembly: 1) both
duplicates and near-duplicates are removed; 2) read pairs that aligned
to three or more locations are set aside(Durand, Shamim, et al., 2016).
Then, 3D-DNA, a custom computational pipeline, was applied for
correcting misassembles, anchor, order and orient fragments of
DNA(Dudchenko et al., 2017). Finally, files generated from the 3D-DNA
were loaded into the visual software Juicebox Assembly Tools module
v1.11.08 for correction and review(Dudchenko et al., 2018; Durand,
Robinson, et al., 2016). Contigs from the B. schroederi were
successfully clustered into 21 groups, which were further ordered and
oriented into Pseudochromosomes (Figure S2).
The completeness of the genome
were evaluated using sets of BUSCO with genome mode and lineage data
from nematode and eukaryote, respectively(Simão, Waterhouse, Panagiotis,
Kriventseva, & Zdobnov, 2015).