2.3 | Genome assembly and quality control
To obtain high-quality genome, WGS data were used for genome survey analysis to estimate essential genome information, including genome size, level of heterozygosity, and repeat content. We used GCE pipeline(Liu et al., 2013) to estimate the genome size of B. schroederi before genome assembly. We generated a total of 29.32 Gb (about 110×) raw PacBio long reads. First, we used Blastall (v2.2.26)(Camacho et al., 2009) to compare the raw data with the NCBI database to confirm that the DNA of the sequenced samples was not contaminated by other species. Then we counted the length frequency distribution of TGS reads to evaluate the sequencing quality and provide a reference for the setting of subsequent assembly parameters.
The B. schroederi genome was assembled using a “correct-then-assemble” strategy. First, NextDenovo (v2.0-beta.1;https://github.com/Nextomics/NextDenovo) was used to correct and assemble a draft genome. Arrow algorithm was then used to carry out a second round of correction for this assembly. NextPolish (v1.0.5)(J. Hu, Fan, Sun, & Liu, 2020) was further used for genome polishing by using the WGS data. We then obtained a primary genome assembly with 639 contigs with N50 of 1.27 Mb. To finally ligate the scaffolds to chromosomes, Hi-C technology(Lieberman-Aiden et al., 2009) was used to capture the chromosome conformations. 105Gb (~400 X) Hi-C sequencing data were generated from a single Hi-C library which was constructed as previously described. All Hi-C reads were first mapped against our assembled genome using BWA (v0.7.13-r1126)(H. Li & Durbin, 2010) with parameters “bwa mem -t 16 -k 19 -a -V ”. HiC-Pro pipeline(Servant et al., 2015) was then used for filtering the mapping result, leaving 645 million valid read pairs. Next, Juicer v1.5.7 software was used for auxiliary assembly: 1) both duplicates and near-duplicates are removed; 2) read pairs that aligned to three or more locations are set aside(Durand, Shamim, et al., 2016). Then, 3D-DNA, a custom computational pipeline, was applied for correcting misassembles, anchor, order and orient fragments of DNA(Dudchenko et al., 2017). Finally, files generated from the 3D-DNA were loaded into the visual software Juicebox Assembly Tools module v1.11.08 for correction and review(Dudchenko et al., 2018; Durand, Robinson, et al., 2016). Contigs from the B. schroederi were successfully clustered into 21 groups, which were further ordered and oriented into Pseudochromosomes (Figure S2). The completeness of the genome were evaluated using sets of BUSCO with genome mode and lineage data from nematode and eukaryote, respectively(Simão, Waterhouse, Panagiotis, Kriventseva, & Zdobnov, 2015).