2.4 De novo assembly and polishing of the genome
PacBio long subreads were originally corrected with Canu v1.6 (Korenet al. 2017). The genome assembly was performed on WTDBG v1.2.8
using the error-corrected reads. The PacBio Subreads were subsequently
mapped back to the raw contigs by Blasr v5.1, and contigs were further
polished in Arrow v2.1.0 (Chin et al. 2013). Due to a high error
ratio of PacBio raw long reads, Illumina short reads were mapped back to
the improved contigs and further polished by Pilon v1.20 (Walkeret al. 2014). In addition, we applied the GC depth analysis to
evaluate whether potential contamination remained during sequencing and
the coverage of the assembly. The analysis showed that an average GC
content of the genome was 33.23% and a single-peaked distribution cure
(Additional file 1: Figures S2 and S3). Combining the GC depth analysis
with the sequencing depth of the genome indicated that there was no
contamination from other species (Additional file 1: Figure S4).