2.2 Genome estimation and de novo assembly
Initially, Illumina data were filtered and corrected by Fastp (version 0.19.3) (Chen, Zhou, Chen, & Gu, 2018), followed by applying the data to estimate the genomic features. Beforehand, Illumina reads were randomly selected and aligned to the Nucleotide Sequence Database (NT) using BLAST (version 2.2.31) (Altschul, Gish, Miller, Myers, & Lipman, 1990) with the parameter of E-value = 1e-05 for confirming whether contamination existed. In this study, we plotted the 21-mer depth distribution (k =21) to estimate the genome size, heterozygosity, and repeats using Jellyfish (version 2) (Marçais, & Kingsford, 2011). Genome size estimation was implemented by the formula G = N21-mer (total number of k-mers) /D 21-mer (k-mer depth of the main peak). Repetitive sequences were accumulated from where the depth of a k-mer was more than two times that of the main peak, and heterozygous sequences were estimated at where the depth was half of the main peak.
Using the long single molecular reads from PacBio, the pipelines of workflow were as follows in the genome assemblies. First, the clean data from PacBio were subjected to error correction using Canu (version 1.5) (Koren et al., 2017) with the parameter of error correct coverage = 60. Subsequently, the outputs were piped into the workflow of SMARTdenovo (version 1.0) (Schmidt, Vogel, & Denton, 2017), and the genomic contigs were automatically generated with the parameters of J=5000, A=1000, and r=0.95. Finally, the preliminary assembly was polished three times by Racon (version 1.32) (Vaser, Sović, Nagaranjan, & Šikić, 2017), resulting in the first correction being successfully realized. Recognizing the relatively high error rate of the third-generation sequencing platform, Illumina reads specifically for genome estimation had been prepared for the second correction. This was implemented by Pilon (version 1.22) (Walker et al., 2014), and the error correction was again run three times. Each of the tools used for genome assembly was well-founded for the assembly process of C. fluminea .