2.2 Genome estimation and de novo assembly
Initially, Illumina data were filtered and corrected by Fastp (version
0.19.3) (Chen, Zhou, Chen, & Gu, 2018), followed by applying the data
to estimate the genomic features. Beforehand, Illumina reads were
randomly selected and aligned to the Nucleotide Sequence Database (NT)
using BLAST (version 2.2.31) (Altschul, Gish, Miller, Myers, & Lipman,
1990) with the parameter of E-value = 1e-05 for
confirming whether contamination existed. In this study, we plotted the
21-mer depth distribution (k =21) to estimate the genome size,
heterozygosity, and repeats using Jellyfish (version 2) (Marçais, &
Kingsford, 2011). Genome size estimation was implemented by the formula
G = N21-mer (total number of k-mers) /D 21-mer (k-mer depth of the main
peak). Repetitive sequences were accumulated from where the depth of a
k-mer was more than two times that of the main peak, and heterozygous
sequences were estimated at where the depth was half of the main peak.
Using the long single molecular reads from PacBio, the pipelines of
workflow were as follows in the genome assemblies. First, the clean data
from PacBio were subjected to error correction using Canu (version 1.5)
(Koren et al., 2017) with the parameter of error correct coverage = 60.
Subsequently, the outputs were piped into the workflow of SMARTdenovo
(version 1.0) (Schmidt, Vogel, & Denton, 2017), and the genomic contigs
were automatically generated with the parameters of J=5000, A=1000, and
r=0.95. Finally, the preliminary assembly was polished three times by
Racon (version 1.32) (Vaser, Sović, Nagaranjan, & Šikić, 2017),
resulting in the first correction being successfully realized.
Recognizing the relatively high error rate of the third-generation
sequencing platform, Illumina reads specifically for genome estimation
had been prepared for the second correction. This was implemented by
Pilon (version 1.22) (Walker et al., 2014), and the error correction was
again run three times. Each of the tools used for genome assembly was
well-founded for the assembly process of C. fluminea .