3.3 Chromosome construction by Hi-C
More than 571.60 million read pairs (73.20%) of total Hi-C data were
mapped to the initial genomic assembly. We utilized a total of 116.65
million valid interaction pairs for Hi-C scaffolding; invalid
interaction pairs, including reads of dangling end pairs, re-ligation
pairs, self-cycle pairs, and dumped pairs, were filtered out (Supporting
Information Table S1).
The initial assembled contigs were broken and reassembled using unique
mapped read pairs for Hi-C. The areas that could be restored as
candidate areas had already been corrected. After Hi-C assembly and
manual adjustment, we obtained 4,728 corrected contigs (Table 2)
assigned to 18 pseudochromosomes. The final assembly presented a
high-quality Asian Clam genome that reached 1.52 Gb in length and was
characterized by a contig N50 of 521.06 Kb and a scaffold N50 of 70.62
Mb (Table 2). There were 1.51 Gb of genomic sequences accounting for
99.17% of total contig sequences on 18 chromosomes comprising 4,621
contigs (97.74%) (Figure 3). Additionally, 1.40 Gb (92.81%) of genomic
sequences were anchored with a defined order and orientation in a Hi-C
interaction heat map (Supporting Information Table S2). The scaffolding
process for the Asian Clam genome showed a high level of efficiency
(genomic sequences more than 99%, contigs more than 97%) and deserved
to be considered as a high-quality and chromosomal-level genome.
TABLE 2 Statistics and characteristics of the genome
for Corbicula fluminea