3.3 Chromosome construction by Hi-C
More than 571.60 million read pairs (73.20%) of total Hi-C data were mapped to the initial genomic assembly. We utilized a total of 116.65 million valid interaction pairs for Hi-C scaffolding; invalid interaction pairs, including reads of dangling end pairs, re-ligation pairs, self-cycle pairs, and dumped pairs, were filtered out (Supporting Information Table S1).
The initial assembled contigs were broken and reassembled using unique mapped read pairs for Hi-C. The areas that could be restored as candidate areas had already been corrected. After Hi-C assembly and manual adjustment, we obtained 4,728 corrected contigs (Table 2) assigned to 18 pseudochromosomes. The final assembly presented a high-quality Asian Clam genome that reached 1.52 Gb in length and was characterized by a contig N50 of 521.06 Kb and a scaffold N50 of 70.62 Mb (Table 2). There were 1.51 Gb of genomic sequences accounting for 99.17% of total contig sequences on 18 chromosomes comprising 4,621 contigs (97.74%) (Figure 3). Additionally, 1.40 Gb (92.81%) of genomic sequences were anchored with a defined order and orientation in a Hi-C interaction heat map (Supporting Information Table S2). The scaffolding process for the Asian Clam genome showed a high level of efficiency (genomic sequences more than 99%, contigs more than 97%) and deserved to be considered as a high-quality and chromosomal-level genome.
TABLE 2 Statistics and characteristics of the genome for Corbicula fluminea