3.1 Species identification
C. striatipennis Kieffer had ever been confusedly identified asC. kiiensis Tokunaga or C. strenzkei Fittkau (Lacerda et al., 2014). It widely distributes in China and was always under the name of C. kiiensis Tokunaga. In present study, the divergences of the cytochrome oxidase subunit 1 (COI) gene between this study and Martin’s released were 0.46% by quantifying the genetic distance matrix (Supplementary Table 2). This genetic distance divergence is within the range of genetic distance among most species (Hebert et al., 2003). Therefore, the laboratory colony of chironomid used in present study is inseparable from C . striatipennis named by Martin on morphology and DNA barcodes, so we follow Martin’s identification and name it C. striatipennis Kieffer ( Amora et al., 2015).
3.2Genome sequencing and characteristics
The estimated genome size of C. striatipennis was about 170.79 Mb, the heterozygosity rate was about 1.13%, the repeat sequence part of genome was about 23.63% and with karyotype of 2N=2X=8, as determined through K-mer analysis (Supplementary Figure 1). These characteristies indicated that C. striatipennis has a highly heterozygous complex genome.
In order to obtain a better assembly, Oxford Nanepore Technologies sequencing data was used to the preliminary genome draft assembly and then using the illumine data to polish the preliminary genome draft. BUSCO assessment indicated that the completeness of the gene set of assembled genome draft was 95.0%, which signified the genome assembly of C. striatipennis was complete and suit for further anchoring sequences to chromosome analysis (Supplementary Figure 2, Supplementary Table 3). Basing on 22.92 Gb clean reads from Hi-C library, 78 scaffolds including 4 pseudomolecules which represented 4 chromosomes and 74 detritus were assembled. The lengths of 4 pseudochromosomes ranged from 20.40 Mb to 60.13 Mb with a scaffold N50 value of 64.51 Mb (Figure 3). In addition, about 179.77 Mb contigs were mapped into 4 pseudochromosomes with an anchoring rate of 98.86 % (Supplementary Table 5). The chromatin interaction data suggest that our Hi-C assembly is of high quality (Figure 2). We used BUSCO to identify 95.0% (3119/3285), 98.7% (1349/1367) and 98.1% (936/954) conserved genes ofC. striatipennis by alignment to corresponding database of Diptera, Insecta and Metazoa (Supplementary Figure 3). The above results are compared with other genomes assembled in Chironomid, it can be concluded that the genome assembly of C. striatipennis was more high-quality and complete (Tab.1).
C. striatipennis is the first species with chromosome-level genome assembly in genus Chironomus . The genome size of C. striatipennis is similar to other species in the genusChironomus (C. riparius, C.tentans, C. tepperi(236 Mb)), but it is much larger than that of species in subfamily Orthocladiinae (C. marinus , Belgica antarctica (99 Mb) and P. akamusi ) and in genus Polypedilum ( P. vanderplanki ,P. pembai (122 Mb)) (Tab.1) (Kaiser et al., 2016; Kelley et al., 2014; Sun et al., 2021). The above results are also consistent with those of Cronette et al. (2015). In conclusion, it can be inferred the genome size of genus Chironomus is larger than that of subfamily Orthocladiinae and genus Polypedilum .
Table 1. Genome statistics and comparisons among chironomid species whose genome has been sequenced