3.3 Gene prediction and annotation
Total 13429 protein-coding genes were obtained through de novo and
homologous prediction. The genome assembly completeness assessed by
BUSCO was up to 98.7% (Tab 1), which means the genome annotation forC. striatipennis is reasonably complete.
Combined with the results of other studies, the length of gene and the
average length of introns in C. striatipennis like other species
in subfamily Chironominae were much higher than those of subfamily
Orthocladinae (Supplementary Table 8)(Kutsenko et al., 2014; Vicoso &
Bachtrog, 2015). The Protein coding part of the genome in C.
striatipennis was smaller than those in other subfamily Orthocladinae
(Supplementary Table 8). It is worth noting that the introns of C.
striatipennis was concentrated at the 5 un-translated regions (5′ UTR).
These introns usually contain transcriptional regulatory elements and
they can reduce gene mutations (Rethmeier et al., 1997; Rose, 2004; Rose
& Beliakoff, 2000). In the future, the role of introns in regulating
gene expression and reducing gene mutations should be further explored.
Besides this, 105 ribosomal RNAs, 31 small nuclear RNAs, 34 microRNAs
and 203 transfer RNAs were identified in C. striatipennis(Supplementary Table 9).
The repeated sequences comprised 23.29 % of whole genome in C.
striatipennis which was much higher than other known chironomid genomes
(Supplementary Table 8). Interspersed repeat is up to 78.3% and makes
up a key component of repeated sequence. Moreover, the number of
repeated sequences predicted by de novo was far more than that obtained
based on RepBaseMaskerEdition-2018.10.26 database. It means thatC. striatipennis has many special repeats sequence. The content
of long terminal repeated (LTR) predicted by de novo makes up 21.14% of
the repeat sequences of C. striatipennis (Supplementary Table
10-11)(Cornette et al., 2017; Kaiser et al., 2016; Kelley et al., 2014;
Kutsenko et al., 2014; Shaikhutdinov & Gusev, 2022; Sun et al., 2021;
Vicoso & Bachtrog, 2015).
A total of 12717 (94.70%) predicted genes were functionally annotated,
among which 12245 (91.18%) genes were annotated with diamond based on
Nr and 8292 (61.75%) genes were annotated with diamond based on
SwissProt databases. 5579 (41.54%) genes were annotated through Kaas.
The number of annotated genes searched by InterProScan based on Pfam, Go
and InterPro databases were 1369 (10.19%), 3079 (22.93%) and 12135
(90.36%), respectively (Supplementary Table
12).
3.4 Genome evolution
2446 single-copy gene families shared by C. striatipennis and
other 10 species in Diptera were used to construct phylogenetic tree and
estimate divergence time (Supplementary Figure 4), which
species-specific orthogroups were identified in a Venn diagram (Figure
4-b, Supplementary Table 6). Phylogenetic analysis shows that C.
striatipennis is closely related to C. riparius and C.
tentans , they form a clade located at the base of Chironomidae. The
Chironomidae divergence was estimated to have occurred about 152.9 Mya,
Orthocladiinae diverged from Chironominae about 71.7 Mya, the divergence
of C. striatipennis occurred about 13.7 ~ 43.6
Mya (Figure 4-a). This indicates that Chironomus belongs to a
relatively young group in Chironomidae.
The analysis of expansion and contraction of orthologoups gene families
show that the clad of Chironomidae had 28 gene families expanded and 18
gene families contracted. 105 gene families in C. striatipenniswere expanded, by contrast, 256 in C. rapirius , 99 in C.
tentans , 247 in P. vanderplanki , 53 in C. marinus and 25
in P. akamusi . Meanwhile, 70 gene families in C.
striatipennis were contracted, by contrast, 80 in C. rapirius ,
185 in C. tentans , 84 in C. marinus and 14 in P.
akamusi (Supplementary Figure 5). In C. striatipennis , the
expanded gene families were significantly involved in growth,
development and defensive metabolism such as larva molting, damage
repair and inflammatory immunity (Supplementary Figure 6), while the
contracted gene families were mainly involved in energy metabolism
(Supplementary Figure 7). These findings suggest that the expanded and
contracted gene families may be closely related to the adaptive
evolution in C. striatipennis. It is similar to Belgica
antarctica and P. akamusi , that they can resist the adverse
environment by expanded heat shock proteins (HSPs) (Sun, X et al., 2021;
Kozeretska, I et al., 2022). Chironomid midge with same life history asC. striatipennis may also expand similar gene families to adapt
to the external environment (Cornette et al., 2017; Kaiser et al., 2016;
Kelley et al., 2014; Kutsenko et al., 2014; Shaikhutdinov & Gusev,
2022; Sun et al., 2021; Vicoso & Bachtrog, 2015).
In order to recognize homologous proteins sequences in the genome,
BLASTP and McscanX were employed to analyze the protein sequences ofC. striatipennis , P. akamusi and P. vanderplanki to
obtain their collinear gene pairs. There were 7162 collinear gene pairs
between C. striatipennis and P. akamusi , 12438 collinear
gene pairs between C. striatipennis and P. vanderplanki ,
270 collinear gene pairs between C. striatipennis and itself
(Supplementary Table 13). Like flies, Chironomid midge also has very low
self-collinearity at the chromosome level. However, there is a close
correspondence between several chironomid species with chromosome level
genome assembly. The possible reason for that is Chironomid midges maybe
abandon some unnecessary genes in the process of evolution.