Ab initio gene prediction and comparative genome
analysis
BRAKER2 (Camacho et al., 2009; Hoff et al., 2016; Hoff et al., 2019;
Lomsadze et al., 2014; Stanke et al., 2006; Stanke et al., 2008)
software predicted 16,702 protein-coding genes in the genome assembly ofS. ricini (Table 1). InterProScan (Finn et al., 2017) analysis
shows that Reverse transcriptase domain (IPR 000477) and Integrase
catalytic core domain (IPR001584) are the top 2 populous domains amongS. ricini genes (Fig. S3), which may reflect that the large
proportion of S. ricini genome is occupied by retrotransposable
elements (SINE, LINE, LTR in Fig. 2).
The circos plot which links single copy orthologs among B. moriand S. ricini shows large scale rearrangement of chromosomes,
such as translocation and chromosome fusion, happened in the ancestor ofS. ricini (Fig. 3A) (Cheong et al., 2015; Krzywinski et al.,
2009). However, despite frequent chromosomal rearrangements, genomic
regions of no links or extremely few links were hardly observed in the
plot, suggesting that almost entire regions of S. ricini andB. mori genomes are reciprocally corresponding and there barely
exist ‘species-specific’ regions.
The number of orthogroups (OGs) among 5 Lepidoptera species is shown in
Fig. 3B, and a phylogenetic tree of 3,907 single copy orthologs explains
the genetic relationships among the 5 species (Fig. 3C). Ortholog
analysis using OrthoFinder (Emms, & Kelly, 2015) identified 205S. ricini -specific OGs (Fig. 3B). Of 205 S.
ricini -specific OGs, forty-six OGs are related to retrotransposable
elements (Fig. 2). Thus, S. ricini specific non-retrotransposon
related OGs were 159. Of these OGs, two OGs (OG0000113 and OG0000131)
are consist of 33 and 30 chorion protein genes, respectively. TheseS. ricini specific chorion genes are located in close proximity
on chromosome 1 as a gene cluster, which can be the ground of the high
apparent duplication rate through tandem duplication or gene conversion.
In addition to the above-mentioned 63 S. ricini -specificchorion genes, 17 chorion genes were found in this
cluster. Table S6 summarized all 80 chorion genes present inS. ricini genome. A phylogenetic analysis of these genes along
with chorion genes of B. mori, P. xylostella, P. xuthus and D.
plexippus suggests that gene duplication could have resulted in
diversification of chorion proteins because chorion genes from OG0000113
and OG0000131 fell into distinct clades (Fig. 3D).
Chorion proteins comprise eggshell and protect embryos from the
environment, suggesting that chorion proteins are likely to evolve to
reflect adaptations to the environment (Lecanidou, Rodakis, Eickbush, &
Kafatos, 1986; Papantonis, Swevers, & Iatrou, 2015; Rodakis, &
Kafatos, 1982). Based on sequence homology, chorion proteins can be
categorized into two groups (α and β), which include three subfamilies,
respectively (Lecanidou et al, 1986; Papantonis et al., 2015). Among the
three subfamilies, high-cysteine (Hc) chorion is considered to play an
important role for embryonic diapause, because Hc chorion proteins
increase hardness of eggshells for embryos to survive diapause in the
winter (Rodakis, & Kafatos, 1982). Interestingly, according to the
BLAST search and phylogenetic analysis, Hc chorion protein genes seemed
to be absent in the S. ricini genome (Fig. 3D, Table S6 and Table
S9). Given that S. ricini is a non-diapause species, it is highly
plausible that S. ricini lacks Hc chorion genes.