Ab initio gene prediction and comparative genome analysis
BRAKER2 (Camacho et al., 2009; Hoff et al., 2016; Hoff et al., 2019; Lomsadze et al., 2014; Stanke et al., 2006; Stanke et al., 2008) software predicted 16,702 protein-coding genes in the genome assembly ofS. ricini (Table 1). InterProScan (Finn et al., 2017) analysis shows that Reverse transcriptase domain (IPR 000477) and Integrase catalytic core domain (IPR001584) are the top 2 populous domains amongS. ricini genes (Fig. S3), which may reflect that the large proportion of S. ricini genome is occupied by retrotransposable elements (SINE, LINE, LTR in Fig. 2). 
The circos plot which links single copy orthologs among B. moriand S. ricini shows large scale rearrangement of chromosomes, such as translocation and chromosome fusion, happened in the ancestor ofS. ricini (Fig. 3A) (Cheong et al., 2015; Krzywinski et al., 2009). However, despite frequent chromosomal rearrangements, genomic regions of no links or extremely few links were hardly observed in the plot, suggesting that almost entire regions of S. ricini andB. mori genomes are reciprocally corresponding and there barely exist ‘species-specific’ regions.
The number of orthogroups (OGs) among 5 Lepidoptera species is shown in Fig. 3B, and a phylogenetic tree of 3,907 single copy orthologs explains the genetic relationships among the 5 species (Fig. 3C). Ortholog analysis using OrthoFinder (Emms, & Kelly, 2015) identified 205S. ricini -specific OGs (Fig. 3B). Of 205 S. ricini -specific OGs, forty-six OGs are related to retrotransposable elements (Fig. 2). Thus, S. ricini specific non-retrotransposon related OGs were 159. Of these OGs, two OGs (OG0000113 and OG0000131) are consist of 33 and 30 chorion protein genes, respectively. TheseS. ricini specific chorion genes are located in close proximity on chromosome 1 as a gene cluster, which can be the ground of the high apparent duplication rate through tandem duplication or gene conversion. In addition to the above-mentioned 63 S. ricini -specificchorion genes, 17 chorion genes were found in this cluster. Table S6 summarized all 80 chorion genes present inS. ricini genome. A phylogenetic analysis of these genes along with chorion genes of B. mori, P. xylostella, P. xuthus and D. plexippus suggests that gene duplication could have resulted in diversification of chorion proteins because chorion genes from OG0000113 and OG0000131 fell into distinct clades (Fig. 3D).
Chorion proteins comprise eggshell and protect embryos from the environment, suggesting that chorion proteins are likely to evolve to reflect adaptations to the environment (Lecanidou, Rodakis, Eickbush, & Kafatos, 1986; Papantonis, Swevers, & Iatrou, 2015; Rodakis, & Kafatos, 1982). Based on sequence homology, chorion proteins can be categorized into two groups (α and β), which include three subfamilies, respectively (Lecanidou et al, 1986; Papantonis et al., 2015). Among the three subfamilies, high-cysteine (Hc) chorion is considered to play an important role for embryonic diapause, because Hc chorion proteins increase hardness of eggshells for embryos to survive diapause in the winter (Rodakis, & Kafatos, 1982). Interestingly, according to the BLAST search and phylogenetic analysis, Hc chorion protein genes seemed to be absent in the S. ricini genome (Fig. 3D, Table S6 and Table S9). Given that S. ricini is a non-diapause species, it is highly plausible that S. ricini lacks Hc chorion genes.