Estimation of the TE insertion time
The insertion time (T) was calculated using the adapted T = K/2r formula (Li, Guo et al. 2019), where K is the sequence divergence, and r is the substitution rate. The formula for calculate sequence divergence is K=-300/4*Ln(1-D*4/300), where D is converted from the mutation rate d (divergence) of each site in the file with suffix of ‘.align’ as one of the RepeatMasker results. The formula is D=d/100.
To estimate the nucleotide substitution rate, 4-fold degenerate sites were extracted from single-copy orthologs of 17 species in our previous report (Xiao, Wei et al. 2021). Further, taking the known tree topology as input parameters, we used the phyloFit program in the PHAST (http://compgen.cshl.edu/phast/) package to estimate the branch length of the input evolutionary tree. The root-to-tip branch length was calculated by the TreeStat tool (http://tree.bio.ed.ac.uk/software/treestat/). The substitution rate was calculated using the root-to-tip branch length divided by the divergence time (485 Mya) of D. pulex on the phylogenetic tree in the report (Xiao, Wei et al. 2021).
Enrichmentanalysis
GO and KEGG enrichment analysis were carried out using the plug-in ClueGO in Cytoscape software. Except for ‘show only Pathways with pV< 0.1’, parameters were default. Based on the genes shared between terms, term-term interrelation and functional groups were defined using kappa score. The leading group term used the most significant term in the merged group. When performing a separate KEGG enrichment analysis, adjusted the ‘Advanced Term/Pathway Selection Option’ parameter so that the number of all merged group is less than 40.