Estimation of the TE insertion time
The insertion time (T) was calculated using the adapted T = K/2r formula
(Li, Guo et al. 2019), where K is the
sequence
divergence, and r is the substitution rate.
The
formula for calculate sequence divergence is K=-300/4*Ln(1-D*4/300),
where D is converted from the
mutation rate d (divergence) of each site in the file with suffix of
‘.align’ as one of the RepeatMasker results. The formula is D=d/100.
To
estimate
the nucleotide substitution rate, 4-fold degenerate sites were extracted
from single-copy orthologs of 17 species in our previous report (Xiao,
Wei et al. 2021). Further, taking the known tree topology as input
parameters, we used the phyloFit program in the PHAST
(http://compgen.cshl.edu/phast/)
package to estimate the branch length of the input evolutionary tree.
The root-to-tip branch length was calculated by the TreeStat tool
(http://tree.bio.ed.ac.uk/software/treestat/).
The
substitution rate was calculated using the root-to-tip branch length
divided by the divergence time (485 Mya) of D. pulex on the
phylogenetic tree in the report (Xiao, Wei et al. 2021).
Enrichmentanalysis
GO
and KEGG enrichment analysis were carried out using the plug-in ClueGO
in Cytoscape software.
Except
for ‘show only Pathways with pV< 0.1’, parameters were
default.
Based
on the genes shared between terms, term-term interrelation and
functional groups were defined using kappa score. The leading group term
used the most significant term in the merged group.
When
performing a separate KEGG enrichment analysis, adjusted the ‘Advanced
Term/Pathway Selection Option’ parameter so that the number of all
merged group is less than 40.