Results
Analyses of TE composition in the genomes of 11 fig wasps
In this study, 11 fig wasps including six pollinators and five non-pollinators were chose to annotate and analyze genomic TEs. The basic information was listed in supplementary Table S1. The results showed that the numbers of TE families varied greatly among species. Except for the unknown TE families, there were significantly more TE families in non-pollinators than that in pollinators. The numbers of TE families in pollinators varied from 22 to 32, while in non-pollinators from 37 to 48 (Table S2). Among them, 14 TE families existed in all fig wasps. All non-pollinators had their own unique TE families. Crypton-H, Zator, L1-Tx1, and tRNA-Deu-RTE were specific toSycobiasp.2; Merlin, Novosib, and ERVK were specific to Philotrypesis tridentata ; Ginger, TcMar-ISRm11, and U were specific toApocrypta bakeri ; hAT-Blackjack, Caulimovirus, and Gypsy-Cigr were specific toSycophaga agraensis ; CMC-Chapaev and MIR were specific to Sycophila sp.2. Not all pollinators had specific TE families. Kolobok, RTE-RTE, and Tad1 were unique to Eupristina koningsbergeri ; Crypton-V was unique toCeratosolen fusciceps ; RNA-Deu was unique to Wiebesia pumilae . On the contrary, some TE families were absent in all pollinators and some were absent in partial species. For examples, Crypton-I and hAT-hATm existed in all non-pollinators but was absent in all pollinators; while RTE-X, TcMar, and R1-LOA were absent in Dolichoris vasculosae ,Kradibia gibbosae , andC. fusciceps , respectively. In addition, RTE-BovB family was absent in non-pollinator S. agraensis .
Furthermore, the total lengths and the proportions in genomes of TEs were analyzed. The results showed the TE lengths in pollinators varied from 6.04 Mb to 38.54 Mb, which occupied 2.2% to 12.1% space in genomes. In non-pollinators, the shortest TE length was 24.86 Mb, and the longest was 174.29 Mb. TE contents ranged from 12.5% to 30.4% (Table S2). The significant difference tests showed that the total lengths and the proportions in genome of TEs in non-pollinators were significantly higher than in pollinators (Mann-Whitney U test, P = 0.017*) (Fig. 1c). Although there was no significant difference in genome size between both groups (Mann-Whitney U test, P = 0.792), there was a significant positive correlation between the total TE length and the genome size of each species (Spearman correlation, R2 = 0.645,P = 0.032*) (Fig. 1b).
To clarify which TE family types contributed the most to the genome size in each species, the percentage of each classified TE family in the genomes was calculated. The results showed that more than 50% of the TE contents were derived from one to four types of TEs families (Table S2), indicating that a few types of TE bursts caused a large accumulation of TEs in the genome. The type of TE family with the highest content of each species was not the same. For example, the TE family with the highest content in Platyscapa corneri and E. koningsbergeri was RTE-X, with a size of 5.0 Mb and 4.4 Mb, respectively. In D. vasculosae , it was Jockey with the size of 1.9 Mb. It was CMC-EnSpm with the size as high as 24.1 Mb inSycobia sp.2. The most abundant TE family in other fig wasps was Gypsy, and the sizes varied between 0.8 to 5.6 Mb. The TE families with the highest content account for 22.16% to 53.35% of the classified types in each species. Further comparison showed the length of the highest content TE family varied between 0.7-5.0 Mb in pollinators, while varied between 2.4-20.1 Mb in non-pollinators. There was a significant difference between the two groups (Mann-Whitney U test,P = 0.003**). Therefore, a few types of TE bursts were the factor that caused TE contents in non-pollinators significantly higher than that in pollinators.
Burst events detected by predicting the insertion time of TEs
Plotting the total length of the classified TEs in the genome according to the time, we obtained the TEs insertion time distribution map of each species (Fig. 2). The burst peaks accumulated by different types of TEs in a specific time could be found in each species, and we defined that the highest peak was the major burst peak, and other peaks were minor. It was easy to find that the burst patterns of pollinators and non-pollinators were inconsistent. For instance, for the fig wasps living on the same host fig species,Ficus benjamina , one pollinator E. koningsbergeri and three non-pollinators ofSycobia sp.2, P. tridentata , and Sycophilasp.2, the “n”-shaped peak of pollinator was sharply contrasted to the “m”-shaped peaks of non-pollinators (Fig. S1), which indicated that the peaks in the pollinators were more concentrated, and in the non-pollinators were more scattered. Based on the time axis, the major burst peaks of pollinators were distributed in 13-28 Mya, and non-pollinators were in 6-38 Mya.
We further compared and analyzed the overlapping patterns of the burst peaks in both groups of pollinators and non-pollinators. For three (Sycobia sp.2, P. tridentata , and Sycophilasp.2) of the five non-pollinators, the major burst peaks were distributed in 32-33 Mya, while for five of the six pollinators, they also had the peaks in 32-34 Mya, but only minor peaks. The burst peaks in 32-34 Mya were thus defined as the common TE peaks in fig wasps. In the period of 13-28 Mya where the major burst peaks of pollinators were located, non-pollinators also have minor peaks. Due to the long time span, they were not recognized as shared peaks. In addition, a large number of TEs had been inserted recently (0-2 Mya), but no apparent phenomenon occurred in pollinators.
To investigate the prevalence of the common burst peaks of fig wasps during the 32-34 Mya period, five widespread species (Nasonia vitripennis , Apis mellifera , Drosophila melanogaster ,Acyrthosiphon pisum , and Daphnia pulex ) were selected to analyze the TE burst time. It was found that the major burst peaks of these species were distributed in 0-3 Mya, and only A. mellifera had a minor peak at 34 Mya (Fig. S2). This result suggested that the TE bursts remaining in the genome of these widespread species were dominated by recent events, while the fig wasps retained obvious traces of ancient TE bursts.
Gene functions near the burst TEs
The periods of the common burst peaks (32-34 Mya), the major burst peaks of pollinators (13-28 Mya), and the minor burst peaks of non-pollinators (0-2 Mya) are highly overlapped with the time-series (0.0 to 4.0, 12.5 to 16.5, 20.5 to 24.5, and 31.0 to 35.0 Mya) representing the intervals of the major continental ice sheet growth or decay. In the consideration that the insertion or deletion of TEs may have a strong impact on the expression of nearby genes, we analyzed the functions of the genes within 1 Kb near insertion positions of the TEs in that four time-series. The GO and KEGG function enrichment analyses of these genes were carried out. According to the frequency of the enriched terms in the 11 fig wasps, the top three terms of GO and KEGG in each period were listed (Table S3). In KEGG terms, Circadian entrainment pathway (map 04713) was significantly enriched by multiple species in all four periods; Calcium signaling pathway (map 04020) was enriched in three periods of 20.5 to 24.5, 12.5 to 16.5, and 0.0 to 4.0; Glutamatergic synapse (map 04724) was enriched in the two periods of 31.0 to 35.0 and 0.0 to 4.0 Mya. In GO terms, DNA integration (GO:0015074) and DNA metabolic process (GO:0006259) were significantly enriched by multiple species in the two periods of 31.0 to 35.0 and 0.0 to 4.0 Mya; signal transduction (GO:0007165) and regulation of cellular process (GO:0050794) were enriched in the two periods of 20.5 to 24.5 and 12.5 to 16.5 Mya. These results indicated that the terms enriched in the four periods were highly similar, with the gene functions mainly involved in pathways such as rhythm, signal transduction, and neural activity.
The KEGG enrichment network was constructed for the genes within 1 Kb near the insertion positions of TEs inserted during 31-35 Mya, when the common peaks of all the fig wasp species appeared. The results showed 27 significant enriched terms in 11 fig wasps (Table S4). Among these terms, Circadian entrainment (map 04713) was the most enriched, being enriched in six species, followed by Neuroactive ligand-receptor interaction (map 04080) and Glutamatergic synapse (map 04724) that were enriched in five and four species, respectively. Based on the genes shared among terms, the kappa statistics was further used to connect and merge terms into functional groups, and the group would expand gradually with the increase of the terms they merged. Within each group, the term with the most significant statistical threshold was defined as the leader. The results showed that in seven fig wasp species, Circadian entrainment pathway was the leading group term in the largest functional group of five species and in the third largest functional group of the other two species. In addition, in the other two species, the functional group with Calcium signaling pathway as the leading group term also contained Circadian entrainment pathway (Fig. 3). It can thus be seen that TEs were inserted in the vicinity of a large number of genes related to the Circadian entrainment pathway during 31-35 Mya.