Results
Analyses
of TE composition in the genomes of 11 fig wasps
In
this study, 11 fig wasps including six pollinators and five
non-pollinators were
chose
to annotate and
analyze
genomic TEs. The basic information was listed in
supplementary
Table S1. The results showed that the numbers of TE families
varied
greatly among species.
Except
for the unknown TE families, there were significantly more TE families
in non-pollinators than that in pollinators. The numbers of TE families
in pollinators varied from 22 to 32, while in non-pollinators from 37 to
48
(Table
S2).
Among
them, 14 TE families
existed
in all fig wasps.
All
non-pollinators had their own unique TE
families.
Crypton-H,
Zator,
L1-Tx1, and
tRNA-Deu-RTE
were specific toSycobiasp.2; Merlin, Novosib, and ERVK were specific to Philotrypesis
tridentata ; Ginger, TcMar-ISRm11, and U were specific toApocrypta bakeri ; hAT-Blackjack, Caulimovirus, and Gypsy-Cigr
were specific toSycophaga
agraensis ; CMC-Chapaev and MIR were specific to Sycophila sp.2.
Not all pollinators had specific TE families. Kolobok, RTE-RTE, and Tad1
were unique to Eupristina koningsbergeri ; Crypton-V was unique toCeratosolen fusciceps ;
RNA-Deu
was unique to Wiebesia pumilae . On the contrary, some
TE
families were absent in all pollinators and some were absent in
partial
species.
For
examples, Crypton-I and
hAT-hATm
existed in all non-pollinators but was absent in all pollinators; while
RTE-X, TcMar, and R1-LOA were absent in Dolichoris vasculosae ,Kradibia gibbosae , andC.
fusciceps , respectively.
In
addition, RTE-BovB family was absent in non-pollinator S.
agraensis .
Furthermore,
the total lengths and the proportions in genomes of TEs were
analyzed.
The results showed the TE lengths in pollinators varied from 6.04 Mb to
38.54 Mb, which occupied 2.2% to 12.1% space in genomes.
In
non-pollinators, the shortest TE length was 24.86 Mb, and the longest
was 174.29 Mb.
TE
contents
ranged
from 12.5% to 30.4% (Table S2).
The
significant difference tests showed
that
the
total lengths and the proportions in genome of TEs in non-pollinators
were significantly higher than in
pollinators
(Mann-Whitney U test, P =
0.017*)
(Fig. 1c).
Although
there was no significant difference in genome size between both groups
(Mann-Whitney U test, P = 0.792),
there
was a significant positive correlation between the total TE
length
and
the genome
size
of each species (Spearman correlation, R2 = 0.645,P =
0.032*)
(Fig. 1b).
To
clarify which TE family types contributed the most to the genome size in
each species, the percentage of each classified TE family in the genomes
was calculated. The results showed that more than 50% of the TE
contents were derived from one to four types of TEs families (Table S2),
indicating that a few types of TE bursts caused a large accumulation of
TEs in the genome. The type of TE family with the highest content of
each species was not the same. For example, the TE family with the
highest content in Platyscapa corneri and E.
koningsbergeri was RTE-X, with a size of 5.0 Mb and 4.4 Mb,
respectively. In D. vasculosae , it was Jockey with the size of
1.9 Mb. It was CMC-EnSpm with the size as high as 24.1 Mb inSycobia sp.2. The most abundant TE family in other fig wasps was
Gypsy, and the sizes varied between 0.8 to 5.6 Mb. The TE families with
the highest content account for 22.16% to 53.35% of the classified
types in each species. Further comparison showed the length of the
highest content TE family varied between 0.7-5.0 Mb in pollinators,
while varied between 2.4-20.1 Mb in non-pollinators. There was a
significant difference between the two groups (Mann-Whitney U test,P = 0.003**). Therefore, a few types of TE bursts were the factor
that caused TE contents in non-pollinators significantly higher than
that in pollinators.
Burst
events detected by predicting the insertion time of TEs
Plotting
the total length of the classified TEs in the genome according to the
time, we obtained the TEs insertion time distribution map of each
species (Fig. 2).
The
burst peaks accumulated by different types of TEs in
a
specific time could be found in each species, and we defined that the
highest peak was the major burst peak, and other peaks were minor. It
was easy to find that the burst patterns of pollinators and
non-pollinators were
inconsistent.
For
instance, for the fig wasps living on the same host fig species,Ficus benjamina ,
one
pollinator E. koningsbergeri and three non-pollinators ofSycobia sp.2, P. tridentata , and Sycophilasp.2,
the
“n”-shaped
peak of pollinator was sharply contrasted to the “m”-shaped peaks of
non-pollinators (Fig.
S1),
which indicated that the peaks in the pollinators were more
concentrated, and in the non-pollinators were more scattered.
Based
on the time axis, the major burst peaks of pollinators were
distributed
in 13-28 Mya, and non-pollinators were in 6-38 Mya.
We further
compared
and analyzed the overlapping patterns of the burst peaks in both groups
of pollinators and non-pollinators.
For
three (Sycobia sp.2, P. tridentata , and Sycophilasp.2) of the five non-pollinators, the major burst peaks were
distributed in 32-33 Mya, while for five of the six pollinators, they
also had the peaks in
32-34
Mya, but only minor peaks.
The
burst peaks in 32-34 Mya were thus defined as the common TE peaks in fig
wasps.
In
the period of 13-28 Mya where the major burst peaks of pollinators were
located, non-pollinators also have minor peaks. Due to the long
time
span, they were not recognized as shared peaks. In addition, a large
number of TEs had been inserted recently (0-2 Mya), but no apparent
phenomenon occurred in
pollinators.
To
investigate the prevalence of the common burst peaks of fig wasps during
the 32-34 Mya period, five widespread species
(Nasonia
vitripennis , Apis mellifera , Drosophila melanogaster ,Acyrthosiphon pisum , and Daphnia pulex ) were selected to
analyze the TE burst
time.
It was found that the major burst peaks of these species were
distributed in 0-3 Mya, and only A. mellifera had a minor peak at
34 Mya (Fig. S2).
This
result suggested that the TE bursts remaining in the genome of these
widespread species were dominated by recent events, while the fig wasps
retained
obvious
traces
of ancient TE bursts.
Gene
functions near the burst TEs
The
periods of the common burst peaks
(32-34
Mya), the major burst peaks of pollinators (13-28 Mya), and the minor
burst peaks of non-pollinators (0-2 Mya) are highly overlapped with the
time-series
(0.0
to 4.0, 12.5 to 16.5, 20.5 to 24.5, and 31.0 to 35.0 Mya) representing
the intervals of the major continental ice sheet growth or decay. In the
consideration that the insertion or deletion of TEs may have a strong
impact on the expression of
nearby
genes,
we analyzed the functions of the genes within 1 Kb near insertion
positions of the TEs in that four time-series. The GO and KEGG function
enrichment analyses of these genes were carried out.
According
to the frequency of the enriched terms in the 11 fig wasps, the top
three terms of GO and KEGG in each period were listed (Table S3). In
KEGG
terms, Circadian entrainment pathway (map 04713) was significantly
enriched by multiple species in all four periods; Calcium signaling
pathway (map 04020) was enriched in three periods of 20.5 to 24.5, 12.5
to 16.5, and 0.0 to 4.0; Glutamatergic synapse (map 04724) was enriched
in the two periods of 31.0 to 35.0 and 0.0 to 4.0 Mya. In GO terms, DNA
integration (GO:0015074) and DNA metabolic process (GO:0006259) were
significantly enriched by multiple species in the two periods of 31.0 to
35.0 and 0.0 to 4.0 Mya; signal transduction (GO:0007165) and regulation
of cellular process (GO:0050794) were enriched in the two periods of
20.5 to 24.5 and 12.5 to 16.5 Mya. These results indicated that the
terms enriched in the four periods were highly similar, with the gene
functions mainly involved in pathways such as
rhythm,
signal transduction, and neural activity.
The
KEGG enrichment network was constructed for the genes within 1 Kb near
the insertion positions of TEs inserted during 31-35 Mya, when the
common peaks of all the fig wasp species appeared. The results showed 27
significant enriched terms in 11 fig wasps (Table S4).
Among
these terms, Circadian entrainment (map 04713) was the most enriched,
being enriched in six species, followed by Neuroactive ligand-receptor
interaction (map 04080) and Glutamatergic synapse (map 04724) that were
enriched in five and four species, respectively.
Based
on the genes shared among terms,
the
kappa statistics was further used to connect and merge terms into
functional
groups,
and the group would expand gradually with the increase of the terms they
merged. Within each group,
the
term with the most significant
statistical
threshold was defined as the leader.
The
results showed that in seven fig wasp species,
Circadian
entrainment pathway was the leading group term
in
the largest functional group of five species and in the third largest
functional group of the other two species.
In
addition, in the other two species, the functional group with Calcium
signaling pathway as the leading group term also contained Circadian
entrainment pathway (Fig. 3). It can thus be seen that TEs were inserted
in the vicinity of a large number of genes related to the Circadian
entrainment pathway during 31-35 Mya.