2.6 Gene family analysis
The protein sequences of 11 species in Diptera were selected for
phylogenetic analysis, including 6 species of family Chironomidae
(C. striatipennis , Chironomus riparius, Chironomus tentans,
Clunio marinus , Polypedilum vanderplanki, Propsilocerus
akamusi ), 3 species of family Culicidae (Anopheles gambiae,
Anopheles sinensis, Culex quinquefasciatus ), one species of family
Drosophilidae (Drosophila melanogaster ) and one species of family
Muscidae (Musca domestica ) (Supplementary Table 4, 7). For
further analysis, the script was used to extract the longest transcript
of each gene; Orthofinder v2.2.6 was employed to identify gene family
clusters (Emms & Kelly, 2015, 2019).
Multiple sequence alignment of single copy gene families generated from
Orthofinder were performed to infer phylogeny of above 11 species by
Mafft v7.407 (Rozewicki et al., 2019). Protein-aligned sequences were
translated into coding sequences (CDS) and further optimized by Gblocks
0.91b (Castresana, 2000). The optimization results were connected into
super gene and put into IQTREE v1.5.5 to construct phylogenetic tree
(Nguyen et al., 2015). The divergence time was estimated by MCMCTREE in
PAML package. The standard divergence time was obtained by Timetree
(Yang, 1997). Based on the results of gene family clustering and
phylogenetic tree, expansion and contraction of gene families were
inferred. The significance of each expanded and contracted gene family
was evaluated by CAFÉ v4.2 (De Bie et al., 2006). The KEGG annotation of
gene families was performed using the same method as gene function
annotation. Homologous gene pairs in the sequence were sought by BLAST
(Kent, 2002). Colinear regions were recognized by McscanX (Wang et al.,
2012).