2.7 Phylogenetic and gene family evolutionary analyses
The single-copy orthologs from all involved species were statistically analyzed using the longest transcripts for each gene. The single-copy orthologous genes shared by the above 11 species (including C. fluminea ) were aligned using MUSCLE (version 3.8.31) (Edgar, 2004). The super-alignment of nucleotide sequences provided a reference tree topology using PhyML (version 3.3) (Guindon et al., 2010). The divergence times among species were roughly estimated by the MCMCTree program of the PAML package (version 4.7a) (Yang, 2007) with the approximate likelihood calculation method. We utilized molecular clock data from the TimeTree (http://www.timetree.org/) (Kumar, Stecher, Suleski, & Hedges, 2017) database as the calibration times.
According to divergence times and phylogenetic relationships, CAFÉ (version 4.2) (De Bie, Cristianini, Demuth, & Hahn, 2006) was used to analyze gene family evolution. The gene family expansion and contraction were analyzed by comparing the differences between the ancestor and involved species. The extended family genes for C. fluminea were extracted and aligned to the functional enrichment on GO and KEGG to detect their functions.