2.3. Sequence Alignment and Phylogenetic Analysis
Nucleotide sequences were edited, assembled and aligned using the Muscle plugin (Edgar, 2004) within the program Codon Code Aligner (v. 5.1.5, Codon Code Corporation), primers sequences used for amplification were excluded from the analysis, COI sequences were translated into amino acid sequences based on the Ascidian mitochondrial code (translation table 13) to further improve sequencing quality and screen for frameshift mutations and stop codons.
Genetic polymorphism analysis was run for each population calculating the number of haplotypes (Nh), haplotype diversity (h) and nucleotide diversity (π) using DnaSP version 5.10.01 (Librado & Rozas, 2009). Sequences of 18S rDNA were phased with the PHASE v2.1.1 algorithm (Stephens & Donnelly, 2003; Stephens, Smith, & Donnelly, 2001) in DnaSP using default parameters. Pair-wise FST among all populations and AMOVA were calculated using ARLEQUIN v. 3.5.2.2 (Excoffier, Laval, & Schneider, 2005). The significance of the variance components and pairwise FST values were assessed by a permutation test with 10,000 replicates. To test Isolation by distance in C. verrucosa populations, Mantel test with 1000 permutations was performed using IBD Macintosh application v. 1.52 (Bohonak, 2002). Scatter plot of geographical distance and genetic distance was performed in R. The genetic distances among populations were expressed as FST pairwise differences. The geographical distances between populations were represented by the shortest coastline distance.
Species delimitation was carried out using the online version of Automatic Barcode Gap Discovery, ABGD (http://wwwabi.snv.jussieu.fr/public/abgd/) using Kimura p-distance. ABGD delimits a “barcode gap” in the distribution of pairwise differences (Puillandre, Lambert, Brouillet, & Achaz, 2012). The haplotype network was created with the Haploviewer application (available at www.cibiv.at/~greg/haploviewer), based on multiple alignments of the sequences and on a neighbour joining tree that was constructed using the software MEGA 7.0.21 (Kumar, Stecher, Li, Knyaz, & Tamura, 2018) using 500 bootstrap replicates and p-distance.
For phylogenetic reconstruction, the best fitting model was determined with the software jModeltest 2.1.9 v20160115, using Phyml v.3.0 with 88 candidate models. The best-fit model for COI was HKY85+G+ I, and for 18S the best-fit model was HKY85+G+I, these substitutions models were applied in Maximum Likelihood (ML) and Bayesian Inference (BI) analysis. ML analysis was run using PhyML 3.0 (Guindon et al., 2010) using 1000 bootstrap replicates for both markers independently. BI analysis was run using sing Markov Chain Monte Carlo (MCMC) simulations in Mr. Bayes 3.2 (Ronquist et al., 2012), sampling every 100 generations, samples of the substitution model parameters were checked whether the likelihoods reaching stationarity, and weather the standard deviation of split frequencies was below 0.05. Mitochondrial COI reached stationarity with a total of 500000 MCMC generations (split=0.04), while 18S with a total of 200000 MCMC generations (split=0.02). The remaining trees sampled were used to infer Bayesian posterior probabilities (BPP) at the nodes and produce the consensus tree.
In order to estimate divergence time since speciation, the BEAST 1.8.0 software package was used to analyse COI sequences (Drummond, Suchard, Xie, & Rambaut, 2012). First, xml files were generated using BEAUti to execute them in BEAST. Data from other marine invertebrates were used as a proxy since due to lack of adequate fossil records, no calibrated mutation rates for ascidians for COI exist in the bibliography. Nydam and Harrison (2011) estimated from data based on other marine invertebrate taxa (crabs, shrimp, urchins), a mutation rate range of 0.016 to 0.026 substitutions per site / million years. Two independent analyses were run, a first one using strict clock model with a substitution rate of 0.016 substitutions per site / million years (107 generations), and a second one at 0.026 substitutions per site / million years (107generations), for both a burn-in of 20 % was applied. The tree prior was set to Yule speciation. The GTR+G substitution model was used. The xml files were then executed in BEAST. Results were analysed using Tracer v1.6.0 to check the convergence to a stationary distribution of parameters.