Statistical analyses
Non–metric multidimensional scaling (nMDS) using the Jaccard coefficient was conducted on both morphological and molecular diatom data sets (here, we ran separate analyses based on species– and genus–level resolution data). We used the R package vegan (Oksanen et al., 2019) to compute the Jaccard dissimilarity matrix and run the nMDS routines. Since the Jaccard index is sensitive to sample size, and since metabarcoding data often produce samples of heterogeneous sizes (Ohlmann et al., 2018), we partitioned the Jaccard dissimilarity matrix into its true turnover component (Baselga, 2010) with the package betapart (Baselga et al., 2018). For the sake of comparison, we also ran these and the following analyses using the spatial turnover component of β-diversity.
In order to explore the overall degree of correspondence between the ordinal results obtained for morphological and molecular data sets, Procrustean rotation analysis and the subsequent PROcrustean randomization TEST (PROTEST) were applied (Gower, 1971, Jackson, 1995). PROTEST generates from a permutation–based procedure a correlation–like statistic referred to as “correlation in a symmetric Procrustean rotation” or \(m_{1-2}\) and a \(p\) value that measures the significance of the concordance established by the Procrustean superimposition approach (Jackson, 1995). In addition, we related the direction of the movement between the base and the end of the procrustean arrow and the length of these arrows to the distribution of the morphology– and molecular–based diatom assemblages to identify the taxa that contributed most to the dissimilarity between ordinations (García-Girón, Fernández-Aláez, Fernández-Aláez, & Luis, 2018a; García-Girón, Fernández-Aláez, & Fernández-Aláez, 2018b).
Each of the resulting dissimilarity matrices (i.e. based on species– and genus–level resolution data separately, and considering the Jaccard coefficient and its true turnover component) was used in a number of statistical tests. First, we used the BIOENV analysis (Clarke and Ainsworth, 1993) to produce “the best” environmental distance matrix and identify associations between environmental features and community composition. In brief, this method is based on standardized environmental variables and tests all combinations of environmental features, providing information as to which combination shows the strongest correlation between compositional variation and environmental distance matrix. Next, we ran a number of Mantel tests and partial Mantel tests (Mantel, 1967; Legendre and Legendre, 2012) to examine the relationships between community dissimilarities and environmental or spatial distances, i.e. we tested for distance decay in community similarity along environmental or spatial gradients. The matrices of spatial distances contained pairwise geographical distances calculated from latitude and longitude, whereas “the best” subset of standardized environmental variables from the BIOENV were used to compute the environmental matrices based on the Euclidean distances between study sites. Mantel tests and BIOENV analysis were based on Spearman’s rank correlation coefficient and 999 permutations for obtaining values.
We used Mantel correlograms (Oden and Sokal, 1986) to test if pairwise dissimilarities were spatially autocorrelated within each distance class. The distance classes were determined by Sturge’s rule (Legendre and Legendre, 2012) and values were based on 199 permutation with Holm correction for multiple testing (Holm, 1979).
Finally, we ran distance–based redundancy analysis (db–RDA) on each community dissimilarity matrix to examine community–environment relationships in more detail (Legendre and Anderson, 1999). This method builds on redundancy analysis, but can be based on any dissimilarity or distance matrix (Legendre and Legendre, 2012). To retain comparability among BIOENV, Mantel tests and db–RDA, we used the environmental variables selected by the BIOENV routine as the predictor variables in the db–RDAs. Using db–RDA, we tested for the amount of variation explained (\(R^{2}\)), overall significance of the ordination solutions, and the marginal significance of each environmental variable included in the model.
All analyses were performed in R version 3.6.0 (R Development Core Team 2018).