Data analysis
Our general modeling framework included site category as a fixed effect (HP source, LP native, old introduction, and recent introduction) and site as a random effect nested within drainage (Fig. 1). We assigned drainages to reflect paired HP to LP translocations. For example, although the El Cedro is a tributary of the Guanapo River, we assigned the two El Cedro sites their own drainage because the translocation occurred within that river. Turure was assigned to its own drainage (as opposed to its HP source site Guanapo) because these two sites are not physically connected. We used this model structure to test for differences among all univariate response variables (stream characteristics, fish traits, gut content) with linear mixed-effects models (R package lmer, Bates et al. 2015), after transforming response variables to meet assumptions of normality, comparing null and fixed-effect models using a likelihood ratio test.
Before testing statistical differences among gut microbial communities, we performed standard pre-processing steps (Table S1). MiSeq lane did not significantly affect the composition of our control samples (PerMANOVA p<0.1). We assessed statistical differences in community structure across site categories using permutational analysis of variance (PerMANOVA) on the unweighted (presence-absence) and weighted (incorporates abundance) Unifrac distance matrices using phyloseq (McMurdie & Holmes 2013) and vegan (Oksanen et al.2017) in R (R Core Team 2017). Permutations were robust to the order of factors, which included gut position, stream treatment, and site, and were constrained by drainage as a random effect. We also tested the effect of site category on posterior gut microbiomes (all sites), in LP sites alone (LP natives and introductions), and within each drainage, adjusting p-values using Bonferroni correction.
We quantified the variance in community composition explained by environmental variables (stream temperature, which correlated to stream pH, r2=0.42, p<0.001), gut morphology, and gut content using distance-based redundancy analysis (function dbRDA in vegan, Legendre and Anderson 1999). DbRDA regresses predictor variables on a sample dissimilarity matrix and tests their significance for explaining community distances. Our model included drainage as a conditional factor. We partitioned the variance each predictor variable explained using multivariate variance partitioning (function varpart, vegan). Compared to the commonly-used envfit function, dbRDA and varpart partition variance from the raw distance matrix, rather than correlating variables with reduced data, which predisposes predictors to explain a high degree of variance. Gut content analysis (which also included gut length measurements) was performed on a subset of fish distinct from the subset for which gut microbiome was sequenced. Therefore, diet was described at the population level rather than paired to microbiome measurements. To account for this, we used population-level means for all predictor variables. This did not affect conclusions. Since the microbiome from HP source populations was so distinct from other site categories, we repeated all variance partitioning and dbRDA without HP source to examine factors controlling mirobiome in LP environments.
We used the distance matrices to calculate mean distance between a source population and its respective introductions (and the LP native site from the corresponding drainage). Betadispersion, or the compositional variation of the microbiome among a group of individuals, was calculated using betadisper in vegan (R Core Team 2017). In addition to betadispersion, we quantified microbiome richness, Shannon-Weiner diversity (Pielou 1975), and Pielou’s Evenness (Pielou 1975) on rarefied data. We assessed differences using the univariate linear mixed effects models described above.