16S amplicon sequencing of gut microbiomes
All extracted microbiome DNA samples were quantified with a Nanodrop ND-1000 spectrophotometer (range 4.2-474.9 ng/µl). All samples were then sent to the QB3 Vincent J. Coates Genomics Sequencing Laboratory at the University of California, Berkeley for automated library preparation and sequencing of 16S rRNA amplicons using an Illumina Mi-Seq v3 (600 cycle). As part of the QB3 library preparation, the Forward ITS1 (ITS1f) – CTTGGTCATTTAGAGGAAGTAA and Reverse ITS1 (ITS2) – GCTGGGTTCTTCATCGATGC primers (Smith and Peay, 2014) were used for DNA metabarcoding markers for fungi (Smith and Peay, 2014). QB3 also used the following 16S rRNA primers for amplification of prokaryotes (archaea and bacteria): Forward 16S v4 (515Fb) – GTGYCAGCMGCCGCGGTAA, and Reverse 16S v4 (806Rb) – GGACTACNVGGGTWTCTAAT (Caporaso et al., 2011; Apprill et al. 2015).
Bioinformatic Analysis /Quantification and Microbial Ecology Assessment of Samples
All 16S rRNA amplicon sequences were processed through QIIME 2.0 (Bolyen et al. 2018) to identify microbe species and estimate abundances. Sequences from all 78 microbiome preps were imported into QIIME v. 2019.10.0. We determined there were no differences between proximal and distal regions of the gut for the San Salvador Island individuals, therefore we concatenated the Crescent Pond and Osprey Lake samples into one file, in which we had 48 samples which included experimental controls and quality controls from the QB3 facility (Table S2). There was no difference between the means of microbe counts in the foregut and the hindgut (paired t-test, P = 0.29).
We used DADA2 (Callahan et al. 2016) for modeling and correcting Illumina-sequenced amplicon errors, removing chimeras, trimming low quality bases, and merging of forward and reverse reads using the following parameters: –p-trunc-len-f 270 –p-trunc-len-r 210. We used the QIIME alignment mafft software to align sequencesalignment mask to filter non-conserved and highly gapped columns from the aligned 16S sequences (Stackebrandt and Goodfellow, 1991). Next, we used qiime phylogeny midpoint-root to root the phylogeny of our 16S amplicon sequences. Finally, we used qiime diversity alpha-rarefaction on all samples and we set the –p-max-depth to 10,000. We removed samples with 5,000 or less from our analyses.
We compared the beta diversity (qiime emperor plot ) of proximal and distal gut microbiomes of the San Salvador samples with a two-tailed paired t -test and found no significant differences between proximal and distal regions of the gut microbiome (P = 0.29). Therefore, we merged the proximal and distal samples for each individual from San Salvador Island, resulting in 48 samples. We also removed oneCualac tessellatus sample because of low read count (129 reads; Figure S1).
We used the classifier Silva 132 99% 515F/806R (silva-132-99-515-806-nb-classifier) for training in identification of taxa from our samples. Afterwards the following files generated in QIIME were used in R (v. 4.0.0) for further statistical analyses: table.qza, rooted-tree.qza, taxonomy.qza, and sample-metadata.tsv. We used the following R packages for further analyses: phyloseq (McMurdie and Holmes 2013) and ggplot2 (Wickham, 2016) with the following functions:distance, plot_bar , plot_ordination , andplot_richness . Before conducting any analyses, we removed the following taxa from our analyses, uncharacterized and Opisthokonta (eukaryotic sequences mainly due to fish 16S amplicons). We estimated alpha diversity by using the plot_richness function and the Chao1 and Shannon’s diversity indices. For beta diversity, we used theplot_ordination function and non-metric multidimensional scaling (NMDS) based on Bray-Curtis distances among samples. Hierarchical clustering was generated with the distance function along withhclust as part of fastcluster (Müllner, 2013) using the average linkage clustering method. The plot_bar function in the phyloseq package was used to visualize relative abundance of taxa. In our taxa plots we removed abundance counts of less than 400 from our analyses. We used ggplot2 to generate all figures (Wickham, 2016). Lastly, we used the Linear discriminant analysis Effect Size (LEfSe version 1.0; Segata et al. 2011) algorithm to identify microbial taxa that were significantly enriched in each of our specialists (scale-eater and molluscivore) in comparison to all other samples. This analysis was used to determine the features (i.e. organisms, clades, operational taxonomic units) to explain differences in assigned metadata categories. We used the nonparametric factorial Kruskal-Wallis rank-sum test to detect taxa with significant differential abundances between specialist samples and all generalist samples (scale-eaters versus generalists + molluscivores, molluscivores versus generalists + scale-eaters). We then used a Wilcoxon test for all pairwise comparisons between taxa within each significantly enriched class to compare to the class level. From the standard and gut length measurements, we used ANCOVA in R (v. 4.0.0) to test whether there was a significant difference among species based on gut length and standard length.
Lastly, we used generalized linear models (GLMs) in R to test the effects of diet (generalist, scale-eater, molluscivore), the fixed effect of location (Osprey Lake, San Salvador Island; Crescent Pond, San Salvador Island; Lake Cunningham, New Providence Island; Fort Fisher, North Carolina; and San Luis Potosí, Mexico), and their interaction on the response variables of principal coordinates axes 1 and 2.