Chuck Merge branch 'master' into compile  over 9 years ago

Commit id: c62aae812fcb41fc6b2941d4bb3bed4e1cab0e1f

deletions | additions      

         

\section{Materials and Methods}  \subsubsection{Experimental Design}  We placed test tube racks in one smaller (185L, control) and 3 larger (370L)  flow-through mesocosms. Each mesocosm had an adjustable flow rate that resulted  in a residence time of approximately 12h. Irregular variation in inflow rate  meant that flow rate varied around that target throughout the day, however,  regular monitoring ensured that the entire volume of each system was flushed  approximately two times per day. To provide a surface for biofilm formation we  attached coverslips to glass slides using nail polish and then attached each  slide to the test tube racks using office-style binder clips. Twice daily 10 ml  of 37 mM KPO$_{4}$ and 1, 5 and 50 ml of 3.7M glucose were added to each of 3  mesocosms to achieve target C:P resource amendments of 10, 100 and 500  respectively. The control mesocosm did not receive any C or P amendments.  \subsubsection{DOC and Chlorophyll Measurements}  To assess the efficacy of the C additions we sampled each mesocosm twice  daily during the first week of the experiment to evaluate dissolved organic  C (DOC) content. After the initiation of the experiment we collected  plankton on filters regularly to evaluate planktonic Chl \textit{a} and  bacterial abundance. Once it was clear that pool size of each community had  been altered (day 8) we filtered plankton onto 0.2 $\mu$m filters and harvested  coverslips to assess bacterial and algal biofilm community composition (16S and 23S  rDNA). In addition all mesoscosms were analyzed for community composition a  second time (day 17) to assess how community composition of both the plankton  and biofilm communities had been altered over time. Control samples were only  analyzed for community composition on day 17.  Samples for dissolved organic C (DOC) analysis were collected in acid  washed 50 mL falcon tubes after filtration through a 0.2 polycarbonate membrane  filter (Millipore GTTP GTTP02500, Sigma Aldrich P9199) attached to a 60 mL  syringe. Syringes and filters were first flushed multiple times with the  control sample to prevent leaching of C from the syringe or the filter  into the sample. Samples were then frozen and analyzed for organic C  content with a Shimadzu 500 TOC analyzer \citep{Wetzel_2000}. Biomass of all  biofilm samples were measured by difference in pre-(without biofilm) and  post-(with biofilm) weighed GF/F filters after oven drying overnight at 60C.  For Chl \textit{a} analysis we collected plankton on GF/F filters (Whatman,  Sigma Aldrich Cat. \# Z242489) by filtering between 500 mL and 1L from the  water column of each mesocosm for each treatment. For biofilm samples, all  biofilm was gently removed from the complete area of each coverslip (3  coverslips for each treatment per sampling event) before being placed in a test  tube for extraction with 90-95\% acetone for ~ 32 hours at -20C and analyzed  immediately after using a Turner 10-AU fluorometer \citep{Wetzel_2000}.  We analyzed bacterial abundance of the planktonic samples using Dapi staining  and direct visualization on a Zeis Axio epifluorescence microscope after the  methods of Porter and Feig (1980). Briefly, 1-3 mL of water was filtered from  three separate water column samples through a 0.2 $\mu$m black polycarbonate  membrane filter and post stained with a combination of Dapi and Citifluor  mountant media (Ted Pella Redding, Ca) to a final concentration of 1$\mu$L  mL-1.   \subsubsection{DNA extraction} For plankton, cells were collected by filtering  between 20 {\textendash} 30 mL of water onto a 0.2 $\mu$m pore-size  polycarbonate filter (Whatman Nucleopore 28417598, Sigma-Aldrich cat\#  WHA110656). For biofilm communities, biomass from the entire coverslip area of  three separate slides was collected and combined in an eppendorf tube by gentle  scrapping the slip surface with an ethanol rinsed and flamed razor blade. DNA  from both the filter and the biofilm was extracted using a Mobio Power Soil DNA  isolation kit (MoBio Cat. \# 12888).   \subsubsection{PCR}  Samples were amplified for pyrosequencing using a forward and reverse fusion  primer. The forward primer was constructed with (5{'}-3{'}) the Roche A  linker, an 8-10bp barcode, and the forward gene specific primer sequence. The  reverse fusion primer was constructed with (5{'}-3{'}) a biotin molecule, the  Roche B linker and the reverse gene specific primer sequence. The gene specific  primer pair for bacterial SSU rRNA genes was 27F/519R \citep{LANED.J.:1991}.  The primer pair p23SrV\_f1/p23SrV\_r1 was used to target 23S rRNA genes on  plastid genomes \citep{Sherwood_2007}. Amplifications were performed in 25 ul  reactions with Qiagen HotStar Taq master mix (Qiagen Inc, Valencia,  California), 1ul of each 5uM primer, and 1ul of template. Reactions were  performed on ABI Veriti thermocyclers (Applied Biosytems, Carlsbad, California)  under the following thermal profile: 95$^{\circ}$C for 5 min, then 35 cycles of  94$^{\circ}$C for 30 sec, 54$^{\circ}$C for 40 sec, 72$^{\circ}$C for 1 min,  followed by one cycle of 72$^{\circ}$C for 10 min and 4$^{\circ}$C hold.  Amplification products were visualized with eGels (Life Technologies, Grand  Island, New York). Products were then pooled equimolar and each pool was  cleaned with Diffinity RapidTip (Diffinity Genomics, West Henrietta, New York),  and size selected using Agencourt AMPure XP (BeckmanCoulter, Indianapolis,  Indiana) following Roche 454 protocols (454 Life Sciences, Branford,  Connecticut). Size selected pools were then quantified and 150 ng of DNA were  hybridized to Dynabeads M-270 (Life Technologies) to create single stranded DNA  following Roche 454 protocols (454 Life Sciences). Single stranded DNA was  diluted and used in emPCR reactions, which were performed and subsequently  enriched. Sequencing followed established manufacture protocols (454 Life  Sciences).   \subsection{Sequence Quality Control and Analysis}  \subsubsection{Quality Control}   The 16S/23S sequence collections were demultiplexed and sequences with sample  barcodes not matching expected barcodes were discarded. We used the maximum  expected error metric \citep{23955772} calculated from sequence quality scores  to cull poor quality sequences from the dataset. Specifically, we discarded any  sequence with a maximum expected error count greater than 1 after truncating to  175 nt. The forward primer and barcode was trimmed from the remaining reads. We  checked that all primer trimmed, error screened and truncated sequences were  derived from the same region of the LSU or SSU rRNA gene (23S and 16S  sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene  alignment ({\textquotedblleft}Ref{\textquotedblright} collection, release 115)  with the Mothur \citep{19801464} NAST-algorithm \citep{16845035} aligner and  inspecting the alignment coordinates. Reads falling outside the expected  alignment coordinates were culled from the dataset. Remaining reads were  trimmed to consistent alignment coordinates such that all reads began and ended  at the same position in the SSU rRNA gene and screened for chimeras with UChime  in {\textquotedblleft}denovo{\textquotedblright} mode \citep{21700674} via the  Mothur UChime wrapper.  \subsubsection{Taxonomic annotations} Sequences were taxonomically classified  using the UClust \citep{20709691} based classifier in the QIIME package  \citep{20383131} with the Greengenes database and taxonomic nomenclature  (version "gg\_13\_5" provided by QIIME developers, 97\% OTU  representative sequences and corresponding taxonomic annotations,  \citep{22134646}) for 16S reads or the Silva LSU database (Ref  set, version 115, EMBL taxonomic annotations, \citep{23193283}) for the 23S  reads as reference. We used the default parameters for the algorithm (i.e.  minimum consensus of 51\% at any rank, minimum sequence identity for hits at  90\% and the maximum accepted hits value was set to 3).  \subsubsection{Clustering}  Reads were clustered into OTUs following the UParse pipeline. Specifically  USearch (version 7.0.1001) was used to establish cluster centroids at a 97\%  sequence identity level from the quality controlled data and map quality  controlled reads to the centroids. The initial centroid establishment algorithm  incorporates a quality control step wherein potentially chimeric reads are not  allowed to become cluster seeds. Additionally, we discarded singleton reads  because it is difficult to asses the quality of singleton reads and this  quality control parameter in addition to maximum expected error screening has  proven to be similarly useful if not superior for reducing 454 sequencing error  as {\textquotedblleft}denoising{\textquotedblright} \citep{23955772}. Moreover,  two popular {\textquotedblleft}denoising{\textquotedblright} algorithms have  been shown to add sequencing errors while correcting others sometimes in a  nearly equal ratio \citep{22543370}. Eighty-eight and 98\% of quality  controlled reads could be mapped back to our cluster seeds at a 97\% identity  cutoff for the 16S and 23S sequences, respectively.   \subsubsection{Alpha and Beta diversity analyses}  Alpha diversity calculations were made using PyCogent Python bioinformatics  modules \citep{17708774}. Beta diversity analyses were made using Phyloseq  \citep{24699258} and its dependencies \citep{vegan}. A sparsity threshold of  25\% was used for ordination of both plastid 23S and bacterial 16S libraries.  Additionally, we discarded any OTUs from the 23S data that could not be  annotated as belonging in the Eukaryota. All DNA sequence based results were  visualized using GGPlot2 \citep{Wickham_2009}. Adonis tests and principal  coordinate ordinations were performed using the Bray-Curtis similarity measure  for pairwise library comparisons. Adonis tests employed the default value for  number of permutations (999) ("adonis" function in Vegan R package,  \citet{vegan}). Principal coordinates of OTUs were found by averaging site  principal coordinate values for each OTU with OTU relative abundance values  (within sites) as weights. The principal coordinate OTU weighted averages were  then expanded to match the site-wise variances \citep{vegan}.  \subsubsection{Identifying Enriched OTUs}  We used an RNA-Seq differential expression statistical framework to find OTUs  enriched in the given sample classes (R package DESeq2 developed by  \citet{Love_2014}) (for review of RNA-Seq differential expression statistics  applied to microbiome OTU count data see \citet{24699258}). We use the term  {\textquotedblleft}differential abundance{\textquotedblright} coined by  \citet{24699258} to denote OTUs that have different proportion means across  sample classes. We are particularly interested in two sample classes: 1)  lifestyle (biofilm or planktonic) and, 2) high C (C:P = 500) versus  not high C (C:P = 10, C:P = 100 and C:P = control). A differentially  abundant OTU would have a proportion mean in one class that is  statistically different from its proportion mean in another. This differential abundance  could mark an enrichment of the OTU in either sample class and the direction of  the enrichment is apparent in the sign (positive or negative) of the metric  used to summarize the proportion mean difference. Here we use log2 of the  proportion mean ratio (means are derived from OTU proportions for all samples  in each given class) as our differential abundance metric. It is also important  to note that the DESeq2 R package we are using to calculate the differential  abundance metric {\textquotedblleft}shrinks{\textquotedblright} the metric in  inverse proportion to the information content for each OTU. In this way the  magnitude of the differential abundance metric will be high only for OTUs which  there is a robust signal of differential abundance. Further, the metric can be  used to effectively rank OTUs by magnitude of the sample class affect (i.e.  OTUs with high proportion mean differences but also high within sample class  proportion variance will not produce misleadingly large differential abundance  metric values). The DESeq2 RNA-Seq statistical framework has been shown to  improve power and specificity when identifying differentially abundant OTUs  across sample classes in microbiome experiments \citet{24699258}.  The specific specific DESeq2 \citep{Love_2014} parameters we used were as follows:   All dispersion estimates from DESeq2 were calculated using a local fit for  mean-dispersion. Native DESeq2 independent filtering was disabled in favor of  explicit sparsity filtering. The sparsity thresholds that produced the maximum  number of OTUs with adjusted p-values for differential abundance below a false  discovery rate of 10\% were selected for biofilm versus planktonic sequence  16S/plastid 23S library comparisons. The specific sparsity threshold for  plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\%  (OTUs found in less than the sparsity threshold of samples were discarded from  the analysis). Cook's distance filtering was also disabled when calculating  p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values  for multiple testing \citep{citeulike:1042553}. Identical DESeq2 methods were  used to assess enriched OTUs from relative abundances grouped into high (C:P =  500) or low (C:P < 500 and control) categories.   IPython Notebooks with computational methods used to create all figures and  tables are provided at the following url:  \url{http://nbviewer.ipython.org/github/chuckpr/BvP_manuscript_figures}.           

\section{Materials and Methods}  \subsubsection{Experimental Design}  We placed test tube racks in one smaller (185L, control) and 3 larger (370L) flow-through mesocosms. Each mesocosm had an adjustable flow rate that resulted in a residence time of approximately 12h. Irregular variation in inflow rate meant that flow rate varied around that target throughout the day, however, regular monitoring ensured that the entire volume of each system was flushed approximately two times per day. To provide a surface for biofilm formation we attached coverslips to glass slides using nail polish and then attached each slide to the test tube racks using office-style binder clips. Twice daily 10 ml of 37 mM KPO$_{4}$ and 1, 5 and 50 ml of 3.7M glucose were added to each of 3 mesocosms to achieve target C:P resource amendments of 10, 100 and 500 respectively. The control mesocosm did not receive any C or P amendments.  \subsubsection{DOC and Chlorophyll Measurements}  To assess the efficacy of the carbon additions we sampled each mesocosm twice daily during the first week of the experiment to evaluate dissolved organic carbon (DOC) content. After the initiation of the experiment we collected plankton on filters regularly to evaluate planktonic Chl \textit{a} and bacterial abundance. Once it was clear that pool size of each community had been altered (day 8) we filtered plankton onto 0.2 $\mu$m filters and harvested coverslips to assess bacterial and algal community composition (16S and 23S rDNA). In addition all mesoscosms were analyzed for community composition a second time (day 17) to assess how community composition of both the plankton and biofilm communities had been altered over time. Control samples were only analyzed for community composition on day 17.  Samples for dissolved organic carbon (DOC) analysis were collected in acid washed 50 mL falcon tubes after filtration through a 0.2 polycarbonate membrane filter (Millipore GTTP GTTP02500, Sigma Aldrich P9199) attached to a 60 mL syringe. Syringes and filters were first flushed multiple times with the control sample to prevent leaching of carbon from the syringe or the filter into the sample. Samples were then frozen and analyzed for organic carbon content with a Shimadzu 500 TOC analyzer \cite{Wetzel_2000}. Biomass of all biofilm samples were measured by difference in pre-(without biofilm) and post-(with biofilm) weighed GF/F filters after oven drying overnight at 60C.  For Chl \textit{a} analysis we collected plankton on GF/F filters (Whatman, Sigma Aldrich Cat. # Z242489) by filtering between 500 mL and 1L from the water column of each mesocosm for each treatment. For biofilm samples, all biofilm was gently removed from the complete area of each coverslip (3 coverslips for each treatment per sampling event) before being placed in a test tube for extraction with 90-95\% acetone for ~ 32 hours at -20C and analyzed immediately after using a Turner 10-AU fluorometer \cite{Wetzel_2000}.  We analyzed bacterial abundance of the planktonic samples using Dapi staining and direct visualization on a Zeis Axio epifluorescence microscope after the methods of Porter and Feig (1980). Briefly, 1-3 mL of water was filtered from three separate water column samples through a 0.2 $\mu$m black polycarbonate membrane filter and post stained with a combination of Dapi and Citifluor mountant media (Ted Pella Redding, Ca) to a final concentration of 1$\mu$ ml-1.  \textit{DNA extraction} For plankton, cells were collected by filtering between 20 – 30 mL of water onto a 0.2 $\mu$m pore-size polycarbonate filter (Whatman Nucleopore 28417598, Sigma-Aldrich cat# WHA110656). For biofilm communities, biomass from the entire coverslip area of three separate slides was collected and combined in an eppendorf tube by gentle scrapping the slip surface with an ethanol rinsed and flamed razor blade. DNA from both the filter and the biofilm was extracted using a Mobio Power Soil DNA isolation kit (MoBio Cat. # 12888).   \subsubsection{PCR}  Samples were amplified for pyrosequencing using a forward and reverse fusion primer. The forward primer was constructed with (5’-3’) the Roche A linker, an 8-10bp barcode, and the forward gene specific primer sequence. The reverse fusion primer was constructed with (5’-3’) a biotin molecule, the Roche B linker and the reverse gene specific primer sequence. The gene specific primer pair for bacterial SSU rRNA genes was 27F/519R \cite{LANED.J.:1991}. The primer pair p23SrV\_f1/p23SrV\_r1 was used to target 23S rRNA genes on plastid genomes \cite{Sherwood_2007}. Amplifications were performed in 25 ul reactions with Qiagen HotStar Taq master mix (Qiagen Inc, Valencia, California), 1ul of each 5uM primer, and 1ul of template. Reactions were performed on ABI Veriti thermocyclers (Applied Biosytems, Carlsbad, California) under the following thermal profile: 95$^{\circ}$C for 5 min, then 35 cycles of 94$^{\circ}$C for 30 sec, 54$^{\circ}$C for 40 sec, 72$^{\circ}$C for 1 min, followed by one cycle of 72$^{\circ}$C for 10 min and 4$^{\circ}$C hold.  Amplification products were visualized with eGels (Life Technologies, Grand Island, New York). Products were then pooled equimolar and each pool was cleaned with Diffinity RapidTip (Diffinity Genomics, West Henrietta, New York), and size selected using Agencourt AMPure XP (BeckmanCoulter, Indianapolis, Indiana) following Roche 454 protocols (454 Life Sciences, Branford, Connecticut). Size selected pools were then quantified and 150 ng of DNA were hybridized to Dynabeads M-270 (Life Technologies) to create single stranded DNA following Roche 454 protocols (454 Life Sciences). Single stranded DNA was diluted and used in emPCR reactions, which were performed and subsequently enriched. Sequencing followed established manufacture protocols (454 Life Sciences).           

\subsection{Sequence Quality Control and Analysis}  \subsubsection{Quality Control}   The 16S/23S sequence collections were demultiplexed and sequences with sample  barcodes not matching expected barcodes were discarded. We used the maximum  expected error metric \citep{23955772} calculated from sequence quality scores  to cull poor quality sequences from the dataset. Specifically, we discarded any  sequence with a maximum expected error count greater than 1 after truncating to  175 nt. The forward primer and barcode was trimmed from the remaining reads. We  checked that all primer trimmed, error screened and truncated sequences were  derived from the same region of the LSU or SSU rRNA gene (23S and 16S  sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene  alignment ({\textquotedblleft}Ref{\textquotedblright} collection, release 115)  with the Mothur \citep{19801464} NAST-algorithm \citep{16845035} aligner and  inspecting the alignment coordinates. Reads falling outside the expected  alignment coordinates were culled from the dataset. Remaining reads were  trimmed to consistent alignment coordinates such that all reads began and ended  at the same position in the SSU rRNA gene and screened for chimeras with UChime  in {\textquotedblleft}denovo{\textquotedblright} mode \citep{21700674} via the  Mothur UChime wrapper.  \subsubsection{Taxonomic annotations} Sequences were taxonomically classified  using the UClust \citep{20709691} based classifier in the QIIME package  \citep{20383131} with the Greengenes database and taxonomic nomenclature  (version "gg\_13\_5" provided by QIIME developers, 97\% OTU  representative sequences and corresponding taxonomic annotations,  \citep{22134646}) for 16S reads or the Silva LSU database (Ref  set, version 115, EMBL taxonomic annotations, \citep{23193283}) for the 23S  reads as reference. We used the default parameters for the algorithm (i.e.  minimum consensus of 51\% at any rank, minimum sequence identity for hits at  90\% and the maximum accepted hits value was set to 3).  \subsubsection{Clustering}  Reads were clustered into OTUs following the UParse pipeline. Specifically  USearch (version 7.0.1001) was used to establish cluster centroids at a 97\%  sequence identity level from the quality controlled data and map quality  controlled reads to the centroids. The initial centroid establishment algorithm  incorporates a quality control step wherein potentially chimeric reads are not  allowed to become cluster seeds. Additionally, we discarded singleton reads  because it is difficult to asses the quality of singleton reads and this  quality control parameter in addition to maximum expected error screening has  proven to be similarly useful if not superior for reducing 454 sequencing error  as {\textquotedblleft}denoising{\textquotedblright} \citep{23955772}. Moreover,  two popular {\textquotedblleft}denoising{\textquotedblright} algorithms have  been shown to add sequencing errors while correcting others sometimes in a  nearly equal ratio \citep{22543370}. Eighty-eight and 98\% of quality  controlled reads could be mapped back to our cluster seeds at a 97\% identity  cutoff for the 16S and 23S sequences, respectively.   \subsubsection{Alpha and Beta diversity analyses}  Alpha diversity calculations were made using PyCogent Python bioinformatics  modules \citep{17708774}. Beta diversity analyses were made using Phyloseq  \citep{24699258} and its dependencies \citep{vegan}. A sparsity threshold of  25\% was used for ordination of both plastid 23S and bacterial 16S libraries.  Additionally, we discarded any OTUs from the 23S data that could not be  annotated as belonging in the Eukaryota. All DNA sequence based results were  visualized using GGPlot2 \citep{Wickham_2009}. Adonis tests and principal  coordinate ordinations were performed using the Bray-Curtis similarity measure  for pairwise library comparisons. Adonis tests employed the default value for  number of permutations (999) ("adonis" function in Vegan R package,  \citet{vegan}). Principal coordinates of OTUs were found by averaging site  principal coordinate values for each OTU with OTU relative abundance values  (within sites) as weights. The principal coordinate OTU weighted averages were  then expanded to match the site-wise variances \citep{vegan}.  \subsubsection{Identifying Enriched OTUs}  We used an RNA-Seq differential expression statistical framework to find OTUs  enriched in the given sample classes (R package DESeq2 developed by  \citet{Love_2014}) (for review of RNA-Seq differential expression statistics  applied to microbiome OTU count data see \citet{24699258}). We use the term  {\textquotedblleft}differential abundance{\textquotedblright} coined by  \citet{24699258} to denote OTUs that have different proportion means across  sample classes. We are particularly interested in two sample classes: 1)  lifestyle (biofilm or planktonic) and, 2) high C (C:P = 500) versus  not high C (C:P = 10, C:P = 100 and C:P = control). A differentially  abundant OTU would have a proportion mean in one class that is  statistically different from its proportion mean in another. This differential abundance  could mark an enrichment of the OTU in either sample class and the direction of  the enrichment is apparent in the sign (positive or negative) of the metric  used to summarize the proportion mean difference. Here we use log2 of the  proportion mean ratio (means are derived from OTU proportions for all samples  in each given class) as our differential abundance metric. It is also important  to note that the DESeq2 R package we are using to calculate the differential  abundance metric {\textquotedblleft}shrinks{\textquotedblright} the metric in  inverse proportion to the information content for each OTU. In this way the  magnitude of the differential abundance metric will be high only for OTUs which  there is a robust signal of differential abundance. Further, the metric can be  used to effectively rank OTUs by magnitude of the sample class affect (i.e.  OTUs with high proportion mean differences but also high within sample class  proportion variance will not produce misleadingly large differential abundance  metric values). The DESeq2 RNA-Seq statistical framework has been shown to  improve power and specificity when identifying differentially abundant OTUs  across sample classes in microbiome experiments \citet{24699258}.  The specific specific DESeq2 \citep{Love_2014} parameters we used were as follows:   All dispersion estimates from DESeq2 were calculated using a local fit for  mean-dispersion. Native DESeq2 independent filtering was disabled in favor of  explicit sparsity filtering. The sparsity thresholds that produced the maximum  number of OTUs with adjusted p-values for differential abundance below a false  discovery rate of 10\% were selected for biofilm versus planktonic sequence  16S/plastid 23S library comparisons. The specific sparsity threshold for  plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\%  (OTUs found in less than the sparsity threshold of samples were discarded from  the analysis). Cook's distance filtering was also disabled when calculating  p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values  for multiple testing \citep{citeulike:1042553}. Identical DESeq2 methods were  used to assess enriched OTUs from relative abundances grouped into high (C:P =  500) or low (C:P < 500 and control) categories.   IPython Notebooks with computational methods used to create all figures and  tables are provided at the following url:  \url{http://nbviewer.ipython.org/github/chuckpr/BvP_manuscript_figures}.           

\subsection{Sequence Quality Control and Analysis}  \subsubsection{Quality Control}   The 16S sequence collection was demultiplexed and sequences with sample barcodes not matching expected barcodes were discarded. We used the maximum expected error metric \cite{23955772} calculated from sequence quality scores to cull poor quality sequences from the dataset. Specifically, we discarded any sequence with a maximum expected error count greater than 1 after truncating to 175 nt. The forward primer and barcode was trimmed from the remaining reads. We checked that all primer trimmed, error screened and truncated sequences were derived from the same region of the LSU or SSU rRNA gene (23S and 16S sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene alignment (“Ref” collection, release 115) with the Mothur \cite{19801464} NAST-algorithm \cite{16845035} aligner and inspecting the alignment coordinates. Reads falling outside the expected alignment coordinates were culled from the dataset. Remaining reads were trimmed to consistent alignment coordinates such that all reads began and ended at the same position in the SSU rRNA gene and screened for chimeras with UChime in “denovo” mode \cite{21700674} via the Mothur UChime wrapper.  \subsubsection{Taxonomic annotations}  Sequences were taxonomically classified using the UClust \cite{20709691} based classifier in the QIIME package \cite{20383131} with the Greengenes database and taxonomic nomenclature (version XXXXX, 97\% OTU representative sequences and corresponding taxonomic annotations, \cite{22134646}) for 16S reads or the Silva LSU database (Ref set, version 115, EMBL taxonomic annotations, \cite{23193283}) for the 23S reads as reference. We used the default parameters for the algorithm (i.e. minimum consensus of 51\% at any rank, minimum sequence identity for hits at 90\% and the maximum accepted hits value was set to 3).  \subsubsection{Clustering}  Reads were clustered into OTUs following the UParse pipeline. Specifically USearch (version 7.0.1001) was used to establish cluster centroids at a 97\% sequence identity level from the quality controlled data and map quality controlled reads to the centroids. The initial centroid establishment algorithm incorporates a quality control step wherein potentially chimeric reads are not allowed to become cluster seeds. Additionally, we discarded singleton reads because it is difficult to asses the quality of singleton reads and this quality control parameter in addition to maximum expected error screening has proven to be similarly useful if not superior for reducing 454 sequencing error as “denoising” \cite{23955772}. Moreover, two popular “denoising” algorithms have been shown to add sequencing errors while correcting others sometimes in a nearly equal ratio \cite{22543370}. Eighty-eight and 98\% of quality controlled reads could be mapped back to our cluster seeds at a 97\% identity cutoff for the 16S and 23S sequences, respectively.   \subsubsection{Alpha and Beta diversity analyses}  Alpha diversity calculations were made using PyCogent Python bioinformatics modules \cite{17708774}. Beta diversity analyses were made using Phyloseq \cite{24699258} and its dependencies \cite{vegan}. Log$_{2}$ fold change of group mean ratios and corresponding null hypothesis based significance values were calculated using DESeq2 \cite{Love_2014}. All dispersion estimates from DESeq2 were calculated using a local fit for mean-dispersion. Native DESeq2 independent filtering was disabled in favor of explicit sparsity filtering. The sparsity thresholds that produced the maximum number of OTUs with adjusted p-values for differential abundance below a false discovery rate of 10\% were selected for biofilm versus planktonic sequence 16S/plastid 23S library comparisons. The specific sparsity threshold for plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\% (OTUs found in less than the sparsity threshold of samples were discarded from the analysis). Cook's distance filtering was also disabled when calculating p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values for multiple testing \cite{citeulike:1042553}. Identical DESeq2 methods were used to assess enriched OTUs from relative abundances grouped into high (C:P = 500) or low (C:P < 500 and control) categories. A sparsity threshold of 25\% was used for ordination of both plastid 23S and bacterial 16S libraries. Additionally, we discarded any OTUs from the 23S data that could not be annotated as belonging in the Eukaryota. All DNA sequence based results were visualized using GGPlot2 \cite{Wickham_2009}.  Adonis tests were performed using the Bray-Curtis similarity measure for pairwise library comparisons with the default value for number of permutations (999) ("adonis" function in Vegan R package, \citet{vegan}). Principal coordinates of OTUs were found by averaging site principal coordinate values for each OTU with OTU relative abundance values (within sites) as weights. The principal coordinate OTU weighted averages were then expanded to match the site-wise variances \cite{vegan}.  \subsubsection{Identifying Enriched OTUs}  We used an RNA-Seq differential expression statistical framework to find OTUs enriched in the given sample classes (R package DESeq2 developed by \citet{Love_2014}) (for review of RNA-Seq differential expression statistics applied to microbiome OTU count data see \citet{24699258}). We use the term “differential abundance” (coined by \citet{24699258}) to denote OTUs that have different proportion means across sample classes. We are particularly interested in two sample classes: 1) environment type (biofilm or planktonic) and, 2) high carbon (C:P = 500) versus not high carbon (C:P = 10, C:P = 100 and C:P = control). A differentially abundant OTU, for instance, would have a proportion mean in one class that is statistically different from its mean in another. This differential abundance could mark an enrichment of the OTU in either sample class and the direction of the enrichment is apparent in the sign (positive or negative) of the metric used to summarize the proportion mean difference. Here we use log2 of the proportion mean ratio (means are derived from OTU proportions for all samples in each given class) as our differential abundance metric. It is also important to note that the DESeq2 R package we are using to calculate the differential abundance metric “shrinks” the metric in inverse proportion to the information content for each OTU. In this way the magnitude of the differential abundance metric will be high only for OTUs which we have strong confidence of true differential abundance and the metric can be used to effectively rank OTUs by magnitude of the sample class affect (i.e. OTUs with high proportion mean differences but also high within sample class proportion variance will not produce misleadingly large differential abundance metric values). The DESeq2 RNA-Seq statistical framework has been shown to improve power and specificity when identifying differentially abundant OTUs across sample classes in microbiome experiments \citet{24699258}.         

Abstract.tex  Introduction.tex  Material and Methods.tex  Sequence Quality Control and Analysis.tex Materials_and_Methods.tex  Sequence_Quality_Control_and_Analysis.tex  Results.tex  figures/conceptualmodel/biofilmsubsidiesFigs06252013.001.jpg  figures/pool size/biofilmsubsidiesFigs07252013.003.jpg