this is for holding javascript data
Chuck Merge branch 'master' into compile
over 9 years ago
Commit id: c62aae812fcb41fc6b2941d4bb3bed4e1cab0e1f
deletions | additions
diff --git a/Material and Methods.tex b/Material and Methods.tex
deleted file mode 100644
index d63aa15..0000000
--- a/Material and Methods.tex
+++ /dev/null
...
\section{Materials and Methods}
\subsubsection{Experimental Design}
We placed test tube racks in one smaller (185L, control) and 3 larger (370L)
flow-through mesocosms. Each mesocosm had an adjustable flow rate that resulted
in a residence time of approximately 12h. Irregular variation in inflow rate
meant that flow rate varied around that target throughout the day, however,
regular monitoring ensured that the entire volume of each system was flushed
approximately two times per day. To provide a surface for biofilm formation we
attached coverslips to glass slides using nail polish and then attached each
slide to the test tube racks using office-style binder clips. Twice daily 10 ml
of 37 mM KPO$_{4}$ and 1, 5 and 50 ml of 3.7M glucose were added to each of 3
mesocosms to achieve target C:P resource amendments of 10, 100 and 500
respectively. The control mesocosm did not receive any C or P amendments.
\subsubsection{DOC and Chlorophyll Measurements}
To assess the efficacy of the C additions we sampled each mesocosm twice
daily during the first week of the experiment to evaluate dissolved organic
C (DOC) content. After the initiation of the experiment we collected
plankton on filters regularly to evaluate planktonic Chl \textit{a} and
bacterial abundance. Once it was clear that pool size of each community had
been altered (day 8) we filtered plankton onto 0.2 $\mu$m filters and harvested
coverslips to assess bacterial and algal biofilm community composition (16S and 23S
rDNA). In addition all mesoscosms were analyzed for community composition a
second time (day 17) to assess how community composition of both the plankton
and biofilm communities had been altered over time. Control samples were only
analyzed for community composition on day 17.
Samples for dissolved organic C (DOC) analysis were collected in acid
washed 50 mL falcon tubes after filtration through a 0.2 polycarbonate membrane
filter (Millipore GTTP GTTP02500, Sigma Aldrich P9199) attached to a 60 mL
syringe. Syringes and filters were first flushed multiple times with the
control sample to prevent leaching of C from the syringe or the filter
into the sample. Samples were then frozen and analyzed for organic C
content with a Shimadzu 500 TOC analyzer \citep{Wetzel_2000}. Biomass of all
biofilm samples were measured by difference in pre-(without biofilm) and
post-(with biofilm) weighed GF/F filters after oven drying overnight at 60C.
For Chl \textit{a} analysis we collected plankton on GF/F filters (Whatman,
Sigma Aldrich Cat. \# Z242489) by filtering between 500 mL and 1L from the
water column of each mesocosm for each treatment. For biofilm samples, all
biofilm was gently removed from the complete area of each coverslip (3
coverslips for each treatment per sampling event) before being placed in a test
tube for extraction with 90-95\% acetone for ~ 32 hours at -20C and analyzed
immediately after using a Turner 10-AU fluorometer \citep{Wetzel_2000}.
We analyzed bacterial abundance of the planktonic samples using Dapi staining
and direct visualization on a Zeis Axio epifluorescence microscope after the
methods of Porter and Feig (1980). Briefly, 1-3 mL of water was filtered from
three separate water column samples through a 0.2 $\mu$m black polycarbonate
membrane filter and post stained with a combination of Dapi and Citifluor
mountant media (Ted Pella Redding, Ca) to a final concentration of 1$\mu$L
mL-1.
\subsubsection{DNA extraction} For plankton, cells were collected by filtering
between 20 {\textendash} 30 mL of water onto a 0.2 $\mu$m pore-size
polycarbonate filter (Whatman Nucleopore 28417598, Sigma-Aldrich cat\#
WHA110656). For biofilm communities, biomass from the entire coverslip area of
three separate slides was collected and combined in an eppendorf tube by gentle
scrapping the slip surface with an ethanol rinsed and flamed razor blade. DNA
from both the filter and the biofilm was extracted using a Mobio Power Soil DNA
isolation kit (MoBio Cat. \# 12888).
\subsubsection{PCR}
Samples were amplified for pyrosequencing using a forward and reverse fusion
primer. The forward primer was constructed with (5{'}-3{'}) the Roche A
linker, an 8-10bp barcode, and the forward gene specific primer sequence. The
reverse fusion primer was constructed with (5{'}-3{'}) a biotin molecule, the
Roche B linker and the reverse gene specific primer sequence. The gene specific
primer pair for bacterial SSU rRNA genes was 27F/519R \citep{LANED.J.:1991}.
The primer pair p23SrV\_f1/p23SrV\_r1 was used to target 23S rRNA genes on
plastid genomes \citep{Sherwood_2007}. Amplifications were performed in 25 ul
reactions with Qiagen HotStar Taq master mix (Qiagen Inc, Valencia,
California), 1ul of each 5uM primer, and 1ul of template. Reactions were
performed on ABI Veriti thermocyclers (Applied Biosytems, Carlsbad, California)
under the following thermal profile: 95$^{\circ}$C for 5 min, then 35 cycles of
94$^{\circ}$C for 30 sec, 54$^{\circ}$C for 40 sec, 72$^{\circ}$C for 1 min,
followed by one cycle of 72$^{\circ}$C for 10 min and 4$^{\circ}$C hold.
Amplification products were visualized with eGels (Life Technologies, Grand
Island, New York). Products were then pooled equimolar and each pool was
cleaned with Diffinity RapidTip (Diffinity Genomics, West Henrietta, New York),
and size selected using Agencourt AMPure XP (BeckmanCoulter, Indianapolis,
Indiana) following Roche 454 protocols (454 Life Sciences, Branford,
Connecticut). Size selected pools were then quantified and 150 ng of DNA were
hybridized to Dynabeads M-270 (Life Technologies) to create single stranded DNA
following Roche 454 protocols (454 Life Sciences). Single stranded DNA was
diluted and used in emPCR reactions, which were performed and subsequently
enriched. Sequencing followed established manufacture protocols (454 Life
Sciences).
\subsection{Sequence Quality Control and Analysis}
\subsubsection{Quality Control}
The 16S/23S sequence collections were demultiplexed and sequences with sample
barcodes not matching expected barcodes were discarded. We used the maximum
expected error metric \citep{23955772} calculated from sequence quality scores
to cull poor quality sequences from the dataset. Specifically, we discarded any
sequence with a maximum expected error count greater than 1 after truncating to
175 nt. The forward primer and barcode was trimmed from the remaining reads. We
checked that all primer trimmed, error screened and truncated sequences were
derived from the same region of the LSU or SSU rRNA gene (23S and 16S
sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene
alignment ({\textquotedblleft}Ref{\textquotedblright} collection, release 115)
with the Mothur \citep{19801464} NAST-algorithm \citep{16845035} aligner and
inspecting the alignment coordinates. Reads falling outside the expected
alignment coordinates were culled from the dataset. Remaining reads were
trimmed to consistent alignment coordinates such that all reads began and ended
at the same position in the SSU rRNA gene and screened for chimeras with UChime
in {\textquotedblleft}denovo{\textquotedblright} mode \citep{21700674} via the
Mothur UChime wrapper.
\subsubsection{Taxonomic annotations} Sequences were taxonomically classified
using the UClust \citep{20709691} based classifier in the QIIME package
\citep{20383131} with the Greengenes database and taxonomic nomenclature
(version "gg\_13\_5" provided by QIIME developers, 97\% OTU
representative sequences and corresponding taxonomic annotations,
\citep{22134646}) for 16S reads or the Silva LSU database (Ref
set, version 115, EMBL taxonomic annotations, \citep{23193283}) for the 23S
reads as reference. We used the default parameters for the algorithm (i.e.
minimum consensus of 51\% at any rank, minimum sequence identity for hits at
90\% and the maximum accepted hits value was set to 3).
\subsubsection{Clustering}
Reads were clustered into OTUs following the UParse pipeline. Specifically
USearch (version 7.0.1001) was used to establish cluster centroids at a 97\%
sequence identity level from the quality controlled data and map quality
controlled reads to the centroids. The initial centroid establishment algorithm
incorporates a quality control step wherein potentially chimeric reads are not
allowed to become cluster seeds. Additionally, we discarded singleton reads
because it is difficult to asses the quality of singleton reads and this
quality control parameter in addition to maximum expected error screening has
proven to be similarly useful if not superior for reducing 454 sequencing error
as {\textquotedblleft}denoising{\textquotedblright} \citep{23955772}. Moreover,
two popular {\textquotedblleft}denoising{\textquotedblright} algorithms have
been shown to add sequencing errors while correcting others sometimes in a
nearly equal ratio \citep{22543370}. Eighty-eight and 98\% of quality
controlled reads could be mapped back to our cluster seeds at a 97\% identity
cutoff for the 16S and 23S sequences, respectively.
\subsubsection{Alpha and Beta diversity analyses}
Alpha diversity calculations were made using PyCogent Python bioinformatics
modules \citep{17708774}. Beta diversity analyses were made using Phyloseq
\citep{24699258} and its dependencies \citep{vegan}. A sparsity threshold of
25\% was used for ordination of both plastid 23S and bacterial 16S libraries.
Additionally, we discarded any OTUs from the 23S data that could not be
annotated as belonging in the Eukaryota. All DNA sequence based results were
visualized using GGPlot2 \citep{Wickham_2009}. Adonis tests and principal
coordinate ordinations were performed using the Bray-Curtis similarity measure
for pairwise library comparisons. Adonis tests employed the default value for
number of permutations (999) ("adonis" function in Vegan R package,
\citet{vegan}). Principal coordinates of OTUs were found by averaging site
principal coordinate values for each OTU with OTU relative abundance values
(within sites) as weights. The principal coordinate OTU weighted averages were
then expanded to match the site-wise variances \citep{vegan}.
\subsubsection{Identifying Enriched OTUs}
We used an RNA-Seq differential expression statistical framework to find OTUs
enriched in the given sample classes (R package DESeq2 developed by
\citet{Love_2014}) (for review of RNA-Seq differential expression statistics
applied to microbiome OTU count data see \citet{24699258}). We use the term
{\textquotedblleft}differential abundance{\textquotedblright} coined by
\citet{24699258} to denote OTUs that have different proportion means across
sample classes. We are particularly interested in two sample classes: 1)
lifestyle (biofilm or planktonic) and, 2) high C (C:P = 500) versus
not high C (C:P = 10, C:P = 100 and C:P = control). A differentially
abundant OTU would have a proportion mean in one class that is
statistically different from its proportion mean in another. This differential abundance
could mark an enrichment of the OTU in either sample class and the direction of
the enrichment is apparent in the sign (positive or negative) of the metric
used to summarize the proportion mean difference. Here we use log2 of the
proportion mean ratio (means are derived from OTU proportions for all samples
in each given class) as our differential abundance metric. It is also important
to note that the DESeq2 R package we are using to calculate the differential
abundance metric {\textquotedblleft}shrinks{\textquotedblright} the metric in
inverse proportion to the information content for each OTU. In this way the
magnitude of the differential abundance metric will be high only for OTUs which
there is a robust signal of differential abundance. Further, the metric can be
used to effectively rank OTUs by magnitude of the sample class affect (i.e.
OTUs with high proportion mean differences but also high within sample class
proportion variance will not produce misleadingly large differential abundance
metric values). The DESeq2 RNA-Seq statistical framework has been shown to
improve power and specificity when identifying differentially abundant OTUs
across sample classes in microbiome experiments \citet{24699258}.
The specific specific DESeq2 \citep{Love_2014} parameters we used were as follows:
All dispersion estimates from DESeq2 were calculated using a local fit for
mean-dispersion. Native DESeq2 independent filtering was disabled in favor of
explicit sparsity filtering. The sparsity thresholds that produced the maximum
number of OTUs with adjusted p-values for differential abundance below a false
discovery rate of 10\% were selected for biofilm versus planktonic sequence
16S/plastid 23S library comparisons. The specific sparsity threshold for
plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\%
(OTUs found in less than the sparsity threshold of samples were discarded from
the analysis). Cook's distance filtering was also disabled when calculating
p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values
for multiple testing \citep{citeulike:1042553}. Identical DESeq2 methods were
used to assess enriched OTUs from relative abundances grouped into high (C:P =
500) or low (C:P < 500 and control) categories.
IPython Notebooks with computational methods used to create all figures and
tables are provided at the following url:
\url{http://nbviewer.ipython.org/github/chuckpr/BvP_manuscript_figures}.
diff --git a/Materials_and_Methods.tex b/Materials_and_Methods.tex
new file mode 100644
index 0000000..1aa2dc1
--- /dev/null
+++ b/Materials_and_Methods.tex
...
\section{Materials and Methods}
\subsubsection{Experimental Design}
We placed test tube racks in one smaller (185L, control) and 3 larger (370L) flow-through mesocosms. Each mesocosm had an adjustable flow rate that resulted in a residence time of approximately 12h. Irregular variation in inflow rate meant that flow rate varied around that target throughout the day, however, regular monitoring ensured that the entire volume of each system was flushed approximately two times per day. To provide a surface for biofilm formation we attached coverslips to glass slides using nail polish and then attached each slide to the test tube racks using office-style binder clips. Twice daily 10 ml of 37 mM KPO$_{4}$ and 1, 5 and 50 ml of 3.7M glucose were added to each of 3 mesocosms to achieve target C:P resource amendments of 10, 100 and 500 respectively. The control mesocosm did not receive any C or P amendments.
\subsubsection{DOC and Chlorophyll Measurements}
To assess the efficacy of the carbon additions we sampled each mesocosm twice daily during the first week of the experiment to evaluate dissolved organic carbon (DOC) content. After the initiation of the experiment we collected plankton on filters regularly to evaluate planktonic Chl \textit{a} and bacterial abundance. Once it was clear that pool size of each community had been altered (day 8) we filtered plankton onto 0.2 $\mu$m filters and harvested coverslips to assess bacterial and algal community composition (16S and 23S rDNA). In addition all mesoscosms were analyzed for community composition a second time (day 17) to assess how community composition of both the plankton and biofilm communities had been altered over time. Control samples were only analyzed for community composition on day 17.
Samples for dissolved organic carbon (DOC) analysis were collected in acid washed 50 mL falcon tubes after filtration through a 0.2 polycarbonate membrane filter (Millipore GTTP GTTP02500, Sigma Aldrich P9199) attached to a 60 mL syringe. Syringes and filters were first flushed multiple times with the control sample to prevent leaching of carbon from the syringe or the filter into the sample. Samples were then frozen and analyzed for organic carbon content with a Shimadzu 500 TOC analyzer \cite{Wetzel_2000}. Biomass of all biofilm samples were measured by difference in pre-(without biofilm) and post-(with biofilm) weighed GF/F filters after oven drying overnight at 60C.
For Chl \textit{a} analysis we collected plankton on GF/F filters (Whatman, Sigma Aldrich Cat. # Z242489) by filtering between 500 mL and 1L from the water column of each mesocosm for each treatment. For biofilm samples, all biofilm was gently removed from the complete area of each coverslip (3 coverslips for each treatment per sampling event) before being placed in a test tube for extraction with 90-95\% acetone for ~ 32 hours at -20C and analyzed immediately after using a Turner 10-AU fluorometer \cite{Wetzel_2000}.
We analyzed bacterial abundance of the planktonic samples using Dapi staining and direct visualization on a Zeis Axio epifluorescence microscope after the methods of Porter and Feig (1980). Briefly, 1-3 mL of water was filtered from three separate water column samples through a 0.2 $\mu$m black polycarbonate membrane filter and post stained with a combination of Dapi and Citifluor mountant media (Ted Pella Redding, Ca) to a final concentration of 1$\mu$ ml-1.
\textit{DNA extraction} For plankton, cells were collected by filtering between 20 – 30 mL of water onto a 0.2 $\mu$m pore-size polycarbonate filter (Whatman Nucleopore 28417598, Sigma-Aldrich cat# WHA110656). For biofilm communities, biomass from the entire coverslip area of three separate slides was collected and combined in an eppendorf tube by gentle scrapping the slip surface with an ethanol rinsed and flamed razor blade. DNA from both the filter and the biofilm was extracted using a Mobio Power Soil DNA isolation kit (MoBio Cat. # 12888).
\subsubsection{PCR}
Samples were amplified for pyrosequencing using a forward and reverse fusion primer. The forward primer was constructed with (5’-3’) the Roche A linker, an 8-10bp barcode, and the forward gene specific primer sequence. The reverse fusion primer was constructed with (5’-3’) a biotin molecule, the Roche B linker and the reverse gene specific primer sequence. The gene specific primer pair for bacterial SSU rRNA genes was 27F/519R \cite{LANED.J.:1991}. The primer pair p23SrV\_f1/p23SrV\_r1 was used to target 23S rRNA genes on plastid genomes \cite{Sherwood_2007}. Amplifications were performed in 25 ul reactions with Qiagen HotStar Taq master mix (Qiagen Inc, Valencia, California), 1ul of each 5uM primer, and 1ul of template. Reactions were performed on ABI Veriti thermocyclers (Applied Biosytems, Carlsbad, California) under the following thermal profile: 95$^{\circ}$C for 5 min, then 35 cycles of 94$^{\circ}$C for 30 sec, 54$^{\circ}$C for 40 sec, 72$^{\circ}$C for 1 min, followed by one cycle of 72$^{\circ}$C for 10 min and 4$^{\circ}$C hold.
Amplification products were visualized with eGels (Life Technologies, Grand Island, New York). Products were then pooled equimolar and each pool was cleaned with Diffinity RapidTip (Diffinity Genomics, West Henrietta, New York), and size selected using Agencourt AMPure XP (BeckmanCoulter, Indianapolis, Indiana) following Roche 454 protocols (454 Life Sciences, Branford, Connecticut). Size selected pools were then quantified and 150 ng of DNA were hybridized to Dynabeads M-270 (Life Technologies) to create single stranded DNA following Roche 454 protocols (454 Life Sciences). Single stranded DNA was diluted and used in emPCR reactions, which were performed and subsequently enriched. Sequencing followed established manufacture protocols (454 Life Sciences).
diff --git a/Sequence Quality Control and Analysis.tex b/Sequence Quality Control and Analysis.tex
deleted file mode 100644
index 22a24c7..0000000
--- a/Sequence Quality Control and Analysis.tex
+++ /dev/null
...
\subsection{Sequence Quality Control and Analysis}
\subsubsection{Quality Control}
The 16S/23S sequence collections were demultiplexed and sequences with sample
barcodes not matching expected barcodes were discarded. We used the maximum
expected error metric \citep{23955772} calculated from sequence quality scores
to cull poor quality sequences from the dataset. Specifically, we discarded any
sequence with a maximum expected error count greater than 1 after truncating to
175 nt. The forward primer and barcode was trimmed from the remaining reads. We
checked that all primer trimmed, error screened and truncated sequences were
derived from the same region of the LSU or SSU rRNA gene (23S and 16S
sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene
alignment ({\textquotedblleft}Ref{\textquotedblright} collection, release 115)
with the Mothur \citep{19801464} NAST-algorithm \citep{16845035} aligner and
inspecting the alignment coordinates. Reads falling outside the expected
alignment coordinates were culled from the dataset. Remaining reads were
trimmed to consistent alignment coordinates such that all reads began and ended
at the same position in the SSU rRNA gene and screened for chimeras with UChime
in {\textquotedblleft}denovo{\textquotedblright} mode \citep{21700674} via the
Mothur UChime wrapper.
\subsubsection{Taxonomic annotations} Sequences were taxonomically classified
using the UClust \citep{20709691} based classifier in the QIIME package
\citep{20383131} with the Greengenes database and taxonomic nomenclature
(version "gg\_13\_5" provided by QIIME developers, 97\% OTU
representative sequences and corresponding taxonomic annotations,
\citep{22134646}) for 16S reads or the Silva LSU database (Ref
set, version 115, EMBL taxonomic annotations, \citep{23193283}) for the 23S
reads as reference. We used the default parameters for the algorithm (i.e.
minimum consensus of 51\% at any rank, minimum sequence identity for hits at
90\% and the maximum accepted hits value was set to 3).
\subsubsection{Clustering}
Reads were clustered into OTUs following the UParse pipeline. Specifically
USearch (version 7.0.1001) was used to establish cluster centroids at a 97\%
sequence identity level from the quality controlled data and map quality
controlled reads to the centroids. The initial centroid establishment algorithm
incorporates a quality control step wherein potentially chimeric reads are not
allowed to become cluster seeds. Additionally, we discarded singleton reads
because it is difficult to asses the quality of singleton reads and this
quality control parameter in addition to maximum expected error screening has
proven to be similarly useful if not superior for reducing 454 sequencing error
as {\textquotedblleft}denoising{\textquotedblright} \citep{23955772}. Moreover,
two popular {\textquotedblleft}denoising{\textquotedblright} algorithms have
been shown to add sequencing errors while correcting others sometimes in a
nearly equal ratio \citep{22543370}. Eighty-eight and 98\% of quality
controlled reads could be mapped back to our cluster seeds at a 97\% identity
cutoff for the 16S and 23S sequences, respectively.
\subsubsection{Alpha and Beta diversity analyses}
Alpha diversity calculations were made using PyCogent Python bioinformatics
modules \citep{17708774}. Beta diversity analyses were made using Phyloseq
\citep{24699258} and its dependencies \citep{vegan}. A sparsity threshold of
25\% was used for ordination of both plastid 23S and bacterial 16S libraries.
Additionally, we discarded any OTUs from the 23S data that could not be
annotated as belonging in the Eukaryota. All DNA sequence based results were
visualized using GGPlot2 \citep{Wickham_2009}. Adonis tests and principal
coordinate ordinations were performed using the Bray-Curtis similarity measure
for pairwise library comparisons. Adonis tests employed the default value for
number of permutations (999) ("adonis" function in Vegan R package,
\citet{vegan}). Principal coordinates of OTUs were found by averaging site
principal coordinate values for each OTU with OTU relative abundance values
(within sites) as weights. The principal coordinate OTU weighted averages were
then expanded to match the site-wise variances \citep{vegan}.
\subsubsection{Identifying Enriched OTUs}
We used an RNA-Seq differential expression statistical framework to find OTUs
enriched in the given sample classes (R package DESeq2 developed by
\citet{Love_2014}) (for review of RNA-Seq differential expression statistics
applied to microbiome OTU count data see \citet{24699258}). We use the term
{\textquotedblleft}differential abundance{\textquotedblright} coined by
\citet{24699258} to denote OTUs that have different proportion means across
sample classes. We are particularly interested in two sample classes: 1)
lifestyle (biofilm or planktonic) and, 2) high C (C:P = 500) versus
not high C (C:P = 10, C:P = 100 and C:P = control). A differentially
abundant OTU would have a proportion mean in one class that is
statistically different from its proportion mean in another. This differential abundance
could mark an enrichment of the OTU in either sample class and the direction of
the enrichment is apparent in the sign (positive or negative) of the metric
used to summarize the proportion mean difference. Here we use log2 of the
proportion mean ratio (means are derived from OTU proportions for all samples
in each given class) as our differential abundance metric. It is also important
to note that the DESeq2 R package we are using to calculate the differential
abundance metric {\textquotedblleft}shrinks{\textquotedblright} the metric in
inverse proportion to the information content for each OTU. In this way the
magnitude of the differential abundance metric will be high only for OTUs which
there is a robust signal of differential abundance. Further, the metric can be
used to effectively rank OTUs by magnitude of the sample class affect (i.e.
OTUs with high proportion mean differences but also high within sample class
proportion variance will not produce misleadingly large differential abundance
metric values). The DESeq2 RNA-Seq statistical framework has been shown to
improve power and specificity when identifying differentially abundant OTUs
across sample classes in microbiome experiments \citet{24699258}.
The specific specific DESeq2 \citep{Love_2014} parameters we used were as follows:
All dispersion estimates from DESeq2 were calculated using a local fit for
mean-dispersion. Native DESeq2 independent filtering was disabled in favor of
explicit sparsity filtering. The sparsity thresholds that produced the maximum
number of OTUs with adjusted p-values for differential abundance below a false
discovery rate of 10\% were selected for biofilm versus planktonic sequence
16S/plastid 23S library comparisons. The specific sparsity threshold for
plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\%
(OTUs found in less than the sparsity threshold of samples were discarded from
the analysis). Cook's distance filtering was also disabled when calculating
p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values
for multiple testing \citep{citeulike:1042553}. Identical DESeq2 methods were
used to assess enriched OTUs from relative abundances grouped into high (C:P =
500) or low (C:P < 500 and control) categories.
IPython Notebooks with computational methods used to create all figures and
tables are provided at the following url:
\url{http://nbviewer.ipython.org/github/chuckpr/BvP_manuscript_figures}.
diff --git a/Sequence_Quality_Control_and_Analysis.tex b/Sequence_Quality_Control_and_Analysis.tex
new file mode 100644
index 0000000..d8fac2a
--- /dev/null
+++ b/Sequence_Quality_Control_and_Analysis.tex
...
\subsection{Sequence Quality Control and Analysis}
\subsubsection{Quality Control}
The 16S sequence collection was demultiplexed and sequences with sample barcodes not matching expected barcodes were discarded. We used the maximum expected error metric \cite{23955772} calculated from sequence quality scores to cull poor quality sequences from the dataset. Specifically, we discarded any sequence with a maximum expected error count greater than 1 after truncating to 175 nt. The forward primer and barcode was trimmed from the remaining reads. We checked that all primer trimmed, error screened and truncated sequences were derived from the same region of the LSU or SSU rRNA gene (23S and 16S sequences, respectively) by aligning the reads to Silva LSU or SSU rRNA gene alignment (“Ref” collection, release 115) with the Mothur \cite{19801464} NAST-algorithm \cite{16845035} aligner and inspecting the alignment coordinates. Reads falling outside the expected alignment coordinates were culled from the dataset. Remaining reads were trimmed to consistent alignment coordinates such that all reads began and ended at the same position in the SSU rRNA gene and screened for chimeras with UChime in “denovo” mode \cite{21700674} via the Mothur UChime wrapper.
\subsubsection{Taxonomic annotations}
Sequences were taxonomically classified using the UClust \cite{20709691} based classifier in the QIIME package \cite{20383131} with the Greengenes database and taxonomic nomenclature (version XXXXX, 97\% OTU representative sequences and corresponding taxonomic annotations, \cite{22134646}) for 16S reads or the Silva LSU database (Ref set, version 115, EMBL taxonomic annotations, \cite{23193283}) for the 23S reads as reference. We used the default parameters for the algorithm (i.e. minimum consensus of 51\% at any rank, minimum sequence identity for hits at 90\% and the maximum accepted hits value was set to 3).
\subsubsection{Clustering}
Reads were clustered into OTUs following the UParse pipeline. Specifically USearch (version 7.0.1001) was used to establish cluster centroids at a 97\% sequence identity level from the quality controlled data and map quality controlled reads to the centroids. The initial centroid establishment algorithm incorporates a quality control step wherein potentially chimeric reads are not allowed to become cluster seeds. Additionally, we discarded singleton reads because it is difficult to asses the quality of singleton reads and this quality control parameter in addition to maximum expected error screening has proven to be similarly useful if not superior for reducing 454 sequencing error as “denoising” \cite{23955772}. Moreover, two popular “denoising” algorithms have been shown to add sequencing errors while correcting others sometimes in a nearly equal ratio \cite{22543370}. Eighty-eight and 98\% of quality controlled reads could be mapped back to our cluster seeds at a 97\% identity cutoff for the 16S and 23S sequences, respectively.
\subsubsection{Alpha and Beta diversity analyses}
Alpha diversity calculations were made using PyCogent Python bioinformatics modules \cite{17708774}. Beta diversity analyses were made using Phyloseq \cite{24699258} and its dependencies \cite{vegan}. Log$_{2}$ fold change of group mean ratios and corresponding null hypothesis based significance values were calculated using DESeq2 \cite{Love_2014}. All dispersion estimates from DESeq2 were calculated using a local fit for mean-dispersion. Native DESeq2 independent filtering was disabled in favor of explicit sparsity filtering. The sparsity thresholds that produced the maximum number of OTUs with adjusted p-values for differential abundance below a false discovery rate of 10\% were selected for biofilm versus planktonic sequence 16S/plastid 23S library comparisons. The specific sparsity threshold for plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\% (OTUs found in less than the sparsity threshold of samples were discarded from the analysis). Cook's distance filtering was also disabled when calculating p-values with DESeq2. We used the Benjamini-Hochberg method to adjust p-values for multiple testing \cite{citeulike:1042553}. Identical DESeq2 methods were used to assess enriched OTUs from relative abundances grouped into high (C:P = 500) or low (C:P < 500 and control) categories. A sparsity threshold of 25\% was used for ordination of both plastid 23S and bacterial 16S libraries. Additionally, we discarded any OTUs from the 23S data that could not be annotated as belonging in the Eukaryota. All DNA sequence based results were visualized using GGPlot2 \cite{Wickham_2009}.
Adonis tests were performed using the Bray-Curtis similarity measure for pairwise library comparisons with the default value for number of permutations (999) ("adonis" function in Vegan R package, \citet{vegan}). Principal coordinates of OTUs were found by averaging site principal coordinate values for each OTU with OTU relative abundance values (within sites) as weights. The principal coordinate OTU weighted averages were then expanded to match the site-wise variances \cite{vegan}.
\subsubsection{Identifying Enriched OTUs}
We used an RNA-Seq differential expression statistical framework to find OTUs enriched in the given sample classes (R package DESeq2 developed by \citet{Love_2014}) (for review of RNA-Seq differential expression statistics applied to microbiome OTU count data see \citet{24699258}). We use the term “differential abundance” (coined by \citet{24699258}) to denote OTUs that have different proportion means across sample classes. We are particularly interested in two sample classes: 1) environment type (biofilm or planktonic) and, 2) high carbon (C:P = 500) versus not high carbon (C:P = 10, C:P = 100 and C:P = control). A differentially abundant OTU, for instance, would have a proportion mean in one class that is statistically different from its mean in another. This differential abundance could mark an enrichment of the OTU in either sample class and the direction of the enrichment is apparent in the sign (positive or negative) of the metric used to summarize the proportion mean difference. Here we use log2 of the proportion mean ratio (means are derived from OTU proportions for all samples in each given class) as our differential abundance metric. It is also important to note that the DESeq2 R package we are using to calculate the differential abundance metric “shrinks” the metric in inverse proportion to the information content for each OTU. In this way the magnitude of the differential abundance metric will be high only for OTUs which we have strong confidence of true differential abundance and the metric can be used to effectively rank OTUs by magnitude of the sample class affect (i.e. OTUs with high proportion mean differences but also high within sample class proportion variance will not produce misleadingly large differential abundance metric values). The DESeq2 RNA-Seq statistical framework has been shown to improve power and specificity when identifying differentially abundant OTUs across sample classes in microbiome experiments \citet{24699258}.
diff --git a/layout.md b/layout.md
index e5953db..23bf75e 100644
--- a/layout.md
+++ b/layout.md
...
Abstract.tex
Introduction.tex
Material and Methods.tex
Sequence Quality Control and Analysis.tex Materials_and_Methods.tex
Sequence_Quality_Control_and_Analysis.tex
Results.tex
figures/conceptualmodel/biofilmsubsidiesFigs06252013.001.jpg
figures/pool size/biofilmsubsidiesFigs07252013.003.jpg