Chuck Pepe-Ranney edited Sequence Quality Control and Analysis.tex  almost 10 years ago

Commit id: d6c11f5485d197d2ea1b0b844022c8ab7780016b

deletions | additions      

       

Reads were clustered into OTUs following the UParse pipeline. Specifically USearch (version 7.0.1001) was used to establish cluster centroids at a 97\% sequence identity level from the quality controlled data and map quality controlled reads to the centroids. The initial centroid establishment algorithm incorporates a quality control step wherein potentially chimeric reads are not allowed to become cluster seeds. Additionally, we discarded singleton reads because it is difficult to asses the quality of singleton reads and this quality control parameter in addition to maximum expected error screening has proven to be similarly useful if not superior for reducing 454 sequencing error as “denoising” \cite{23955772}. Moreover, two popular “denoising” algorithms have been shown to add sequencing errors while correcting others sometimes in a nearly equal ratio \cite{22543370}. Eighty-eight and 98\% of quality controlled reads could be mapped back to our cluster seeds at a 97\% identity cutoff for the 16S and 23S sequences, respectively.   \subsubsection{Alpha and Beta diversity analyses}  Alpha diversity calculations were made using PyCogent Python bioinformatics modules \cite{17708774}. Beta diversity analyses were made using Phyloseq \cite{24699258} and its dependencies \cite{vegan}. Log$_{2}$ fold change of group mean ratios and corresponding null hypothesis based significance values were calculated using DESeq2 \cite{Love_2014}. All dispersion estimates from DESeq2 were calculated using a local fit for mean-dispersion. Native DESeq2 independent filtering was disabled in favor of explicit sparsity filtering. The sparsity thresholds that produced the maximum number of OTUs with adjusted p-values for differential abundance below a false discovery rate of 10\% were selected for biofilm versus planktonic sequence 16S/plastid 23S library comparisons. The specific sparsity threshold for plastid 23S and 16S libraries for biofilm versus plankton comparisons was 10\% (OTUs found in less than the sparsity threshold of samples were discarded from the analysis). Cook's distance filtering was also disabled when calculating p-values with Deseq2. DESeq2.  We used the Benjamini-Hochberg method to adjust p-values for multiple testing \cite{citeulike:1042553}. A sparsity threshold of 25\% was used for ordination of both plastid 23S and bacterial 16S libraries. Additionally, we discarded any OTUs from the 23S data that could not be annotated as belonging in the Eukaryota. All results were visualized using GGPlot2 \cite{Wickham_2009}.