Authorea

Jenna M. Lang edited Methods.md over 7 years ago

Commit id: 6acda3e7be34fb62ea7cfb26a4e37684f08da032

deletions | additions

##Bioinformatic Analysis Unless otherwise noted, all microbial community analyses were conducted using the QIIME workflow version 1.8 or R \cite{R}. All python scripts referred to are components of QIIME \cite{Caporaso_2010}. ###Demultiplex and QC. An in-house script was used to assign sequences to samples, using dual-index barcoding. This script is available on github (https://github.com/gjospin/Demul_trim_prep). This script allows for 1 base pair difference per barcode. The paired reads were then aligned and a consensus was computed using FLASH \cite{21903629} \cite{Magoc_2011} with maximum overlap of 120 and a minimum overlap of 70 (other parameters were left as default). The custom script automatically demultiplexes the data into fastq files, executes FLASH, and parses its results to reformat the sequences with appropriate naming conventions for QIIME v. 1.8.0\cite{20383131} in fasta format. ###OTU assignment and QC Chimeric sequences were identified using usearch61 as implemented in the identify\_chimeric\_seqs.py script, resulting in the removal of 8760 sequences. The pick\_open\_reference\_otus.py script was used to cluster sequences at 97% similarity to generate OTUs (Operational Taxonomic Units, a proxy for species). Taxonomy was assigned to each OTU by comparing a representative sequence from each cluster to the gg\_13\_8\_otus reference taxonomy provided by the Greengenes Database Consortium (http://greengenes.secondgenome.com) \cite{McDonald_2011}. OTUs that were classified as chloroplasts or mitochondria were removed from further analysis. The number of high-quality sequences remaining per sample ranged from 26831 to 77843 (see Table 1). All subsequent beta diversity analyses (comparisons across samples) were performed with all samples rarefied to 26830 sequences.