Authorea

Jenna M. Lang edited Methods.md over 8 years ago

Commit id: 19518ad38abebac227148c4f404004a38715311e

deletions | additions

\cite{Caporaso_2012}. ##Bioinformatic Analysis Unless otherwise noted, all microbial community analyses were conducted using the QIIME workflow version 1.8 or R \cite{R}. All python scripts referred to are components of QIIME \cite{Caporaso_2010}. ###Demultiplex and QC. An in-house script was used to assign sequences to samples, using dual-index barcoding. This script is available on github (https://github.com/gjospin/Demul_trim_prep) ###OTU assignment and QC Chimeric sequences were identified using usearch61 as implemented in the identify\_chimeric\_seqs.py script, resulting in the removal of 8760 sequences. The pick\_open\_reference\_otus.py script was used to cluster sequences at 97% similarity to generate OTUs (operational taxonomic units, (Operational Taxonomic Units, a proxy for species). Taxonomy was assigned to each OTU by comparing a representative sequence from each cluster to the gg\_13\_8\_otus reference taxonomy provided by the Greengenes Database Consortium (http://greengenes.secondgenome.com.) OTUs that were classified as chloroplasts or mitochondria were removed from further analysis. The number of high-quality sequences remaining per sample ranged from 26831 to 77843 (see Table 1). All subsequent beta diversity analyses (comparisons across samples) were performed with all samples rarefied to 26830 sequences. ###Comparison of ISS surfaces to analogous surfaces in homes on Earth and to the Human Microbiome Project The sequences and associated metadata from a 40-home pilot study for the Wildlife of Our Homes Project are available for download from Figshare \cite{885e3742-e0c3-4719-a6a8-dba9930a33ca}. We also obtained 100 random samples from each of 13 body sites from the HMP Data Portal (http://hmpdacc.org/HM16STR/)\cite{Huttenhower_2012}\cite{Gevers_2012}. These two additional datasets were used in a combined analysis with the ISS sequences presented here. Because the sequences from the three projects are not all the same lengths, each dataset was independently analyzed using a closed-reference OTU-picking approach, with a 97% similarity cutoff, and the resultant biom tables were merged with the merge\_otu\_tables.py script. Shannon diversity, as well as non-metric multidimensional scaling (NMDS) based on Bray-Curtis and Unweighted Unifrac distances were computed and plotted using Phyloseq \cite{McMurdie_2013} and the ggplot2 \cite{Wilkinson_2011} packages in R \cite{R}.