Jonathan A. Eisen edited Methods.md  over 8 years ago

Commit id: 70ba2f047fc5d460f186cbcf51270e5b10682718

deletions | additions      

       

###Demultiplex and QC.  An in-house script was used to assign sequences to samples, using dual-index barcoding. This script is available on github (https://github.com/gjospin/Demul_trim_prep)  ###OTU assignment and QC  Chimeric sequences were identified using usearch61 as implemented in the identify\_chimeric\_seqs.py script, resulting in the removal of 8760 sequences. The pick\_open\_reference\_otus.py script was used to cluster sequences at 97% similarity to generate OTUs (Operational Taxonomic Units, a proxy for species). Taxonomy was assigned to each OTU by comparing a representative sequence from each cluster to the gg\_13\_8\_otus reference taxonomy provided by the Greengenes Database Consortium (http://greengenes.secondgenome.com.) (http://greengenes.secondgenome.com) \cite{McDonald_2011}.  OTUs that were classified as chloroplasts or mitochondria were removed from further analysis. The number of high-quality sequences remaining per sample ranged from 26831 to 77843 (see Table 1). All subsequent beta diversity analyses (comparisons across samples) were performed with all samples rarefied to 26830 sequences. ###Comparison of ISS surfaces to analogous surfaces in homes on Earth and to the Human Microbiome Project  The sequences and associated metadata from a 40-home pilot study for the Wildlife of Our Homes Project are available for download from Figshare \cite{885e3742-e0c3-4719-a6a8-dba9930a33ca}. We also obtained 100 samples from each of 13 body sites from the HMP Data Portal (http://hmpdacc.org/HM16STR/)\cite{Huttenhower_2012}\cite{Gevers_2012}. These two additional datasets were used in a combined analysis with the ISS sequences presented here. Because the sequences from the three projects are not all the same lengths, each dataset was independently analyzed using a closed-reference OTU-picking approach, with a 97% similarity cutoff, and the resultant biom tables were merged with the merge\_otu\_tables.py script. Shannon diversity, as well as non-metric multidimensional scaling (NMDS) based on Bray-Curtis \cite{Bray_1957}and Unweighted Unifrac \cite{Lozupone_2005} distances were computed and plotted using Phyloseq \cite{McMurdie_2013} and the ggplot2 \cite{Wilkinson_2011} packages in R \cite{R}.