Jonathan A. Eisen edited Methods.md  over 8 years ago

Commit id: 4b4339a935c2c1c662f3f3fdd8add9b8ba5c93fb

deletions | additions      

       

Chimeric sequences were identified using usearch61 as implemented in the identify\_chimeric\_seqs.py script, resulting in the removal of 8760 sequences. The pick\_open\_reference\_otus.py script was used to cluster sequences at 97% similarity to generate OTUs (Operational Taxonomic Units, a proxy for species). Taxonomy was assigned to each OTU by comparing a representative sequence from each cluster to the gg\_13\_8\_otus reference taxonomy provided by the Greengenes Database Consortium (http://greengenes.secondgenome.com.) OTUs that were classified as chloroplasts or mitochondria were removed from further analysis. The number of high-quality sequences remaining per sample ranged from 26831 to 77843 (see Table 1). All subsequent beta diversity analyses (comparisons across samples) were performed with all samples rarefied to 26830 sequences.  ###Comparison of ISS surfaces to analogous surfaces in homes on Earth and to the Human Microbiome Project  The sequences and associated metadata from a 40-home pilot study for the Wildlife of Our Homes Project are available for download from Figshare \cite{885e3742-e0c3-4719-a6a8-dba9930a33ca}. We also obtained 100 samples from each of 13 body sites from the HMP Data Portal (http://hmpdacc.org/HM16STR/)\cite{Huttenhower_2012}\cite{Gevers_2012}. These two additional datasets were used in a combined analysis with the ISS sequences presented here. Because the sequences from the three projects are not all the same lengths, each dataset was independently analyzed using a closed-reference OTU-picking approach, with a 97% similarity cutoff, and the resultant biom tables were merged with the merge\_otu\_tables.py script. Shannon diversity, as well as non-metric multidimensional scaling (NMDS) based on Bray-Curtis and \cite{Bray_1957}and  Unweighted Unifrac \cite{Lozupone_2005} distances were computed and plotted using Phyloseq \cite{McMurdie_2013} and the ggplot2 \cite{Wilkinson_2011} packages in R \cite{R}. ##Comparison to rooms with mechanical ventilation or open windows.  We obtained a list of human pathogens, compiled by Kembel et al, 2012 from the author. We then used BLAST \cite{2231712} to search a representative sequence from each of the ISS OTUs against the NCBI Reference Sequence (RefSeq) database \cite{Pruitt_2004}. OTUs with 97% similarity to an organism that was on the list of known pathogens were flagged as "related to a known human pathogen". The phylogenetic diversity (Faith's PD) was calculated using the alpha\_diversity.py script, with samples rarefied to 700 sequences.