Authorea

Jonathan A. Eisen edited Introduction.md almost 10 years ago

Commit id: e96b3a24b47d43b4dfc85dc326861f59482812c9

deletions | additions

#Introduction The objective of the present study was to design, test, troubleshoot, and publish a comprehensive workflow for microbial genome sequencing; sequencing, encompassing everything from culturing new organisms to depositing sequence data; enabling even a lab with limited resources and bioinformatics experience to perform it. In the fall of 2011, with support from the Alfred P. Sloan foundation, 2011 our lab began an project with the goal of having undergraduate research students generate genome sequences for microorganisms isolated from the "built environment". The focus on the built environment was because this project attempting was part of the larger "microBEnet" (microbiology of the built environment network) project) in the lab. This project was initiated becuase it could serve many purposes including (1) engaging undergraduates in research on microbiology of the built environment (2) generating "reference genomes" for microbes that are found in the built environment (3) providng material to sequence microbial reference genomes from enhance our ability to communicate about microbes in the built environment and (4) providing a testing ground for the development of material for educational activities on microbiolog of the built environment. Several As part of this project, undergraduate students went through the process of isolating, identifying, sequencing and assembling microbial genomes, followed by submission to NCBI and publication of each genome \cite{Lo_2013}\cite{Bendiks_2013}\cite{Flanagan_2013}\cite{Diep_2013}\cite{Coil_2013}\cite{Holland_Moritz_2013}. Through the course of this the project we found that while that, despite the so-called democritization of genome sequencing and the availability of many diverse tools for making many of the steps have become much easier in recent years needed relatively easy, (e.g. kits for library prep, relatively cheap sequencing, bioinformatics pipelines), there were still a significant number of stumbling blocks. In addition to these hurdles, some portions of the project involve choosing between a wide variety of options (e.g. choice of assembly program) which can create a large activation energy for a lab without a bioinformatician. Each option comes with its own advantages and disadvantages in terms of complexity, expense, computing power, time, and experience required. In this workflow we have chosen one path through these choices, allowing a researcher to go from a swab to a published paper. We used this workflow to process a novel Tatumella sp. isolate and publish the genome \cite{Dunitz_2014}. The data from every step of the workflow, using this Tatumella isolate, is available on Figshare (REF DOI) The sequencing and de novo assembly of genomes has already yielded enormous scientific insight revolutionizing a diverse collection of fields, from epidemiology to ecology. Our hope is that this workflow will help make this revolution more accessible to all scientists, as well as present educational opportunities for undergraduate researchers and classes.