Jenna M. Lang edited Introduction.md  over 9 years ago

Commit id: 442dbc25af400f4c54d78cb121d4914bea2d93ff

deletions | additions      

       

#Introduction  Thanks to decreases in cost and difficulty, sequencing the genome of a microorganism is becoming a relatively common activity in many research and educational institutions. However, such microbial genome sequencing is still far from routine or simple. Our The  objective in of  this project here work  was to design, test, troubleshoot, and publish a comprehensive workflow for microbial genome sequencing, encompassing everything from culturing new organisms to depositing sequence data; enabling even a lab with limited resources and bioinformatics experience to perform it. In the fall of 2011 2011,  our lab began a project with the goal of having undergraduate students generate genome sequences for microorganisms isolated from the "built environment". The project focused on the built environment because it was part of the larger "microBEnet" (microbiology of the built environment network, www.microbe.net) effort. This project was initiated because it could serve serves  many purposes purposes,  including (1) engaging undergraduates in research on microbiology of the built environment environment,  (2) generating "reference genomes" for microbes that are found in the built environment environment,  (3) providing material to enhance our ability to communicate about microbes in the built environment and (4) providing a testing ground for the development of developing  material for educational activities on microbiology of the built environment. As part of this project, undergraduate students went through the process of isolating, identifying, sequencing isolated and classified microbes, sequenced  and assembling microbial assembled their  genomes, followed by submission submitted the genome sequences  to databases housed by The National Center for Biotechnology Information (NCBI) (NCBI),  and publication of each published  genome \cite{Lo_2013}\cite{Bendiks_2013}\cite{Flanagan_2013}\cite{Diep_2013}\cite{Coil_2013}\cite{Holland_Moritz_2013}. Through the course of the project we found that, despite Despite  the so-called democratization of genome sequencing and the availability of diverse tools making many of the steps easier, (e.g. kits for library prep, relatively cheap sequencing, bioinformatics pipelines), there were still a significant number of stumbling blocks. Moreover, some portions of the project involve choosing between a wide variety of options (e.g. (_e.g.,_  choice of assembly program) which can create a large activation energy for a lab without a bioinformatician. Each option comes with its own advantages and disadvantages in terms of complexity, expense, computing power, time, and experience required. In this workflow workflow,  we have describe an approach to genome sequencing that allows a researcher to go from a swab to a published paper. We used this workflow to process a novel _Tatumella_ sp. isolate and publish the genome \cite{Dunitz_2014}. The data from every step of the workflow, using this _Tatumella_ isolate, is available on Figshare \cite{a4e375db-d538-4413-b23d-26132131aa94} The sequencing and _de novo_ assembly of genomes hasalready  yielded enormous scientific insight revolutionizing a wide range of fields, from epidemiology to ecology. Our hope is that this workflow will help make this revolution more accessible to all scientists, as well as present educational opportunities for undergraduate researchers and classes.   There are several excellent resources that focus on smaller portions of this entire process, usually assembly and/or annotation. workflow.  Examples include the Computational Genomics Pipeline \cite{Kislyuk_2010} and a "Beginner’s guide to comparative bacterial genome analysis" \cite{Edwards_2013} both of which start with data on the sequences of individual small fragments from an organism's genome (each DNA sequence generated by a sequencing system is known as a "read"). Another recent resource describing \cite{Edwards_2013}. Clarke et. al., 2014 describes  aroughly  similar pipelineis  focused on human mitrochondrial genomes \cite{Clarke_2014}.