Aaron Darling edited Introduction.md  almost 10 years ago

Commit id: 9b2e61cbcedbaf70da0f9f63aa837ff68a50b441

deletions | additions      

       

#Introduction  Thanks to decreases in cost and difficulty, sequencing the genome of a microorganism is becoming a relatively common activity in many research and educational institutions. However, such microbial genome sequencing is still far from routine or simple. The objective of the present study was to design, test, troubleshoot, and publish a comprehensive workflow for microbial genome sequencing, encompassing everything from culturing new organisms to depositing sequence data; enabling even a lab with limited resources and bioinformatics experience to perform it.  In the fall of 2011 our lab began a project with the goal of having undergraduate students generate genome sequences for microorganisms isolated from the "built environment". The project focused on the built environment because it was part of the larger "microBEnet" (microbiology of the built environment network) effort. This project was initiated because it could serve many purposes including (1) engaging undergraduates in research on microbiology of the built environment (2) generating "reference genomes" for microbes that are found in the built environment (3) providing material to enhance our ability to communicate about microbes in the built environment and (4) providing a testing ground for the development of material for educational activities on microbiology of the built environment. As part of this project, undergraduate students went through the process of isolating, identifying, sequencing and assembling microbial genomes, followed by submission to NCBI and publication of each genome \cite{Lo_2013}\cite{Bendiks_2013}\cite{Flanagan_2013}\cite{Diep_2013}\cite{Coil_2013}\cite{Holland_Moritz_2013}. Through the course of the project we found that, despite the so-called democritization democratization  of genome sequencing and the availability ofmany  diverse toolsfor  making many of the steps needed relatively easy, easier,  (e.g. kits for library prep, relatively cheap sequencing, bioinformatics pipelines), there were still a significant number of stumbling blocks. In addition to these hurdles, Moreover,  some portions of the project involve choosing between a wide variety of options (e.g. choice of assembly program) which can create a large activation energy for a lab without a bioinformatician. Each option comes with its own advantages and disadvantages in terms of complexity, expense, computing power, time, and experience required. In this workflow we have chosen one path through these choices, allowing describe an approach to genome sequencing that allows  a researcher to go from a swab to a published paper. We used this workflow to process a novel Tatumella _Tatumella_  sp. isolate and publish the genome \cite{Dunitz_2014}. The data from every step of the workflow, using this Tatumella _Tatumella_  isolate, is available on Figshare ( http://dx.doi.org/10.6084/m9.figshare.1064368)  The sequencing and de novo _de novo_  assembly of genomes has already yielded enormous scientific insight revolutionizing a diverse collection wide range  of fields, from epidemiology to ecology. Our hope is that this workflow will help make this revolution more accessible to all scientists, as well as present educational opportunities for undergraduate researchers and classes.   There are several excellent resources that focus on smaller portions of this entire process, usually assembly and/or annotation. Examples include the Computational Geneomics Pipeline \cite{Kislyuk_2010} and a "Beginner’s guide to comparative bacterial genome analysis" \cite{Edwards_2013} both of which start with already sequenced reads from a known organism.