Authorea

David Coil edited Genome Assembly and Annotation.md almost 10 years ago

Commit id: 6e789e3006f2d21e8affba44af51be2aa185b5c9

deletions | additions

##Assembly Genome assembly typically consists of data cleaning (quality filtering and adaptor removal), error correction, contig assembly, scaffolding, and verification of scaffolds/contigs. There are a large array of programs that can perform some, or most of these steps. These programs include commercial and open-source options, with some choice being very user friendly and some being extremely difficult to use/install. Good sources for overviews of assemblers and the assembly process include the GAGE project (REF), the GAGE-2 project (REF), and the Assemblathon Project (REF). Common assemblers for bacterial genomes include SPADES (REF), MIRA (REF), SGA (REF), Velvet (REF) CLC (REF), and A5 (REF). For this workflow we recommend use of the open source A5 assembly pipeline which automates all of the steps described above with a single command (REF). A5 is designed to work with raw, demultiplexed Illumina data and a recent version has been optimized for longer reads from the MiSeq (Coil et al submitted). Input reads can be paired or unpaired, and the files can be separate (forward reads in one file file, reverse reads in another) or interleaved. Download/Install A5 Download A5 from

After downloading and unzipping the program, move it from your downloads folder to your desktop. Create a new folder which will contain the files generated by the pipeline. A5 is a command line based program, on a mac you will need to run it from the terminal see section II "Using the Terminal", for an introduction to the terminal. ###Running A5 Once you have opened the terminal navigate to the folder you just created because A5 will output the files your location when you call the program. In this example I created the folder on the desktop and named it a5_ouput so the syntax for navigating to the folder is