Authorea

Jenna M. Lang edited Genome Assembly and Annotation.md almost 10 years ago

Commit id: 6bd6ae501976d2983aa41151e165a24207164fb9

deletions | additions

4. scaffolding 5. verification of scaffolds/contigs There is a plethora of programs that can perform some, or most of these steps. These programs include commercial and open-source options, some are very user friendly and some are extremely difficult to use/install. Common assemblers for bacterial genomes include SPADES \cite{Bankevich_2012}, MIRA (\cite{Chevreux_2004}), \cite{Chevreux_2004}, SGA (\cite{Simpson_2010}), \cite{Simpson_2010}, Velvet (\cite{Zerbino_2008}) \cite{Zerbino_2008} CLC (http://www.clcbio.com/files/whitepapers/whitepaper-denovo-assembly-4.pdf-**This was the best reference I could find on CLC but I'm not sure how to cite it/if I should use something else**), and A5 (\cite{Tritt_2012}). \cite{Tritt_2012}. Good sources for overviews of genome assemblers and the assembly process include the GAGE project (\cite{Salzberg_2012}), \cite{Salzberg_2012}, the GAGE-2 project (REF), and the Assemblathon Project (\cite{Earl_2011}). \cite{Earl_2011}. For this workflow we recommend use of the open source A5 assembly pipeline which automates all of the steps described above with a single command (\cite{Tritt_2012}). A5 is designed to work with raw, demultiplexed Illumina data and a recent version has been optimized for longer reads from the MiSeq (Coil et al submitted). Input reads can be paired or unpaired, and the files can be separate (forward reads in one file, reverse reads in another) or interleaved. These files should have the .fastq extension. See HERE for a description of the fastq format. You may need assistance from your sequencing center in locating and accessing these files. You will need one of the three following (per genome): 1) a single .fastq file that contains your single reads (if paired-end sequencing was not requested), 2) a single .fastq file that contains both forward and reverse reads, or 3) two .fastq files, one with forward reads and one with the corresponding reverse reads.