David Coil edited Genome Assembly and Annotation.md  about 9 years ago

Commit id: 4580467ab90b7992a9982914b294b6325df9b06f

deletions | additions      

       

For more on interpreting these numbers proceed to "Assembly Validation".  ###Assembly Validation  There are three components to genome assembly validation. The first is the overall "quality" of the assembly, assessed by examining the stats provided by A5-miseq (discussed below). The second is verification that the organism sequenced is the organism of interest, simply by checking the assembled 16S sequence using a BLAST search (see section 7 above). The third is "completeness," which is difficult to measure without a closely-related reference genome. Nevertheless, we can get an idea of how complete the genome is by looking for highly conserved "housekeeping" genes that are found in almost every bacterial genome. To do this, we use a program called PhyloSift \cite{Darling_2014} to assess the presence or absence of 37 housekeeping genes in the assembly to infer completeness (see Section X). 9.1.5).  ###Interpretation of A5-miseq stats  To open A5-miseq stats, import it into Excel as a tab delimited CSV file.