David Coil edited Genome Assembly and Annotation.md  almost 10 years ago

Commit id: e9e82b14785003cccbfa78e305ad101ed696eba5

deletions | additions      

       

A separate metric of the base call quality is also reported by A5-miseq as "bases >= Q40". Following assembly, A5-miseq realigns the reads to the assembled sequence and estimates the accuracy of the nucleotide called at each site in the assembly. These accuracies are provided as PHRED quality scores (cite PHRED here), which represent log-scaled probabilities of accuracy. For example a PHRED score of 20 indicates a 99% chance of the correct base, while Q30 and Q40 indicate 99.9% and 99.99% probabilities of the correct base being called. A5-miseq reports the number of assembly bases called with at least Q40.  ###Verification of 16S Sequence  Follow the steps described in Section ??, 11,  "Making a Phylogenetic Tree" for obtaining and performing a BLAST search of the full length 16s sequence. PhyloSift:  Navigate to   http://phylosift.wordpress.com  Download and unzip the latest version of phylosift Phylosift  In the terminal, navigate to the directory containing the unzipped phylosift Phylosift  Run 

less marker_summary.txt  The DNGNGWU0001-00040 markers represent 37 highly conserved bacterial genes, if one is missing it won't show up as a zero, it is necessary to manually verify the list. Most of the genes should only appear once. An occasional 2 is fine, but if all/a majority of the genes appear twice or even three times you have most likely sequenced multiple bacteria together. Additionally check to make sure there is no 18S RNA (at the top of the list) to ensure your sample has not been contaminated with a eukaryotes eukaryote  (e.g. yeast). Important Note: Markers 4, 8 and 38 are no longer included in the Phylosift analysis so do not be concerned if they are not listed.