David Coil edited Genome Assembly and Annotation.md  almost 10 years ago

Commit id: 4ef341ddf78036385638ed5da74ede75c0b4590f

deletions | additions      

       

a5_pipeline.pl read_1.fastq read_2.fastq mygenome  /Users/Madison/Desktop/a5\_miseq\_macOS\_20140113/bin/a5\_pipeline.pl is the pipeline and its location  /Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX_p1.fastq /Users/Madison/Desktop/a5\_miseq_macOS\_20140113/example/phiX\_p1.fastq  is the first paired end read /Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX_p2.fastq /Users/Madison/Desktop/a5\_miseq\_macOS\_20140113/example/phiX\_p2.fastq  is the second paired end read example_sequence is the name of the output file  Once the program finishes running you will have a complete assembly located in the folder you created under the name you specified.  Among the numerous files generated by A5, the two of particular importance are the "example_sequence.contigs.fasta" "example\_sequence.contigs.fasta"  and "example_sequence.final.scaffolds.fasta" "example\_sequence.final.scaffolds.fasta"  which contain the contigs and scaffolds respectively. In addition, A5 generates a file containing information about the quality of the assembly called "???????" 

The number of raw reads/raw nucleotides "Raw reads"/"Raw nt" and error-corrected reads/nucleotides "EC Reads"/"Raw nt" counts are useful for seeing what percentage of the data got discarded. A very large difference between these numbers (the "Pct" stats) would indicate either poor quality input data or significant adaptor contamination (with for example a very short library insert size).  Finally "X_cov" "X\_cov"  shows the average coverage across the genome. For Illumina data we recommend that this number be between ~30X and 100X. Much less than 30X coverage and the quality of any given base in the assembly may come into question. Conversely, too much coverage can reduce the quality of the assembly and require downsampling. ###Verification of 16S Sequence 

In the terminal, navigate to the directory containing the unzipped phylosift   Run  ./phylosift search contig_file_name contig\_file\_name  For example:  ./phylosift search /Users/microBEnet/Desktop/Data-Genomes/Pantoea_Tatumella/tatumella/tatumella.final.scaffolds.fasta.contigs.fsa /Users/microBEnet/Desktop/Data-Genomes/Pantoea\_Tatumella/tatumella/tatumella.final.scaffolds.fasta.contigs.fsa  Note: The first time you run PhyloSift it has to download a marker gene database so it may take a few minutes.  From the PhyloSift directory  Move to the "PS_temp" "PS\_temp"  directory Within this directoy, Phylosift has created a directory with the same name as the input file. Move to this new directory, then move to "blastDir".  Open the marker_summary.txt marker\_summary.txt  file in the blastDir less marker_summary.txt