this is for holding javascript data
David Coil edited Genome Assembly and Annotation.md
almost 10 years ago
Commit id: 4ef341ddf78036385638ed5da74ede75c0b4590f
deletions | additions
diff --git a/Genome Assembly and Annotation.md b/Genome Assembly and Annotation.md
index b6c65f2..f6fb485 100644
--- a/Genome Assembly and Annotation.md
+++ b/Genome Assembly and Annotation.md
...
a5_pipeline.pl read_1.fastq read_2.fastq mygenome
/Users/Madison/Desktop/a5\_miseq\_macOS\_20140113/bin/a5\_pipeline.pl is the pipeline and its location
/Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX_p1.fastq /Users/Madison/Desktop/a5\_miseq_macOS\_20140113/example/phiX\_p1.fastq is the first paired end read
/Users/Madison/Desktop/a5_miseq_macOS_20140113/example/phiX_p2.fastq /Users/Madison/Desktop/a5\_miseq\_macOS\_20140113/example/phiX\_p2.fastq is the second paired end read
example_sequence is the name of the output file
Once the program finishes running you will have a complete assembly located in the folder you created under the name you specified.
Among the numerous files generated by A5, the two of particular importance are the
"example_sequence.contigs.fasta" "example\_sequence.contigs.fasta" and
"example_sequence.final.scaffolds.fasta" "example\_sequence.final.scaffolds.fasta" which contain the contigs and scaffolds respectively.
In addition, A5 generates a file containing information about the quality of the assembly called "???????"
...
The number of raw reads/raw nucleotides "Raw reads"/"Raw nt" and error-corrected reads/nucleotides "EC Reads"/"Raw nt" counts are useful for seeing what percentage of the data got discarded. A very large difference between these numbers (the "Pct" stats) would indicate either poor quality input data or significant adaptor contamination (with for example a very short library insert size).
Finally
"X_cov" "X\_cov" shows the average coverage across the genome. For Illumina data we recommend that this number be between ~30X and 100X. Much less than 30X coverage and the quality of any given base in the assembly may come into question. Conversely, too much coverage can reduce the quality of the assembly and require downsampling.
###Verification of 16S Sequence
...
In the terminal, navigate to the directory containing the unzipped phylosift
Run
./phylosift search
contig_file_name contig\_file\_name
For example:
./phylosift search
/Users/microBEnet/Desktop/Data-Genomes/Pantoea_Tatumella/tatumella/tatumella.final.scaffolds.fasta.contigs.fsa /Users/microBEnet/Desktop/Data-Genomes/Pantoea\_Tatumella/tatumella/tatumella.final.scaffolds.fasta.contigs.fsa
Note: The first time you run PhyloSift it has to download a marker gene database so it may take a few minutes.
From the PhyloSift directory
Move to the
"PS_temp" "PS\_temp" directory
Within this directoy, Phylosift has created a directory with the same name as the input file. Move to this new directory, then move to "blastDir".
Open the
marker_summary.txt marker\_summary.txt file in the blastDir
less marker_summary.txt