Jenna M. Lang edited Building a 16S rDNA Phylogenetic Tree.md  almost 10 years ago

Commit id: 3cef7fb299a27ac208f7f251f55c13f4c4a2d8e4

deletions | additions      

       

##Obtain an RDP alignment  The goal of this section is to obtain a 16S alignment from RDP that can be used to build a tree. This procedure has the added benefit of providing an independent verification of the taxonomic assignment of your sequence based on the BLAST results.  1. Go to rdp.cme.msu.edu   2. Create an account  

5. Assign it a group name (this is what the program will label your sequence/organism). Choose this carefully since that will be the name on the final tree.   6. Click the "+" next to the sequence, to add it to your cart  7. Click on CLASSIFIER at the top of the page  8. Click on "Do Classification With Selected Sequences" button. This will show you a hierarchical view of the classification of your sequence (from Phylum to Genus.) You will use this information to navigate to other sequences that you want to include in your alignment that you will use to build your phylogenetic tree. For example, Figure X shows the Hierarchy hierarchy  for _Bacillus subtilus_. the _Tatumella_ 16S sequence.  8. Click on BROWSERS. We recommend openining BROWSERS in a new tab so that you can keep the hierarchy information handy.  9. Click on "Isolates" to select only isolates for further analysis. Then click "Browse"  10. Click on the + sign next to "Archaea outgroup." This will add an Archaeal sequence to your cart, which will be used to root your phylogenetic tree.  11. If using the example sequence provided, click on Firmicutes, "Proteobacteria",  then Bacilli, Gammaproteobacteria,  then Bacillales, "Enterobacteriales",  then Bacilaceae. Enterobacteriaceae.  This will take you to the Genus Bacillus, Tatumella,  which currently has over 26,000 69  species in it. One If the genus you are working with has too many sequences to analyze easily (for example, _Bacillus_ currently has >26000,) one  way to reduce this number is to exclude the uncultured taxa in the database. To do this, scroll down to the Data Set Options and click on the "Isolates" button. Click "Refresh" and you will see that there are still more than 22,000 fewer  sequences in the Genus. To reduce this number further, click on the "Type" Strain button. This As a worst-case scenario, you  will result in need to manually select  a reasonable number subset  of 181 taxa. organisms to include in your alignment.  12. Click on the + sign next to **genus** Bacillus Tatumella  to add all of those sequences to your cart. 13. Note: RDP will not generate trees with more than 200 sequences. However, if following our recommendation of using Fasttree then this limit does not apply. You can check how many sequences are in Click on "Sequence Cart" and confirm that  your cart by looking to uploaded sequence,  the right outgroup sequence, and all  of"Hierarchy Browser".  In most cases  thetree can be built with sequences from  otherisolates within the same genus. For a genus with relatively few  sequences you'd like to include  in RDP, this is fairly straightforward; select the entire genus your tree are displayed.  14. Click on "download," leave the download options as the defaults (fasta, aligned, uncorrected ,)  and move then click  onto choosing an outgroup. For genera with very large numbers of sequences, trimming is required. Since the goal is to discover the placement of a candidate sequence within the genus, the best pool of sequences would contain multiple sequences from each species in that genus. This requires manual selection of sequences in  the Browser. We recommend using sequences with informative names.  In some rare cases appropriate download button. Save  the tree generated by RDP will be sufficient, other times file and then rename  itmay fail  to load, crash, or be impossible to resolve detail. We recommend instead, downloading the alignment and building the tree in FastTree. something informative.  ##Clean up the RDP taxon names  Download the alignment you just created in The  RDP -Click on download, use alignment will have taxon names that most of  the default settings (fasta, aligned, uncorrected )  Download Jenna's downstream software tools will not tolerate because they consist of special text characters. So, we have written a little Perl  script (Add here) to clean up (CleanupRDP.pl) that will remove those special characters and replace them with underscores. This script will be included in  the file data download from github.  To run CleanupRDP.pl, first move it to your Applications folder. Then, in a Terminal window, navigate to  the clean up script (???Should we explain what this script does???)  $ perl scriptname -i input file -o output file  Example: directory that contains the RDP alignment that you've just downloaded. Then, type:  $  perl /Users/microBEnet/Desktop/RDPcleanup.pl CleanupRDP.pl  -i /Users/microBEnet/Desktop/newTatumella.fa RDP_alignment.fa  -o TatumellaRAST_RDP.fa RDP_alignemnt_clean.fa  ##Building the Tree with FastTree   Go to http://www.microbesonline.org/fasttree/#Install