Authorea

Jenna M. Lang added Building a 16S rDNA Phylogenetic Tree.md about 10 years ago

Commit id: 8da80e8202af46a6d8805b74c2928f8b3717d9c5

deletions | additions

#Building a 16S rDNA Phylogenetic Tree There are two points during the workflow where making a 16S phylogenetic tree may be useful. The first is after identification of candidate organisms by Sanger sequencing and the second is after assembly of the genome. The process is identical in both cases, but the additional length and improved quality of the post-assembly full-length 16S sequence may generate a better tree. The tree can be used for identification of the candidate (e.g. is the candidate found in a single species clade?), for naming of the candidate (does it fall in a clade containing only members of that species, and other members of the species are not found outside that clade?), and for placement of the organism into a phylogenetic context. The outline of the workflow is to use the Ribosomal Database Project (RDP) to generate an alignment of the sequence with close relatives and an outgroup, following by cleanup of the RDP headers, tree-building with FastTree and viewing/analysis of the tree in Dendroscope. Obtain the Full-Length 16S Sequence from the Assembly (Skip this step if you are building the tree using the 16S sequence from Sanger sequencing) 1. Go to RAST and sign in On the “Jobs Overview" page, click on view details (under annotation progress) for the microbe you are working with. 2. Click on Browse annotated genome in SEED viewer (At the top of the page) 3. Click on Browse through the features of [organism name] 4. Under Function search for “ssurna” or “SSU rRNA” (if it doesn't work at first then refresh the page) 5. Find the ssuRNA that is 1400-1800 bp in length (often Illumina assemblies also have fragments of 16S sequence that are only a few hundred bp long) 6. Click on the Feature ID for that sequence 7. Click on the Sequences tab (around the middle of the page ) 8. Click on Show Fasta 9. Click on Download Sequences and save as a fasta file. Rename the file to something useful. 11. Double check the identity of the sequence at BLAST: http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome Obtaining an RDP alignment The goal of this section is to obtain a 16S alignment from RDP that can be used to build a tree. The "art" is determining which sequences to put in the alignment. Go to rdp.cme.msu.edu Create an account Click on my RDP/login Upload the fasta file containing your 16s sequence Assign it a group name (this is what the program will label your sequence/organism). Choose this carefully since that will be the name on the final tree. Upload the fasta file (may take some time to process, status=P (pending) wait until A (aligned) If it is taking awhile refresh the screen Click the "+" next to the sequence, to add it to your cart click on BROWSERS at the top of the page Specify isolates as the Source, leave the rest as defaults Click Browse Browse Sequences To choose a specific bacteria click on the blue class/subclass/order button until you see a specific species and click the plus next to it. If you want to choose an entire class/subclass/order, click on the plus sign next to it. Note: RDP will not generate trees with more than 200 sequences. However, if following our recommendation of using Fasttree then this limit does not apply. You can check how many sequences are in your cart by looking to the right of "Hierarchy Browser". In most cases the tree should be built with sequences from other isolates within the same genus. For a genus with relatively few sequences in RDP, this is fairly straightforward; select the entire genus and move on to choosing an outgroup. For genera with very large numbers of sequences, trimming is required. Since the goal is to discover the placement of a candidate sequence within the genus, the best pool of sequences would contain multiple sequences from each species in that genus. This requires manual selection of sequences in the Browser. We recommend avoiding sequences without informative names. Select an outgroup The ideal outgroup is a sequence as closely related to your candidate as possible, but outside the group that you want to examine. We recommend choosing a type sequence from within the same family. Click on TREE BUILDER Specify your outgroup (make a note of your outgroup because you will need it when viewing the alignment in Dendroscope Click on Create tree In some rare cases the tree generated by RDP will be sufficient, other times it may fail to load, crash, or be impossible to resolve detail. We recommend instead, downloading the alignment and building the tree in FastTree. Building the Tree with FastTree Download the alignment you just created in RDP -Click on download, use the default settings (fasta, aligned, uncorrected ) Download Jenna's script (Add here) to clean up the file To run the clean up script (???Should we explain what this script does???) $ perl scriptname -i input file -o output file Example: $ perl /Users/microBEnet/Desktop/RDPcleanup.pl -i /Users/microBEnet/Desktop/newTatumella.fa -o TatumellaRAST_RDP.fa FastTree Go to http://www.microbesonline.org/fasttree/#Install and download the FastTree.c program by right clicking on it and saving the link FastTree requires a gcc compiler to function? Which you can download and install from xcode here https://developer.apple.com/downloads/index.action?q=xcode In order to download xcode you will need to register as a developer with Apple which takes only a couple minutes. After downloading xcode follow your computer’s instructions to install it. Once it is installed open the program and open preferences (under the xcode tab). Click on the downloads option and install the command line tools. Once you have successfully downloaded and installed xcode, open the terminal, navigate to your downloads file (or wherever you are storing FastTree) and copy and paste the following syntax $ gcc -O3 -finline-functions -funroll-loops -Wall -o FastTree FastTree.c -lm To run FastTree, type the following in the terminal $ FastTree -gtr -nt alignment.file > tree_file For Example $ /Users/Madison/Downloads/FastTree -gtr -nt /Users/Madison/Desktop/newTatumella > /Users/Madison/Desktop/tree_file_tatumella.tre The alignment file will be the output file of the cleanup script And the tree_file output name should include a path to where you want it to be located. The output name should also end in .tre in order to be recognized by Dendroscope Viewing the Tree Download and install Dendroscope http://ab.inf.uni-tuebingen.de/software/dendroscope/ You will need to obtain a license here http://www-ab2.informatik.uni-tuebingen.de/software/dendroscope/register/ Enter the license into Dendrosope and then you can open your alignment to view it (file Once the tree is visible, the first step is to re-root the tree to the outgroup. Expand the tree by clicking the expansion button (labeled in the above screen shot), then scroll through the tree to locate the outgroup. Click on the beginning of the taxa name, to select it, and reroot the tree by going to edit and selecting re-root. We recommend viewing the tree as a phylogram which can be accomplished by clicking on the phylogram button (labeled in the above screen shot). From this tree it should be possible to determine the phylogenetic placement of the candidate sequence, and in some cases to give it a name with more certainty than a simple BLAST search. Below are examples of a relatively informative tree and a relatively uninformative tree: TI In tree shown above (genus Brachybacterium), our sample of interest from an assembly is "Brachybacterium muris UCD-AY4" (REF). It falls within a clade where every named member has the same name "Brachybacterium muris", and this name does not occur elsewhere on the tree. Hence, we were confident enough to name our sample as that species. In the tree shown above (genus Microbacterium) our species of interest is Microbacterium sp. str. UCD-TDU (REF). In contrast to the Brachybacterium example, here our species falls within a poorly defined clade containing multiple species. In this case we did not assign a species name to this isolate.