Authorea

Jenna M. Lang edited 16S rDNA Sequencing and Analysis (Organism Identification).md over 9 years ago

Commit id: a456fc06e08d270dcedb9edaa06c8b24171686b8

deletions | additions

#16S rDNA Sequencing and Analysis (Organism Identification) Following the second dilution streaking, the organisms need to be identified. indetified, or classified. This is accomplished by determining and then analyzing the DNA sequence of the 16S rRNA gene. In this section section, we describe how the sequence of this gene is determined and readied for analysis. The general outline is as follows: DNA extraction, polymerase chain reaction (PCR) amplification of the 16S rRNA gene, and sequencing of the resulting PCR product using a method originally developed by Fred Sanger and now known as "Sanger sequencing". Sanger sequencing \cite{Sanger_1977}. There are multiple approaches one can take to these steps. For example, the PCR reaction needs DNA from the organisms of interest. That DNA can come directly from a liquid culture of the organism (when this is used for PCR this is known as direct PCR). Alternatively, one can take a liquid culture and then isolate the DNA from that culture and use the "clean" purified DNA as material for the PCR. This adds an extra step to the process - a step known as "DNA extraction". DNA extraction (see below.) Direct PCR significantly decreases the amount of work needed for preparation, but it can yield poorer results, both in terms of PCR success and resultant sequence quality. However, we recommend direct PCR when screening a large number of samples. DNA extraction can then be used for any recalcitrant samples. DNA extraction is significantly more work, but it often generates better Sanger sequences allowing for more accurate identification. ##DNA Extraction There are a number of different options for DNA isolation and which one should be used depends on many factors including available equipment, experience, and cost. A standard approach in microbiology involves the use of phenol and chloroform extraction followed by ethanol precipitation, and any number of protocols for this approach can be found in books, articles and on the internet. A common alternative approach is to use a commercially available kit - there are many advantages to such kits - notably ease and lack of toxic chemicals. A disadvantage of kits is that they typically are more expensive per sample than other approaches (especially if one is only doing a few samples since most kits include materials forat a minimum of 50 samples). For most projects, we use kits - especially typically the Promega-Wizard Genomic DNA Purification Kit. Follow the protocol or kit instructions provided by the manufacturer and then proceed to "PCR reaction" below. ##Direct PCR (if not extracting DNA) Centrifuge 1 ml of the overnight culture until the cells form a pellet at the bottom of the tube (about 5 minutes at 10,000 g), pour off the liquid on top (a.k.a. the (the supernatant) and resuspend the pellet in 100 ul of sterile DNAase-free water. Incubate the samples at 100 deg C for 10 minutes to help lyse the cells. Use the resulting solution as the template in the PCR reaction below. ##PCR reaction This reaction uses the 27F (AGAGTTTGATCMTGGCTCAG) and 1391R (GACGGGCGGTGTGTRCA) primers which amplify a near full-length bacterial (and many archaeal) 16S rRNA gene. Our lab uses standard PCR reagents (Qiagen or Kappa), with an annealing temperature of 54 deg C and an extension at 72 deg C of 90 seconds. Do not forget to include positive (any sample containing bacterial genomic DNA) and negative (e.g., just (_e.g_., replace DNA with water) controls. After PCR is completed, confirm the PCR reaction worked by agarose gel electrophoresis, all controls behaved as expected, and that you have DNA fragments of the correct size (~1350bp). ##Submit Samples for Sequencing Very few single-researcher labs currently have the capacity to do Sanger sequencing. However, there are a number of DNA sequencing facilities (commercial and academic) that provide Sanger sequencing services for researchers. They will handle as little as a single sample, or will allow you to submit an unlimited number of samples, typically arrayed in 96-well plates. You will typically provide both your PCR product as well as primers for sequencing (sometimes (typically, the same primersas used for PCR are used for sequencing). To get the most data, do not forget to rquest request forward (e.g., (_e.g_., using primer 27F) and reverse (e.g., (_e.g._, using primer 1391R) reactions for each sample. Each facility will have its own guidelines concerning DNA and primer concentration. Our lab uses the UC Davis DNA Sequencing Facility http://dnaseq.ucdavis.edu. If a quick an internet search does not reveal the presence of a Sequencing Facility near you, most sequencing centers will allow you to ship samples to them for sequencing. ##Sanger Sequence Processing The end product of Sanger sequencing is the production of sequence "reads" sequences (reads) for each sample submitted. Upon receiving Sanger reads from a sequencing facility, typically via e-mail, it is necessary to do some pre-processing before they can be analyzed. These steps include quality trimming the reads, reverse complementing the reverse sequence, aligning the reads, generating a consensus sequence, and converting to FASTA format. Note - there are dozens of different formats used for sequence information. FASTA format is one of the simplest. In the FASTA format a sequence file is given a name in one line (the name follows the character '>') and then the sequence information is in the following lines. There are very limited options for free software that allow the user to perform these steps. In this workflow we recommend using an automated pipeline available at the Ribosomal Database Project \cite{Cole_2013} if working with a large number of sequences. This pipeline only provides a rough view, since it doesn't complement or align the reads, it simply quality trims them and outputs the data in a format that can be fed directly to the BLAST program at NCBI \cite{2231712}. This will at least give an idea of which genera, and sometimes which species, to which each sample belongs to. sequence can be classified. We then recommend processing samples of interest using SeqTrace \cite{stucky2012seqtrace} which allows the user to see the trace, process the sequences manually, and a get a longer, more accurate sequence for analysis. We have also created a script that will perform the same steps at as SeqTrace automatically, but does not allow you to adjust any of the parameters. The choice of our script (easy, little control) versus SeqTrace (more complex, more control) will depend on the user and the project. ##RDP Sanger pipeline

The RDP Sanger analysis pipeline can be found at the following URL: (https://rdp.cme.msu.edu/login/pipeline/libSummary). This pipeline allows you to upload one zipped folder containing multiple abi .abi traces. It cleans and processes the sequences and generates a FASTA file of the processed sequences; which can then be uploaded to BLAST and analyzed. This allows you to quickly screen your samples before running the files through the more time consuming SeqTrace analysis which will reverse complement and align the reads to generate a consensus sequence. After signing in to (https://rdp.cme.msu.edu/login/pipeline/libSummary) you will be on the Library Run Summary page. Click on the Create New Run tab near the top of the page. Select the appropriate 16S rRNA gene (Archaea or Bacteria depending on your sample) name your library and choose a library name abbreviation and select any vector (this pipeline assumes cloned PCR fragments but will work fine regardless of what you select here). Select the Upload the data without well mapping button at the bottom of the page. You will now be directed to the Data Loader page, choose a zipped folder containing the abi traces you wish to analyze and click Load Data (to create the folder, put all of the abi traces you are working with into a folder, right click on the folder and select Compress “folder name”—if you downloaded the files as a group from your sequencing facility they may already be in a zipped folder). When the pipeline is finished, you will be directed to click a link and open a new window containing the library run stats. Select the Download Raw Sequence button. Navigate to BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome) and select the Choose File button underneath the area for the FASTA sequence. Select the file you just downloaded from the library run stats page. We recommend checking the box to exclude Uncultured/environmental sample sequences then click BLAST. If you are working with a large number of FASTA sequences it may take a few minutes. When the BLAST is complete you can cycle through the sequences you blasted using the pull down menu to the right of the Results for: heading. ##SeqTrace _We recommend using SeqTrace first if only working with a couple of sequences, if sequences. When working with a large batch it might be easier to screen the samples using the RDP Sanger pipeline above and only using SeqTrace for samples of interest._ Download the program from https://code.google.com/p/seqtrace/downloads/list

After downloading and unpacking the program, SeqTrace is ready to go. SeqTrace must be launched from a Terminal window. For a refresher or introduction to the Terminal see section 2. Move SeqTrace to your Applications folder. Open a Terminal window and copy/paste or type: /Applications/seqtrace-0.9.0/seqtrace.py

sudo cp ~/Downloads/muscle /usr/bin ###Convert Files from .abi to . fastq .fastq To run the merge\_sanger\_16s.pl you will first need to convert your read files from .abi to .fastq This can be done at http://sequenceconversion.bugaco.com/converter/biology/sequences/ Use the drop down menus to set it to convert .abi files to .fastq. Upload a file and convert it. The converted file will save to your downloads folder under the name sample.fastq. If you are working with a lot of reads we recommend immediately renaming the files to match the original abi .abi file name to avoid confusion. ###Edit and Create a Consensus Sequence Once all of your files are in fastq format, move all of them to the Sanger\_seq folder in which you saved the merge\_sanger\_16s.pl script. Use the terminal to navigate to within this folder by typing:

Then, to run the script, type: perl merge_sanger_16s.pl file1.fastq file2.fastq The script will return one of 2 messages: 1. "Found N conflicting case(s) during merging of X residues" 2. "Not enough data to overlap confidently." In the first case the merging happened, however there may be some conflicting bases. The fewer the better. It can be an indication of how confident the user should be with the results. Since this is a very crude method method, it should be noted that there is no fancy algorithm behind the merge. There is a crude comparison for which we keep the base that had the highest quality score. In the second outcome, the sequences were trimmed too much when doing the QC. quality-trimming. The length of both sequences end to end was smaller than the fragment length that we are looking for. This is an indication of poor quality sequence and most users should not proceed (others can lower the quality threshold set by the script). The newly merged file will be saved as file1\_merged.fasta and can be uploaded to BLAST for identification.