Authorea

Jenna M. Lang edited Sanger Sequence Processing.md about 9 years ago

Commit id: 0a520829d4df7536ede446268d58e41ead4ec67f

deletions | additions

The end product of Sanger sequencing is the production of sequences (reads) for each sample submitted. Upon receiving Sanger reads from a sequencing facility, typically as .abi files via email, it is necessary to do some pre-processing before they can be analyzed. These steps include quality trimming the reads, reverse complementing the reverse sequence, aligning the reads, generating a consensus sequence, and converting to FASTA format. There are very limited options for free software that allow the user to perform these steps. In this workflow we recommend using an automated pipeline available at the Ribosomal Database Project \cite{Cole_2013} if working with a large number of sequences. This pipeline only provides a rough view, since it doesn't orient or align the reads, it simply quality trims them and outputs the data in a format that can be fed directly to the BLAST program at NCBI \cite{2231712}. \cite{Altschul_1990}. This will at least give an idea of which genera, and sometimes which species, to which each sequence can be classified. We then recommend processing samples of interest using SeqTrace \cite{stucky2012seqtrace} which allows the user to see the traces (graphical representation of reads), process the sequences manually, and a get a longer, more accurate sequence for analysis. We have also created a script that will perform the same steps as SeqTrace automatically, but does not allow you to adjust any of the parameters. The choice of our script (easy, little control) versus SeqTrace (more complex, more control) will depend on the user and the project.