Jennifer Shelton edited introduction.tex  over 8 years ago

Commit id: fe49b7d8c825afdab8e442db43b6f0bad5d90ac7

deletions | additions      

       

\subsection{Customizing FASTA files to ensure that information is properly interpreted by downstream tools}  Regardless of whether a FASTA file is technically improperly formatted or it's format merely violates a popular convention it is critical to quality analysis workflows that data is converted into a format that will be correctly interpreted by downstream tools. Formatting issues can fall into multiple categories including actual format errors, errors and  formats that are not technically wrong but are non-standardand formats  that cause some tools to  throwerrors because  an existing tool has a bug (in which case we should modify the FASTA and proceed only if the tool will then correctly import the data and export the desired output). error.  Some format errors indicate a major problem like an attempt to use the wrong data format (e.g. the first line is not a FASTA header because it does not begin with a \verb|>| character). These types of errors will be subsequently referred to as fatal. Alternately, some formatting issues occur commonly without indicating the FASTA file is corrupt (e.g. improperly wrapped/unwrapped sequence lines, missing final new line characters, unusual new line characters like \verb|\r|). These issues will be referred to as non-fatal. Fatal formatting issues should cause processing to stop. Non-fatal formatting issues should be automatically corrected according to the most common resolution for this type of error. While downstream processing continues the analyst can double check the automated decision to reformat non-fatal issues. This way workflow would not be slowed for trivial reformatting steps and the more rare problems (e.g. when a missing last new line was caused by incomplete file transfer) could still be caught.