Jennifer Shelton edited introduction.tex  over 8 years ago

Commit id: a2274dd412e5ad349d1e6c8aaf06de73a3c14777

deletions | additions      

       

FASTA file format requirements are very minimal \cite{FASTAformat}. Each sequence is preceded by a header/description line that begins with a \verb|>|. Sequence lines can include any standard IUB/IUPAC single character symbols for nucleic acids or amino acids or the ambiguous codes that indicate possible residues or bases \cite{comm1970abbreviations}. They can also include \verb|-| to indicate alignment gaps and \verb|*| to indicate stop codons.   NCBI recommends wrapping FASTA file sequences lines. lines \cite{FASTAformat}.  It is also common practice to use the first `word' in a header (i.e. any character string to the left of the first space in the header) as the unique sequence id. Although these features are common they are not required leading to format compatibility issues with tools that treat these conventions as required features. \subsection{Customizing FASTA files to ensure that information is properly interpreted by downstream tools}