Jennifer Shelton edited introduction.tex  over 8 years ago

Commit id: cfb54ee322fc6db06200d14282f513f91a3f854f

deletions | additions      

       

\subsection{FASTA file format specifications versus recommendations}  FASTA file format requirements are very minimal \cite{FASTAformat}. Each sequence is preceded by a header/description line that begins with a \verb|>|. Sequence lines can include any standard IUB/IUPAC single character symbols for nucleic acids or amino acids or the ambiguous codes that indicate possible residues or bases. bases \cite{comm1970abbreviations}.  They can also include \verb|-| to indicate alignment gaps and \verb|*| to indicate stop codons. NCBI recommends wrapping FASTA file sequences lines. It is also common practice to use the first `word' in a header (i.e. any character string to the left of the first space in the header) as the unique sequence id. Although these features are common they are not required leading to format compatibility issues with tools that treat these conventions as required features.