Predictors
BLAST
BLAST, Basic Local Alignment Search Tool, is a sequence similarity algorithm used for querying a large database of gene sequencing data. It is a heuristic algorithm that returns results quickly, but does not guarantee an optimal alignment, unlike others, such as the Smith-Waterman algorithm. It is one of the most widely used sequence searching tools because of its speed and practicality. Many gene sequence databases and organizations, i.e. UniProt, NCBI, EMBL-EBI, commonly provide a BLAST search. The BLAST algorithm is The BLAST algorithm was originally developed and published by Altschul et al. in 1990. \cite{Altschul_1990} BLAST is based on the assumption that good alignments contain short lengths of exact matches. BLAST was used in the CAFA2 challenge as a baseline measure. It requires a query sequence, a database of target sequences, and a minimal matching threshold score. The BLAST algorithm consists of five steps: 1) Filter out areas of low complexity or repeated sequences, which can result in misleading high matching scores. These regions will be marked with an X and ignored by BLAST algorithm 2) Create a sequential list of all k-letter words in the sequence. 3) The database is then scanned for for matches, using a scoring matrix such as BLOSUM, against all k-letter words in the list. Only matches scoring higher than the minimal matching threshold score are retained. 4) Take the matches from the previous step, extend each word by one letter and scan the database again. 5) Repeat the process until a minimal set of matching sequences is obtained.