Authorea

Amirali Sharifian edited Many_mechanisms_have_been_proposed__.tex over 8 years ago

Commit id: 7c48636f2f99d25710dae1bdcdad9e7b38803008

deletions | additions

VHD combines two techniques to achieve the goals. First it uses bit-parallel method to store and run the tools. Bit-parallel method is designed to fully utilize the entire width of the processor words to reduce the number of instructions that are needed to process data. Second, we apply our filter on our data to reduce number of computation meanwhile our bit-parallel algorithm is running. Morover to increase our filtering power we consider how A,C,G,T should code to have better filtering. Finally we introduce another technique to predict next data that is going to fetch. \subsection{storage layout} The VHD storage layout is inspired by the bit-sliced method \cite{O_Neil_1997}. In VHD, each sequence break down to fixed-length segments, each of which contains w codes( w is width of processor word). If we code our alphabet with \emph{k} bits then, the \emph{w} k-bit codes in a segment are then transposed into \emph{k} w-bit words. In Figure\ref{fig:fig2} there is an example to how we transpose our data. Data genomic data sets have five character A,C,G,T,N(unknown). We have coded each character with three bits, A(001), C(010), G(011), T(100). Our read in the example is "AACGTTGAAACG" and its length is 12. We assumed our word processor length is equal to 8. With these assumptions we can divide our read into two segments. Sincewe our string's length is 12 in the second segment we will have four free slots, we fill those slots with zero. As a result, before segmenting our code we had 12 characters each of them with length of three. But after segmenting in each segment we have three 8-bit words. With this approach The first word keeps the most significant bits of the codes, second one keeps the middle bits and last one keeps the less significant bits. Morover inside a segment, the k words are physically stored in a continuous memory space.