Authorea

Amirali Sharifian edited subsection_Comparision_algorithm_For_comparing__.tex over 8 years ago

Commit id: d8497155df78ad5cd990e1753da6bc2735edc9a4

deletions | additions

Since we are coding our base pairs with three bits, we should do XOR for each level separately and then accumulate our final result with applying OR operation on all results. With this approach our final result would be a word processor which each of its bit will tell us that whether two \textit{bp}s from two different strings matched or not. Next step would be computing number of miss matches between two strings. Since we have segmented our string into multiple segments for the last step we need to sum up each segment's result and then compare it with the user defined threshold \textit{e}. Figure \ref{fig:fig3} shows following procedure. \begin{figure} \end{figure} \subsection{Filtering} As we mentioned in the introduction part user defined error threshold is usually equal to just $5\%$ of mappings length and read's length is usually between 80 till 120 bps which means our error threshold won't be bigger than 5 or 6 errors and as a result nearly $98\%$ of our mappings would be incorrect. These numbers show that if we don't filter out our data approximately $98\%$ of our computation would be useless. First step for filtering out our data is to computing number of errors and comparing it with \emph{e} for each segment. Instead of summing up segment's results and do comparision with \emph{e} in the last step we can compare each segment's result with \emph{e} sepratly. Therefore, we are doing more operations for correct reads since instead of counting number of ones in the result vector once and then compare it with the error threshold we are counting number of ones per each segment and then compare it with error thershold. But the point is we are doing these additional computations for just $2%$ of our total reads and for the rest of the reads we are saving instructions because now per each segment we can decide that whether we have passed error threshold or not and if yes we can stop comparision operation for that read and start comparing next read with our reference. Now base on our word processor's length (\emph{w}) and our read's length (\emph{l}) we can compute number of operations we are doing for each read. \begin{formula} n_{segments} = \lceil l / w \rceil \end{formula}