Authorea

Amirali Sharifian edited subsection_Comparision_algorithm_For_comparing__.tex over 8 years ago

Commit id: d4f165933f50b9d03cf5128ff2f995c463bd5a20

deletions | additions

\end{figure} \subsection{Filtering} As we mentioned in the introduction part user defined error threshold is usually equal to just between $0\%$ and $5\%$ of mappings length and read's length is usually between 80 till 120 bps which means our error threshold won't be bigger than 5 or 6 errors and as a result nearly $98\%$ of our mappings would be incorrect. These numbers show that if we don't filter out our data approximately $98\%$ of our computation would be useless. First step for filtering out our data is to computing number of errors and comparing it with \emph{e} for each segment. Instead of summing up segment's results and do comparision with \emph{e} in the last step we can compare each segment's result with \emph{e} sepratly. Therefore, we are doing more operations for correct reads since instead of counting number of ones in the result vector once and then compare it with the error threshold we are counting number of ones per each segment and then compare it with error thershold. But the point is we are doing these additional computations for just $2\%$ of our total reads and for the rest of the reads we are saving instructions because now per each segment we can decide that whether we have passed error threshold or not and if yes we can stop comparision operation for that read and start comparing next read with our reference. Now base on our word processor's length (\emph{w}) and our read's length (\emph{l}) we can compute number of operations we are doing for each read.

\end{equation} Now number of operations we are doing to comparing reads with a reference would be depend on the size of our word processor. In modern CPUs $w$ is usually 32 or 64 (and if we use SIMD operations our register's length can be 128 or more). For instance if our word processor length is equal to 64 and our read's length is 100 then first segment would contain $64\%$ of our read. In other words we still using big portion of reads to determine should we continue our comparison procedure or should we stop since even in the first $64\%$ number of errors exceed \emph{e}. According to Figure ... using first $64\%$ of each reads can filter out nearly .... of our data with comparing just first segment. segment.\\ But what if we can answer this fact question with even less comparison. For instance with what if we use only using $16\%$ first 16 bps of our read and comparing compare it with the reference. Figure ... shows that number of reads going to filter won't change significantly and the number would stay the same. However, if we use first 16bps we will have more segments and as a result number of operation we should do for rest of our reference reads will increase but since we can answer the question filter out more data it will still have positive effect on our performance.\\ Now if we get same result from comparing 16 first bits in compare with first 32 bits then we can reduce number of bits transferring with just using 16 bits.