A novel method for determining the non-CDS region by using
error-correcting codes
Abstract
Our main motivation question is “Is there any relation between the
non-coding region and useless error-correcting codes?”. Then we focused
CDS and non-CDS areas instead of exon and intron, because CDS involves
in process of synthesis a protein and is involved by exons. We get the
data of the genes from NCBI \cite{ncbi}. In this study,
we introduce the method Fi-noncds that is used for determining the
non-CDS region by using error-correcting codes. We obtained that the
error-correction codes that can’t correct any codes named zero
error-correcting code, placed in non-CDS areas, densely. This result
shows that non-CDS regions (non-coding areas in DNA) match zero
error-correcting codes (useless error-correcting code). Frame lengths
7,8,9 and 10,11,12,13 and 14 were tested by the method. Optimal result
for selected genes (TRAV1-1, TRAV1-2, TRAV2, TRAV7, WRKY33, HY5,
GR-RBP2) is frame length 8, $n=7$, $k=2$, $dnaNo=1$. Moreover,
optimal results of the algorithm Fi-noncds matched the best sequence
length 8 as in [Lichtenberg, Jens and Yilmaz, Alper and Welch, Joshua
D and Kurz, Kyle and Liang, Xiaoyu and Drews, Frank and Ecker, Klaus and
Lee, Stephen S. and Geisler, Matt and Grotewold, Erich ve Welch, Lonnie
R.,The word landscape of the non-coding segments of the Arabidopsis
thaliana genome,Bell Labs Tech. J, Volume 10, no 1].