loading page

Identification of Splicing Sites Via Integrated Features and Support Vector Machine
  • +3
  • Kamran Ullah,
  • Salman Ahmad khan,
  • Tariq Hussain,
  • Muhammad Kabir,
  • Bailin Yang,
  • iqtidar ali
Kamran Ullah
The University of Agriculture Peshawar
Author Profile
Salman Ahmad khan
The University of Agriculture Peshawar
Author Profile
Tariq Hussain
Zhejiang Gongshang University

Corresponding Author:[email protected]

Author Profile
Muhammad Kabir
University of Management and Technology
Author Profile
Bailin Yang
Zhejiang Gongshang University
Author Profile
iqtidar ali
The University of Agriculture Peshawar
Author Profile

Abstract

Gene splicing plays an extremely important role in the diversity of protein. In eukaryotic gene expression, and entirely eradication of introns and fusing of the remaining exons together is a prominent task because the exon sequence is mostly interrupted by introns. Owing to its importance in genetic engineering, it is extremely recommendable to identify these splicing sites. Through conventional experimental approaches, it becomes a difficult task, even in some situations impossible. The increasing genome sequences at an exponential rate in this area, it remained a challenge to develop a precise, reliable, and robust computational approach for fast prediction of splicing sites thus in the current study an ensemble space of Kmer, RevKmer, and Pseudo Trinucleotide composition (PseTNC) are applied to take out those features that can numerically describe the biological sample. Then these features were passed into three classification algorithms such as random forest, k-nearest neighbor, and support vector machine (SVM). After evaluation through the jackknife test, the proposed model achieved promising results of 93.92% and 96.39% for datasets S 1 and S 2 respectively. It has been noted that the identification performance of our current model (TargetSS) is better than the existing methods. Finally, we conclude that our proposed model for splicing site identification will be proved a useful tool for Bioinformatics, Computational Biology, Molecular Biology, and drug discovery applications.