3.1 | Comparison of different feature
sets
Table 2 compare the predictive performance of two prediction systems
based on three different feature sets and the combined by second-level
SVM, which are all optimized by using MCC as the fitness function. In
our experiment, the individual prediction model by using the
sequence-based feature scheme outperforms the other two. And then the
model by using the micro-environment-based feature better than the
structure-based feature scheme. The combined model by second-level SVM
procedure with outstanding performance shows further information is
indeed and very helpful to understand and determine the cancer-related
factors.
On the other hand, between two systems, CanSavPrewmperforms better than CanSavPrew in three individual
feature sets and the combined. That is because the distinct training and
predicting models are built from the specific sub-group according to the
wild and mutated amino acid type of SAV. Our best prediction system,
CanSavPrewm with two-level SVM that combined sequence-,
structure-, and micro-environment-based features, could distinguish the
SAVs related to cancer or not, and the accuracy, the Matthews
correlation coefficient, and F1-score yield to 90.88%, 0.77 and 0.83,
respectively. The predictive performance for each wild type of SAV of
system CanSavPrewm is illustrated in Table 3.