3.1 | Comparison of different feature sets

Table 2 compare the predictive performance of two prediction systems based on three different feature sets and the combined by second-level SVM, which are all optimized by using MCC as the fitness function. In our experiment, the individual prediction model by using the sequence-based feature scheme outperforms the other two. And then the model by using the micro-environment-based feature better than the structure-based feature scheme. The combined model by second-level SVM procedure with outstanding performance shows further information is indeed and very helpful to understand and determine the cancer-related factors.
On the other hand, between two systems, CanSavPrewmperforms better than CanSavPrew in three individual feature sets and the combined. That is because the distinct training and predicting models are built from the specific sub-group according to the wild and mutated amino acid type of SAV. Our best prediction system, CanSavPrewm with two-level SVM that combined sequence-, structure-, and micro-environment-based features, could distinguish the SAVs related to cancer or not, and the accuracy, the Matthews correlation coefficient, and F1-score yield to 90.88%, 0.77 and 0.83, respectively. The predictive performance for each wild type of SAV of system CanSavPrewm is illustrated in Table 3.