Fig. 1 Ranking of variable importance of RSI in Random Forest model.
Feature selection andclinical signature development
The clinical feature variables included a total of 32 categories (refer to section 2.2 for details). To construct clinical features (CLI), we followed the same method as in the previous section. Firstly, six machine learning algorithms, including Gradient Boosting, Support Vector Machine, AdaBoost, Random Forest, K-Nearest Neighbor, and Neural Network, were employed, and the results indicated that the Support Vector Machine model was better prediction results (Table 3).
The Support Vector Machine model boasted a prediction accuracy of 0.8627 (95% CI: 0.8078-0.9068). To extract the corresponding CLI, the Support Vector Machine algorithm was implemented on the clinical data of each patient. The importance variables and ranking results of features in the Support Vector Machine model were revealed in Fig. 2. Subsequently, the Support Vector Machine algorithm was utilized to extract the corresponding CLI from the clinical data of each patient.
Table 3 Machine learning outcomes of CLI.