2.3.3 Random Forest Model
Analogously, this paper also applies a random forest model. The relationship between the 204 properties calculated by GFA algorithm is analyzed using the Pearson correlation coefficient matrix. The results are represented by the python package Yellowbrick as a picture. Principal component analysis (including 2D and 3D)32was used to analyze the relationships between attributes for the 31 composite data set,. Preprocessing of the data set included the following steps. To begin with, we amplified the numerical values of the properties to between 0 and 1, and then selected 54 properties with a variance greater than 0.05. Furthermore, we normalized and got the data with a mean of 0 and a variance of 1. In the end, we got nine properties using Lasso feature selection. The Pearson correlation coefficient matrix heatmap of these nine properties showed that the research results were credible.