4.4 Evaluation criteria of model
Accuracy is the ratio of the number of correctly classified samples to the total number of samples. The higher the accuracy, the better the model performance.
Precison refers to the proportion of positive samples predicted by the model, which reflects the model’s ability to distinguish negative samples. The higher the Precision value, the higher the model accuracy.
Recall represents the proportion of all positive samples in the test set that are correctly identified as positive samples. and represents the ability of the classifier to find all positive samples.
F1-score is the harmonic mean of precision and recall. The higher the F1-score, the more robust the model is.
Where TP is positive samples correctly identified as positive samples. TN indicates negative samples correctly identified as negative samples. FP refers to false positive samples, that is., negative samples incorrectly identified as positive samples. FN indicates false negative samples, that is, positive samples incorrectly identified as negative samples.
AP refers to the average precision, which is the area under the PR curve. The better the model, the higher the AP value.
The mAP is the average of multiple class APs and takes the range of [0,1]. The larger the value the better the model performance.
Where, n denotes the number of defect types on the surface of Si3N4 ceramic bearing inner ring, that is, n=4. i denotes the defect type, and the value is [0,3].
The performance of the improved RetinaNet model is verified using the Si3N4 ceramic bearing inner ring surface defect dataset. The model performance evaluation results are shown in Table 2. The improved RetinaNet model performs well on the dataset with a Recall value of 92.59%, which is close to 1, an F1-score value of 0.95, a precision value of 98.19%, and a mAP value of 91.84%. Pit has an AP value of 100%, Crack has an AP value of 79.11%, Wear has an AP value of 96.43%, and snowflake has an AP value of 91.82%. Pit and snowflake have the highest recall value, both at 100.00%. Wear had the second highest recall value, at 96.43%. Crack has the lowest recall value, at 73.91%. Pit has the highest F1-score value of 1.00. Wear has the second highest F1 value of 0.98. Snowflake has an F1 value of 0.95. And crack has the lowest F1 value of 0.85. Pit, crack and wear all have a precision value of 100.00% and snowflake had a precision value of 90.91%.
Tab 2 Evaluation results of the improved RetinaNet on surface defect dataset of Si3N4 ceramic bearing inner