4.4 Evaluation criteria of model
Accuracy is the ratio of the number of correctly classified samples to
the total number of samples. The higher the accuracy, the better the
model performance.
⑶
Precison refers to the proportion of positive samples predicted by the
model, which reflects the model’s ability to distinguish negative
samples. The higher the Precision value, the higher the model accuracy.
⑷
Recall represents the proportion of all positive samples in the test set
that are correctly identified as positive samples. and represents the
ability of the classifier to find all positive samples.
⑸
F1-score is the harmonic mean of precision and recall. The higher the
F1-score, the more robust the model is.
⑹
Where TP is positive samples correctly identified as positive samples.
TN indicates negative samples correctly identified as negative samples.
FP refers to false positive samples, that is., negative samples
incorrectly identified as positive samples. FN indicates false negative
samples, that is, positive samples incorrectly identified as negative
samples.
AP refers to the average precision, which is the area under the PR
curve. The better the model, the higher the AP value.
The mAP is the average of multiple class APs and takes the range of
[0,1]. The larger the value the better the model performance.
⑺
Where, n denotes the number of defect types on the surface of
Si3N4 ceramic bearing inner ring, that
is, n=4. i denotes the defect type, and the value is [0,3].
The performance of the improved RetinaNet model is verified using the
Si3N4 ceramic bearing inner ring surface
defect dataset. The model performance evaluation results are shown in
Table 2. The improved RetinaNet model performs well on the dataset with
a Recall value of 92.59%, which is close to 1, an F1-score value of
0.95, a precision value of 98.19%, and a mAP value of 91.84%. Pit has
an AP value of 100%, Crack has an AP value of 79.11%, Wear has an AP
value of 96.43%, and snowflake has an AP value of 91.82%. Pit and
snowflake have the highest recall value, both at 100.00%. Wear had the
second highest recall value, at 96.43%. Crack has the lowest recall
value, at 73.91%. Pit has the highest F1-score value of 1.00. Wear has
the second highest F1 value of 0.98. Snowflake has an F1 value of 0.95.
And crack has the lowest F1 value of 0.85. Pit, crack and wear all have
a precision value of 100.00% and snowflake had a precision value of
90.91%.
Tab 2 Evaluation results of the improved RetinaNet on surface defect
dataset of Si3N4 ceramic bearing inner