Fig. 7 The diagram of the improved RetinaNet model framework FPN solves the multi-scale problem in object detection by simply connecting the network. The performance of surface microdefect detection has been substantially improved with essentially no increase in the computational effort of the original model. Top-down connections are made through top-level features for upsampling and low-level features, and predictions are made at each level. The more abstract and semantic high-level feature map is collected. And the low-resolution feature map is upsampled as 2 times nearest neighbor to generate the feature map using the underlying localization information. Thus, the target box category classification and bbox location regression tasks are completed. The surface defect recognition and classification tasks are realized to classify surface defects as pit, crack, snowflake and wear.