From the results of the ablation experiments, it can be seen that increasing the perceptual field and refining the filtering of features in spatial and channel dimensions can bring different gains to the accuracy of multi-target tracking from different perspectives, while the bottom-up fusion in the information aggregation module can fully fuse the low-level information with the high-level information, which is also the place where the model gains the most.
Conclusion: Focus-MOT takes the extraction and fusion of features at different scales as the main direction, which retains more effective feature information and effectively reduces the number of ID switching during model tracking. Field Enhancement Refinement Module and Information Aggregation Module are proposed to improve the network’s ability to extract key features of the target and enhance the model’s effect of extracting features under different sensory fields. Moreover, it can effectively improve the tracking ability when the target scale is small and the targets overlap, and effectively improve the accuracy of model detection and tracking. The experimental results show that the method has a strong comprehensive performance by effectively reducing the number of ID switching at a higher MOTA.
References
1. Bewley A, Ge Z, Ott L, et al. Simple online and realtime tracking[C]. 2016 IEEE International Conference on Image Processing, Phoenix, Arizona, USA, 2016: 3464-3468
2. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015:28.
3. Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric[C]. 2017 IEEE International Conference on Image Processing, Beijing, China, IEEE, 2017: 3645-3649
4. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 2016: 779-788
5. ZHANG Y F,WANG C Y,WANG X G,et al. FairMOT:on the fairness of detection and re-identification in mul⁃tiple object tracking[J]. International Journal of Computer Vision,2021,129(11):3069-3087
6. GUO Song,WANG Jingya,WANG Xinchao,et al. Online multiple object tracking with cross-task synergy[C].2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Nashville,2021: 8132-8141
7. Shang-Hua Gao,Ming-Ming Cheng,Kai Zhao,Xin-Yu Zhang,Ming-Hsuan Yang,Philip Torr.Res2Net:A New Multi-scale Backbone Architecture.arXiv:1904.01169:1-7
8. Dollár P, Wojek C, Schiele B, et al. Pedestrian detection: A benchmark[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009: 304-311
9. Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3213-3221
10. Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3213-3221
11. Xiao T, Li S, Wang B, et al. Joint detection and identification feature learning for person search[C]//Proceedings of the .IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3415-3424
12. Zheng L, Zhang H, Sun S, et al. Person re-identification in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1367- 1376