From the results of the ablation experiments, it can be seen that
increasing the perceptual field and refining the filtering of features
in spatial and channel dimensions can bring different gains to the
accuracy of multi-target tracking from different perspectives, while the
bottom-up fusion in the information aggregation module can fully fuse
the low-level information with the high-level information, which is also
the place where the model gains the most.
Conclusion: Focus-MOT takes the extraction and fusion of features
at different scales as the main direction, which retains more effective
feature information and effectively reduces the number of ID switching
during model tracking. Field Enhancement Refinement Module and
Information Aggregation Module are proposed to improve the network’s
ability to extract key features of the target and enhance the model’s
effect of extracting features under different sensory fields. Moreover,
it can effectively improve the tracking ability when the target scale is
small and the targets overlap, and effectively improve the accuracy of
model detection and tracking. The experimental results show that the
method has a strong comprehensive performance by effectively reducing
the number of ID switching at a higher MOTA.
References
1. Bewley A, Ge Z, Ott L, et al. Simple online and realtime
tracking[C]. 2016 IEEE International Conference on Image Processing,
Phoenix, Arizona, USA, 2016: 3464-3468
2. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time
object detection with region proposal networks[J]. Advances in
Neural Information Processing Systems, 2015:28.
3. Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with
a deep association metric[C]. 2017 IEEE International Conference on
Image Processing, Beijing, China, IEEE, 2017: 3645-3649
4. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified,
real-time object detection[C]. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 2016:
779-788
5. ZHANG Y F,WANG C Y,WANG X G,et al. FairMOT:on the fairness of
detection and re-identification in mul⁃tiple object tracking[J].
International Journal of Computer Vision,2021,129(11):3069-3087
6. GUO Song,WANG Jingya,WANG Xinchao,et al. Online multiple object
tracking with cross-task synergy[C].2021 IEEE/CVF Conference on
Computer Vision and Pattern Recognition.Nashville,2021: 8132-8141
7. Shang-Hua Gao,Ming-Ming Cheng,Kai Zhao,Xin-Yu Zhang,Ming-Hsuan
Yang,Philip Torr.Res2Net:A New Multi-scale Backbone
Architecture.arXiv:1904.01169:1-7
8. Dollár P, Wojek C, Schiele B, et al. Pedestrian detection: A
benchmark[C]//2009 IEEE Conference on Computer Vision and Pattern
Recognition. IEEE, 2009: 304-311
9. Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for
pedestrian detection[C]//Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 2017: 3213-3221
10. Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for
pedestrian detection[C]//Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. 2017: 3213-3221
11. Xiao T, Li S, Wang B, et al. Joint detection and identification
feature learning for person search[C]//Proceedings of the .IEEE
Conference on Computer Vision and Pattern Recognition. 2017: 3415-3424
12. Zheng L, Zhang H, Sun S, et al. Person re-identification in the
wild[C]//Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2017: 1367- 1376