Focus-MOT: Multi-target tracking detection algorithm with fine-grained feature extraction aggregation
Jia Hongyu1, Yang Wenwu2,, Zhang Lulu3
1 Dalian Maritime University, No.1 Linghai Road, Dalian, China
2 Dalian Maritime University, No.1 Linghai Road, Dalian, China
3 Dalian Maritime University, No.1 Linghai Road, Dalian, China
Email: yangwenwu@dlmu.edu.cn.
Abstract This work proposes a multi-target tracking and detection algorithm Focus-MOT based on feature refinement extraction fusion, t through the designed Field Enhancement Refinement Module and Information Aggregation Module, which effectively reduces the number of target ID switching.Jointly learns the Detector and Embedding model method becomes the mainstream of multi-target tracking and detection due to its fast detection speed, its Re-ID branch needs to use low-dimensional features and high-dimensional features to accommodate both large and small targets, however, its insufficient feature extraction leads to high ID_SW. Therefore this work aims to extract features of different levels for aggregation as a way to reduce the number of ID switching. The experimental results show a 2.7% improvement in MOTA and a 2300 times decrease in ID_SW relative to the results of the FairMOT algorithm on the MOT17 dataset.
Introduction: Deep learning based multi-target tracking and detection methods can be generally classified into Tracking-By-Detection (referred to as TBD paradigm) and Jointly learns the Detector and Embedding model (referred to as JDE paradigm).The TBD paradigm is represented by Faster R-CNN as the detector of Sort, DeepSort algorithm, MOTDT algorithm, etc. [1-3]. Since the TBD paradigm treats feature vector acquisition and target detection as two separate models and features are not shared, both parts require separate computation time, and the total time is the sum of both, resulting in a lot of time wastage. In contrast, the JDE paradigm uses a single network to fuse target detection and embedding learning, extracts Re-ID features while target detection, and reduces repeated computational inference by sharing features, thus improving the time efficiency of the model while maintaining the same accuracy as the TBD paradigm. For example, Fair-MOT and TADAM algorithms that improve the JDE paradigm [4-6].
The JDE paradigm relies on the feature extraction of the backbone network for recognition tracking, and the degree of its extraction seriously affects the detection tracking accuracy.
Focus-MOT improves the feature extraction and fusion strategy under the single network multitasking model, and adopts the JDE paradigm to design the Field Enhancement Refinement Module and Information Aggregation Module, aiming at extracting features of different levels for aggregation through the backbone network. in order to reduce the number of ID exchanges and pursue a balanced progress between detection speed and accuracy.