Focus-MOT: Multi-target tracking detection algorithm with fine-grained
feature extraction aggregation
Jia Hongyu1, Yang Wenwu2,, Zhang
Lulu3
1 Dalian Maritime University, No.1 Linghai Road,
Dalian, China
2 Dalian Maritime University, No.1 Linghai Road,
Dalian, China
3 Dalian Maritime University, No.1 Linghai Road,
Dalian, China
Email: yangwenwu@dlmu.edu.cn.
Abstract This work proposes a multi-target tracking and detection
algorithm Focus-MOT based on feature refinement extraction fusion, t
through the designed Field Enhancement Refinement Module and Information
Aggregation Module, which effectively reduces the number of target ID
switching.Jointly learns the Detector and Embedding model method becomes
the mainstream of multi-target tracking and detection due to its fast
detection speed, its Re-ID branch needs to use low-dimensional features
and high-dimensional features to accommodate both large and small
targets, however, its insufficient feature extraction leads to high
ID_SW. Therefore this work aims to extract features of different levels
for aggregation as a way to reduce the number of ID switching. The
experimental results show a 2.7% improvement in MOTA and a 2300 times
decrease in ID_SW relative to the results of the FairMOT algorithm on
the MOT17 dataset.
Introduction: Deep learning based multi-target tracking and
detection methods can be generally classified into Tracking-By-Detection
(referred to as TBD paradigm) and Jointly learns the Detector and
Embedding model (referred to as JDE paradigm).The TBD paradigm is
represented by Faster R-CNN as the detector of Sort, DeepSort algorithm,
MOTDT algorithm, etc. [1-3]. Since the TBD paradigm treats feature
vector acquisition and target detection as two separate models and
features are not shared, both parts require separate computation time,
and the total time is the sum of both, resulting in a lot of time
wastage. In contrast, the JDE paradigm uses a single network to fuse
target detection and embedding learning, extracts Re-ID features while
target detection, and reduces repeated computational inference by
sharing features, thus improving the time efficiency of the model while
maintaining the same accuracy as the TBD paradigm. For example, Fair-MOT
and TADAM algorithms that improve the JDE paradigm [4-6].
The JDE paradigm relies on the feature extraction of the backbone
network for recognition tracking, and the degree of its extraction
seriously affects the detection tracking accuracy.
Focus-MOT improves the feature extraction and fusion strategy under the
single network multitasking model, and adopts the JDE paradigm to design
the Field Enhancement Refinement Module and Information Aggregation
Module, aiming at extracting features of different levels for
aggregation through the backbone network. in order to reduce the number
of ID exchanges and pursue a balanced progress between detection speed
and accuracy.