What are the contributions of the paper: This paper proposes a filtering approach to improve re-identification, which may be used upon pedestrian detection. I think that the paper does not provide sufficient detail to reproduce the results, and its technical contributions is weak; however, the presented results might be of interest for some readers. Therefore, I made my recommendation re-submit after some modifications.
What are the additional ways in which the paper could be improved:
1. I basically think that the technical contribution of the window-based filtering is weak. However, this approach may be adopted in other re-identification methods to improve the performance. My main concern at this point is how agnostic this performance improvement is to different combinations of pedestrian detection and re-identification. If the approach much depends on the combination, providing some insights on when this approach works well is essential. Some additional experimental results to compare different combinations together with discussion on them is necessary.
We understand the reviewers concern. We think that having pedestrian detections of another PD algorithm for a set of frames of the dataset would be interesting way to confirm how agnostic to the algorithms employed the performance boost from the window-based classifier is. However we feel there are some parts of the performance boost that we can readily state will always happen, such as:
When the parameter d is \(>\)1 (minimum number of re-identifications greater than 1 for a window to be provided as output) then spurious False Positives and spurious mis-classifications are always filtered out, thus improving precision.
When the parameter w is \(>\)2 (minimum width of the window greater than 2) then missed detections or mis-classifications that fall between d correct re-identifications will always be recovered, since that whole window is provided as output, thus improving recall.
We’ve added this to the end of section 3.3 Window-based Classifier.
2. The details of the main technical components of this work are not provided. As for false positive class, I don’t see how it is used in the re-id system.
We thank the reviewer for pointing this out. If this matter is not clear, we shall endeavor to make it clear. In figure 2, we assume the Single-frame Classifier is a classifier that will classify input images as one of the persons available in the training set. Therefore, when a classifier that only has classes for people receives an input image of a fire-extinguisher (figure 5), it always fails to correctly classify (there is no correct class!). In Section 5.2.2 Single frame Re-Identification we cite  as the algorithm employed to enact classification.  describes a multi-class classifier, that does not, by default, contains an extra class for unclassifiable inputs. Therefore, when using the “FP Class module”, we are in fact providing the re-identification classifier algorithm with extra training samples for it to train an extra class, a class of detection false positives. For this to be clearer we have edited the 5.4 Scenarios section thusly:
Afterwards, in the FPCLASS scenario, we turn ON the FPclass, thus providing the re-identification algorithm with additional training samples (samples of false positives) to build an extra class (a false positive class), and therefore evaluate our approach to address detection false positives.
As for window-based classifier, the paper does not mention what its outputs are (and I assume that they are binary classification results, not ranking). The paper definitely needs these details.
We thank the reviewer very much for this comment, since this fact is important and, unfortunately, missing in the submitted version. The window-based classifier was built to provide output to a user that asks the system “where is X person?”, and indeed provides a binary output to that question for a set of frames. Therefore, we’ve complemented the first paragraph of section 3.3 Window-based Classifier as follows:
Here we describe the window-based classifier, that exploits the temporal continuity of the pedestrians in the video to increase performance. It takes any single-frame classifier that gives a ranked output, filters its output, and provides a binary output that informs if a given person is present in a given set of frames.
Particularly what if there are multiple people in a single frame?
Thank you for asking this. To clarify and avoid similar doubts arising in our readers we’ve added the following to the end of the first paragraph in section 3.3 Window-based Classifier:
This output is independent per person. Since the output is frames that contain person X, In a frame that contains person 1 and person 2, if both are correctly re-identified, this frame will be in the output when the system is asked “Where is person 1”, as well as when asked “Where is person 2”.
3. Object detectors are often trained with hard negatives, which is false negatives detected with an initial (or tentative) detector as in Dalal’s HOG paper. I think positive false class in this paper is somehow related hard negatives. I think some discussion on this point is necessary. For example, the detector can be trained with an augmented dataset that contains images in Figure 5. Given this, what are the advantages of the proposed false positive class over retraining the detector?
That’s exactly it! The False Positive class is indeed an extra set of training samples to allow the classifier to have a trained class of false positives. The benefit comes from allowing the use of a ’pre-trained-off-the-shelf’ pedestrian detector. The user may not have access to re-training the pedestrian detector, but will for sure be able to include given samples (false positive samples) under a new class in the gallery of the re-identification algorithm. To drive this point home with the readers we’ve enriched the first paragraph of section 3.2 False Positives Class thusly:
Pedestrian detector algorithms usually produce some false positive detections. One way to improve their performance in a given context is to retrain the detector adding the false positives to the training as hard negatives on that specific context. However, some false positives may still remain, or the user may wish to use a ’pre-trained-off-the-shelf’ PD algorithm where retraining is not an option. Therefore, another contribution of our work is to adapt the RE-ID module to deal with the FP produced by the PD.
and added the following sentence in that same section:
Observing that the appearance of the FPs in a given scenario is not completely random, but is worth modeling (see Fig 5), we provide the re-identification classifier with FP samples for it to be able to train a FP class.
4. Are Eq. (2) and (3) an extension of the original CMC by the authors to evaluate entire re-id systems? Or is there any existing work that uses it? I’m not very much sure how #ord(i) is defined if there are false positives.
Eq. (2) and (3) are the original CMC when there are false positives and missed detections respectively. We wrote them down to prove how the original CMC has all its values reduced when there are false positives (Eq. 2), and to prove how the original CMC does not change on average when samples go missing (Eq. 3). ord(i) is defined as the number of correct re-identifications at index i in the ordered list of all matches for a probe sample against all classes in the gallery. Since the re-identification classifier deals with each sample independently, when there are false positives the number of correct re-identifications is not affected. There simply are a greater number on incorrect re-identifications. We have enriched the respective parts of the text as follows:
This means that when there are False Positive (FP) probes, without a FP class, each FP contributes to the denominator of Equation 1 in the CMC calculation, but not to the numerator. Given the definition of ord(i), and since the re-identification classifier deals with each sample independently, then when there are false positives the number of correct re-identifications is not affected. There simply are a greater number on incorrect re-identifications, thus reducing every value of the CMC by the fraction of the amount of FPs relative to the total of probes. See Equation 2 below, for the mathematical representation of the original CMC equation when there are FPs:
When there are Missed Detections (MDs), if on average, the samples missed are distributed proportionally to ord(i), then the CMC does not change. See Equation 4 for the mathematical representation of this.