Part Detectors and CNNs

Abstract. First, we review the part detectors we created in the re-id bookchapter. Second, we review the structure of CNNs. The connection between these two topics is this idea: the first convolutional layer of a CNN is similar to a bank of spatial filters, while the part detectors are based on histograms of oriented gradients (HOG) features: is there some transferable knowledge between the two approaches? a new type of layer for the CNN? a new type of feature extraction for HOGs? what about the following convolutional layers in a CNN?

\label{fig:HOG_Overview}Overview of the HOG feature extraction.

Part Detectors

Calculating the HOG features requires a series of steps, shown summarized in Fig. \ref{fig:HOG_Overview}. At each step, Dalal and Triggs (Dalal 2005) experimentally show that certain choices produce better results than others, and they call the resultant procedure the default detector (HOG-dd). Like other recent implementations (Felzenszwalb 2010), we largely operate the same choices, but also introduce some tweaks.

STEP 1.

Here, we assume the input is an image window of canonical size for the body part we are considering. Like in HOG-dd, we directly compute the gradients with the masks \([-1,0,1]\). For color images, each RGB color channel is processed separately, and pixels assume the gradient vector with the largest norm. While it does not take full advantage of the color information, it is better than discarding it like in the Andriluka’s detector.

STEP 2.

Next, we turn each pixel gradient vector into an histogram by quantizing its orientation into 18 bins. The orientation bins are evenly spaced over the range \(0^{\circ}-180^{\circ}\) so each bin spans \(10^{\circ}\). For pedestrians there is no a-priori light/dark scheme between foreground and background (due to clothes and scenes) that justifies the use of the “signed” gradients with range \(0^{\circ}-360^{\circ}\): in other words, we use the contrast insensitive version