If the input to the FERC is video, then the difference between individual frames is computed. Whenever the difference is zero maximally stable frames occurs. Then out of these stable frames aggregated edge detection output is computed. After comparing this aggregated sum for all stable frames, the frame with maximum sum is selected because in a way that frame has maximum details. This frame is then selected as an input to FERC. The logic behind selecting image with more edges is that blurry images have minimum edges or no edges. Once the input image is obtained skin tone detection algorithm\cite{Muhammad_2015} is applied to extract human body parts from the image. This skin-tone detected image is a binary image and used as one of the feature vectors for the first layer of background removal CNN. The other feature is Hough transformed image. If the input image is gray-scale then skin tone detection algorithm is low on accuracy. To overcome this problem second level of background removal CNN, uses circles-in-circle filter. This filter uses Hough transform values for each circle detection\cite{Djekoune_2017}. As shown in figure 2 for each convolution operation, the entire image is divided into overlapping 3x3 matrix and then corresponding 3x3 filer, is convolved over each 3x3 matrix from the image. The sliding and taking dot product operation is called convolution and hence the name convolutional. During convolution dot product of both 3x3 matrix is computed and stored at corresponding location e.g.(1,1) at the output (fig. 2). Once the entire output matrix is calculated, then this output is passed to the next layer of CNN for another round of convolution. The last layer of face feature extracting CNN is simple perceptron, which tries to optimize values of scale factor and exponent depending upon ground truth.\cite{Arena_1991}