loading page

Understanding Feature Engineering in the Context of Convolutional Neural Network Models
  • Peter
Peter
National Institute of Standards and Technology (NIST)

Corresponding Author:[email protected]

Author Profile

Abstract

The main objective of this paper is to understand feature engineering in the context of Convolutional Neural Network (CNN) models. Convolutional neural network model has two layers; convolutional, and fully connected layer. The first layer does automatic feature engineering and other layer does nonlinear approximation of extracted features for given analysis. Our work is motivated by the lack of knowledge among the users of CNN models how to configure the convolutional layer (i.e., the choice of filter types and their parameters) in order to maximize the accuracy and reduce the time taken for training the model. The challenges arise from the fact that, (a) there is no mathematical framework for CNN models that would allow rigorous optimization as opposed to ad-hoc search for optimal solutions, (b) the optimal choice of filters in the convolution layer is tightly coupled with the choices in the fully connected layer, and (c) feature engineering in the convolutional layer varies depending on a domain-specific task and type of training data.

We approached the first challenge by varying the structure of CNN model and optimizing the corresponding parameters. The second challenge is approached by selecting three types of filters that are used to initialize convolutional layers of CNN model. The three filters are formed by a random process, a wavelet transform with non-orthogonal matrix (Gabor transform), and a wavelet transform with real and orthogonal matrix (Haar transform). The random initialization represents the case of no a priori knowledge about the application task and feature engineering. The wavelet transform-based initialization introduces a mathematical model for a convolutional layer and assumes that the space of feature engineering can be constrained to data characteristics in time and frequency domains. The presence or absence of orthogonality of a wavelet transform represents the application tasks are about not only discrimination but also reconstruction (i.e., inverse transform is needed). Finally, the third challenge is addressed by considering image classification and image denoising tasks that have different requirements on feature engineering in a convolutional layer. The experimental results for both tasks are obtained from the same data set containing synthetic images of ellipse and triangle shapes. The two objects of different shapes vary in scale, orientation, location, color, and amount and type of included noise (denoising task). The novelty of this work lies in incorporating apriori knowledge about the application tasks into a CNN design, and quantifying the corresponding tradeoffs in terms of convergence, number of parameters, and model accuracy. Based on the experimental results, we concluded that apprpriate choice of filters at convolutional layers will result in faster convergence and better prediction compared to random filters. The significance of this study provides initial recommendations prior to CNN architecture construction for any given analysis.