Deep learning

AbstractIn the last couple of years a new approach has been used with great success, called deep learning. Deep learning usually consists of neural networks with at least 3 layers, which are used to build hierarchical representations of data. Some of the deep learning methods are unsupervised and they are used to learn better features that are than used to solve a machine learning problem. Other deep learning methods are supervised and can be used directly to solve classification problems. Deep learning systems have been used to obtain the state of the art in different problems, such as object recognition in images or speech recognition, tested on standard datasets such as MNIST, CIFAR-10, Pascal VOC and TIMIT.


Deep learning represents a category of machine learning algorithms that differ from previous ones by not learning the desired function directly, but they learn how to process the input data first. Until now, for example, to recognize objects in images, researchers developed various feature extractors, the more specific to an object the better, and after applying them to an image, they used a classification algorithm such as an SVM or a Random Forest to determine what was in each image. For some objects there are really good feature extractors (for faces, for example), but for other less common items, there are no good feature extractors and it would take too long time for humans to manually develop such things for different items. But deep learning algorithms don’t require this feature extraction step because they learn to do it themselves.

The deep part of the name comes from the fact that instead of having a single layer that receives the input data and outputs the desired result, we have a series of layers that process data received from the previous layer, extracting higher and higher levels of features. Only the last layer is used to obtain the result, after the data has been transformed and compressed.

Deep neural networks have been succesfully used by various research groups. Dahl et al. from Microsoft presented a paper about using deep neural nets for speech recogntion (Dahl). The following year, Quoc et al. from Google developed a system that, from 10 million YouTube thumbnails, learned by itself to recognize human faces (and 22.000 other categories of objects) (Le).

The aim of this paper is to present some of the main aspects of deep learning, that set it apart from traditional machine learning algorithms and that make it possible for it to perform much better in some cases.

The rest of the paper is structured as follows: section \ref{sec:unsupervised} presents the two main unsupervised deep learning approaches, while section \ref{sec:supervised} presents some of the improvements used in the supervised setting.