KL-Autoencoders for manifold learning

AbstractWe propose efficient nonlinear dimensionality reduction method based on information theory and autoencoders. We extends autoencoders to learn also the manifold of the original data. The methods are demonstrated on important category of earth science dataset that is obtained from the popular afternoon constellation system.

Introduction

\label{introduction}

For many reasons, including visualization, data modeling, and elucidating important features in Earth science data, it is worthwhile to project the data onto important subspaces via transformations such as, for example, Principal Components Analysis (PCA) (Lee 2007). PCA diagonalizes the covariance of the data, which identifies and ranks dimensions (components) according to their degree of correlation (Gershenfeld 1999). PCA, though robust and widely used, is linear and struggles with nonlinearity and non-normally distributed data, underestimating the true dimensionality of the data. For example, an important class of non-linearity is exemplified in \ref{fig:1dmanifold} below: observations that live on a lower-dimensional manifold (locally Euclidean space) embedded in the high-dimensional observation space. Nonlinear dimensionality reduction (NDR) is the process of determining a mapping from the high-dimensional observation space to the lower-dimensional representation space. In non-parametric NDR, however, an explicit mapping of individual points between the observation and representation spaces is not required.

Though there are relatively few ways to arrive at a hyperplane in the data, there are many ways to explore nonlinear structure that may underlie observations. PCA can be derived from a matrix encoding the pairwise distances along the manifold. These distances lie along geodesics in the linear case and suggest methods for determining underlying non-linear structure, namely the preservation or scaling of distances between points. For example, Locally Linear Embedding (LLE) achieves NDR by constructing linear mappings at points throughout the data set based on their nearest-neighbors, which readily captures the longer-range nonlinearity (Roweis 2000). Isomap NDR, which is closely related to Multi-dimensional Scaling (MDS) and hence PCA, approximates point-to-point geodesic (along the manifold) distances as the shortest path between the two points achieved by stepping from one datum to another, i.e., a graph-distance (Tenenbaum 2000) (Lee 2007). In recent years there has been a great deal of exploration into different ways to use the topological and metric structure of the data to construct NDR mappings.

Fast functions for estimating the intrinsic dimensionality of data would be good for initial analyses, comparing results, and determining whether further investment in NDR and analysis might be fruitful for a given dataset in earth science. Due to the potential size and computational complexity of these analyses, scalable performance will be a key goal of our NDR method. There are wo major classes of NDR: convex methods, such as LLE (Roweis 2000), and nonconvex methods, like Autoencoders (Hinton 2006). Nonconvex methods, in spite of possibly arriving at stochastic local minima solutions, are attractive because of their iterative nature while convex methods, which often based on eigendecomposition of pairwise similarities between data points, are computationally more expensive as the dataset grows to millions or more data points.

Most NDR methods assume that only local distances in high dimensional space are reliable, and attempt to preserve the differences of local distances between the observation and representation spaces. The local distances are euclidean and the objective functions are a weighted version of the difference in these distances. Here, we propose an information theory based criteria, where each local neighborhood has a distribution function that determines how much likely a data point is a member of that neighborhood. The objective function is to minimize the Kullback–Leibler divergence (KL-Divergence) between the distributions of the two spaces.

We extend this definition to autoencoders, whose goal is minimize only the reconstruction error between the high and low dimensional space. Instead of trying to reconstruct the original data, our proposed autoencoders attempts to also reconstruct their neighborhood, thus it is an attempt to also learn the manifold.

Our paper is organized as following. Section \ref{background} describes the motivation behind our work. Then we discuss existing works, in section \ref{related}, that are related and a basis for our proposed method in section \ref{methods}. In section \ref{experiments}, we outline the experiments that evaluate our method.

Background

\label{background}

There has been a great surge of interest in NDR techniques in such fields as materials science, facial recognition, drug discovery, and more, including Earth science (Tenenbaum 2000). For example, Laplacian Eigenvalues (a graph-based technique) and non-linear PCA have been applied to land cover and target classification in polarimetric SAR data (Tu 2012). NDR via manifold learning methods starting with Isomap, LLE, and Local Tangent Space Alignment can lead to better results for the analysis of real nonlinear spatial and hyperspectral imagery (Pu 2013) (Sun 2014).

Directly relevant to ‘event’ processing, where an event is an instance of some phenomenon (e.g., tropical cyclone, blizzard, tropopause fold, etc.), we note the Isomap-based NDR analysis of highly nonlinear and high-dimensional Asian monsoon convection data (Hannachi 2013). Indeed, NDR techniques can provide a way to discover similarities in superficially dissimilar objects, dramatically increasing the power of our event queries allowing a broader range of ‘masks’ to be determined (Floyd 2013). Modest increases in storm detection rates and an 8x computation time speedup have been observed in the application of manifold learning to radar data (Lu 2012).