This report will explain the article written by Sam T. Roweis and Lawrence K. Saul about the Locally Linear Embedding (LLE) algorithm used for the analysis and visualization of big amounts of data. This document called Nonlinear Dimensionality Reduction by Locally Linear Embedding and published at the American Association for the Advancement of Science in 2000, shows how the use of this algorithm improves the problem of dimensionality reduction mapping the inputs into a coordinate to be applied in nonlinear manifold data.

Sam T. Roweis, associate professor in Department of Computer Science at the University of Toronto since 2006, is a Research Scientist & Consultant for Google. And Lawrence K. Saul, professor in Department of Computer Science and Engineering at the UC San Diego since 2006, is now focused on applications of machine learning to problems in computer systems and security.

High-dimensional data, that is to say all data that requires more than 2D or 3D to be represented,is difficult to be interpreted by a human. To simplify it, one method that can be followed is considering an embedded nonlinear manifold formed by the relevant data, this is the purpose of the nonlinear dimensionality reduction.

Dimensionality reduction can be considered as the process of deriving a set of degrees of freedom to reproduce most part of the variability of a data set. As in the example shown in Fig.3, a set of faces images which the only variation its pose, only one degree of freedom has been altered so the pictures lie on a continuous one-dimensional curve.

In other words, all manifold learning algorithms can be applied to data dimensionality reduction, producing a low-dimensional encoding from a high-dimensional one, data visualization, providing an image of a data set based on the degrees of freedom, and supervised learning preprocessing, to clean sub-sequent supervised training data.

The purpose of Locally Linear Embedding (LLE) and other dimensionality reduction algorithms like Multidimensional Scaling (MDS), Principal Component Analysis (PCA) and Isomap, is to find correlations and connections between different samples of an observed object and plotting them into a drawn that can be visualized. Also it is able to maps the resource data into a single global coordinate system, optimizing it not using local minima and learning form nonlinear manifolds as it is shown in the source article.

Although the work shown by Roweis and Saul names only a few algorithms, there are many other systems based also on manifold learning procedures (Local tangent space alignment, Laplacian eigenmaps,…). The algorithms presented and developed in the article have some main disadvantages with the existing LLE. As we can see in Fig.1 form the original file, mapping high-dimensional inputs into a low-dimensional space with the same number of coordinates as observed variability modes using MDS or Isomap preserve pairwise distance between points, distance calculated with straight lines (MDS) or manifold shortest path (Isomap) while LLE eliminates the need to calculate them for far points.

This is done by characterizing the local geometry of the linear manifold patch with coefficients that reconstruct all data points using their corresponding neighbors. Minimizing the cost function taken into account that each sample \(X_{i}\) is reconstructed using its neighbors we obtain the weights for each case as a matrix, knowing that the sum of all weights of the same row must be equals to 1. So in the final step we can compute the cost function, basing on locally linear reconstruction errors, using the low dimensional vector \(Y_{i}\) obtained from the corresponding \(X_{i}\) that represents the global coordinates of the manifold.