Apprentissage

Large scale machine learning with stochastic gradient descent

 

I. Introduction



General information 
Author: Leon Bottou
Author Affiliations: NEC Labs America, Princeton, USA
This article is extracted from the book Proceedings of COMPSTAT'2010[1]
Book Subtitle: 19th International Conference on Computational Statistics Paris France, August 22-27, 2010
Pages: pp 177-186
Date of release: 30 September 2010
Keywords: stochastic gradient descent,  efficiency, online learning

Author presentation

As found in bibliography below Leon Bottou is :[2]
A researcher "that best known for his work in machine learning and data compression". He worked on stochastic gradient descent, he is also one of the main creators of  DjVu software. He is the creator of the Lush programming language. He developed "the open source software LaSVM for fast large-scale support vector machine, and stochastic gradient descent software for training linear SVM and Conditional Random Fields". And now he is working with Facebook Artificial Intelligence Research, from March 2015.


PLAN

1)       Stochastic gradient algorithm description.
2)       Why stochastic gradient algorithms are attractive when the data is abundant?
3)       Asymptotical efficiency of estimates obtained after a single pass over the training set.
4)       Empirical evidence.

II. Context

 
One of nowadays problems is computational complexity of learning algorithms certainly with large datasets and limited processor’s speed calculation .
The context as mentionned in the article is , "the capabilities of statistical machine learning methods are limited by the computing time rather than the sample size". In this case, the paper uses stochastic gradient algorithms for large scale machine learning problems(maximal computing time constraint).
.

III. Positioning

This paper review preexisted algorithm and proves the importance of using why stochastic gradient algorithms in large scale machine.

IV. Contribution

 
The article contribution is the use of gradient algorithms for large scale machine learning problems. First, it proved why stochastic gradient algorithms are attractive when the data is abundant. Also, it proved asymptotical efficiency of estimates obtained after a single pass over the training set. Finally, it have given empirical evidence.

V.   Experience

This section shows experimental results illustrating the performance of stochastic gradient algorithms on a variety of linear systems.
They used these equations:


Using these equations they got these results:
Here we can conclude from the figure below or as mentionned in the article "The lower half of the plot shows the time requ