Large scale machine learning with stochastic gradient descent
Author: Leon Bottou
Author Affiliations: NEC Labs America,
article is extracted from the book Proceedings of
Book Subtitle: 19th International
Conference on Computational Statistics Paris France, August 22-27, 2010
release: 30 September 2010
Keywords: stochastic gradient descent, efficiency, online learning
As found in bibliography below Leon Bottou is :
A researcher "that best known for his work
in machine learning and data compression". He worked on stochastic
gradient descent, he is also one of the main creators of DjVu software. He is the creator of the Lush programming language. He developed "the open
source software LaSVM for fast large-scale support vector
machine, and stochastic gradient descent software for training linear
SVM and Conditional Random Fields". And now he is working with Facebook
Artificial Intelligence Research, from March 2015.
1) Stochastic gradient algorithm description.
2) Why stochastic gradient algorithms are
attractive when the data is abundant?
3) Asymptotical efficiency of estimates obtained
after a single pass over the training set.
4) Empirical evidence.
One of nowadays problems is computational
complexity of learning algorithms certainly with large datasets and limited processor’s
speed calculation .
The context as mentionned in the article is , "the capabilities of
statistical machine learning methods are limited by the computing time rather
than the sample size". In this case, the paper uses stochastic gradient
algorithms for large scale machine learning problems(maximal
computing time constraint).
This paper review preexisted algorithm and
proves the importance of using why stochastic gradient algorithms in large
The article contribution is the use of gradient algorithms for large scale machine learning problems.
First, it proved why stochastic gradient algorithms are attractive when the
data is abundant. Also, it proved asymptotical efficiency of estimates obtained after a single pass
over the training set. Finally, it have given empirical evidence.
This section shows experimental
results illustrating the performance of stochastic gradient algorithms
on a variety of linear systems.
They used these equations: