Authorea

Antonio Coppola edited Introduction.tex over 9 years ago

Commit id: 3cadb4003a2339587d9c1c7acea8c96291f85b59

deletions | additions

\subsection{Notation} Throughout this vignette, we will be using the following notation, which largely follows Andrew and Gao (2007). Let $f: \mathbb{R}^\mathbb{n} \mapsto \mathbb{R}$ be the objective function to be minimized. We also let the $\lvert \lvert \cdot \lvert \lvert$ operator denote the $L_2$ norm of a vector, and $\lvert \lvert \cdot \lvert \lvert_1$ denote the $L_1$ norm. $\mathbf{B_k}$ is the Hessian matrix (or its approximation) of $f$ at $x^k$, and $g^k$ if the gradient of $f$ at the same point. We also let $\mathbf{H_k} = \mathbf{B_k}^{-1}$ be the inverse of the (approximated) Hessian matrix. \subsection{The L-BFGS Algorithm} The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm \cite{Liu_Nocedal_1989} is employed for solving high-dimensional minimization problems in scenarios where both the objective function and its gradient can be computed analytically. The L-BFGS algorithm belongs to the class of quasi-Newton optimization routines, which solve the given minimization problem by computing approximations to the Hessian matrix of the objective function. At each iteration, the algorithm quasi-Newton algorithms locally models model $f$ at the point $x^k$ using the following a quadratic approximation: \[ Q(x) = f(x^k) + (x - x^k)^T g^k + \frac{1}{2} (x - x^k)^T \mathbf(B_k) (x - x^k) \] A search direction can then be found by computing the vector $x^*$ that minimizes $Q(x)$. Assuming that Hessian is positive-definite, \subsection{The OWL-QN Algorithm}