Xavier Holt edited Where_mathbf_w_T_is__.md  over 8 years ago

Commit id: 66c14043370e033ca218cd4a0e70cec93a6c221e

deletions | additions      

       

Where \(\mathbf{w^T}\) is a vector of weights, and \(\mathbf{x_i}\) is our original data point with the addition of an identity element to allow for a intercept term. That is, if \(\mathbf{X}_i \(\mathbf{x}_i  = (x_{i1}, x_{i2}, \dots, x_{id})^T \in \mathbb{R}^d\) then our transformed data point would be represented as \((1, x_1, x_2, \dots, x_d)^T \in \mathbb{R}^{d+1}\). The addition of this dimension is used widely, for example in simple linear regression. Now our goal is to find some estimation of this weight vector $mathbf{w^T}$. The traditional approach is to get a point-estimator through some kind of maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any appropriate method. Alternatively and equivalently, we could optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex \cite{rennie2005regularized}. This means we can get a reasonable point-estimate quite efficiently, for example through the use of stochastic gradient descent.