Xavier Holt edited Where_mathbf_w_T_is__.md  over 8 years ago

Commit id: 4458b16de22b249852f4959da573822ee97c11a2

deletions | additions      

       

Where \(\mathbf{w^T}\) is a vector of weights, and \(\mathbf{X}\) \(\mathbf{x_i}\)  is our original data point with the addition of an identity element to allow for a intercept term. That is, if \(\mathbf{X}_i = (x_1, x_2, (x_{i1}, x_{i2},  \dots, x_d)^T x_{id})^T  \in \mathbb{R}^d\) then our transformed data point would be represented as \((1, x_1, x_2, \dots, x_d)^T \in \mathbb{R}^{d+1}\). The addition of this dimension is used widely, for example in simple linear regression. Now our goal is to find some estimation of this weight vector \(mathbf{w^T}/). The traditional approach is to get a point-estimator through some kind of maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any appropriate method. Alternatively and equivalently, we could optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex \cite{rennie2005regularized}. This means we can get a reasonable point-estimate quite efficiently, for example through the use of stochastic gradient descent.