Xavier Holt edited Where_mathbf_w_T_is__.md  over 8 years ago

Commit id: cf91bb5efe0dcf173ca962c5ad904e589bed0eed

deletions | additions      

       

Where \(\mathbf{w^T}\) is a vector of weights, and \(\mathbf{X}\) is our original data point with the addition of an identity element to allow for a intercept term. That is, if \(\mathbf{X}_i = (x_1, x_2, \dots, x_d)^T \in \mathbb{R}^d\) then our transformed data point would be represented as \((1, x_1, x_2, \dots, x_d)^T \in \mathbb{R}^{d+1}\). The addition of this dimension is used widely, for example in simple linear regression.  Now our goal is to find some estimation of this weight vector $\mathbf{w^T}$. \(mathbf{w^T}/).  The traditional approach is to get a point-estimator through some kind of maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any number of approaches. Equivalently, appropriate method. Alternatively and equivalently,  we can could  optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex \cite{rennie2005regularized}. This means we can get a reasonable point-estimate quite efficiently, for example through the use of stochastic gradient descent.