Xavier Holt edited Where_mathbf_w_T_is__.md  over 8 years ago

Commit id: d6b367d111d6f241a7b672260b4af79aa17238f1

deletions | additions      

       

Where \(\mathbf{w^T}\) is a vector of weights, and \(\mathbf{x_i}\) is our original data point with the addition of an identity element to allow for a intercept term. That is, if \(\mathbf{x}_{i\alpha} our original data was of the form \(\tilde{\mathbf{x}_{i}}  = (x_{i1}, x_{i2}, \dots, x_{id})^T x_{id-1})^T  \in \mathbb{R}^d\) \mathbb{R}^{d-1}\),  then our transformed data point would be represented as \((1, x_1, x_2, \dots, x_d)^T := \mathbf{x_i}  \in \mathbb{R}^{d+1}\). The addition of this dimension is used widely, for example in simple linear regression. Now our goal is to find some estimation of this weight vector $mathbf{w^T}$. The traditional approach is to get a point-estimator through some kind of maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any appropriate method. Alternatively and equivalently, we could optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex \cite{rennie2005regularized}. This means we can get a reasonable point-estimate quite efficiently, for example through the use of stochastic gradient descent.