Xavier Holt edited Where_mathbf_w_T_is__.md  over 8 years ago

Commit id: 1072ef6654da8580bdc0801f219793483c435101

deletions | additions      

       

Where \(\mathbf{w^T}\) is a vector of weights, and \(\mathbf{x_i}\) is our original data point with the addition of an identity element to allow for a intercept term. That is, if our original data was of the form \(\tilde{\mathbf{x}_{i}} \(\tilde{\mathbf{x}}_{i}  = (x_{i1}, x_{i2}, \dots, x_{id-1})^T \in \mathbb{R}^{d-1}\), then our transformed data point would be represented as \((1, x_{i1}, x_{i2}, \dots, x_{id-1})^T := \mathbf{x_i} \in \mathbb{R}^{d}\). The addition of this dimension is used widely, for example in simple linear regression. Now our Our  goal then  is to find some estimation of this weight vector $mathbf{w^T}$. \(\mathbf{w^T}\), given our dataset \(\mathcal{D} = (\mathbf{Y_n, X_n} )\).  The traditional approach is to get a point-estimator through some kind of maximum-likelihood approach. This is equivalent to formulating an objective function of the likelihood of our estimator given the data, and optimising through any appropriate method. Alternatively and equivalently, we could optimise over the log-likelihood. Either way, it turns out that the objective function is strictly convex \cite{rennie2005regularized}. This means we can get a reasonable point-estimate quite efficiently, for example through the use of stochastic gradient descent. That is, we represent the problem approximately as: