this is for holding javascript data
Xavier Holt edited subsection_Prior_for_Weights_In__.tex
over 8 years ago
Commit id: 72c0ed1962e4854ae16df659edd4642ad3799d25
deletions | additions
diff --git a/subsection_Prior_for_Weights_In__.tex b/subsection_Prior_for_Weights_In__.tex
index 3d38031..743c469 100644
--- a/subsection_Prior_for_Weights_In__.tex
+++ b/subsection_Prior_for_Weights_In__.tex
...
\begin{align}
\hat{\mathbf{w}} &= \text{argmax}_\mathbf{w} \log \left( L(\mathbf{w} \mid \mathcal{D})\times p(\mathbf{w} \mid \boldsymbol{\sigma} )\right)\\
& = \text{argmax}_\mathbf{w} \log(L(\mathbf{w} \mid \mathcal{D})) + \log p(\mathbf{w} \mid \boldsymbol{\sigma})\\
& := \text{argmax}_\mathbf{w} \mathcal{l}(\mathbf{w} \mid \mathcal{D})
+ - (\mathbf{w}^T \boldsymbol{\sigma^{-1}})^2
\end{align}
As mentioned, this has some reasonable qualities and has been shown to perform quite well \cite{Zhang_2003,fan2003loss}. The downside is that, while the Guassian prior favours values particularly close to zero, it does not significantly favour parameters being exactly equal to zero. This is of particular relevance to the problem at hand, as our ability to define an arbitrary number of features and hence dimensions means that it would be beneficial to enforce some sparsity on the weights. It turns out that regularisers that penalise proportional to the $\mathcal{l}^1$ norm of the weights, instead of $\mathcal{l}^2$ norm, achieve this result. To this end, we make the following observation: