Jacob Stevenson edited Volume probabilities.tex  over 9 years ago

Commit id: 7445eeaa0c06ba67683e973e5792b102af3b2119

deletions | additions      

       

Another advantage of the Bayesian method is that it gives you a full probability distribution over $\theta$. Which is a lot more information than a simple maximum likelihood estimate. (I'm not sure if the likelihood function can be thought of as a probability distribution. I suspect that the Bayesian probability simply reduces to the Likelihood if the prior $P(\theta)$ is flat. But I can't say for sure.)  \subsubsection{bayesian model comparison}  Bishop \S 3.4 talks about Bayesian model comparison. In this situation, different models would correspond to different functional forms for $P(V|\theta_i)$ with $\theta_i$ being parameters of model $i$ ($M_i$).  If we define the data $D = \{ V_1, ..., V_N \}$ then   \begin{equation}  p(M_i | D) \propto p(M_i) p(D|M_i)  \end{equation}  where p(M_i) is the prior over p(M_i) which we will assume is flat. $p(D|M_i)$ is known as the model evidence. It is also sometime called the marginal likelihood because it can be viewed as the likelihiood function over the space of models, in which the parameters of the models have been marginalized out (integrated out). The ratio of model evidences for two models is known as the Bayes factor.  We can even combine multiple models (mixture distribution)  \begin{equation}  p(V| D) = \sum_i p(V|D, M_i) p(M_i | D)  \end{equation}  \subsection{non-parametric formulation}  If we want to approximate the probability density without specifying a functional form we can use density estimation \url{http://en.wikipedia.org/wiki/Density_estimation} (also see Bishop \S 2.5), especially kernal density estimation \url{http://en.wikipedia.org/wiki/Kernel_density_estimation}. This is a bit like histogramming with smoothing, but you end up with an analytic form for the density at the end. This can easily be done using \url{http://scikit-learn.org/stable/modules/density.html}