Authorea

Xavier Holt edited Inference_Our_inference_method_is__.md almost 8 years ago

Commit id: aadc5c369a5fb3537e413b22c6b59e1a5ddd93dd

deletions | additions

Our inference method is one of our primary contributions to the state of the art. ## State of the Art Current Approaches The result Underlying all of the our timeline modelsabove is a probability probabilistic density. In order to do anything of interest, we have to be able to perform inference in this space. The catch is that unlike with simpler models, the integral representing our density is intractable.One solution makes use of the fact that while intergrating over the whole measure space is difficult, plugging in values and retrieving an (un-normalised) probability is not. We can exploit this property by performing an intelligent random walk over the surface of the density. The idea is that if we walk for long enough, we'll obtain a reasonable representation of the surface. This is the basis for a family of inference methods called Markov-Chain Monte-Carlo (MCMC) sampling. All current In general, the way this is handled in nonparametric Bayesian TLG formulations use MCMC based inference methods \cite{Ahmed2011, Hong:2011du, Wang2013, Ahmed:2012vh, Li2013}. They models can largely be divided into two camps. These are simple to implement sampling and expectation-maximisation (EM) approaches: 1. The first solution makes use of the fact that while intergrating over the whole measure space is difficult, plugging in values and retrieving an intuitive method for exploring (un-normalised) probability is not. We can exploit this property by performing an unknown density. On intelligent random walk over the other hand they tend to scale poorly with both data-set size or dimensionality \cite{wainwright2008graphical, Grimmer_2010}. NLP in general tends to have large amounts surface of sparse high dimensional data. Furthermore TLG the density. The idea is that if we walk for long enough, we'll obtain a summarisation task and the value reasonable representation ofsummarisation grows with the size of surface. This is the underlying data. Because basis for a family ofthis, exploring additional inference methods is an important goal for further research. called Markov-Chain Monte-Carlo (MCMC) sampling. 2. In broad strokes, the second this method defines a family of densities. The family should be capable of approximating the generating distribution of any given dataset. An iterative optimisation process then occurs to find the distribution in this family that best matches our data. ## Contributions Inference in nonparametric All current Bayesianformulations can largely be divided into sampling and expectation-maximisation (EM) like approaches. The former has been applied to TLG but as of yet no attempt has been made to apply the latter. The work of Wang et al. \cite{Wang2011} and Bryant et al.\cite{Bryant2012} on variational formulations use MCMC based inference is a step in this direction. methods \cite{Ahmed2011, Hong:2011du, Wang2013, Ahmed:2012vh, Li2013}. They develop a variational framework for the hierarchal dirichlet model, a fundamental part of all nonparametric TLG formulations. As such we seek are simple to build on their work implement and apply it to specifically an intuitive method for exploring an unknown density. On the TLG case. Our goal is motivated by other hand they tend to scale poorly with both data-set size or dimensionality \cite{wainwright2008graphical, Grimmer_2010}. On the excellent performance of other hand, variational inference. This is both generally \cite{Grimmer_2010} and specifically; Wang et al.\cite{Wang2011} had excellent performance on a dataset inference (a type of 400,000 articles, an EM-algorithm) has been shown to generate models of similar quality to MCMC methods several order of magnitude larger than any sampling-based inference on the TLG problem. magnitudes faster \cite{wainwright2008graphical, Grimmer_2010}. One alternative No implementations of TLG to sampling-based methods is variational inference. In broad strokes date use this method defines a family of densities capable form of approximating the generating distribution of any given dataset. We then perform an iterative optimisation process inference. Variational methods are highly specific tofind the distribution in this family that best matches our data. Variational underlying model, and developing a new variational inference has been shown to generate models of similar quality to MCMC methods several order formulation is an involved task. Nevertheless they have the potential massively increase scalability of magnitudes faster \cite{wainwright2008graphical, Grimmer_2010}. TLG models, and as such we develop the framework below. No implementations of TLG to date use this form of inference. ## Variational methods are highly specific to the underlying model, and developing a new variational inference formulation is an involved task. Nevertheless they could provide a large increase in scalability. For the sake of comparison, we can observe the work done by Wang et al. and Bryant et al. \cite{Wang2011, Bryant2012}. They analyse the performance of variational inference methods for hierarchical Dirichlet processes. This process is the foundation of our nonparametric topic models, justifying the comparison. Wang et al. employ variational methods on datasets of over 400,000 articles\cite{Wang2011}. In contrast the largest TLG model with MCMC inference to date has only had a dataset of 10,000 articles\cite{Wang2013}. Inference Derived update rules: