Authorea

Xavier Holt edited Contributions_Timeline_generation_is_a__.md almost 8 years ago

Commit id: 666af0e9d72948c567e036ed6052b55f973c5693

deletions | additions

# Contributions Timeline generation is a tool with incredible potential. The ability to intelligently It provides a broad and efficiently summarise huge blocks of data is hugely valuable. As general framework for intelligently summarising heterogeneous data. However, as it stands all current TLG models fall short in two areas: evaluation and inference. This is the gap in the literature we seek to fill, and our the methodologies we develop to address this gap make up our two primary contributions to the state-of-the-art. 1. First, we develop We provide a novelscientifically rigorous framework for evaluating the quality of a timeline. timeline that is scientifically rigourous. No current approach in the literature is satisfactory. This is concerning asin some sense the evaluative process can be seen as the is a fundamental pillars pillar upon which our model resides. models reside. Any conclusion we make on our models depend on the correctedness of ourevaluation framework. We address this by constructing an evaluative pipeline framework. Yet, we argue below that seeks to balance rigour, cost, correctedness and interpretability. 2. We also present a novel method for performing inference on the timeline models. No all current implementation has a dataset of more than a few-thousand articles. This is unsatisfactory considering the huge wealth methods fall short in one of information available several ways. We use these shortcomings to us. The primary reason for this is motivate the inference bottleneck. Current methods use some form development of Gibb's sampling, however these methods do not scale well. our evaluation pipeline. We seek to improve on performance in this area by leveraging an alternative approach: variational bayesian inference. Being able to perform these modelling * argue that our approach is a technical complexity of the methods. This is both generally \cite{Grimmer_2010} balances cost and specifically; Wang et al.\cite{Wang2011} had excellent performance on a dataset of 400,000 articles, an order of magnitude larger than any sampling-based inference on the TLG problem. correctedness. 2. We also method for performing inference on the timeline models. No current implementation uses more than a few thousand articles. This is unsatisfactory considering the wealth of information available to us. The primary reason for this is the inference bottleneck. Current methods all use some form of Gibb's sampling, however these methods (especially simpler implementations thereof) have been known to scale poorly. We seek to improve on performance in this area by leveraging an alternative approach: variational bayesian inference. This family of techniques has been known to outclass Gibb's methods in terms of scalability while still maintaining comparable performance\cite{Grimmer_2010}. The catch is that unlike the generalisability and ease of understanding that characterises sampling based approaches, applying variational methods to a new domain is an involved, technical endeavour.