How to Train Your AI PA: A Novel Approach to Timeline Evaluation and Inference


We address the issue of timeline evaluation and inference on timeline models. We develop an exciting new framework for evaluation, and argue for its theoretical soundness. We also improve upon the state-of-the-art in terms of model inference. We make the first attempts in the literature to apply variational inference methods to the timeline generation problem. In doing so we obtain results that are competitive in terms of the timelines generated with speed-ups of an order of magnitude.


We live in a time where information is more readily available in greater quantities than ever before. The question is how do me make the best use of it? One recent method for summarising massive amounts of data and presenting it in an accessible way is timeline generation.

Timeline generation (TLG) is a way of representing a large amount of temporally dependent information concisely. It is query driven; we retrieve a corpus of text linked to some entity, event or other term. We then select a number of the constituent sentences, timestamp them and return them as output (Fig 1). The canonical TLG model makes this selection by fitting a topic model over the corpus. This is used to cluster these articles into stories. The most relevant of these stories are selected and summarised through some flavour of sentence-selection. It can be seen as a generalisation of the multi-document summarisation task, where we have introduced temporal dependency and structure.

In this paper, we first outline the canonical timeline generation model. We look at several domains where it has been applied, as well as its statistical foundation. Through surveying the current body of work for this model, we define two fundamental issues with current implementations: namely the process by which timelines are evaluated and how inference is performed. Our paper offers novel and rigorous solutions to both of these issues. We provide an innovative and scientifically-rigourous framework for evaluating the quality of a timeline. We also develop a novel method for performing inference on the timeline models. We use the former to evaluate the latter, and present our full methodological design and results. Finally we include two appendices., The first outlines the development and evaluation of our baseline model, which is an intergral part of the entire evaluative process but nevertheless distinct from the rest of the paper. The second covers the ROUGE metric in more detail.

Example of Timeline for the Entity 'Obama'


Timeline generation has been applied in a range of domains.

Single-document variations of the TLG model have been applied to Wikipedia and Wikinews articles (Bauer 2015, Minard 2015). The single-document nature of the task is characterised by a focus on the summarisation and selection tasks over clustering. Timelines took the form of timestamped subsets of the document sentences. A similar approach has been applied to Twitter feeds; Hong et al. applied a single-document version of the TLG model to a user's tweet-history (Hong 2011).

The most frequent application of TLG models is to a corpus of news articles (Chieu 2004, Hong 2011, Yan 2011, Yan 2011a, Allan 2001, Swan 2000,