ROUGH DRAFT authorea.com/109159

# Introduction

We first introduce the problem and examine several domains where it has been examined. We then address the two primary research questions that arise when comparing different methods: how best to model and evaluate timelines. In each of these two areas we organise the key papers thematically. The contributions of each paper are synthesised into a general discussion about the pros and cons of each approach.

## The Problem

Timeline generation (TLG) is a way of representing a large amount of temporally dependent information concisely. It is query driven; we retrieve a corpus of text linked to some entity, event or other term. The canonical TLG model clusters these articles into topics or stories, selects the most important of these clusters and returns timestamped summaries. It can be seen as a generalisation of the multi-document summarisation task, where we have introduced temporal dependency and structure.

Clustering: Our model handles both current and historical articles. Several domains where the TLG model is desirable (e.g. news) are characterised by a large historical catalog and high frequency. Our clustering model has to be scaleable and ideally functional in a streaming context.

Topic models: Incorporating a topic structure into our model gives us an intuitive and understandable document representation. Looking at the topic distribution also lets us prioritise certain kinds of sentences or stories.

Timelines: TLG differs from regular multi-document summarisation in its temporal dependence. As such, we value diversity in both the kind of content selected as well as when it was published.

Summarisation: While clustering and topic modeling are useful structuring tools, we must still compress the data for human readability.

## Applications

Timeline generation has been applied in a range of domains. Several approaches focus on specific sub-problems in TLG. Althoff et al. timelines entities in the Freebase repository (Althoff 2015). This knowledge base is comprised of subject and objects linked by predicates; 'Robert Downey Junior' might be linked by 'Starred In' to 'The Avengers. The TLG model seeks to group and select a subset of these subject-predicates that summarise our query entity. Working from knowledge-base instead of a corpus of text provides useful structure. We are spared the task of preprocessing, entity linking, consolidation and data validation on our text corpora. Of course, this relies on the existence of a knowledge base of timestamped facts about our query entity. This is an issue if we want to be able to perform queries on lesser known entities. A similar approach was used by Ahmed et al. on a corpus of academic papers (Ahmed 2012).

Single-document variations of the TLG model have been applied to Wikipedia and Wikinews articles (Bauer 2015, Minard 2015). The single-document nature of the task is characterised by a focus on the summarisation and selection tasks over clustering. Timelines took the form of timestamped subsets of the document sentences. A similar approach has been applied to Twitter feeds; Hong et al. applied a single-document version of the TLG model to a user's tweet-history (Hong 2011).

The most frequent application of TLG models is to a corpus of news articles (Chieu 2004, Hong 2011, Yan 2011, Yan 2011a, Allan 2001, Swan 2000, Ahmed 2011). A document retrieval service is given a query term; a person, event, company or generic term. The body of documents retrieved form the input for the TLG task. The task then is to group and summarise a potentially large body of articles and provide a compact representation of the query term.

# Research Methods in Timeline Modelling

Several approaches have been used to model timelines and the TLG problem. They can be roughly divided into optimisation based approaches and probabilstic approaches. Our focus and the majority of our analysis is on the latter. They are the more prevalent approach and have several highly desirable attributes in terms of post-hoc querying and representation.

## Optimisation Based Approaches

Optimisation based approaches develop a function that measures sentence importance. This has to account for relevancy and diversity -- both in terms of content and temporality. Once this function has been formulated, inference occurs through maximisisation.

### Simple Optimisation

Several approaches formulate TLG as a simple optimisation problem. Althoff et al. and Chieu et al. both define objective functions on the sentence level which include signals of relevance and diversity (Althoff 2015, Chieu 2004). Numeric methods are used to find the sentences maximising this objective. This approach forgoes the topic and clustering structure present in other methodologies. The advantage is that model construction and timeline generation is relatively simple. Without any additional structure on the data it can be hard to determine a valid and representative objective function. This structure can also be useful in and of itself, discussed further in the subsequent sections.

### PageRank Framework

In a similar manner some academics have applied a PageRank-esque framework to TLG (Yan 2011, Yan 2011a, Tran 2015). These models operate through maximising an objective function, but are characterised by an underlying graph structure. Sentences are linked to one another and ranked based on the 'votes' or 'preferences' they receive from their neighbours. Often inter- and intra- document sentences are represented and weighted differently to improve performance (Wan 2007). Temporal dependence is induced through the particulars of graph construction; Yan et al. for example models this component through intra- and inter- date edges (Yan 2011, Yan 2011a). The PageRank approach is more structured than a simple optimisation approach, and comes at the cost of additional model building complexity. The tradeoff is that the graph-structure is argued as being an intuitive method for representing the problem (Wan 2007).

## Simple Probabilistic Models

The earliest attempts at approaching the TLG problem were simple probablistic models. Allan et al.'s approach was a first attempt at generalising sentence selection to include temporal dependence (Allan 2001). The model ranked documents by defining a density function on a sentence level measuring the probability it was "useful and novel". As a simplifying assumption, usefulness and novelness were assumed independent, that is $$P\left(\text{useful\landnovel}\right) = P\left(\text{useful}\right)P\left(\text{novel}\right)$$. The features were modeled through simple counting statistics and sentence similarities -- a 'novel' document was dissimilar to documents appearing earlier. Inference involved emitting all sentences with $$P\left(\text{useful\landnovel}\right) > \theta$$ for some parameter $$\theta$$.

The advantage of this model is that we immediately retrieve some measure of uncertainty. Optimisation based approaches could emit uninformative sentences that are maximum by their cost function. A well implemented probability based approach indicates not just the most desirable sentence but also some interpretable measure of ipso-factor relevance.

## Bayesian Models

Several authors model TLG as a Bayesian probability network (Ahmed 2011, Hong 2011, Wang 2013, Ahmed 2012, Li 2014). In all cases, a hierarchal topic model is placed over the documents. There have been both parametric and non-parametric approaches. The models also differ in how they handle temporality and clustering.

### Topic Models

Topics are represented as a distribution over the words in the corpus. A document in turn is represented as a mixture of these topics. In the hierarchal model, topics are modeled in a tree-structure. The closer two topics in the tree, the more similar their word-distributions. Topics closer to the root are broad, and descendants in the tree are sub-topics. For example, the story 'Robert Downey Junior stars in the Avengers' might be modeled