2016HCT Prelim.


Personal news and content curation is an exciting NLP application. Systems providing this service are often characterised by a collaborative approach that combines human and machine intelligence. As the scope of the problem increases however, so too does the importance of automation. To this end we propose a novel method for scoring news articles and other related content. It is natural to view this problem in a learning-to-rank framework. The training phase of our model first makes use of a pairwise transform. This alters the problem from the ranking of a whole corpus to many individual pairwise comparisons (is article 'a' better than article 'b'). This transformed set is then used to determine the optimal weights in a logistic regression model. These can then be used directly to classify the non-transformed test set. We also perform a comprehensive review and selection process on a large range of candidate features. Our final features involve measures of centrality, informativeness, complexity and within-group similarity.

News Ranking/Scoring

(Phelan 2009): Using Twitter to Recommend Real-Time Topical News

  • In this short paper we will consider the problem of identifying niche topical news stories. Current recommender systems are limited in their ability to identify such stories because, typically, they rely on a critical mass of user consumption before such stories can be recognised.
  • In this paper, we consider a novel alternative to conventional recommendation approaches by harnessing a popular micro- blogging service such as Twitter.

(Lin 2008): Emotion Classification of Online News Articles from the Reader’s Perspective

  • In this paper, we automatically classify documents into reader-emotion categories (useful, happy, heartwarming etc.)

(Tatar 2012): Ranking news articles based on popularity prediction

  • In this paper we address the problem of predicting the popularity of news articles based on user comments.
  • Our results indicate that prediction methods improve the ranking performance and we observed that for our dataset a simple linear predictor is best.
  • In this paper we consider the number of comments as an implicit evaluator of the interest generated by an article.
  • A common characteristic of online content is that it suffers from a decay of interest over time, and depending on the type of content, this interest may be steep or gradual.

(Liu 2007): Algorithm for Ranking News

  • In terms of examination of properties of news articles produced by news ranking function, semantic relevancy, freshness, citation count and degree of authority are combined into the model, and extended relevance is proposed.
  • In order to measure the semantic relevancy, the traditional vector model is modified and time is taken into account.
  • Set similarity metric.
  • Hard set authority score.

(Del Corso 2005): Ranking a Stream of News

  • The ranking algorithm pro- posed ranks news information, finding the most authoritative news sources and identifying the most interesting events in the different categories to which news article belongs.
  • The complexity of our algorithm is linear in the number of pieces of news still under consideration at the time of a new posting. This allow a continuous on-line process of ranking.
  • Our ranking scheme depends on two parameters, ρ ac- counting for the decay rate of freshness of news articles, and β which gives us the amount of source’s rank we want to transfer to each posted piece of news.