Xavier Holt edited Abstract_Personal_news_and_content__.md  over 8 years ago

Commit id: 4bb421dfab38170dbf3e488e1917536f82aa8622

deletions | additions      

       

# Abstract  Personal news and content curation is an exciting NLP application. Systems providing this service are often characterised by a collaborative approach that combines human and machine intelligence. As the scope of the problem increases however, so too does the importance of automation. To this end we propose a novel method for scoring news articles and other related content. It is natural to frame view  this problem in a learning-to-rank framework. The training phase of our model first makes use of a pairwise transform. This alters the problem from the ranking of a whole corpus to many individual pairwise comparisons (is article 'a' better than article 'b'). This transformed set is then used to determine the optimal weights in a logistic regression model. These can then be used directly to classify the non-transformed test set. We also perform a comprehensive review and selection process on a large range of candidate features. Our final features involve measures of centrality, informativeness, complexity and within-group similarity.