Personal news and content curation is an exciting NLP application. Systems providing this service are often characterised by a collaborative approach that combines human and machine intelligence. As the scope of the problem increases however, so too does the importance of automation. To this end we propose a novel method for scoring news articles and other related content. It is natural to view this problem in a learning-to-rank framework. The training phase of our model first makes use of a pairwise transform. This alters the problem from the ranking of a whole corpus to many individual pairwise comparisons (is article 'a' better than article 'b'). This transformed set is then used to determine the optimal weights in a logistic regression model. These can then be used directly to classify the non-transformed test set. We also perform a comprehensive review and selection process on a large range of candidate features. Our final features involve measures of centrality, informativeness, complexity and within-group similarity.
describing a model for recognizing these phenomena in social media, such as “tweets"
five data sets retrieved from Twitter taking advantage of user-generated tags, such as “#humor" and “#irony"
irony detection [44,45,10,35], satire detection , and sarcasm detection [43,18]
decision tree + frequency-weighted term vector
when considering the whole set of features, humor reaches up to 93% of accuracy (Table 3), whereas irony markedly improves its score, reaching up to 90% in its best result for binary classification
multi-class problem was 80% accuracy for both
the role played by the last feature (emotional scenarios) on the classifications is significant. Considering the three categories (activation, imagery, pleasantness)