Authorea

Xavier Holt edited The_backbone_of_our_baseline__.md almost 8 years ago

Commit id: f7308829736afbc290ef6ea6c9383a4da26671e4

deletions | additions

## Evaluation The backbone of our baseline model is a binary-classifier. We analyse the performance of this classifier in isolation. In ensuring the atomic elements of the baseline model perform well we seek to argue that the model is a reasonable benchmark. ## ### Dataset and Experimental Design We take one-thousand briefs from August 2015, five-hundred from November 2015 and five-hundred from December 2015. These constitute our training, tuning and test sets respectively. We constructed a family of classifiers using our training set with different hyper-parameter configurations. These were compared using their performance on the tuning set, and the strongest model was evaluated on the test set. This is the score presented in all results below.

Our experimental parameters were the set of features used as well as the type of classifier. We tested a range of subset-configurations (indicated below) and compared a logistic-regression (logReg) model against one based on random-forests (rF). The hyperparameters of the logReg model were the penalty-metric (\(l^1-, l^2-\) or mixed-norms) and the regularisation parameter. In our rF model we optimised over maximum tree-depth. ## ### Results We see that our best AUC score of `0.84` used an rF model trained on the full set of features **(Fig. ?)**. We include the full ROC curve for this configuration **(Fig. ?)**.

* Combining articles which independently are likely to be useful doesn’t give us any guarantee about the overall quality/coverage. * We don’t directly model diversity of articles. * We don’t account for the temporal aspect. * BUT: The binary version of the model is a useful component in a more structured model.