Xavier Holt edited The_Baseline_Model_as_a__.md  almost 8 years ago

Commit id: b2e2ee9138c0b33c5e28f2a7ef8061c94f47b426

deletions | additions      

       

### Results  We see that our best AUC score of `0.84` used an rF model trained on the full set of features **(Fig. ?)**. We include the full ROC curve for this configuration **(Fig. ?)**.  In fact rF models outperformed their logReg counterparts uniformly. Additionally, rF models were particularly good at consolidating the different features; in contrast to the logReg model, adding a feature to the rF model never decreased performance. The logReg model also made particularly poor use of the 'freshness/recency' feature. This was a noisy feature with several large outliers. As rF models are highly robust, we are unsurprised by this finding **(Fig. ?)**. * Reasonable performance, but model is very simplistic.  * Combining articles which independently are likely to be useful doesn’t give us any guarantee about the overall quality/coverage.  * We don’t directly model diversity of articles.  * We don’t account for the temporal aspect.   * BUT: The binary version of the model is a useful component in a more structured model.