Authorea

The Framework

We address these issues with the following general evaluative framework (Fig. 3):

Before evaluation begins, clearly indicate how:
- candidate entities are chosen,
- gold-standards are to be generated,
- articles-linked to entities and,
- crowd-voting is to be employed.
Choose a sufficient (>20) number of entities that are representative of your domain. Split them into a test/train split.
As parameter selection generally involves comparing enormous amounts of models, this section is not well suited to a crowd-evaluation. As such, within the training phase we use whichever automatic methods are available to us. All exploratory analysis and model-selection is to be done strictly with the training set.
The result of the above process should be a handful of models of interest which we seek to compare. Once we have these, we use them to generate system timelines for each of the test entities.
Begin the final testing phase. This can invole automatic methods where appropriate. If we seek to demonstrate the overall expressiveness and clarity of our timeline, we should always include a human evaluation step.