Authorea

A naive bayes classifier.

As the proportion of data used for training increases, so does the Kappa measure.

Note the separate section for the results of cross-validation. Cross-validation splits the data into sections for training and testing on. In the case of 10-fold cross-validation, the data is split into 10 sections. Nine sections will be used to train and one to test. This is then repeated for each other section. Overall, this is a better way to train a classifier. However, as the size of a dataset grows, cross-validation may become infeasible because of the large amount of training and testing. In the case of using 100% of the dataset as a training dataset, you run the risk of training to fit only that dataset. The classifier will be less able to handle new, unknown data.