Authorea

[section] [theorem] [theorem]Lemma [subsection]

CV Scores in Classification Learning

In binary classification, the contingency table summarizes all of the learner’s training performance. This shows the count the amount of samples that fall into one of the four groups derived from the comparison between the model’s output and the observed data for the labels.

Let \(\hat{f}\) be our fit model, from the data with a CV procedure. Every sample has a target value \(y\) and a predicted outcome \(\hat{y}\) and there are only four possible outcomes for these two variables with two values for each. We can express the models’ target outcomes \(\hat{y}\) into the positive (\(\hat{P}\)) or negative (\(\hat{N}\)) categories and the same with actual target data into the positive (\(P\)) or negative (\(N\)) categories.

To assess the performance of the classification algorithm and chose the best model we must decide on how the CV procedure will value two different models. The idea is to quantify the mismatch between the target and the predicted value. Here loss functions are also known as scores, measures or utility functions and are built by looking at how many times an algorithm misclassifies instances and where is the misclassification happening. To visualize this, a confusion table with the following format is drafted:

c >r @c @c @l & & &
& & P \((0)\) & N \((1)\)
& P \((0)\) & & &
& N \((1)\) & & &

In the confusion table, cell values count the amount of instances that fall into each of the four possible outcomes and scores are constructed from these values. The focus will be in measuring \(FP\) and \(FN\) volumes.

Some of the most used metrics include the following:

True Positive Rate (Recall): \(\frac{TP}{P} = \frac{TP}{TP + FN}\)
This rate measures the percentage of real positive values captured by the algorithm. A high recall of the algorithm indicates that a high number of the real positive labels were classified as positive.
Positive Predictive Value (Precision): \(\frac{TP}{\hat{P}} = \frac{TP}{TP + FP}\)
This rate measures the overconfidence of the algorithm in its predictions, a high precision indicates the value of the predictions.
True Negative Rate (Specificity): \(\frac{TN}{N} = \frac{TN}{TN + FP}\)
This rate measures the percentage of real negative values captured by the algorithm.
False Positive Rate (Fall-Out): \(FPR = 1 - SPC\)
This rate measures the percentage of false negative values misclassified by the algorithm.
Accuracy: \(\frac{TP + TN}{P + N} = \frac{TP}{TP + FP}\)
This rate measures the overconfidence of the algorithm in its predictions.
F1 Score: \(\frac{TP + TN}{P + N} = \frac{TP}{TP + FP} = 2 \frac{1}{ \frac{1}{recall} + \frac{1}{precision} }\)
This is the harmonic mean of the recall and the precision. It’s advantage is that it can capture both of the scores in equal weight. Its values range in the \([0,1 ]\) domain and are ordered in the sense that perfect classifiers have an \(F1\) score of 1.