[section] [theorem] [theorem]Lemma [subsection]

In \ref{figure-biasVariance}, the authors fit a regression model on data to illustrate the interaction between bias and variance. The model increases complexity when using a greater number of features as more parameters need to be fit during the optimization procedure. Thus the model increases in complexity, as measured in its degrees of freedom.

At first the bias of the model is high both for the training and testing sets, and as a result we would expect to have a high generalization error. Then the overall error decreases as complexity increases. This behavior is expected because the model learns to better fit the data to give a better model output. We have that the expected prediction error, estimated by averaging over the test set’ prediction error’s, also decreases when the model’s complexity is increased. However when the model starts to overfit the data, the test error starts to rise whilst the train error keeps decreasing. This situation is significant to our prediction error estimation cause it hinders that the model has lost predictive power due to an increase in variance.

In this situation, a common heuristic to select the best model is to stop increasing the model’s complexity once the \(EPE\) stops decreasing.