Cross-validation techniques
To assess the predictive performance of any given combination of bibliometric indicators in practice it is necessary to determine how the classification results will generalise to an independent (i.e. unknown) data set. For this purpose, cross-validation techniques are commonly employed to provide an indication of model validity when considering out-of-sample predictions. This is accomplished by sequentially training and then generating test predictions from different subset decompositions of the original data, and using the average number of misclassified observations as a means to rank each predictor grouping. In doing so, cross-validation helps to address the risk of over-fitting models that are based on limited sample sizes, but equally provides a means to identify the most suitable predictor groupings to use for model building purposes based on their robustness to misclassifications. Cross-validation techniques are generally grouped into either exhaustive or non-exhaustive categories, as shown in Table \ref{table:cross-validation_techniques}: