Model comparison
After selecting the best model for each approach, the standard 30-year average model (STA) was compared against each time-matched approach (T01, T05, T10). A repeated random k -fold cross-validation (Berrar 2019) was used to obtain 1000 model iterations (4 folds × 250 repeats) using the same feature classes and regularization multipliers of the best model per approach. At each iteration, the same training and validation folds were used for comparing pairwise the standard approach against each of the three temporal resolutions (T01, T05, T10), using AUC, CBI, and OR metrics (each separately). A 10thpercentile of training omission rate of the fully withheld data (OR-W) was also utilized for model evaluation. As described above, these withheld data were outside the temporal range used for model training (i.e., outside the 1971 to 2000 period). Nevertheless, for simplicity, environmental values from the standard 30-year average were assigned to the withheld occurrences to ensure consistency in interpreting and comparing all geographic predictions.
Significant differences in each metric were identified using a correlated t-test (Bouckaert 2003, Nadeau and Bengio 2003). This test accounts for deviations from the standard t-test assumption of data independence between iterations, which is invalid in the present analysis because the occurrences are shared between the k -folds. A Bonferroni adjustment was used to correct multiple comparisons for each of the three validation metrics (AUC, CBI, OR) and one evaluation metric (OR.W); i.e. each metric constituted a family. Each family consisted of three correlated t-tests, each obtained by comparing the time-matched approach (T01, T05, T10) against the 30-year standard (STA). Finally, the geographic predictions were visually inspected to evaluate ecological plausibility.