Model comparison
After selecting the best model for each approach, the standard 30-year
average model (STA) was compared against each time-matched approach
(T01, T05, T10). A repeated random k -fold cross-validation
(Berrar 2019) was used to obtain 1000 model iterations (4 folds × 250
repeats) using the same feature classes and regularization multipliers
of the best model per approach. At each iteration, the same training and
validation folds were used for comparing pairwise the standard approach
against each of the three temporal resolutions (T01, T05, T10), using
AUC, CBI, and OR metrics (each separately). A 10thpercentile of training omission rate of the fully withheld data (OR-W)
was also utilized for model evaluation. As described above, these
withheld data were outside the temporal range used for model training
(i.e., outside the 1971 to 2000 period). Nevertheless, for simplicity,
environmental values from the standard 30-year average were assigned to
the withheld occurrences to ensure consistency in interpreting and
comparing all geographic predictions.
Significant differences in each metric were identified using a
correlated t-test (Bouckaert 2003, Nadeau and Bengio 2003). This test
accounts for deviations from the standard t-test assumption of data
independence between iterations, which is invalid in the present
analysis because the occurrences are shared between the k -folds.
A Bonferroni adjustment was used to correct multiple comparisons for
each of the three validation metrics (AUC, CBI, OR) and one evaluation
metric (OR.W); i.e. each metric constituted a family. Each family
consisted of three correlated t-tests, each obtained by comparing the
time-matched approach (T01, T05, T10) against the 30-year standard
(STA). Finally, the geographic predictions were visually inspected to
evaluate ecological plausibility.