3.1. Reliability assessment of MAGT and ALT
The simulation results were compared with the in situ observation data using cross-validation. A comparison of the five results (Figure 2) reveals that there was no significant bias between the simulated values and the available borehole data on the QTP, but the RMSE and R2 of the ensemble method imply that it was more reliable than the other four methods. The consistency between the measured and simulated MAGT at most sites for the five models was less than 1°C. Among these models, the ensemble method performed optimally, with a simulation accuracy for 80 sites of < 1°C, which account for 95% of the total sites. It exhibited a strong positive correlation between the simulated and observed MAGT (R2 = 0.73, p < 0.001). Overall, the ensemble method (Figure 2(e)) displayed the highest accuracy among the models in forecasting the MAGT. For this reason, the ensemble model was selected to simulate the present MAGT and future trends.
Similarly, the simulated ALT results were compared with the insitu observation data using the same statistical method. For ALT, the best fitting result was RF (Figure 3(d)), which exhibited the highest R2 and the lowest RMSE values of 0.51 and 0.69 m, respectively. Although the GLM method exhibited a smaller bias, the difference between the two methods was not large. Overall, the validations for the five results did not differ significantly. Based on further comparison of Figures 2 and 3, it can be seen that the fitting accuracy of MAGT was better than that of ALT, withR2 values of the corresponding optimal fitting results of 0.73 and 0.51, respectively. This is due to the fact that the spatial heterogeneity of the ALT is larger than that of the MAGT on the QTP, and the ALT will fluctuate greatly during climate change within a short period (Cao et al., 2017).
We calculated the error distribution for five typical regions separately (Table 1). Overall, the distribution of RMSE and bias on the QTP was relatively uniform, with the exception of the RMSE in the AEJIR. The reason for this may be that there are relatively few observation sites in the northern part of the whole investigated regions, and the simulating accuracy has high sensitivity to single points and poor regional representation. In addition, permafrost along the G109 Highway is greatly affected by human activities, and there are more observation sites in this region. Compared with the error statistics of the entire QTP, the RMSE of MAGT in the G109IR was relatively small, while the RMSE of ALT was relatively large. Thus, we may conclude that MAGT is relatively less affected by human activities, while ALT is more affected by disturbance and displays great spatial heterogeneity. In terms of bias, the region with the largest bias was GZIR. The reason is that GZIR located in the transition zone between permafrost and seasonally frozen ground, and the accuracy of the results would be affected to some extent.