2.2.4 Maxent model
The maximum entropy model is derived from the maximum entropy principle, which was proposed by Jaynes et al in 1957. According to this principle, when only partial constraints of an unknown distribution are known, the probability distribution that satisfies these constraints while maximizing entropy should be selected. By maximizing entropy, the maximum entropy principle expresses the likelihood of the chosen distribution. Therefore, the maximum entropy model is considered the best model among all possible probability models due to its maximum entropy. In constructing the maximum entropy model, it is crucial to assess the significance of each factor for the input factors of the environment layer (Kumar. 2012). The Jackknife method, also known as the Knife Cut method, was proposed by an expert in 1949 as a resampling technique to reduce estimation bias in the research process. This method is widely used for hypothesis testing and calculating confidence intervals. It allows for the analysis of the impact of variable factors on the predictive accuracy of a model and provides insights into the accuracy of these variables. Additionally, the Knife Cut method serves specific functions, such as correcting bias in statistical sampling and conducting more accurate data testing based on statistical principles (Miller. 1974). In order to mitigate the impact of overfitting caused by variable factors, the knife-cut method is employed to evaluate the contribution and importance of all variables in the model. This involves conducting an independent variable importance analysis for each variable and considering the role of the remaining variables after removing each variable. The contribution rate indicates the extent to which an environmental factor contributes to the model, with higher values indicating a greater degree of contribution. The cumulative contribution rate represents the cumulative value of the contribution rate. The replacement importance measures the reduction in AUC (Area Under the Curve) when a randomly selected environmental factor is replaced in the training sample points. Higher values suggest that the model is more reliant on that particular environmental factor, signifying its equal significance (Préau et al. 2018). ROC stands for Receiver Operating Characteristic and is used to evaluate the performance of a model in an assessment test. The ROC curve is a graph that plots the False Positive Rate (FPR) on the horizontal axis, which represents the probability of correctly predicting counterexamples out of the total counterexamples, and the True Positive Rate (TPR) on the vertical axis, which represents the probability of correctly predicting positive counterexamples out of the total positive counterexamples. The Area Under Curve (AUC) of the ROC curve is a commonly used metric to measure the accuracy of the system being tested. Unlike other evaluation metrics, the AUC value is not affected by the threshold value and is considered more desirable. Simply looking at the curve alone cannot accurately assess the effectiveness of the classifier. Therefore, the AUC value is used to indicate the predictions of the classifier on positive examples and the probability of correctly predicting counterexamples. In theory, the AUC value ranges from 0.5 to 1, with a value closer to 1 indicating a more perfect test. The specific relationship between the AUC value and model accuracy is as follows: AUC value < 0.6 indicates poor model accuracy, AUC value between 0.6 and 0.7 indicates general model accuracy, AUC value between 0.7 and 0.8 indicates more accurate model accuracy, AUC value between 0.8 and 0.9 indicates accurate model accuracy, and AUC value > 0.9 indicates extremely accurate model accuracy. In other words, as the AUC value approaches 1, the model accuracy increases, resulting in more accurate prediction results.