2.2.4 Maxent model
The maximum entropy model is derived from the maximum entropy principle,
which was proposed by Jaynes et al in 1957. According to this principle,
when only partial constraints of an unknown distribution are known, the
probability distribution that satisfies these constraints while
maximizing entropy should be selected. By maximizing entropy, the
maximum entropy principle expresses the likelihood of the chosen
distribution. Therefore, the maximum entropy model is considered the
best model among all possible probability models due to its maximum
entropy. In constructing the maximum entropy model, it is crucial to
assess the significance of each factor for the input factors of the
environment layer (Kumar. 2012). The Jackknife method, also known as the
Knife Cut method, was proposed by an expert in 1949 as a resampling
technique to reduce estimation bias in the research process. This method
is widely used for hypothesis testing and calculating confidence
intervals. It allows for the analysis of the impact of variable factors
on the predictive accuracy of a model and provides insights into the
accuracy of these variables. Additionally, the Knife Cut method serves
specific functions, such as correcting bias in statistical sampling and
conducting more accurate data testing based on statistical principles
(Miller. 1974). In order to mitigate the impact of overfitting caused by
variable factors, the knife-cut method is employed to evaluate the
contribution and importance of all variables in the model. This involves
conducting an independent variable importance analysis for each variable
and considering the role of the remaining variables after removing each
variable. The contribution rate indicates the extent to which an
environmental factor contributes to the model, with higher values
indicating a greater degree of contribution. The cumulative contribution
rate represents the cumulative value of the contribution rate. The
replacement importance measures the reduction in AUC (Area Under the
Curve) when a randomly selected environmental factor is replaced in the
training sample points. Higher values suggest that the model is more
reliant on that particular environmental factor, signifying its equal
significance (Préau et al. 2018). ROC stands for Receiver Operating
Characteristic and is used to evaluate the performance of a model in an
assessment test. The ROC curve is a graph that plots the False Positive
Rate (FPR) on the horizontal axis, which represents the probability of
correctly predicting counterexamples out of the total counterexamples,
and the True Positive Rate (TPR) on the vertical axis, which represents
the probability of correctly predicting positive counterexamples out of
the total positive counterexamples. The Area Under Curve (AUC) of the
ROC curve is a commonly used metric to measure the accuracy of the
system being tested. Unlike other evaluation metrics, the AUC value is
not affected by the threshold value and is considered more desirable.
Simply looking at the curve alone cannot accurately assess the
effectiveness of the classifier. Therefore, the AUC value is used to
indicate the predictions of the classifier on positive examples and the
probability of correctly predicting counterexamples. In theory, the AUC
value ranges from 0.5 to 1, with a value closer to 1 indicating a more
perfect test. The specific relationship between the AUC value and model
accuracy is as follows: AUC value < 0.6 indicates poor model
accuracy, AUC value between 0.6 and 0.7 indicates general model
accuracy, AUC value between 0.7 and 0.8 indicates more accurate model
accuracy, AUC value between 0.8 and 0.9 indicates accurate model
accuracy, and AUC value > 0.9 indicates extremely accurate
model accuracy. In other words, as the AUC value approaches 1, the model
accuracy increases, resulting in more accurate prediction results.