salima - Authorea

Bankruptcy prediction is one of the most important research topics in the area of accounting and finance. The rapid increase of data science, artificial intelligence, and machine learning has led researchers to develop an accurate bankruptcy prediction model. Recent studies show that ensemble methods achieve better performance than traditional machine learning models for predicting corporate failure, especially with highly imbalanced datasets. However, the black box property of these techniques remains difficult to interpret the result and generate corporate classes without any explanation. To this end, we propose to build an accurate and interpretable classification model that generates a set of prediction rules for output. In this paper, a semi-supervised Tri-eXtreme Gradient Boosting (Tri-XGBoost) is suggested. In the proposed approach, three different xgboost methods are applied as the weak classifiers (gbtree xgboost, gblinear xgboost, and dart xgboost) combined with sampling methods such as Borderline-Smote (BLSmote) and Random under-sampling (RUS) to balance the distribution of the datasets. In addition, the xgboost is applied to choose the most important features which increase the predictive accuracy. Finally, our result is presented in the form of “IF-THEN” rules to enhance the comprehensibility of the model by both applicants and experts. Our proposed model is validated using the Polish bankruptcy imbalanced datasets. The experimental results confirm the performance of our proposed method compared to the existing methods with an AUC, G-mean and F1-score ranging from 91% to 97%.