loading page

Decision Tree Post-Pruning Without Loss Of Accuracy using the SAT-PP algorithm with An Empirical Evaluation on Oncology Data
  • Teddy Lazebnik,
  • Svetlana Bunimovich-Mendrazitsky
Teddy Lazebnik
Ariel University

Corresponding Author:[email protected]

Author Profile
Svetlana Bunimovich-Mendrazitsky
Ariel University
Author Profile

Abstract

Decision tree (DT) is one of the most popular and efficient techniques in data mining. Specifically, in the clinical domain, DTs have been wildly used thanks to relatively easy interpretation and efficient computation time. However, some DT models may produce a large tree size structure which is difficult to understand and often leads to misclassification of data in the testing process. Therefore, a DT model which is a simple tree with high accuracy is the desired goal. Post pruning (PP) algorithms have been introduced to reduce the complexity of the tree structure with a minor decrease in the accuracy of classification. We propose a new Boolean satisfiability (SAT) based PP algorithm (Namely, SAT-PP algorithm) which reduces the tree size while preserving the accuracy of the unpruned tree, since in medical-related tasks, decreasing the model’s performance is something we emphatically try to avoid. Namely, in the case of medical-related tasks, one may prefer an unpruned DT model to a pruned DT model with worse performance. To evaluate the proposed algorithm and other PP algorithms, we compare the performance in terms of the model query response time and accuracy of classification using three oncology data sets. The SAT-PP DT model obtained the same accuracy and F1 score as the DT model without PP while significantly reducing computation time (6.8%).