Comparing SMOTE-based-ML Methods on the unbalanced dataset: A case study
on a date and pistachio fruits
Abstract
The use of artificial intelligence (AI) techniques in agriculture has
reached a critical point, and classification studies utilizing machine
learning (ML) methods for agricultural products are being conducted
intensively. However, these studies face challenges in creating balanced
datasets, which can significantly impact the performance of proposed
methods and lead to confusing results. In this study, we address this
challenge by creating a dataset comprising data from 7 species of date
palm (Phoenix dactylifera L.) and 2 species of pistachio fruit (Pistacia
vera L). Our objective is to evaluate the performance of ML methods on
this dataset and explore the effectiveness of the Synthetic Minority
Over-Sampling Technique (SMOTE) for improving classification accuracy.
Initially, we performed classification on the original dataset using
popular ML methods. Among these methods, the POLY-SVM model exhibited
the best performance with an accuracy ratio of 92.95%. However, we
observed limitations in the classification results due to the unbalanced
distribution of data among different classes. To address this issue, we
applied the SMOTE technique for over-sampling, which effectively
balanced the data distribution. The POLY-SVM model, when trained on the
dataset after SMOTE over-sampling, achieved a significantly improved
classification accuracy of 98.64%. Furthermore, our proposed model
demonstrated enhanced sub-classification performance, particularly for
fruit species. In conclusion, this study highlights the challenges posed
by unbalanced data distribution in agricultural product classification
studies using ML methods. By utilizing the SMOTE technique for data
over-sampling, we successfully addressed this challenge and improved
classification accuracy.