loading page

Comparing SMOTE-based-ML Methods on the unbalanced dataset: A case study on a date and pistachio fruits
  • Fatih Bal,
  • Fatih Kayaalp
Fatih Bal
Kirklareli Universitesi

Corresponding Author:[email protected]

Author Profile
Fatih Kayaalp
Duzce Universitesi
Author Profile

Abstract

The use of artificial intelligence (AI) techniques in agriculture has reached a critical point, and classification studies utilizing machine learning (ML) methods for agricultural products are being conducted intensively. However, these studies face challenges in creating balanced datasets, which can significantly impact the performance of proposed methods and lead to confusing results. In this study, we address this challenge by creating a dataset comprising data from 7 species of date palm (Phoenix dactylifera L.) and 2 species of pistachio fruit (Pistacia vera L). Our objective is to evaluate the performance of ML methods on this dataset and explore the effectiveness of the Synthetic Minority Over-Sampling Technique (SMOTE) for improving classification accuracy. Initially, we performed classification on the original dataset using popular ML methods. Among these methods, the POLY-SVM model exhibited the best performance with an accuracy ratio of 92.95%. However, we observed limitations in the classification results due to the unbalanced distribution of data among different classes. To address this issue, we applied the SMOTE technique for over-sampling, which effectively balanced the data distribution. The POLY-SVM model, when trained on the dataset after SMOTE over-sampling, achieved a significantly improved classification accuracy of 98.64%. Furthermore, our proposed model demonstrated enhanced sub-classification performance, particularly for fruit species. In conclusion, this study highlights the challenges posed by unbalanced data distribution in agricultural product classification studies using ML methods. By utilizing the SMOTE technique for data over-sampling, we successfully addressed this challenge and improved classification accuracy.