The results compare five supervised data mining algorithms using WEKA (Waikato Environment for Knowledge Analysis) 3.8.4 data mining software. Performance is evaluated by algorithms such as Naïve Bayes, Bayes Network, K-Nearest Neighbor (KNN), J48, and Random Tree. Comparison of data mining algorithm results based on error rate, processing time, precision value, and accuracy. This analysis applies the following predictors to evaluate the accuracy metrics; to determine which is the most accurate predictor. [7]
A Bayesian classifier is founded on the notion that predicting the values of features for elements of that class is the function of a (natural) class, and it is a probabilistic graphical model that represents information about a set of random variables [8]. For this study, the Bayes Network and Naïve Bayes algorithm are used for the predictive modeling.
Decision trees work to evaluate an instance of data by constructing a tree, beginning at the root of the tree and progressing to the leaves (roots) before a prediction can be made. The method of constructing a decision tree works by greedily choosing the best split point to make predictions and repeating the process until a fixed depth is reached by the tree. It is pruned after the tree is developed in order to boost the capacity of the model to generalize to new data [9]. The J48 algorithm is used to classify various applications and to obtain correct classification results. Random Forest is an Ensemble Learning Algorithm that works by constructing a multitude of decision-making trees at training time and producing the predicted class. [10]
The K-Nearest Neighbors algorithm operates by storing and querying the full training dataset to find the most related training patterns when making a prediction [11]. The k-NN algorithm is used for the calculation of constant variables in k-NN regression. Also, this algorithm uses a weighted average of the closest k neighbors, weighted by their distance inversely. This algorithm works as follows: Root Mean Squared Error dependent. This is achieved with the aid of cross-validation. The key downside to KNN being increasingly slower as data volume expands makes it an inefficient alternative in conditions where predictions need to be done quickly [12].