Authorea

Robert Neil Leong edited section_Conclusion_and_Recommendation_In__.tex about 8 years ago

Commit id: 2ed4ac36d843266489e9be6d1736c9073ed02b62

deletions | additions

\section{Conclusion and Recommendation} In summary, we used the k-Means algorithm to group the pixels into healthy and infected clusters. The clusters were then labeled and used to train an SVM classifier that would automatically determine which clusters contain infected pixels and which clusters contain healthy pixels. The objective of our research is on measuring the size of the defect, manifested physically due to a disease, for the main purpose of getting a quantitative measure for the rate of disease spreading. Wang et al. \cite{wang2013detection} used histograms of image patches as features in training a LinearSVM Linear SVM to detecting regions that are most likely to have defects. They were able to surround the defects with a bounding box, but it would not be a accurate measure since the shape of the defects are not rectangular. Jhuria's \cite{jhuria2013image} work on the other hand, used a clustering approach in segmenting and measuring the defects. However, users had to manually label the clusters as either part of the defect or part of the healthy fruit. Lopez \cite{lopez2010automatic} used a multivariate image analysis approach in detecting fruit defects. They had a high accuracy of about 91\%, which is a little bit higher than our achieved accuracy of 89\%, but their approach of using a the $T^2$ threshold to determine if the pixels are healthy or defective may not be suitable with cacao pods because the color of cacao pods are uneven and it varies as it grows. That is, unlike oranges and mandarins, cacao pods may have red green portions, or orange and purple portions which may be mistaken as defects because of the mismatching colors. Based on the criteria described in the previous sections, a bigger $k$ would generally give better results, on both K-Means image segmentation and SVM classification. This is because increasing the number of clusters lessens the variance within clusters, thus, decreasing the chances of mixing healthy and infected pixels in the same cluster. However, more clusters would mean additional computational complexity and longer processing times. K-Means alone has a computational complexity of $O(n^{dk+1}\log{}n)$ \cite{inaba1994applications}, where $n$ refers to the number of samples, $k$ refers to the number of clusters, and $d$ refers to the number of dimensions. This implies that the complexity increases exponentially as the number of clusters increase. Our experiments show that using 4 clusters gives a good balance between segmentation performance and computational complexity. As shown in Figure \ref{fig:accuracyxk}, the accuracy starts to plateau after 4 clusters, i.e. increasing the number of clusters would only give a small increase in accuracy while exponentially increasing complexity.