Measuring the Infestation Level

The next step involved the segmentation of the infected and non-infected parts of the pods. We adopted Dubey’s \cite{dubey2013infected} solution of using K-Means algorithm for a simple and straightforward image segmentation. Following their work, we converted the image from the RGB color space to the L*a*b* color space. Commission Internationale d’Eclairage (CIE) designed the L*a*b* color space to match how humans perceive differences in color and luminance \cite{szeliski2010computer}, thus making it a good color space for computing distances. It is composed of a luminosity or lightness dimension (L*) and two chromaticity or color dimension (a*b*). Isolating the color information to two dimensions (in L*a*b*) makes it computationally more efficient than having the color information spread to three dimensions (in RGB) \cite{dubey2013infected}.

The pixels are then clustered in the a*b* space using the K-Means algorithm. The K-Means algorithm starts by randomly selecting \(k\) pixels as the initial centroids for the clusters. \(k\) is a user defined parameter that sets the number of clusters to be formed. The centroids represent the clusters and is used to determine which pixels belong to it. The rest of the algorithm is an iterative process and proceeds as follows: (Step 1) Assign all the pixels to the cluster with the centroid closest to them. We used the Euclidean distance in computing for the similarity – the lesser the distance the more similar they are. (Step 2) Compute for the new centroids of each cluster by getting the mean of all the pixels within that cluster. (Step 3) Repeat Steps 1 and 2 until the clusters do not change anymore.

After clustering, the image is then segmented based on the clusters formed, i.e. each cluster of pixels form a separate image. The general idea is that the infected part of the fruit would be similar in color and will tend to be in a separate cluster from the healthy part of the fruit.

Afterwards, manual labeling of formed clusters were done due to the limitations of the clustering algorithm to automatically identify which pixels are infected and which are not. Then, automating the labeling process was now viewed as a classification problem. For the next step, we used a Support Vector Machine (SVM) to classify infected clusters from healthy clusters.

Support Vector Machine is a supervised machine learning algorithm used for classification and regression tasks. Training a classifier requires a set of features that represent the data points and discriminate between their classes. In this case, the data points are the clusters of pixels obtained from the k-Means algorithm. We observed that humans rely on color to distinguish the infected part of the fruit. Therefore, it is logical to choose color as the main feature for the classifier.

After labeling the clusters, they were transformed into feature vectors to be used in training the classifier. We tested on two sets of feature vectors, one composed of the average values of red, green, and blue in the RGB color space, and the other composed of the average values of a* and b* in the L*a*b* color space, both taken over all the pixels within the cluster. They are basically the centroids of the clusters represented in two different color spaces.

Lastly, the infestation level \(I\) assigned to the fruit is measured by the ratio of the area of the disease \(A_{d}\) to the area of the fruit \(A_{f}\) divided by 2 (See Equation \ref{eq:infLevel}). The area of disease and area of the fruit is estimated by number of infected pixels associated with the disease and the number of pixels of the cacao respectively. The adjustment factor \(\frac{1}{2}\) is based on the reason that only one side of the cacao pod can be captured by an image.

\begin{equation} \label{eq:infLevel} \label{eq:infLevel}I=\frac{A_{d}}{2A_{f}}×100(\%)\nonumber \\ \end{equation}

The infestation level can then be mapped to the severity index of Alvindia et al. \cite{alvindia2015revisiting}, or to a different index depending on its intended use.