Authorea

Segmenting the Infected Part of the Fruit

We adopted Dubey’s \cite{dubey2013infected} solution of using K-Means algorithm for a simple and straightforward image segmentation. Following their work, we converted the image from the RGB color space to the L*a*b* color space. Commission Internationale d’Eclairage (CIE) designed the L*a*b* color space to match how humans perceive differences in color and luminance \cite{szeliski2010computer}, thus making it a good color space for computing distances. It is composed of a luminosity or lightness dimension (L*) and two chromaticity or color dimension (a*b*). Isolating the color information to two dimensions (in L*a*b*) makes it computationally more efficient than having the color information spread to three dimensions (in RGB) \cite{dubey2013infected}.

The pixels are then clustered in the a*b* space using the K-Means algorithm. The K-Means algorithm starts by randomly selecting \(k\) pixels as the initial centroids for the clusters. \(k\) is a user defined parameter that sets the number of clusters to be formed. The centroids represent the clusters and is used to determine which pixels belong to it. The rest of the algorithm is an iterative process and proceeds as follows: (Step 1) Assign all the pixels to the cluster with the centroid closest to them. We used the Euclidean distance in computing for the similarity – the lesser the distance the more similar they are. (Step 2) Compute for the new centroids of each cluster by getting the mean of all the pixels within that cluster. (Step 3) Repeat Steps 1 and 2 until the clusters do not change anymore.

After clustering, the image is then segmented based on the clusters formed, i.e. each cluster of pixels form a separate image. The general idea is that the infected part of the fruit would be similar in color and will tend to be in a separate cluster from the healthy part of the fruit.

Aftewards, due to the limitation of the clustering algorithm to automatically identify which pixels are infected and which are healthy, manual labeling of the clusters were done through visual review. Then, the automation problem was now viewed as a classification problem, wherein we used a Support Vector Machine (SVM) to classify infected pixels from healthy pixels.

Support Vector Machine is a supervised machine learning algorithm used for classification and regression tasks. Training a classifier requires a set of features that represent the data points and discriminate between their classes. In this case, the data points are the clusters of pixels obtained from the K-Means algorithm. We observed that humans rely on color to distinguish the infected part of the fruit. Therefore, it is logical to choose color as the main feature for the classifier.

After labeling the clusters, they were transformed into feature vectors to be used in training the classifier. The feature vector is composed of the average values of red, green, and blue (\(\mu_{r}\), \(\mu_{g}\), \(\mu_{b}\) in the RGB color space) taken over all the pixels within the cluster.