Authorea

Daniel Stanley Tan edited We_trained_a_Support_Vector__.tex about 8 years ago

Commit id: 364bb66ba52b72e0b2f1442684c21680eb33a52f

deletions | additions

We trained a Support Vector Machine (SVM) classifier with a quadratic kernel on the different values of $k$. Figure \ref{fig:scatterSVM} shows a scatter plot of the cluster centroids in a*b* space. The colors tell whether the cluster is infected or healthy. The 'x' symbols represent the points that were misclassified. We chose a quadratic kernel because the scatter plot show that the clusters cannot be easily separated by a linear function. The Table \ref{tab:svmTableRGB} shows the performance of the SVM classifier. classifier using the mean values of the channels in RGB color space as features. The values were computed from a 10-fold cross validation, i.e. the SVM classifier was trained and evaluated 10 times with each iteration having 90\% randomly selected data points for training and 10\% randomly selected data points for testing. At $k=2$, the SVM classifier had a low accuracy. Poor segmentation of the images during the clustering step affected the classifier leading to a low accuracy. Increasing $k$ to 3 and 4 significantly improved the results, having up to 82.2\% and 85.8\% accuracy respectively. We tried to see how much the classifier would improve by having more clusters, but even after doubling the value of $k$ the accuracy only increased by 1.3\%. Similarly, Table \ref{tab:svmTableLAB} shows the performance of the SVM classifier using the mean values of a* and b* in L*a*b* color space as features. At all values of $k$, the L*a*b* color space had an accuracy equal or better than that of the RGB color space. We also trained a model with $k$ set to 8 to see the effect of increasing the number of clusters. It is interesting to observe that at $k=8$ the accuracy went down. This is because the number of samples and the granularity of the samples increase as the number of clusters increase. Consequently, more number of points are near the boundary where the model is most likely to misclassify. General trends with different levels of $k$ were also observed for both sensitivity and specificity at both color spaces. Figure \ref{fig:accuracyxk} plots the accuracy as a function of $k$. It shows that the trend of the accuracy as $k$ increases for both color spaces. It implies that increasing $k$ further would have minimal effect on the accuracy of the classifier. Furthermore, at some point, particularly after the $k=4$, it is expected that increasing the $k$ will generally lead to the reduction of the accuracy of the SVM. Another interesting observation in this figure is the consistency of SVM applied on the L*a*b* color space to produce more accurate classification than that of the RGB color space.