We implement a standard averaged perceptron model, as documented in the assignment specification.

We also repeated several of the experiments used throughout this analysis with a multinomial Naive Bayes classifier. As we were encouraged to use the Pereptron model and considering the limited amount of space available, the results of this comparison have not been included. It is worth noting that the Naive Bayes model seemed to perform no worse that the Perceptron in terms of accuracy, and often beat it. Furthermore, in terms of model run-time the Naive Bayes was very competitive.

Validation Dataset

We validated our model on the Internet Advertisement Data Set. We found with a single pass we achieved a 10-fold average accuracy of 0.949. This is reasonably competitive with other benchmarks on this dataset, so we are satisfied with our implementation. (See http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements for benchmarks). The best accuracy (0.951) was found by grid-searching on the number of training iterations -- the best value being three passes through.

\[\begin{aligned} a &= b \\ &= c\end{aligned}\]

Wikipedia Text Classification


All 'mean' values are evaluated as average model accuracy across all ten-folds. This is equivalent to using an F1-measure with micro-averaging. Our experiments can be divided into bag-of-word approaches, semantic approaches and combined approached. Bag of word (BOW) approaches include both the standard BOW model as well as the TF-IDF transformation. Our semantic approaches attempted to use POS-taggers and polarity/objectivity measures to predict text class. The combined approaches simply took two or more different feature sets and attempted to combine them, through changing the weighting of each feature set and using \(\chi^2\)-feature selection/reduction.