Authorea

Feature exploration

Aside from bag-of-words textual analysis we considered some other possible features of bullying messages. Bag-of-words analysis is limited to just the text of the tweet. It is possible that there are other features that won’t show up in text of a tweet. For example, the time a tweet is posted might help classify messages.

@-mention

Because bullying is can be surmised to be directed at a person, the initial dataset is of tweets that contain an ’@-mention’. An @-mention serves as a tool to notify a specific user.

In the bag of words model these mentions are treated like any other word. To test if the presence of an @-mention improves the accuracy of the classifier, the classification was done on the same dataset but with all @-mentions removed. This is not entirely representative, and see the Recommendation section for thoughts on how to handle this. Comparisons of classifying with versus without @-mentions included in the bag of words show no appreciable difference.

Time of tweet

The time of day a tweet is posted was also examined as a possible indicator of a bullying message. For example, bullying messages might be posted after school more often than non-bullying messages. For the 1200 manually classified messages the time of posting and the timezone of the user were extracted and added to result in the local time of posting. (According to statistical methods, the difference is significant, but I’m not sure if my methods are valid.) A problem is that the messages were collected from a 48 hour consecutive period. Any spikes might be from an event starting at that specific time. That can be solved by randomly sampling messages from a longer period of time.

Additionally, some 15 percent of messages were deleted or otherwise inaccessible at the moment the time data was collected. We thought that they might turn out to have a high proportion of bullying messages that were either deleted by twitter for breaking any rules or by the user for regretting their comment. However, this turned out not be the case. The proportion of bullying to non-bullying messages is lower than that of the original dataset. If the proportion were higher it could be added as a an additional feature: After some weeks the classifier could ’double check’ earlier messages classified as bullying, and if they were deleted, it could increase the confidence of that particular message being a bullying message. There is slight statistical evidence that the deletion of a message can be an indicator of a non-bullying message, but we didn’t implement this because of the low expected gain for the cost of implementation.