SALAD: An Exploration of Split Active Learning based Unsupervised Network Data Stream Anomaly Detection using Autoencoders

Christopher Nixon; Mohamed Sedky; Mohamed Hassan

doi:10.36227/techrxiv.14896773.v1

loading page

SALAD: An Exploration of Split Active Learning based Unsupervised Network Data Stream Anomaly Detection using Autoencoders

Christopher Nixon ,
Mohamed Sedky ,
Mohamed Hassan ,
Justin Champion

Abstract

Machine learning based intrusion detection systems monitor network data streams for cyber attacks. Challenges in this space include detection of unknown attacks, adaptation to changes in the data stream such as changes in underlying behaviour, the human cost of labeling data to retrain the machine learning model and the processing and memory constraints of a real-time data stream. Failure to manage the aforementioned factors could result in missed attacks, degraded detection performance, unnecessary expense or delayed detection times. This research evaluated autoencoders, a type of feed-forward neural network, as online anomaly detectors for network data streams. The autoencoder method was combined with an active learning strategy to further reduce labeling cost and speed up training and adaptation times, resulting in a proposed Split Active Learning Anomaly Detector (SALAD) method. The proposed method was evaluated with the NSL-KDD, KDD Cup 1999, and UNSW-NB15 data sets, using the scikit-multiflow framework. Results demonstrated that a novel Adaptive Anomaly Threshold method, combined with a split active learning strategy offered superior anomaly detection performance with a labeling budget of just 20%, significantly reducing the required human expertise to annotate the network data. Processing times of the autoencoder anomaly detector method were demonstrated to be significantly lower than traditional online learning methods, allowing for greatly improved responsiveness to attacks occurring in real time. Future research areas are applying unsupervised threshold methods, multi-label classification, sample annotation, and hybrid intrusion detection.