SALAD: An Exploration of Split Active Learning based Unsupervised
Network Data Stream Anomaly Detection using Autoencoders
Abstract
Machine learning based intrusion detection systems monitor network data
streams for cyber attacks. Challenges in this space include detection of
unknown attacks, adaptation to changes in the data stream such as
changes in underlying behaviour, the human cost of labeling data to
retrain the machine learning model and the processing and memory
constraints of a real-time data stream. Failure to manage the
aforementioned factors could result in missed attacks, degraded
detection performance, unnecessary expense or delayed detection times.
This research evaluated autoencoders, a type of feed-forward neural
network, as online anomaly detectors for network data streams. The
autoencoder method was combined with an active learning strategy to
further reduce labeling cost and speed up training and adaptation times,
resulting in a proposed Split Active Learning Anomaly Detector (SALAD)
method. The proposed method was evaluated with the NSL-KDD, KDD Cup
1999, and UNSW-NB15 data sets, using the scikit-multiflow framework.
Results demonstrated that a novel Adaptive Anomaly Threshold method,
combined with a split active learning strategy offered superior anomaly
detection performance with a labeling budget of just 20%, significantly
reducing the required human expertise to annotate the network data.
Processing times of the autoencoder anomaly detector method were
demonstrated to be significantly lower than traditional online learning
methods, allowing for greatly improved responsiveness to attacks
occurring in real time. Future research areas are applying unsupervised
threshold methods, multi-label classification, sample annotation, and
hybrid intrusion detection.