Hyper-heuristic approaches for data stream-based iIntrusion detection in the Internet of Things

Detecting cyber-security attacks is still a challenging task. This is due to the evolving nature of the attacks. On the other hand, existing stream data learning models with limited labelling have many limitations. Most importantly, algorithms that suffer from a limited capability to adapt to the...

Full description

Saved in:
Bibliographic Details
Main Author: Hadi, Ahmed Adnan
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/113138/1/113138.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Detecting cyber-security attacks is still a challenging task. This is due to the evolving nature of the attacks. On the other hand, existing stream data learning models with limited labelling have many limitations. Most importantly, algorithms that suffer from a limited capability to adapt to the evolving nature of data generated from network traffic are called concept drift. Hence, the algorithm must overcome the problem of dynamic updates in the internal parameters or counter the concept drift. Existing literature relies on offline trained models or incremental learning models. The former suffers from partially or fully outdated knowledge after drift occurrence, and the latter suffers from the constraints of the pre-defined hyper-parameter of the model. Thus, using neural network-based semi-supervised stream data learning is inadequate due to capture the changes in the distribution and characteristics of various classes of data while avoiding the effect of the outdated stored knowledge in neural networks (NN). Therefore, we propose a prominent approach that integrates each of the NN, a meta-heuristic based on an evolutionary genetic algorithm (GA), and a core online-offline clustering (Core). The system trains the NN on previously labelled data, and its knowledge is used to calculate the core online-offline clustering block error. Genetic optimisation is responsible for selecting the best parameters of the core clustering to minimise the error. In doing so, the old knowledge can be preserved dynamically to overcome the concept drift. Nevertheless, the various components embedded in the hyperheuristic models have created concern about the model's efficiency and whether it is an over-fitting or under-fitting free performance. Therefore, the core classifier in the hyper-heuristic approach of Intrusion Detection System (IDS) is developed to the parallel structure NN. This enables more controllability of reaching optimal learning without falling into sub-optimality because of over-fitting or under-fitting. In addition, it is considered that existing solutions do not provide a feature driftaware solution to the concept drift adaptable solution, which exploits the fact that many of the original features are non-relevant. Here, the memory consumption can be reduced by enabling a feature selection algorithm that excludes nonrelevant features and preserves the relevant ones. the algorithm is developed based on the variable length of the PSO. The reason for using variable length searching is its effectiveness in searching for high dimensional space and reducing the number of candidates' features. This is done by segmenting the space into parts after sorting the features based on their relevance. The algorithms were examined on two real datasets, namely, NSL-KDD and Landsat. The experimental results showed that the accuracy of the algorithm over the NSL-KDD dataset was 99.72%, with a memory reduction of 10%. Furthermore, this was accomplished with only 25 neurons which means a reduction of the number of neurons by a percentage of 75%. Hence, this provides a handling of the effectiveness and efficiency dilemma, which is considered a need in IoT networks. Other than that, a decrease in memory has assisted in generating better accuracy performance with more memory efficiency.