Evaluation of machine learning techniques for imbalanced data in IDS
Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give h...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utm-ep.37080 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.370802017-06-29T07:03:41Z Evaluation of machine learning techniques for imbalanced data in IDS 2013-08 Mokaramian, Shahram TK7885-7895 Computer engineer. Computer hardware Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms. 2013-08 Thesis http://eprints.utm.my/id/eprint/37080/ http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:70060?site_name=Restricted Repository masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
TK7885-7895 Computer engineer Computer hardware |
spellingShingle |
TK7885-7895 Computer engineer Computer hardware Mokaramian, Shahram Evaluation of machine learning techniques for imbalanced data in IDS |
description |
Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Mokaramian, Shahram |
author_facet |
Mokaramian, Shahram |
author_sort |
Mokaramian, Shahram |
title |
Evaluation of machine learning techniques for imbalanced data in IDS |
title_short |
Evaluation of machine learning techniques for imbalanced data in IDS |
title_full |
Evaluation of machine learning techniques for imbalanced data in IDS |
title_fullStr |
Evaluation of machine learning techniques for imbalanced data in IDS |
title_full_unstemmed |
Evaluation of machine learning techniques for imbalanced data in IDS |
title_sort |
evaluation of machine learning techniques for imbalanced data in ids |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Computing |
granting_department |
Faculty of Computing |
publishDate |
2013 |
url |
http://eprints.utm.my/id/eprint/37080/5/ShahramMokaramianMFSKSM2013.pdf |
_version_ |
1747816498869567488 |