Classification of cross site scripting web pages using machine learning techniques
There are many web application threats such as SQL injection and Cross Site Scripting. According to OWASP 2013 security report, Cross Site Scripting came in third place. Cross Site Scripting is an attack that targets web applications which lack security countermeasures against untrusted data that is...
Saved in:
主要作者: | |
---|---|
格式: | Thesis |
語言: | English |
出版: |
2017
|
主題: | |
在線閱讀: | http://eprints.utm.my/id/eprint/78565/1/FaisalSalehNasserMFC2017.pdf |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
id |
my-utm-ep.78565 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.785652018-08-29T07:31:57Z Classification of cross site scripting web pages using machine learning techniques 2017-01 Al-Aswer, Faisal Saleh Nasser QA75 Electronic computers. Computer science There are many web application threats such as SQL injection and Cross Site Scripting. According to OWASP 2013 security report, Cross Site Scripting came in third place. Cross Site Scripting is an attack that targets web applications which lack security countermeasures against untrusted data that is provided by the user, and this attack take advantage of these web applications because they do not apply any input validation or output sanitization methods. Few previous works which used machine learning to detect cross site scripting attacks via classification of the web pages into two classes; malicious or benign. The previous works used too many features which considered as irrelevant and noise data because they do not have significant value on accuracy ratio which would cause complexity and decrease the performance of the classification process. They also used URL features which considered unnecessary since URL is considered as the entry point of the attack but cannot activate it since all the different kinds of cross site scripting get activated and run inside the HTML source code. In this study, we focus on how to implement feature selection through Information Gain (IG) to select the most significant features that lead to better performance and less execution time. The selected features used to classify the datasets with three different classifiers to test the performance of these features. The features used in this study were used by previous works, however with IG feature selection, we selected 14 features as the most significant features and the accuracy obtained by using these features was 95.78% compared to when using all features which was 93.11%. The recall was also improved from 88% when all features used to 92.33% when only using the 14 selected features. 2017-01 Thesis http://eprints.utm.my/id/eprint/78565/ http://eprints.utm.my/id/eprint/78565/1/FaisalSalehNasserMFC2017.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:109762 masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
QA75 Electronic computers Computer science |
spellingShingle |
QA75 Electronic computers Computer science Al-Aswer, Faisal Saleh Nasser Classification of cross site scripting web pages using machine learning techniques |
description |
There are many web application threats such as SQL injection and Cross Site Scripting. According to OWASP 2013 security report, Cross Site Scripting came in third place. Cross Site Scripting is an attack that targets web applications which lack security countermeasures against untrusted data that is provided by the user, and this attack take advantage of these web applications because they do not apply any input validation or output sanitization methods. Few previous works which used machine learning to detect cross site scripting attacks via classification of the web pages into two classes; malicious or benign. The previous works used too many features which considered as irrelevant and noise data because they do not have significant value on accuracy ratio which would cause complexity and decrease the performance of the classification process. They also used URL features which considered unnecessary since URL is considered as the entry point of the attack but cannot activate it since all the different kinds of cross site scripting get activated and run inside the HTML source code. In this study, we focus on how to implement feature selection through Information Gain (IG) to select the most significant features that lead to better performance and less execution time. The selected features used to classify the datasets with three different classifiers to test the performance of these features. The features used in this study were used by previous works, however with IG feature selection, we selected 14 features as the most significant features and the accuracy obtained by using these features was 95.78% compared to when using all features which was 93.11%. The recall was also improved from 88% when all features used to 92.33% when only using the 14 selected features. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Al-Aswer, Faisal Saleh Nasser |
author_facet |
Al-Aswer, Faisal Saleh Nasser |
author_sort |
Al-Aswer, Faisal Saleh Nasser |
title |
Classification of cross site scripting web pages using machine learning techniques |
title_short |
Classification of cross site scripting web pages using machine learning techniques |
title_full |
Classification of cross site scripting web pages using machine learning techniques |
title_fullStr |
Classification of cross site scripting web pages using machine learning techniques |
title_full_unstemmed |
Classification of cross site scripting web pages using machine learning techniques |
title_sort |
classification of cross site scripting web pages using machine learning techniques |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Computing |
granting_department |
Faculty of Computing |
publishDate |
2017 |
url |
http://eprints.utm.my/id/eprint/78565/1/FaisalSalehNasserMFC2017.pdf |
_version_ |
1747818016530235392 |