Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd

Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterati...

Full description

Saved in:
Bibliographic Details
Main Author: Nur Farahaina, Idris
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ump-ir.37640
record_format uketd_dc
spelling my-ump-ir.376402023-09-15T08:10:42Z Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd 2022-02 Nur Farahaina, Idris Q Science (General) QA75 Electronic computers. Computer science Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterative Dichotomiser 3 (ID3), CART, and C4.5. ID3 has the most advantages among the three algorithms, especially in processing time, as it builds the fastest tree with short depth. However, despite the decision tree’s commonness in handling classification problems, it suffers problems like high variance and overfitting, leading to poor generalisation. The combination of fuzzy and ID3 algorithm manages the data more efficiently as it combines both the advantages of fuzzy and decision tree. For the proposed technique of the FID3-DBD algorithm, the continuous and discrete (integer) attributes would be defined in the linguistic values of the fuzzy sets, and the FUZZYDBD method is being used to set up the fuzzy sets’ parameters. Replacement with the linguistic labels of fuzzy sets with the highest compatibility of input values has also been done before the tree induction occurs. The proposed technique solves the limitation of the classic ID3 algorithm that cannot classify the continuous-valued attributes and, at the same time, increase the classification accuracy. The bagging method was then applied to the FID3-DBD algorithm to overcome overfitting problems and high variance in decision trees. Four breast cancer datasets were used to evaluate the classification accuracy: Wisconsin Breast Cancer (Original) dataset, WDBC (Diagnostic) dataset, Breast Cancer Coimbra dataset, and Mammographic Mass dataset. All those datasets were acquired from the UCI machine learning repository. This study aims to solve the limitation of the classic ID3 algorithm that is unable to classify continuous data well and overcome the high variance and overfitting issues. This research methodology consists of four fundamental steps: literature review, data collection, experiment implementation, and report writing. The FID3-DBD algorithm acquired the classification accuracy of 94.362% for the Wisconsin Breast Cancer (Original) dataset, 94.358% for the WDBC (Diagnostic) dataset, 81.119% for the Mammographic Mass dataset and 64.224% for the Coimbra dataset. The BFID3-DBD algorithm obtained the classification accuracy of 96.003% for the Wisconsin Breast Cancer (Original) dataset, 95.273% for the WDBC (Diagnostic) dataset, 81.590% for the Mammographic Mass dataset and 68.966% for the Coimbra dataset. The study verified that the FID3-DBD algorithm could classify the continuous data, and the BFID3-DBD algorithm overcame the overfitting issue, reduced high variance, and increased test data classification accuracy. 2022-02 Thesis http://umpir.ump.edu.my/id/eprint/37640/ http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf pdf en public masters Universiti Malaysia Pahang Faculty of Computing Mohd Arfian, Ismail
institution Universiti Malaysia Pahang Al-Sultan Abdullah
collection UMPSA Institutional Repository
language English
advisor Mohd Arfian, Ismail
topic Q Science (General)
Q Science (General)
spellingShingle Q Science (General)
Q Science (General)
Nur Farahaina, Idris
Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
description Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterative Dichotomiser 3 (ID3), CART, and C4.5. ID3 has the most advantages among the three algorithms, especially in processing time, as it builds the fastest tree with short depth. However, despite the decision tree’s commonness in handling classification problems, it suffers problems like high variance and overfitting, leading to poor generalisation. The combination of fuzzy and ID3 algorithm manages the data more efficiently as it combines both the advantages of fuzzy and decision tree. For the proposed technique of the FID3-DBD algorithm, the continuous and discrete (integer) attributes would be defined in the linguistic values of the fuzzy sets, and the FUZZYDBD method is being used to set up the fuzzy sets’ parameters. Replacement with the linguistic labels of fuzzy sets with the highest compatibility of input values has also been done before the tree induction occurs. The proposed technique solves the limitation of the classic ID3 algorithm that cannot classify the continuous-valued attributes and, at the same time, increase the classification accuracy. The bagging method was then applied to the FID3-DBD algorithm to overcome overfitting problems and high variance in decision trees. Four breast cancer datasets were used to evaluate the classification accuracy: Wisconsin Breast Cancer (Original) dataset, WDBC (Diagnostic) dataset, Breast Cancer Coimbra dataset, and Mammographic Mass dataset. All those datasets were acquired from the UCI machine learning repository. This study aims to solve the limitation of the classic ID3 algorithm that is unable to classify continuous data well and overcome the high variance and overfitting issues. This research methodology consists of four fundamental steps: literature review, data collection, experiment implementation, and report writing. The FID3-DBD algorithm acquired the classification accuracy of 94.362% for the Wisconsin Breast Cancer (Original) dataset, 94.358% for the WDBC (Diagnostic) dataset, 81.119% for the Mammographic Mass dataset and 64.224% for the Coimbra dataset. The BFID3-DBD algorithm obtained the classification accuracy of 96.003% for the Wisconsin Breast Cancer (Original) dataset, 95.273% for the WDBC (Diagnostic) dataset, 81.590% for the Mammographic Mass dataset and 68.966% for the Coimbra dataset. The study verified that the FID3-DBD algorithm could classify the continuous data, and the BFID3-DBD algorithm overcame the overfitting issue, reduced high variance, and increased test data classification accuracy.
format Thesis
qualification_level Master's degree
author Nur Farahaina, Idris
author_facet Nur Farahaina, Idris
author_sort Nur Farahaina, Idris
title Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_short Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_full Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_fullStr Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_full_unstemmed Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_sort classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
granting_institution Universiti Malaysia Pahang
granting_department Faculty of Computing
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf
_version_ 1783732271687139328