Rough set approach for categorical data clustering

A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependen...

Full description

Saved in:

Bibliographic Details
Main Author:	Herawan, Tutut
Format:	Thesis
Language:	English English English
Published:	2010
Subjects:	QA Mathematics QA71-90 Instruments and machines
Online Access:	http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-uthm-ep.3609
record_format	uketd_dc
spelling	my-uthm-ep.36092022-02-03T01:53:46Z Rough set approach for categorical data clustering 2010-03 Herawan, Tutut QA Mathematics QA71-90 Instruments and machines A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependency Attributes (MDA) to improve the previous techniques due to these issues. The proposed technique is based on rough set theory by taking into account the dependency of attributes of an information system. The main contribution of this technique is to introduce a new technique to classify objects from categorical datasets which has better performance as compared to the baseline techniques. The algorithm of the proposed technique is implemented in MATLAB® version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is Windows XP Professional SP3. Results collected during the experiments on four small datasets and thirteen UCI benchmark datasets for selecting a clustering attribute show that the proposed MDA technique is an efficient approach in terms of accuracy and computational complexity as compared to BC, TR and MMR techniques. For the clusters purity, the results on Soybean and Zoo datasets show that MDA technique provided better purity up to 17% and 9%, respectively. The experimental result on supplier chain management clustering also demonstrates how MDA technique can contribute to practical system and establish the better performance for computation complexity and clusters purity up to 90% and 23%, respectively. 2010-03 Thesis http://eprints.uthm.edu.my/3609/ http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf text en public http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf text en validuser phd doctoral Universiti Tun Hussein Malaysia Fakulti Sains Komputer dan Teknologi Maklumat
institution	Universiti Tun Hussein Onn Malaysia
collection	UTHM Institutional Repository
language	English English English
topic	QA Mathematics QA71-90 Instruments and machines
spellingShingle	QA Mathematics QA71-90 Instruments and machines Herawan, Tutut Rough set approach for categorical data clustering
description	A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependency Attributes (MDA) to improve the previous techniques due to these issues. The proposed technique is based on rough set theory by taking into account the dependency of attributes of an information system. The main contribution of this technique is to introduce a new technique to classify objects from categorical datasets which has better performance as compared to the baseline techniques. The algorithm of the proposed technique is implemented in MATLAB® version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is Windows XP Professional SP3. Results collected during the experiments on four small datasets and thirteen UCI benchmark datasets for selecting a clustering attribute show that the proposed MDA technique is an efficient approach in terms of accuracy and computational complexity as compared to BC, TR and MMR techniques. For the clusters purity, the results on Soybean and Zoo datasets show that MDA technique provided better purity up to 17% and 9%, respectively. The experimental result on supplier chain management clustering also demonstrates how MDA technique can contribute to practical system and establish the better performance for computation complexity and clusters purity up to 90% and 23%, respectively.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Herawan, Tutut
author_facet	Herawan, Tutut
author_sort	Herawan, Tutut
title	Rough set approach for categorical data clustering
title_short	Rough set approach for categorical data clustering
title_full	Rough set approach for categorical data clustering
title_fullStr	Rough set approach for categorical data clustering
title_full_unstemmed	Rough set approach for categorical data clustering
title_sort	rough set approach for categorical data clustering
granting_institution	Universiti Tun Hussein Malaysia
granting_department	Fakulti Sains Komputer dan Teknologi Maklumat
publishDate	2010
url	http://eprints.uthm.edu.my/3609/1/24p%20TUTUT%20HERAWAN.pdf http://eprints.uthm.edu.my/3609/2/TUTUT%20HERAWAN%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/3609/3/TUTUT%20HERAWAN%20WATERMARK.pdf
_version_	1747831036049358848

Rough set approach for categorical data clustering

Similar Items