Improved clustering using robust and classical principal component

k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set...

Full description

Saved in:
Bibliographic Details
Main Author: Hassn, Ahmed Kadom
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.70922
record_format uketd_dc
spelling my-upm-ir.709222022-07-07T03:07:15Z Improved clustering using robust and classical principal component 2017-06 Hassn, Ahmed Kadom k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA. Algorithms 2017-06 Thesis http://psasir.upm.edu.my/id/eprint/70922/ http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf text en public masters Universiti Putra Malaysia Algorithms Fitrianto, Anwar
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Fitrianto, Anwar
topic Algorithms


spellingShingle Algorithms


Hassn, Ahmed Kadom
Improved clustering using robust and classical principal component
description k-means algorithm is a popular data clustering algorithm. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trial-and-error process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply k-means clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the k-means clustering algorithm. We call our propose method as k-means by principal components (pc1). In this study, the kernels that are created by using the k-means method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the k-means by PCA is faster and more efficient than the classical k-means algorithm. The classical k-means algorithm and the k-means by PCA algorithm are very sensitive to the presence of outlier. Hence the k-means by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the k-means by PCA and the k-means by robust PCA are equally good. Nonetheless, the k-means by robust PCA is not much affected by outliers compared to the k-means by classical PCA.
format Thesis
qualification_level Master's degree
author Hassn, Ahmed Kadom
author_facet Hassn, Ahmed Kadom
author_sort Hassn, Ahmed Kadom
title Improved clustering using robust and classical principal component
title_short Improved clustering using robust and classical principal component
title_full Improved clustering using robust and classical principal component
title_fullStr Improved clustering using robust and classical principal component
title_full_unstemmed Improved clustering using robust and classical principal component
title_sort improved clustering using robust and classical principal component
granting_institution Universiti Putra Malaysia
publishDate 2017
url http://psasir.upm.edu.my/id/eprint/70922/1/FS%202017%2047%20UPM.pdf
_version_ 1747812937331900416