An improved hierarchical clustering combination approach for software modularization
Software modularization plays an important role in software maintenance phase. Modularization is the breaking down of a software system into sub-systems so that most similar entities (e.g., classes or functions) are collected in clusters to get the modular architecture. To check the accuracy of coll...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2017
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/7816/2/24p%20RASHID%20NASEEM.pdf http://eprints.uthm.edu.my/7816/1/RASHID%20NASEEM%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/7816/3/RASHID%20NASEEM%20WATERMARK.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-uthm-ep.7816 |
---|---|
record_format |
uketd_dc |
spelling |
my-uthm-ep.78162022-10-12T02:21:21Z An improved hierarchical clustering combination approach for software modularization 2017-01 Naseem, Rashid QA Mathematics Software modularization plays an important role in software maintenance phase. Modularization is the breaking down of a software system into sub-systems so that most similar entities (e.g., classes or functions) are collected in clusters to get the modular architecture. To check the accuracy of collected clusters, authoritativeness is calculated which finds the correspondence between collected clusters and a software decomposition prepared by a human expert. To improve the authoritativeness, different techniques have been proposed in the literature. However, agglomerative hierarchical clusterings (AHCs) are preferred due to their resemblance with internal tree structure of the software systems because AHC results in a tree like structure, called dendrogram. AHC uses similarity measures to find association values between entities and makes clusters of similar entities. This research addresses the strengths and weakness of existing similarity measures (i.e., Jaccard (JC), JaccardNM (JNM), and Russal&Rao (RR)). For example JC measure produces large number of clusters (NoC) and number of arbitrary decisions (AD). Large NoC is considered to be better for improving the authoritativeness but large AD deteriorates it. To overcome this trade-off, new combined binary similarity measures are proposed. To further improve the authoritativeness, this research explores the idea of hierarchical clustering combination (HCC) for software modularization which is based on combining results (dendrograms) of individual AHCs (IAHCs). This research proposes an improved HCC approach in which the dendrograms are represented in a 4+N (4 is the number of features and can be extended to N) dimensional Euclidean space (4+NDES). The proposed binary similarity measures and 4+NDES based HCC approach are tested on several test software systems. Experimental results revealed [13.5% - 63.5%] improvement in authoritativeness as compared to existing approaches. Thus the combined measures and 4+NDES-HCC have shown better potential to be used for software modularization. 2017-01 Thesis http://eprints.uthm.edu.my/7816/ http://eprints.uthm.edu.my/7816/2/24p%20RASHID%20NASEEM.pdf text en public http://eprints.uthm.edu.my/7816/1/RASHID%20NASEEM%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/7816/3/RASHID%20NASEEM%20WATERMARK.pdf text en validuser phd doctoral Universiti Tun Hussein Onn Malaysia Fakulti Sains Komputer dan Teknologi Maklumat |
institution |
Universiti Tun Hussein Onn Malaysia |
collection |
UTHM Institutional Repository |
language |
English English English |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Naseem, Rashid An improved hierarchical clustering combination approach for software modularization |
description |
Software modularization plays an important role in software maintenance phase. Modularization is the breaking down of a software system into sub-systems so that most similar entities (e.g., classes or functions) are collected in clusters to get the modular architecture. To check the accuracy of collected clusters, authoritativeness is calculated which finds the correspondence between collected clusters and a software decomposition prepared by a human expert. To improve the authoritativeness, different techniques have been proposed in the literature. However, agglomerative hierarchical clusterings (AHCs) are preferred due to their resemblance with internal tree structure of the software systems because AHC results in a tree like structure, called dendrogram. AHC uses similarity measures to find association values between entities and makes clusters of similar entities. This research addresses the strengths and weakness of existing similarity measures (i.e., Jaccard (JC), JaccardNM (JNM), and Russal&Rao (RR)). For example JC measure produces large number of clusters (NoC) and number of arbitrary decisions (AD). Large NoC is considered to be better for improving the authoritativeness but large AD deteriorates it. To overcome this trade-off, new combined binary similarity measures are proposed. To further improve the authoritativeness, this research explores the idea of hierarchical clustering combination (HCC) for software modularization which is based on combining results (dendrograms) of individual AHCs (IAHCs). This research proposes an improved HCC approach in which the dendrograms are represented in a 4+N (4 is the number of features and can be extended to N) dimensional Euclidean space (4+NDES). The proposed binary similarity measures and 4+NDES based HCC approach are tested on several test software systems. Experimental results revealed [13.5% - 63.5%] improvement in authoritativeness as compared to existing approaches. Thus the combined measures and 4+NDES-HCC have shown better potential to be used for software modularization. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Doctorate |
author |
Naseem, Rashid |
author_facet |
Naseem, Rashid |
author_sort |
Naseem, Rashid |
title |
An improved hierarchical clustering combination approach for software modularization |
title_short |
An improved hierarchical clustering combination approach for software modularization |
title_full |
An improved hierarchical clustering combination approach for software modularization |
title_fullStr |
An improved hierarchical clustering combination approach for software modularization |
title_full_unstemmed |
An improved hierarchical clustering combination approach for software modularization |
title_sort |
improved hierarchical clustering combination approach for software modularization |
granting_institution |
Universiti Tun Hussein Onn Malaysia |
granting_department |
Fakulti Sains Komputer dan Teknologi Maklumat |
publishDate |
2017 |
url |
http://eprints.uthm.edu.my/7816/2/24p%20RASHID%20NASEEM.pdf http://eprints.uthm.edu.my/7816/1/RASHID%20NASEEM%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/7816/3/RASHID%20NASEEM%20WATERMARK.pdf |
_version_ |
1747831196134408192 |