A hybrid model for discovering significant patterns in data mining
A significant pattern mining is one of the most important researches and a major concern in data mining. The significant patterns are very useful since it can reveal a new dimension of knowledge in certain domain applications. There are three categories of significant patterns named frequent p...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2012
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/2207/1/24p%20ZAILANI%20ABDULLAH.pdf http://eprints.uthm.edu.my/2207/2/ZAILANI%20ABDULLAH%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/2207/3/ZAILANI%20ABDULLAH%20WATERMARK.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-uthm-ep.2207 |
---|---|
record_format |
uketd_dc |
spelling |
my-uthm-ep.22072021-10-31T04:06:28Z A hybrid model for discovering significant patterns in data mining 2012-07 Abdullah, Zailani QA Mathematics QA76 Computer software A significant pattern mining is one of the most important researches and a major concern in data mining. The significant patterns are very useful since it can reveal a new dimension of knowledge in certain domain applications. There are three categories of significant patterns named frequent patterns, least patterns and significant least patterns. Typically, these patterns may derive from the absolute frequent patterns or mixed up with the least patterns. In market-basket analysis, frequent patterns are considered as significant patterns and already make a lot of contribution. Frequent Pattern Tree (FP-Tree) is one of the famous data structure to deal with batched frequent patterns but it must rely on the original database. For detecting the exceptional occurrences or events that have a high implication such as unanticipated substances that cause air pollution, unexpected degree programs selected by students, unpredictable motorcycle models preferred by customers; the least patterns are very meaningful as compared to the frequent one. However, in this category of patterns, the generation of standard tree data structure may trigger the memory overflow due to the requirement of lowering the minimum support threshold. Furthermore, the classical support-confidence measure has many limitations such as tricky in choosing the right support-confidence value, misleading interpretation based on support-confidence combination and not scalable enough to deal with significant least patterns. Therefore, to overcome these drawbacks, in this thesis we proposed a Hybrid Model for Discovering Significant Patterns (Hy-DSP) which consist of the combination of Efficient Frequent Pattern Mining Model (EFP�M2), Efficient Least Pattern Mining Model (ELP-M2) and Significant Least Pattern Mining Model (SLP-M2). The proposed model is developed using the latest .NET framework and C# as a programming language. Experiments with the UCI datasets showed that the Hy-DSP which consist of DOSTrieIT and LP-Growth* outperformed the benchmarked CanTree and FP-Growth up to 4.13 times (75.78%) v and 10.37 times (90.31%), respectively, thus verify its efficiency. In fact, the number of patterns produce by the models is also less than the standard measures. 2012-07 Thesis http://eprints.uthm.edu.my/2207/ http://eprints.uthm.edu.my/2207/1/24p%20ZAILANI%20ABDULLAH.pdf text en public http://eprints.uthm.edu.my/2207/2/ZAILANI%20ABDULLAH%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/2207/3/ZAILANI%20ABDULLAH%20WATERMARK.pdf text en validuser phd masters Universiti Tun Hussein Onn Malaysia Fakulti Sains Komputer dan Teknologi Maklumat |
institution |
Universiti Tun Hussein Onn Malaysia |
collection |
UTHM Institutional Repository |
language |
English English English |
topic |
QA Mathematics QA76 Computer software |
spellingShingle |
QA Mathematics QA76 Computer software Abdullah, Zailani A hybrid model for discovering significant patterns in data mining |
description |
A significant pattern mining is one of the most important researches and a major
concern in data mining. The significant patterns are very useful since it can reveal a
new dimension of knowledge in certain domain applications. There are three
categories of significant patterns named frequent patterns, least patterns and
significant least patterns. Typically, these patterns may derive from the absolute
frequent patterns or mixed up with the least patterns. In market-basket analysis,
frequent patterns are considered as significant patterns and already make a lot of
contribution. Frequent Pattern Tree (FP-Tree) is one of the famous data structure to
deal with batched frequent patterns but it must rely on the original database. For
detecting the exceptional occurrences or events that have a high implication such as
unanticipated substances that cause air pollution, unexpected degree programs
selected by students, unpredictable motorcycle models preferred by customers; the
least patterns are very meaningful as compared to the frequent one. However, in this
category of patterns, the generation of standard tree data structure may trigger the
memory overflow due to the requirement of lowering the minimum support
threshold. Furthermore, the classical support-confidence measure has many
limitations such as tricky in choosing the right support-confidence value, misleading
interpretation based on support-confidence combination and not scalable enough to
deal with significant least patterns. Therefore, to overcome these drawbacks, in this
thesis we proposed a Hybrid Model for Discovering Significant Patterns (Hy-DSP)
which consist of the combination of Efficient Frequent Pattern Mining Model (EFP�M2), Efficient Least Pattern Mining Model (ELP-M2) and Significant Least Pattern
Mining Model (SLP-M2). The proposed model is developed using the latest .NET
framework and C# as a programming language. Experiments with the UCI datasets
showed that the Hy-DSP which consist of DOSTrieIT and LP-Growth*
outperformed the benchmarked CanTree and FP-Growth up to 4.13 times (75.78%)
v
and 10.37 times (90.31%), respectively, thus verify its efficiency. In fact, the number
of patterns produce by the models is also less than the standard measures. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Master's degree |
author |
Abdullah, Zailani |
author_facet |
Abdullah, Zailani |
author_sort |
Abdullah, Zailani |
title |
A hybrid model for discovering significant patterns in data mining |
title_short |
A hybrid model for discovering significant patterns in data mining |
title_full |
A hybrid model for discovering significant patterns in data mining |
title_fullStr |
A hybrid model for discovering significant patterns in data mining |
title_full_unstemmed |
A hybrid model for discovering significant patterns in data mining |
title_sort |
hybrid model for discovering significant patterns in data mining |
granting_institution |
Universiti Tun Hussein Onn Malaysia |
granting_department |
Fakulti Sains Komputer dan Teknologi Maklumat |
publishDate |
2012 |
url |
http://eprints.uthm.edu.my/2207/1/24p%20ZAILANI%20ABDULLAH.pdf http://eprints.uthm.edu.my/2207/2/ZAILANI%20ABDULLAH%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/2207/3/ZAILANI%20ABDULLAH%20WATERMARK.pdf |
_version_ |
1747830924765036544 |