Designing an efficient hybrid algorithm using unifying process to mine frequent itemset /

The current advancement in technology inexorably leads to data flood. More data is generated from banking, telecom, scientific experiments, biology, high energy physics, web etc. Data mining is the process of extracting useful information from this flooded data, which helps in making profitable fu...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmad Shah (Author)
Format: Thesis
Language:English
Published: Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2016
Subjects:
Online Access:Click here to view 1st 24 pages of the thesis. Members can view fulltext at the specified PCs in the library.
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The current advancement in technology inexorably leads to data flood. More data is generated from banking, telecom, scientific experiments, biology, high energy physics, web etc. Data mining is the process of extracting useful information from this flooded data, which helps in making profitable future decisions in these fields. Frequent itemset mining is one of the focus research areas and an important step to find association rules. Time and space requirements for generating frequent itemsets are of utter importance. Algorithms to mine frequent itemsets effectively help in finding association rules and also help in many other data mining tasks. In this study, we perform an in-depth analysis of different frequent itemset mining algorithms and discuss their strengths and weaknesses. There are many algorithms for mining frequent itemset, for example, the baseline algorithm Apriori that has started a new era in data mining and made the concept of frequent itemset and association rule possible. Others are variations of the same algorithms used on different set of data with improvements in terms of memory and execution time. We expound the performance of these algorithms on transactional database with respect to execution time, memory consumption and different support values. We use standard datasets RETAIL which is an anonymous retail market basket data and T10I4D100K for this evaluation. We also design an efficient hybrid algorithm using unifying process to combine different algorithms (Improved Apriori and FP-Growth) to get better results in term of execution time and memory as compared to both Improved Apriori and FP-Growth. Results indicate that the proposed hybrid algorithm, albeit more complex, consumes less memory resources and faster execution time. For example, a 50% reduction in execution time is achieved for the hybrid algorithm in comparison to the benchmarks when RETAIL dataset is used, while a 40% reduction is achieved when T10I4D100K data set with repetitive itemset is used for evaluation. In terms of memory resources consumption, our proposed hybrid algorithm consumes fewer resources with low support values. But as support value increases, it achieves comparable results to benchmark algorithms.
Physical Description:xiii, 52 leaves : illustrations ; 30cm.
Bibliography:Includes bibliographical references (leaves 50-52).