A new approach in infrequent itemset mining based on Eclat algorithm

Big data exponentially increases in the bulk of heterogeneous data. It requires an advanced technology to process, analyse and automatically transform the processed data into useful knowledge. Data mining has excellent potential in discovering the hidden knowledge within the databases. This hidden k...

Full description

Saved in:
Bibliographic Details
Main Author: Julaily Aida Jusoh (Author)
Format: Thesis Book
Language:English
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Big data exponentially increases in the bulk of heterogeneous data. It requires an advanced technology to process, analyse and automatically transform the processed data into useful knowledge. Data mining has excellent potential in discovering the hidden knowledge within the databases. This hidden knowledge can lead to the association rule which may disclose useful pattern. Two significant patterns can be found in frequent and infrequent. Most of the previous infrequent mining techniques deal with the horizontal data format. evertheless, the current and emerging trend finds researchers dealing with a vertical data format. One example of a vertical rule mining algorithm is Equivalence Class Transformation (Eclat). The Eclat algorithm comprises four variants: tidset, diffset, sortdiffset and postdiffset which only employed for frequent itemset mining. This research will introduce a new version of the Eclat algorithm to fulfil the purpose of obtaining the infrequent itemset. In the early phase, a bit modification has been done on the Eclat algorithm and its variants which named R-Eclat where R refers to rare. As the previous Eclat algorithm, the R-Eclat also comprises four variants: r-tidset, r-diffset, r-sortdiffset and r-postdiffset. In the second phase, R-Eclat and its variants are executed by serial processing, but the duration of mining processing is time-consuming. In response to the promising results of mining in speedy processing time and less memory usage, R-Eclat is complemented with a parallel programming approach. At the third phase, a new parallel R-Eclat named as PR-Eclat is proposed to overcome the limitations of serial processing in speeding the running time. In PR-Eclat algorithms, the experimental results indicate that PR-Eclat outperforms the R-Eclat in average by 54% during execution time and the memory usage reduces at an average of 60% in infrequent itemset mining. This research confronts the issue of large database with single format only. Future work is recommended to highlight multiple databases and multiple data formats such as images, audio and video.
Physical Description:xxvi,190 leaves: colour illustrations; 30cm.
Bibliography:Includes bibliographical references (leaves 165-175)