Enhanced approach for non-negative matrix factorization (NMF) based summarization using conditional random fields (CRF) segmentation /
Automatic Text Summarization (A TS) is a complicated task of computer generating summary of document(s), which is smaller in size while preserving its information content. Since A TS appears to be a good candidate to address the information overload problem, it has gained a quantifiable attention re...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
Kuala Lumpur :
Kulliyyah of Information and Communication Technology, International Islamic University Malaysia,
2016
|
Subjects: | |
Online Access: | http://studentrepo.iium.edu.my/handle/123456789/5362 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Automatic Text Summarization (A TS) is a complicated task of computer generating summary of document(s), which is smaller in size while preserving its information content. Since A TS appears to be a good candidate to address the information overload problem, it has gained a quantifiable attention recently. This thesis mainly attempts an enhanced approach of A TS, addressing the feature extraction problem that prevails with the existing A TS approaches which uses algebraic based reduction method namely Nonnegative Matrix Factorization (NMF). The most vital role of any extractive A TS is the identification of most important sentences from the given text. This is possible only when the correct semantics or features of the sentences are identified properly. When NMF applied on ATS, transformation of information from the input sentences to features is more precisely not possible, since NMF has no intrinsic domain knowledge of the input source to be summarized. Thus the main issue with the existing A TS based on NMF is the proper feature extraction from the source text. Moreover as NMF is basically an approximation algorithm and not intended for feature extraction, better performance can be achieved only after proper enhancement or tuning on NMF when applied on A TS. Hence this work proposes an enhanced supervised domain based extractive approach on A TS using NMF to resolve the problem with the existing approach. The two important parametric values that serve as input to the NMF process include initialization and sparseness measure. These inputs are vital to the output produced by NMF. These parameters of NMF were not been considered in the existing literature when NMF applied on A TS. In the existing NMF based A TS the initial seeds of W and H matrices are initialized with random values or zeroes without considering the features of the source text. Thus to address the issue, in the proposed approach applying Conditional Random Field (CRF), the initial seeds of W and H are constructed based on the features available in the source text to achieve better performance. The other parameter, sparseness of the W and H matrices, which makes only few elements of Wand H matrices active to extract the features more accurate, is not used in the existing approach of NMF based A TS. The proposed work aimed to achieve better performance by using the sparse representation of the W and H matrices. Hence this work proposes an extended approach that can enhance the performance of NMF when applied on NMF by treating initialization and sparseness parameters of NMF. Also it is aimed to study the impact it makes to the quality of the summary generated. The proposed methodology is tested across two domains namely legal and scientific documents. Experimental results shows that proposed method when treated with proper initialization seeds for W and H matrices can produce better performance. Whereas for sparseness treatment, obtained results clearly illustrates that tuning the sparseness could not give better performance. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics are used for the evaluation of the proposed method. |
---|---|
Physical Description: | xxiii, 245 leaves : colour illustrations ; 30cm. |
Bibliography: | Includes bibliographical references (leaves 195-211). |