A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis
Sentiment could be expressed implicitly or explicitly in a text. The main challenge in sentiment analysis (SA) is to identify hidden sentiments. This challenge is even worsened by false classification of opinion words, neglect of context information, and poor handling of short texts. This study addr...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | eng eng |
Published: |
2023
|
Subjects: | |
Online Access: | https://etd.uum.edu.my/10934/1/Depositpermission-900068.pdf https://etd.uum.edu.my/10934/2/s9000068_01.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-uum-etd.10934 |
---|---|
record_format |
uketd_dc |
spelling |
my-uum-etd.109342024-02-22T01:57:38Z A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis 2023 Mehanna, Yassin Samir Hassan Mahmuddin, Massudi Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts & Sciences QA299.6-433 Analysis Sentiment could be expressed implicitly or explicitly in a text. The main challenge in sentiment analysis (SA) is to identify hidden sentiments. This challenge is even worsened by false classification of opinion words, neglect of context information, and poor handling of short texts. This study addresses the limitations of bag-of-words (BoW) and bag-of-concepts (BoC) text representations, in contextual and conceptual semantic methods. A semantic conceptualization method using Tagged BoC (TBoC) for SA is proposed to detect the correct sentiment towards the actual target that considers all affective and conceptual information conveyed in a text with a special focus on short text. The TBoC is an approach that analyses and decomposes text to uncover latent sentiments while preserving all relations and vital information to boost SA accuracy. In addition, the most efficient lexicons and pre-processing techniques are investigated in improving the accuracy of SA. This study comprises four phases: a) data collection and pre-processing, b) concepts extraction from text data using conceptualization method, c) documents deconstruction into TBoC using Long Short- Term Memory, Convolutional Neural Network, Latent Dirichlet Allocation, Rulebased, and customized algorithms, and d) sentiment classification on multiple benchmarking datasets. A comparative study was also conducted with state-of-the-art SA methods to evaluate the proposed approach using general-purpose and domainspecific sentiment lexicons on multiple SA levels including document, aspect, category, and topic levels. The TBoC technique with domain-specific sentiment lexicon has shown good performance and outperformed other state-of-the-art methods. Accuracy results indicated an improvement of 2%, 3%, and 6% compared to Naïve Bayes, Neural Networks, and Support Vector Machine respectively for aspect-level SA. The use of TBoC within the semantic conceptualization has high capabilities in concept extraction while preserving information on the context, interrelations, and latent feelings. Thus, contributing knowledge in SA and into the lexicon-based and hybrid approaches. 2023 Thesis https://etd.uum.edu.my/10934/ https://etd.uum.edu.my/10934/1/Depositpermission-900068.pdf text eng staffonly https://etd.uum.edu.my/10934/2/s9000068_01.pdf text eng public other doctoral Universiti Utara Malaysia |
institution |
Universiti Utara Malaysia |
collection |
UUM ETD |
language |
eng eng |
advisor |
Mahmuddin, Massudi |
topic |
QA299.6-433 Analysis |
spellingShingle |
QA299.6-433 Analysis Mehanna, Yassin Samir Hassan A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
description |
Sentiment could be expressed implicitly or explicitly in a text. The main challenge in sentiment analysis (SA) is to identify hidden sentiments. This challenge is even worsened by false classification of opinion words, neglect of context information, and poor handling of short texts. This study addresses the limitations of bag-of-words (BoW) and bag-of-concepts (BoC) text representations, in contextual and conceptual semantic methods. A semantic conceptualization method using Tagged BoC (TBoC) for SA is proposed to detect the correct sentiment towards the actual target that considers all affective and conceptual information conveyed in a text with a special focus on short text. The TBoC is an approach that analyses and decomposes text to uncover latent sentiments while preserving all relations and vital information to boost SA accuracy. In addition, the most efficient lexicons and pre-processing techniques are investigated in improving the accuracy of SA. This study comprises four phases: a) data collection and pre-processing, b) concepts extraction from text data using conceptualization method, c) documents deconstruction into TBoC using Long Short- Term Memory, Convolutional Neural Network, Latent Dirichlet Allocation, Rulebased, and customized algorithms, and d) sentiment classification on multiple benchmarking datasets. A comparative study was also conducted with state-of-the-art SA methods to evaluate the proposed approach using general-purpose and domainspecific sentiment lexicons on multiple SA levels including document, aspect, category, and topic levels. The TBoC technique with domain-specific sentiment lexicon has shown good performance and outperformed other state-of-the-art methods. Accuracy results indicated an improvement of 2%, 3%, and 6% compared to Naïve Bayes, Neural Networks, and Support Vector Machine respectively for aspect-level SA. The use of TBoC within the semantic conceptualization has high capabilities in concept extraction while preserving information on the context, interrelations, and latent feelings. Thus, contributing knowledge in SA and into the lexicon-based and hybrid approaches. |
format |
Thesis |
qualification_name |
other |
qualification_level |
Doctorate |
author |
Mehanna, Yassin Samir Hassan |
author_facet |
Mehanna, Yassin Samir Hassan |
author_sort |
Mehanna, Yassin Samir Hassan |
title |
A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
title_short |
A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
title_full |
A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
title_fullStr |
A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
title_full_unstemmed |
A semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment Analysis |
title_sort |
semantic conceptualization on tagged bag-of concepts to improve accuracy for sentiment analysis |
granting_institution |
Universiti Utara Malaysia |
granting_department |
Awang Had Salleh Graduate School of Arts & Sciences |
publishDate |
2023 |
url |
https://etd.uum.edu.my/10934/1/Depositpermission-900068.pdf https://etd.uum.edu.my/10934/2/s9000068_01.pdf |
_version_ |
1794023792230531072 |