Sentiment analysis by fusing text and location features of geo-tagged tweets

Sentiment analysis analyses text input and determines whether the sentiment is negative, neutral, or positive. Sentiment analysis is vital for social development. An organization can use the feedback of product reviews or the comments about a particular service to improve the quality of life of the...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Wei Lun
Format: Thesis
Published: 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-mmu-ep.11101
record_format uketd_dc
spelling my-mmu-ep.111012023-04-17T07:04:11Z Sentiment analysis by fusing text and location features of geo-tagged tweets 2021-07 Lim, Wei Lun QA76.75-76.765 Computer software Sentiment analysis analyses text input and determines whether the sentiment is negative, neutral, or positive. Sentiment analysis is vital for social development. An organization can use the feedback of product reviews or the comments about a particular service to improve the quality of life of the communities they serve. People also like to share their opinions with others, even with total strangers. Social media is a common platform that people use to post their thoughts. They do so to get positive feedback from social media through the likes and follows from other people. These rewards do boost their self-esteem, or they do so just for social interaction. Researchers have also been working on text sentiment analysis to study people’s emotions to get insights for better decision-making. Twitter sentiment analysis provides valuable feedback from public emotion on events or products that are related to them. Current Twitter sentiment research has been focused on obtaining sentiment features from vectorized lexical and syntactic features from tweets without considering additional context from other attributes of a tweet. Location is an important factor that has been neglected as a factor that affects people’s emotions. Sometimes, it is not the products that make people want to complain. It is, rather, the combination of the bad experiences experienced at that location that is making them feel uncomfortable. With that, people then decided to express their unpleasant feeling using social media. For example, customers may complain about the environment of a restaurant is not pleasant to dine in because they are disturbed by the noise from the car repair shop nearby. This work investigated how vectorized location information could be combined with word embeddings to produce a hybrid representation, which has resulted in an improvement on a tweet sentiment classification task. The location information of the geo-tagged tweets provided further context, which was useful for improving a sentiment classification task. The tweets investigated contained a set of geo-tagged tweets. The word embeddings of these tweets were combined with the geo-tagged tweets’ vectorized location features or ego network measurements to form a sentiment feature set of geo-tagged tweets. The sentiment feature set was incorporated into a Convolutional Neural Network (CNN), a Bidirectional Recurrent Neural Network (BRNN), a Bidirectional Long Short-Term Memory Network (BiLSTM), and a Transformer for the tasks of training and predicting sentiment classification labels. The performance of this hybrid representation is compared with the performance of the baseline model built using a GloVe model. The results of the experiments have shown that the incorporation of vectorized location information has resulted in the improvement of the accuracy for a twitter sentiment classification task performed using CNN and BRNN for the binary classification task, while the usage of CNN has resulted in improvement of accuracy in the multiclass classification task. The Transformer is showing inconsistent results in both binary and multiclass classification tasks, indicating that the feature fusion is not suitable to use with Transformer. To investigate how location information added to text affect the model performance, SHapley Additive exPlanations (SHAP) is used to check the feature importance in CNN. The reason for choosing CNN is that it shows improved accuracy in both binary and multiclass classification tasks. The value generated from SHAP showed that location information affects the model by changing the order of feature importance in the model. 2021-07 Thesis http://shdl.mmu.edu.my/11101/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Computing and Informatics EREP ID: 10278
institution Multimedia University
collection MMU Institutional Repository
topic QA76.75-76.765 Computer software
spellingShingle QA76.75-76.765 Computer software
Lim, Wei Lun
Sentiment analysis by fusing text and location features of geo-tagged tweets
description Sentiment analysis analyses text input and determines whether the sentiment is negative, neutral, or positive. Sentiment analysis is vital for social development. An organization can use the feedback of product reviews or the comments about a particular service to improve the quality of life of the communities they serve. People also like to share their opinions with others, even with total strangers. Social media is a common platform that people use to post their thoughts. They do so to get positive feedback from social media through the likes and follows from other people. These rewards do boost their self-esteem, or they do so just for social interaction. Researchers have also been working on text sentiment analysis to study people’s emotions to get insights for better decision-making. Twitter sentiment analysis provides valuable feedback from public emotion on events or products that are related to them. Current Twitter sentiment research has been focused on obtaining sentiment features from vectorized lexical and syntactic features from tweets without considering additional context from other attributes of a tweet. Location is an important factor that has been neglected as a factor that affects people’s emotions. Sometimes, it is not the products that make people want to complain. It is, rather, the combination of the bad experiences experienced at that location that is making them feel uncomfortable. With that, people then decided to express their unpleasant feeling using social media. For example, customers may complain about the environment of a restaurant is not pleasant to dine in because they are disturbed by the noise from the car repair shop nearby. This work investigated how vectorized location information could be combined with word embeddings to produce a hybrid representation, which has resulted in an improvement on a tweet sentiment classification task. The location information of the geo-tagged tweets provided further context, which was useful for improving a sentiment classification task. The tweets investigated contained a set of geo-tagged tweets. The word embeddings of these tweets were combined with the geo-tagged tweets’ vectorized location features or ego network measurements to form a sentiment feature set of geo-tagged tweets. The sentiment feature set was incorporated into a Convolutional Neural Network (CNN), a Bidirectional Recurrent Neural Network (BRNN), a Bidirectional Long Short-Term Memory Network (BiLSTM), and a Transformer for the tasks of training and predicting sentiment classification labels. The performance of this hybrid representation is compared with the performance of the baseline model built using a GloVe model. The results of the experiments have shown that the incorporation of vectorized location information has resulted in the improvement of the accuracy for a twitter sentiment classification task performed using CNN and BRNN for the binary classification task, while the usage of CNN has resulted in improvement of accuracy in the multiclass classification task. The Transformer is showing inconsistent results in both binary and multiclass classification tasks, indicating that the feature fusion is not suitable to use with Transformer. To investigate how location information added to text affect the model performance, SHapley Additive exPlanations (SHAP) is used to check the feature importance in CNN. The reason for choosing CNN is that it shows improved accuracy in both binary and multiclass classification tasks. The value generated from SHAP showed that location information affects the model by changing the order of feature importance in the model.
format Thesis
qualification_level Master's degree
author Lim, Wei Lun
author_facet Lim, Wei Lun
author_sort Lim, Wei Lun
title Sentiment analysis by fusing text and location features of geo-tagged tweets
title_short Sentiment analysis by fusing text and location features of geo-tagged tweets
title_full Sentiment analysis by fusing text and location features of geo-tagged tweets
title_fullStr Sentiment analysis by fusing text and location features of geo-tagged tweets
title_full_unstemmed Sentiment analysis by fusing text and location features of geo-tagged tweets
title_sort sentiment analysis by fusing text and location features of geo-tagged tweets
granting_institution Multimedia University
granting_department Faculty of Computing and Informatics
publishDate 2021
_version_ 1776101395485687808