Fake review annotation model and classification through reviewers' writing style

In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive p...

Full description

Saved in:
Bibliographic Details
Main Author: Shojaee, Somayeh
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.90777
record_format uketd_dc
spelling my-upm-ir.907772021-09-27T03:38:45Z Fake review annotation model and classification through reviewers' writing style 2019-09 Shojaee, Somayeh In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive potential organizations to reshape their businesses, customers from making best decisions and opinion mining techniques from reaching accurate conclusions. One of the big challenges of spam review detection is the lack of available labeled gold standard real-life product review dataset. Manually labeling product reviews as fake or real is one of the approaches to deal with the problem. However, recognizing whether a review is fake or real is very difficult by only reading the content of the review, because spammers can easily craft a fake review that is just like any other real reviews. To address this problem we enhance the inter-annotator agreement in manually labeling approach by proposing a model to annotate product reviews as fake or real. This is the first contribution of this research study. The proposed annotation model is designed, implemented and accessed online. Our crawled reviews are labeled by three annotators who were trained and paid to complete the labeling through our system. The spamicity score has been calculated for each review and a label has been assigned to every review based on their spamicity score. The Fleiss's Kappa is calculated for three annotators with value of 0.89, which shows \almost perfect agreement" between them. The labeled real-life product review dataset is the second contribution of this study. To test the accuracy of our model, we also re-labeled a portion of available Yelp.com dataset through our system and calculated the disagreement with their actual label based on the Yelp.com's filltering system. We found that only 7% of the reviews were labeled differently. The other open problem of fake product review classification is the lack of historic knowledge independent feature sets. Most of the feature-based fake review detection techniques are only applicable on a specific product domain or historic knowledge is needed to extract these features. To address the problem, this study presents a set of domain and historic knowledge independent features, namely writing style and readability, which can be applied to almost any review hosting site. The feature set is the third contribution of this study. Writing style here refers to linguistic aspects that identify fake and real reviewers. Fake reviewers try hard to write a review that sounds like genuine, hence it affects their writing style and also readability of their fake reviews consequently. The method dependently detects reviewers' writing style before spamming can hurt a product or a business. The evaluation results of our features on the only available crowdsourced labeled gold standard dataset, with the accuracy of 90.7%, and on our proposed dataset with the accuracy of 98.9%, suggest significant differences between fake and real reviews on writing style and readability level. Computer networks - Security measures Security systems 2019-09 Thesis http://psasir.upm.edu.my/id/eprint/90777/ http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf text en public doctoral Universiti Putra Malaysia Computer networks - Security measures Security systems Azmi Murad, Masrah Azrifah
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Azmi Murad, Masrah Azrifah
topic Computer networks - Security measures
Security systems

spellingShingle Computer networks - Security measures
Security systems

Shojaee, Somayeh
Fake review annotation model and classification through reviewers' writing style
description In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive potential organizations to reshape their businesses, customers from making best decisions and opinion mining techniques from reaching accurate conclusions. One of the big challenges of spam review detection is the lack of available labeled gold standard real-life product review dataset. Manually labeling product reviews as fake or real is one of the approaches to deal with the problem. However, recognizing whether a review is fake or real is very difficult by only reading the content of the review, because spammers can easily craft a fake review that is just like any other real reviews. To address this problem we enhance the inter-annotator agreement in manually labeling approach by proposing a model to annotate product reviews as fake or real. This is the first contribution of this research study. The proposed annotation model is designed, implemented and accessed online. Our crawled reviews are labeled by three annotators who were trained and paid to complete the labeling through our system. The spamicity score has been calculated for each review and a label has been assigned to every review based on their spamicity score. The Fleiss's Kappa is calculated for three annotators with value of 0.89, which shows \almost perfect agreement" between them. The labeled real-life product review dataset is the second contribution of this study. To test the accuracy of our model, we also re-labeled a portion of available Yelp.com dataset through our system and calculated the disagreement with their actual label based on the Yelp.com's filltering system. We found that only 7% of the reviews were labeled differently. The other open problem of fake product review classification is the lack of historic knowledge independent feature sets. Most of the feature-based fake review detection techniques are only applicable on a specific product domain or historic knowledge is needed to extract these features. To address the problem, this study presents a set of domain and historic knowledge independent features, namely writing style and readability, which can be applied to almost any review hosting site. The feature set is the third contribution of this study. Writing style here refers to linguistic aspects that identify fake and real reviewers. Fake reviewers try hard to write a review that sounds like genuine, hence it affects their writing style and also readability of their fake reviews consequently. The method dependently detects reviewers' writing style before spamming can hurt a product or a business. The evaluation results of our features on the only available crowdsourced labeled gold standard dataset, with the accuracy of 90.7%, and on our proposed dataset with the accuracy of 98.9%, suggest significant differences between fake and real reviews on writing style and readability level.
format Thesis
qualification_level Doctorate
author Shojaee, Somayeh
author_facet Shojaee, Somayeh
author_sort Shojaee, Somayeh
title Fake review annotation model and classification through reviewers' writing style
title_short Fake review annotation model and classification through reviewers' writing style
title_full Fake review annotation model and classification through reviewers' writing style
title_fullStr Fake review annotation model and classification through reviewers' writing style
title_full_unstemmed Fake review annotation model and classification through reviewers' writing style
title_sort fake review annotation model and classification through reviewers' writing style
granting_institution Universiti Putra Malaysia
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf
_version_ 1747813657396379648