Fake review annotation model and classification through reviewers' writing style

In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive p...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Shojaee, Somayeh
التنسيق:	أطروحة
اللغة:	English
منشور في:	2019
الموضوعات:	Computer networks - Security measures Security systems
الوصول للمادة أونلاين:	http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-upm-ir.90777
record_format	uketd_dc
spelling	my-upm-ir.907772021-09-27T03:38:45Z Fake review annotation model and classification through reviewers' writing style 2019-09 Shojaee, Somayeh In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive potential organizations to reshape their businesses, customers from making best decisions and opinion mining techniques from reaching accurate conclusions. One of the big challenges of spam review detection is the lack of available labeled gold standard real-life product review dataset. Manually labeling product reviews as fake or real is one of the approaches to deal with the problem. However, recognizing whether a review is fake or real is very difficult by only reading the content of the review, because spammers can easily craft a fake review that is just like any other real reviews. To address this problem we enhance the inter-annotator agreement in manually labeling approach by proposing a model to annotate product reviews as fake or real. This is the first contribution of this research study. The proposed annotation model is designed, implemented and accessed online. Our crawled reviews are labeled by three annotators who were trained and paid to complete the labeling through our system. The spamicity score has been calculated for each review and a label has been assigned to every review based on their spamicity score. The Fleiss's Kappa is calculated for three annotators with value of 0.89, which shows \almost perfect agreement" between them. The labeled real-life product review dataset is the second contribution of this study. To test the accuracy of our model, we also re-labeled a portion of available Yelp.com dataset through our system and calculated the disagreement with their actual label based on the Yelp.com's filltering system. We found that only 7% of the reviews were labeled differently. The other open problem of fake product review classification is the lack of historic knowledge independent feature sets. Most of the feature-based fake review detection techniques are only applicable on a specific product domain or historic knowledge is needed to extract these features. To address the problem, this study presents a set of domain and historic knowledge independent features, namely writing style and readability, which can be applied to almost any review hosting site. The feature set is the third contribution of this study. Writing style here refers to linguistic aspects that identify fake and real reviewers. Fake reviewers try hard to write a review that sounds like genuine, hence it affects their writing style and also readability of their fake reviews consequently. The method dependently detects reviewers' writing style before spamming can hurt a product or a business. The evaluation results of our features on the only available crowdsourced labeled gold standard dataset, with the accuracy of 90.7%, and on our proposed dataset with the accuracy of 98.9%, suggest significant differences between fake and real reviews on writing style and readability level. Computer networks - Security measures Security systems 2019-09 Thesis http://psasir.upm.edu.my/id/eprint/90777/ http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf text en public doctoral Universiti Putra Malaysia Computer networks - Security measures Security systems Azmi Murad, Masrah Azrifah
institution	Universiti Putra Malaysia
collection	PSAS Institutional Repository
language	English
advisor	Azmi Murad, Masrah Azrifah
topic	Computer networks - Security measures Security systems
spellingShingle	Computer networks - Security measures Security systems Shojaee, Somayeh Fake review annotation model and classification through reviewers' writing style
description	In the last decade, online product reviews have become the main source of information during customers' decision making and business' purchasing processes. Unfortunately, fraudsters have produced untruthful reviews driven intentionally for profit or publicity. Their activities deceive potential organizations to reshape their businesses, customers from making best decisions and opinion mining techniques from reaching accurate conclusions. One of the big challenges of spam review detection is the lack of available labeled gold standard real-life product review dataset. Manually labeling product reviews as fake or real is one of the approaches to deal with the problem. However, recognizing whether a review is fake or real is very difficult by only reading the content of the review, because spammers can easily craft a fake review that is just like any other real reviews. To address this problem we enhance the inter-annotator agreement in manually labeling approach by proposing a model to annotate product reviews as fake or real. This is the first contribution of this research study. The proposed annotation model is designed, implemented and accessed online. Our crawled reviews are labeled by three annotators who were trained and paid to complete the labeling through our system. The spamicity score has been calculated for each review and a label has been assigned to every review based on their spamicity score. The Fleiss's Kappa is calculated for three annotators with value of 0.89, which shows \almost perfect agreement" between them. The labeled real-life product review dataset is the second contribution of this study. To test the accuracy of our model, we also re-labeled a portion of available Yelp.com dataset through our system and calculated the disagreement with their actual label based on the Yelp.com's filltering system. We found that only 7% of the reviews were labeled differently. The other open problem of fake product review classification is the lack of historic knowledge independent feature sets. Most of the feature-based fake review detection techniques are only applicable on a specific product domain or historic knowledge is needed to extract these features. To address the problem, this study presents a set of domain and historic knowledge independent features, namely writing style and readability, which can be applied to almost any review hosting site. The feature set is the third contribution of this study. Writing style here refers to linguistic aspects that identify fake and real reviewers. Fake reviewers try hard to write a review that sounds like genuine, hence it affects their writing style and also readability of their fake reviews consequently. The method dependently detects reviewers' writing style before spamming can hurt a product or a business. The evaluation results of our features on the only available crowdsourced labeled gold standard dataset, with the accuracy of 90.7%, and on our proposed dataset with the accuracy of 98.9%, suggest significant differences between fake and real reviews on writing style and readability level.
format	Thesis
qualification_level	Doctorate
author	Shojaee, Somayeh
author_facet	Shojaee, Somayeh
author_sort	Shojaee, Somayeh
title	Fake review annotation model and classification through reviewers' writing style
title_short	Fake review annotation model and classification through reviewers' writing style
title_full	Fake review annotation model and classification through reviewers' writing style
title_fullStr	Fake review annotation model and classification through reviewers' writing style
title_full_unstemmed	Fake review annotation model and classification through reviewers' writing style
title_sort	fake review annotation model and classification through reviewers' writing style
granting_institution	Universiti Putra Malaysia
publishDate	2019
url	http://psasir.upm.edu.my/id/eprint/90777/1/FSKTM%202020%203%20IR.pdf
_version_	1747813657396379648

Fake review annotation model and classification through reviewers' writing style

مواد مشابهة