Bar chart plagiarism detection

Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. T...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed Salih, Mohammed Mumtaz
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/39164/1/MohammedMumtazMohammedSalihMFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.39164
record_format uketd_dc
spelling my-utm-ep.391642017-09-13T06:44:46Z Bar chart plagiarism detection 2013-01 Mohammed Salih, Mohammed Mumtaz TK Electrical engineering. Electronics Nuclear engineering Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. The extraction of features of bar chart is an essential process to get the data from images. Some techniques presented by researchers focused on the graphical part rather than text itself, such as Hough Transform and Learning Based method. In this study, ten features of bar chart images are utilized to detect and find the proportion of similarity between the charts. Some of these features can be directly extracted by OCR, while others demand finding the relationship between the text part and the graphic part to extract the data such as the real values for each bar in images. The new technique which introduced in this research can extract three values of each bar namely Start, End and Exact values depending on horizontal and vertical lines of the bar chart image. In addition, the Word 2-gram and Euclidean distance methods are used to detect and find the plagiarism. Experimental results show the ability of the system to detect plagiarism for ten possible patterns of bar chart plagiarisms. The performance of the system is evaluated depending on overlapping features and precision and recall. The experimental results show the ability of the system to detect not only copy and paste data of bars, but also restructuring and summarization of captions of image as well as modifications to data of bar chart images, such as swapping among bars, changing colors and changing scales of bar chart images. 2013-01 Thesis http://eprints.utm.my/id/eprint/39164/ http://eprints.utm.my/id/eprint/39164/1/MohammedMumtazMohammedSalihMFSKSM2013.pdf application/pdf en public masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems Faculty of Computer Science and Information Systems
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Mohammed Salih, Mohammed Mumtaz
Bar chart plagiarism detection
description Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. The extraction of features of bar chart is an essential process to get the data from images. Some techniques presented by researchers focused on the graphical part rather than text itself, such as Hough Transform and Learning Based method. In this study, ten features of bar chart images are utilized to detect and find the proportion of similarity between the charts. Some of these features can be directly extracted by OCR, while others demand finding the relationship between the text part and the graphic part to extract the data such as the real values for each bar in images. The new technique which introduced in this research can extract three values of each bar namely Start, End and Exact values depending on horizontal and vertical lines of the bar chart image. In addition, the Word 2-gram and Euclidean distance methods are used to detect and find the plagiarism. Experimental results show the ability of the system to detect plagiarism for ten possible patterns of bar chart plagiarisms. The performance of the system is evaluated depending on overlapping features and precision and recall. The experimental results show the ability of the system to detect not only copy and paste data of bars, but also restructuring and summarization of captions of image as well as modifications to data of bar chart images, such as swapping among bars, changing colors and changing scales of bar chart images.
format Thesis
qualification_level Master's degree
author Mohammed Salih, Mohammed Mumtaz
author_facet Mohammed Salih, Mohammed Mumtaz
author_sort Mohammed Salih, Mohammed Mumtaz
title Bar chart plagiarism detection
title_short Bar chart plagiarism detection
title_full Bar chart plagiarism detection
title_fullStr Bar chart plagiarism detection
title_full_unstemmed Bar chart plagiarism detection
title_sort bar chart plagiarism detection
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems
granting_department Faculty of Computer Science and Information Systems
publishDate 2013
url http://eprints.utm.my/id/eprint/39164/1/MohammedMumtazMohammedSalihMFSKSM2013.pdf
_version_ 1747816543913246720