Document plagiarism detection algorithm using semantic networks

The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmed Muftah, Ahmed Jabr
Format: Thesis
Language:English
Published: 2009
Subjects:
Online Access:http://eprints.utm.my/id/eprint/11433/6/AhmedJabrAhmedMFSKSM2009.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized.