Enhanced normalization approach to address stop-word complexity in compound-word schema labels

An extensive review of the existing research work in the field of schema matching uncovers the significance of semantics in this subject. It is beyond doubt that both structural and semantics aspect of schema matching have been the topic of research for many years and there are strong references ava...

Full description

Saved in:

Bibliographic Details
Main Author:	Hossain, Jafreen
Format:	Thesis
Language:	English
Published:	2014
Subjects:	Data integration (Computer science)
Online Access:	http://psasir.upm.edu.my/id/eprint/60506/1/FSKTM%202014%2026IR.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-upm-ir.60506
record_format	uketd_dc
spelling	my-upm-ir.605062018-05-08T03:23:07Z Enhanced normalization approach to address stop-word complexity in compound-word schema labels 2014-06 Hossain, Jafreen An extensive review of the existing research work in the field of schema matching uncovers the significance of semantics in this subject. It is beyond doubt that both structural and semantics aspect of schema matching have been the topic of research for many years and there are strong references available for both. However, an indepth analysis of all the available approaches suggests there are further scopes for improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been proposed in several studies. However the results show comparatively poor accuracy due to the presence of stop-words in schema labels. Stop-words have previously been ignored in most studies resulting in false negative conclusions. This research work proposes, NORMSTOP (NORMalizer of schemata having STOP-words), an improved schema normalization approach, addressing the complexity of stop-words (e.g. ‗by‘, ‗at‘, ‗and,‘ or‘) in Compound Word (CW) schema labels. NORMSTOP isolates these labels during the preprocessing stage and resets the base-form to a relevant WordNet term, or an annotable compound noun; using a combined set of WordNet features like Attributes, Derivationally Related Forms, and LexNames. When tested on the same real dataset used in the earlier approach - (NORMS or NORMalizer of Schemata), NORMSTOP shows up to 13% improvement in annotation recall measurement. This level of improvement takes the overall schema matching process one step closer to perfect accuracy; and the lack of it exposes a gap in expectation, especially in today‘s databases where stop-words are in abundance. Data integration (Computer science) 2014-06 Thesis http://psasir.upm.edu.my/id/eprint/60506/ http://psasir.upm.edu.my/id/eprint/60506/1/FSKTM%202014%2026IR.pdf text en public masters Universiti Putra Malaysia Data integration (Computer science)
institution	Universiti Putra Malaysia
collection	PSAS Institutional Repository
language	English
topic	Data integration (Computer science)
spellingShingle	Data integration (Computer science) Hossain, Jafreen Enhanced normalization approach to address stop-word complexity in compound-word schema labels
description	An extensive review of the existing research work in the field of schema matching uncovers the significance of semantics in this subject. It is beyond doubt that both structural and semantics aspect of schema matching have been the topic of research for many years and there are strong references available for both. However, an indepth analysis of all the available approaches suggests there are further scopes for improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been proposed in several studies. However the results show comparatively poor accuracy due to the presence of stop-words in schema labels. Stop-words have previously been ignored in most studies resulting in false negative conclusions. This research work proposes, NORMSTOP (NORMalizer of schemata having STOP-words), an improved schema normalization approach, addressing the complexity of stop-words (e.g. ‗by‘, ‗at‘, ‗and,‘ or‘) in Compound Word (CW) schema labels. NORMSTOP isolates these labels during the preprocessing stage and resets the base-form to a relevant WordNet term, or an annotable compound noun; using a combined set of WordNet features like Attributes, Derivationally Related Forms, and LexNames. When tested on the same real dataset used in the earlier approach - (NORMS or NORMalizer of Schemata), NORMSTOP shows up to 13% improvement in annotation recall measurement. This level of improvement takes the overall schema matching process one step closer to perfect accuracy; and the lack of it exposes a gap in expectation, especially in today‘s databases where stop-words are in abundance.
format	Thesis
qualification_level	Master's degree
author	Hossain, Jafreen
author_facet	Hossain, Jafreen
author_sort	Hossain, Jafreen
title	Enhanced normalization approach to address stop-word complexity in compound-word schema labels
title_short	Enhanced normalization approach to address stop-word complexity in compound-word schema labels
title_full	Enhanced normalization approach to address stop-word complexity in compound-word schema labels
title_fullStr	Enhanced normalization approach to address stop-word complexity in compound-word schema labels
title_full_unstemmed	Enhanced normalization approach to address stop-word complexity in compound-word schema labels
title_sort	enhanced normalization approach to address stop-word complexity in compound-word schema labels
granting_institution	Universiti Putra Malaysia
publishDate	2014
url	http://psasir.upm.edu.my/id/eprint/60506/1/FSKTM%202014%2026IR.pdf
_version_	1747812277055127552

Enhanced normalization approach to address stop-word complexity in compound-word schema labels

Similar Items