Enhanced ontological query expansion model using bigram combinations and combined similarity measure for improving information retrieval

The information explosion over the Web has been increasing and changing rapidly over the time, thus the effective retrieval of information is increasingly gaining in importance. Most Information Retrieval (IR) systems typically rely on query and document keyword matching, in order to search over hug...

Full description

Saved in:
Bibliographic Details
Main Author: Raza, Muhammad Ahsan
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/30381/1/Enhanced%20ontological%20query%20expansion%20model%20using%20bigram.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The information explosion over the Web has been increasing and changing rapidly over the time, thus the effective retrieval of information is increasingly gaining in importance. Most Information Retrieval (IR) systems typically rely on query and document keyword matching, in order to search over huge amounts of Web data, examples being famous search engines such as Google, Bing or Yahoo. Problem arising with these simple keyword matching IR systems is vocabulary mismatch issue: the searcher’s query terms may not be matched with those of the corpus. IR systems cannot always expect a user to type the exact keyword in query as present in corpus in order to obtain relevant documents. To deal with this issue, several efforts have been made such as query expansion, whereby the search query is expanded with additional relevant terms using original query keywords. In recent years, ontology based query expansion (OQE) emerges as an advance query expansion model to expand search query semantically using ontology knowledgebase. However, common problems with existing OQE model include (i) the inherent ambiguity of natural language search query, (ii) term-based expansion to support unstructured search query rather than considering multiple query terms together (iii) The expansion of search query with irrelevant terms. The main objective of this research is to improve existing OQE model and propose an enhanced ontological query expansion model (EOQE) for effective information retrieval. The EOQE model attempts to semantically expand unstructured natural language search queries in order to retrieve relevant documents for computer science discipline. The model overcomes the limitations of existing OQE model by following three main steps. First, the query refinement step performs linguistic processing of search query and generates valid search term. Second, the enhanced ontology based expansion step disambiguates the search query and generates the additional expansion concepts on the basis of bigram combinations technique. Third, the query formulation step filters irrelevant terms from expansion concepts set using combined similarity measure technique. The performance evaluation EOQE model was based on comparing the retrieval results of queries expanded with EOQE model and the original queries (called as baseline model). On Vector Space Model (VSM) standard IR system, the EOQE model showed 32% improvements in terms of mean average precision against baseline model, and achieved above 90% recall values for most of search queries. The EOQE model also attained a 17% and 15% increase in P@20 and P@40 values, respectively, than baseline model over famous Google search system. Furthermore, the EOQE model demonstrated competitive performance in terms of precision, recall, average precision, and mean average precision values against EOQE model variants based on single similarity measure. The main contributions of this research are to introduce a model to semantically expand unstructured and ambiguous natural language query using bigram combinations and strong combined similarity measure techniques. These contributions enable exploiting multiple query terms in procedure of OQE rather than using individual query terms, and formulating expanded queries with more relevant semantic concepts.