Evaluating the retrieval performance model of Addaall stemmer for Arabic news of al-Jazeera /
Although, stemming improves the effectiveness of information retrieval of language, it has some limitations and shortcomings. Among the main problems are that it can reduce unrelated words to the same stem as well as fall short to reduce related words to a common stem. In addition, most stemmers suc...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
Kuala Lumpur :
Kulliyyah of Information & Communication Technology,International Islamic University Malaysia,
2012
|
Subjects: | |
Online Access: | http://studentrepo.iium.edu.my/handle/123456789/5367 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Although, stemming improves the effectiveness of information retrieval of language, it has some limitations and shortcomings. Among the main problems are that it can reduce unrelated words to the same stem as well as fall short to reduce related words to a common stem. In addition, most stemmers such as light stemming are of heuristic effort, falling short of full understanding of the morphology of the language. This lays the ground for more research works on search engines which use stemming in order to develop the most effective one for Arabic IR, such as Addaall. This research investigated the retrieval performance of Addaall stemmer. Addaall is a web based Arabic search engine that uses a morphological analyzer and generator to construct different indices based on both root and stem of a word. It evaluates the Addaall prefixes and suffixes removal search (PSRS) and root search (RS) for retrieving Arabic news documents in semi-laboratory setting. The theoretical assumption is that semantic linguistic search can improve recall and precision for both root and stem searches in semi-laboratory environment setting. The research conducted a comparison between PSRS, RS and exact search (ES) as well as explored the main obstacles attributed to indexing and retrieval of Arabic information using different types of index strategies, stem and roots. Queries were constructed from Al-Jazeera news from 2002-2007 and submitted to the two main searches; Addaall and AlJazeera search engine. The retrieved documents were judged relevant if they contain correct and meaningful search term with no ambiguity. Strata and random sampling were carried out in order to run statistical significance testing. The findings of this research demonstrated that PSRS precision rate was significantly higher than those of ES and RS. The RS recall rate was significantly higher than ES and PSRS recall rate. Additionally, this research indicated that Addaall stemmer had improved both recall and precision compared to non-stemming. This is due to Addaall use of linguistic semantic search in its morphological analysis at different levels. On the other hand, the causes for failure were related to root, stemming, Arabic diacritics and indexing. Hence, the significance of this research is underscoring the need for constant research on these factors in order to propose the proper strategies and solutions. Generally, the findings of this research indicated that using web collection in semi-laboratory environment showed that removing prefixes and suffixes without trying to remove the infixes or finding the root enhanced recall and precision values. |
---|---|
Item Description: | Abstracts in English and Arabic. " A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy Library and Information Science."--On t. p. |
Physical Description: | xvi, 271 leaves : illustrations ; 30cm. |
Bibliography: | Includes bibliographical references (leaves 227-236). |