Proper noun detection using regex algorithm and rules for malay named entity recognition

This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorith...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Farid Morsidi
التنسيق: thesis
اللغة:eng
منشور في: 2018
الموضوعات:
الوصول للمادة أونلاين:https://ir.upsi.edu.my/detailsg.php?det=5380
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorithm and rule were introduced in this study toovercome the limitation of dictionary and gazetteer. Two visualization techniques namely asDecision Tree and Term Document Matrix had been used to evaluate the efficiency of themethod. The result obtained 74% of accuracy during the generation of decision tree. Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, theregex algorithm could indicate the presence of Malay proper noun, thus making it an appropriatemethod for extraction tool to cluster and classify Malay proper noun. The study implicates thatthe use of Malay proper noun detection method can increase the effectiveness in namedentity recognition and beneficial to improve document retrieval for Malaylanguage.