Proper noun detection using regex algorithm and rules for malay named entity recognition

This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorith...

全面介绍

Saved in:
书目详细资料
主要作者: Farid Morsidi
格式: thesis
语言:eng
出版: 2018
主题:
在线阅读:https://ir.upsi.edu.my/detailsg.php?det=5380
标签: 添加标签
没有标签, 成为第一个标记此记录!
实物特征
总结:This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorithm and rule were introduced in this study toovercome the limitation of dictionary and gazetteer. Two visualization techniques namely asDecision Tree and Term Document Matrix had been used to evaluate the efficiency of themethod. The result obtained 74% of accuracy during the generation of decision tree. Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, theregex algorithm could indicate the presence of Malay proper noun, thus making it an appropriatemethod for extraction tool to cluster and classify Malay proper noun. The study implicates thatthe use of Malay proper noun detection method can increase the effectiveness in namedentity recognition and beneficial to improve document retrieval for Malaylanguage.