An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin
There is a lot of work which have been implemented to solve the problem of text classification, but there is only a little research doing Arabic text classification because of the difficulties in Arabic morphology and the limited public dataset. In order to construct the dataset, the dataset is vali...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/75386/1/75386.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-uitm-ir.75386 |
---|---|
record_format |
uketd_dc |
spelling |
my-uitm-ir.753862024-05-16T01:57:36Z An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin 2022 Mazlin, Mohd Irwan Hadith literature. Traditions. Sunna Information organization There is a lot of work which have been implemented to solve the problem of text classification, but there is only a little research doing Arabic text classification because of the difficulties in Arabic morphology and the limited public dataset. In order to construct the dataset, the dataset is validated by an expert from lecturer University Sains Islam Malaysia. The purpose validates the dataset is to maintain the authenticity of the content of the hadith. Convolution Neural networks and support vector machines are two different algorithms applied to text classification. CNN seems to be good in extracting the feature from input, and SVM is good for the classification task. This study is to introduce Hadith text classification using a Convolutional Neural Network and Support Vector Machine. There are 6 different ways of designing the experiment to evaluate the result of the study, which are an experiment with the model using different stemming techniques, an experiment with the model using three different algorithms, the result analysis of confusion matric of three algorithms, experiment the model using different SVM kernel, experiment the model using unseen data, produce precision, recall, F1-measure and accuracy result of the model and parameter. First, different model performances are being analysed to find which model gives higher accuracy for this study. CNN-SVM shows a promising result with 92% accuracy, while the CNN only and SVM only give lower accuracy than the proposed model with 82% and 74%. Second, parameter tuning is conducted to find the best parameter for CNN-SVM. Third, the model (CNN-SVM, CNN and SVM) is monitored to see if their performance predicts unseen data. In this study, the CNN-SVM model predicts all correct when using unseen data. Fourth, the model is being tested using different stemming techniques, and it found that the model using non-stemming techniques gives higher accuracy with 92%. Lastly, the different kernel of SVM kernels is being tested to investigate the model's performance for this study. The details about the other experiment can be seen in chapter five, Result and Discussion. The model (CNN-SVM) shows the potential in this study as the model shows better performance than other models. However, there are some limitation of this study, the dataset used were not applied to all categories. It only involved three classes which are prayer, fasting and zakat. So, the model not able to predict correctly if the model predict out of the selected classes. It might be better when the model learns more data and a more specific topic about the Hadith in Arabic. For future work, it is recommended to extend the dataset so that the model can predict the classes in more detail and combine the model with an optimization algorithm to improve the performance of the model. 2022 Thesis https://ir.uitm.edu.my/id/eprint/75386/ https://ir.uitm.edu.my/id/eprint/75386/1/75386.pdf text en public masters Universiti Teknologi MARA (UiTM) Faculty of Computer and Mathematical Sciences Mohamed Rawi, Mohd Izani |
institution |
Universiti Teknologi MARA |
collection |
UiTM Institutional Repository |
language |
English |
advisor |
Mohamed Rawi, Mohd Izani |
topic |
Hadith literature Traditions Sunna Information organization |
spellingShingle |
Hadith literature Traditions Sunna Information organization Mazlin, Mohd Irwan An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
description |
There is a lot of work which have been implemented to solve the problem of text classification, but there is only a little research doing Arabic text classification because of the difficulties in Arabic morphology and the limited public dataset. In order to construct the dataset, the dataset is validated by an expert from lecturer University Sains Islam Malaysia. The purpose validates the dataset is to maintain the authenticity of the content of the hadith. Convolution Neural networks and support vector machines are two different algorithms applied to text classification. CNN seems to be good in extracting the feature from input, and SVM is good for the classification task. This study is to introduce Hadith text classification using a Convolutional Neural Network and Support Vector Machine. There are 6 different ways of designing the experiment to evaluate the result of the study, which are an experiment with the model using different stemming techniques, an experiment with the model using three different algorithms, the result analysis of confusion matric of three algorithms, experiment the model using different SVM kernel, experiment the model using unseen data, produce precision, recall, F1-measure and accuracy result of the model and parameter. First, different model performances are being analysed to find which model gives higher accuracy for this study. CNN-SVM shows a promising result with 92% accuracy, while the CNN only and SVM only give lower accuracy than the proposed model with 82% and 74%. Second, parameter tuning is conducted to find the best parameter for CNN-SVM. Third, the model (CNN-SVM, CNN and SVM) is monitored to see if their performance predicts unseen data. In this study, the CNN-SVM model predicts all correct when using unseen data. Fourth, the model is being tested using different stemming techniques, and it found that the model using non-stemming techniques gives higher accuracy with 92%. Lastly, the different kernel of SVM kernels is being tested to investigate the model's performance for this study. The details about the other experiment can be seen in chapter five, Result and Discussion. The model (CNN-SVM) shows the potential in this study as the model shows better performance than other models. However, there are some limitation of this study, the dataset used were not applied to all categories. It only involved three classes which are prayer, fasting and zakat. So, the model not able to predict correctly if the model predict out of the selected classes. It might be better when the model learns more data and a more specific topic about the Hadith in Arabic. For future work, it is recommended to extend the dataset so that the model can predict the classes in more detail and combine the model with an optimization algorithm to improve the performance of the model. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Mazlin, Mohd Irwan |
author_facet |
Mazlin, Mohd Irwan |
author_sort |
Mazlin, Mohd Irwan |
title |
An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
title_short |
An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
title_full |
An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
title_fullStr |
An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
title_full_unstemmed |
An Arabic hadith text classification model using convolutional neural network and support vector machine / Mohd Irwan Mazlin |
title_sort |
arabic hadith text classification model using convolutional neural network and support vector machine / mohd irwan mazlin |
granting_institution |
Universiti Teknologi MARA (UiTM) |
granting_department |
Faculty of Computer and Mathematical Sciences |
publishDate |
2022 |
url |
https://ir.uitm.edu.my/id/eprint/75386/1/75386.pdf |
_version_ |
1804889690966327296 |