Offline arabic character recognition using genetic approach

Many optical character recognition (OCR) techniques and tools have been developed for plurality of languages. A successful OCR system improves interactivity between humans and computers in many applications such as digitising and recognising written content. With regard to Arabic OCR, the problem of...

Full description

Saved in:
Bibliographic Details
Main Author: Aljuaid, Hanan Abdulrahman
Format: Thesis
Language:English
Published: 2010
Subjects:
Online Access:http://eprints.utm.my/id/eprint/16567/7/HananAbdulrahmanAljuaidMFSKSM2010.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many optical character recognition (OCR) techniques and tools have been developed for plurality of languages. A successful OCR system improves interactivity between humans and computers in many applications such as digitising and recognising written content. With regard to Arabic OCR, the problem of handwriting recognition is challenging because Arabic letters are cursive and shapechangeable depending on their positions. OCR systems have reached nearly perfect acknowledgement of Arabic printed text, yet still in its inception and needs to be greatly improved with handwritten text. Therefore in this study, an approach to recognize Arabic characters based on genetic algorithms (GA) is proposed. The approach requires two separate stages; feature extraction and GA for character recognition development. In the feature extraction stage, six features are detected for each character and denoted as a feature vector of 6 integer numbers. The feature vectors are then utilised in the next stage. Three genetic operators namely selection, crossover and mutation are implemented to search for the similar vectors with the best fitness value to recognise the character. The data used in this study were collected from different resources and stored in a database. It consists of 12,500 printed text words in 50 paragraphs and 15,000 words written by 100 different writers, males and females aged 5 to 60 years. Pre-processing operations are conducted including segmenting paragraphs into lines, segmenting line into words, segmenting words into characters, detecting skeleton, and determining baseline and other horizontal zones. The experimental results have shown that the proposed method has achieved promising accuracy recognition rate with 90.46% for printed text and handwritten characters.