Development of stuttered speech reconstruction system /

Speech is an action in adults characterized by the production of about 14 different sounds per second via the coordinated actions of about 100 muscles connected by spinal and cranial nerves. Only 5-10% of the human population have a completely normal form of oral communication in relation to numerou...

Full description

Saved in:
Bibliographic Details
Main Author: Ajibola, Alim Sabur (Author)
Format: Thesis
Language:English
Published: Kuala Lumpur : Kulliyyah of Engineering, International Islamic University Malaysia, 2017
Subjects:
Online Access:Click here to view 1st 24 pages of the thesis. Members can view fulltext at the specified PCs in the library.
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech is an action in adults characterized by the production of about 14 different sounds per second via the coordinated actions of about 100 muscles connected by spinal and cranial nerves. Only 5-10% of the human population have a completely normal form of oral communication in relation to numerous speech features and healthy voice. Stuttering is defined as a disruption in the normal flow of speech unintentionally by dysfluencies, which includes repetitive pronunciation, prolonged pronunciation, blocked or stalled pronunciation at the phoneme or the syllable level. The aim of this research is to design and develop a speech reconstruction system for stuttered speech. Autoregressive (AR) and Autoregressive Moving Average (ARMA) models were used to model stuttered speech. The generally poor performance of both linear models gave the conviction to use the nonlinear model, the simplest of which is Nonlinear Autoregressive (NAR) neural network model. The Akaike Information Criteria (AIC) for the first 60 orders of the AR models was used for evaluating all the samples used. The AIC number with the lowest value is the best model that fits the signal being modeled. It was observed that there was no relationship between the types of stuttering present in each sample and the generated models. The NAR neural network models had generally the lowest Mean Square Error (MSE) values, ranging from 10-6 to 10-14. LPC reconstruction and the autoencoder neural network were used for the speech reconstruction. The effect of white noise masking on the reconstructed speech was also evaluated. The MSE between the original speech and the reconstructed speech without noise masking for all the speech samples is zero, indicating that the reconstruction was perfect with excellent quality of speech and a mirror reflection of the original speech. The MSE between the original speech and the reconstructed speech for the autoencoder neural network was between 10-2 and 10-5. The MSE of the LPC reconstruction algorithm showed a perfect quality of reconstruction while that of the autoencoder neural network showed a far from perfect reconstruction quality. The automatic speaker recognition (ASR) systems were further used to evaluate the reconstructed speech; the confusion matrices in each case. 25, 65 and 215 hidden nodes were also used for the neural networks. The multilayer perceptron (MLP) and the recurrent neural network (RNN) were used. It was however observed that the reconstructed speech without noise masking had a near perfect recognition of the speaker when using the line spectral frequency (LSF) combined with MLP using 215 hidden nodes. An offline software has also been developed for implementing the disordered speech reconstruction system.
Physical Description:xxi, 180 leaves : illustrations ; 30cm.
Bibliography:Includes bibliographical references (leaves 163-174).