Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition

One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization...

Full description

Saved in:

Bibliographic Details
Main Author:	Wong, Jensen Jing Lung
Format:	Thesis
Language:	English
Published:	2014
Subjects:	BF Psychology
Online Access:	http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utm-ep.48537
record_format	uketd_dc
spelling	my-utm-ep.485372017-08-14T00:22:55Z Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition 2014 Wong, Jensen Jing Lung BF Psychology One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost. 2014 Thesis http://eprints.utm.my/id/eprint/48537/ http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83876?queryType=vitalDismax&query=Multi-speaker+frequency+warping+vocal+tract+length+normalization+for+speaker+independent+speech+recognition&public=true masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	BF Psychology
spellingShingle	BF Psychology Wong, Jensen Jing Lung Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
description	One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost.
format	Thesis
qualification_level	Master's degree
author	Wong, Jensen Jing Lung
author_facet	Wong, Jensen Jing Lung
author_sort	Wong, Jensen Jing Lung
title	Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_short	Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_full	Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_fullStr	Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_full_unstemmed	Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_sort	multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
granting_institution	Universiti Teknologi Malaysia, Faculty of Computing
granting_department	Faculty of Computing
publishDate	2014
url	http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf
_version_	1747817414602522624

Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition

Similar Items