Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition

One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization...

Full description

Saved in:
Bibliographic Details
Main Author: Wong, Jensen Jing Lung
Format: Thesis
Language:English
Published: 2014
Subjects:
Online Access:http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.48537
record_format uketd_dc
spelling my-utm-ep.485372017-08-14T00:22:55Z Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition 2014 Wong, Jensen Jing Lung BF Psychology One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost. 2014 Thesis http://eprints.utm.my/id/eprint/48537/ http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83876?queryType=vitalDismax&query=Multi-speaker+frequency+warping+vocal+tract+length+normalization+for+speaker+independent+speech+recognition&public=true masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic BF Psychology
spellingShingle BF Psychology
Wong, Jensen Jing Lung
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
description One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost.
format Thesis
qualification_level Master's degree
author Wong, Jensen Jing Lung
author_facet Wong, Jensen Jing Lung
author_sort Wong, Jensen Jing Lung
title Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_short Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_full Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_fullStr Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_full_unstemmed Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
title_sort multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2014
url http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf
_version_ 1747817414602522624