Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition
One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utm-ep.48537 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.485372017-08-14T00:22:55Z Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition 2014 Wong, Jensen Jing Lung BF Psychology One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost. 2014 Thesis http://eprints.utm.my/id/eprint/48537/ http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:83876?queryType=vitalDismax&query=Multi-speaker+frequency+warping+vocal+tract+length+normalization+for+speaker+independent+speech+recognition&public=true masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
BF Psychology |
spellingShingle |
BF Psychology Wong, Jensen Jing Lung Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
description |
One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Wong, Jensen Jing Lung |
author_facet |
Wong, Jensen Jing Lung |
author_sort |
Wong, Jensen Jing Lung |
title |
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
title_short |
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
title_full |
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
title_fullStr |
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
title_full_unstemmed |
Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
title_sort |
multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Computing |
granting_department |
Faculty of Computing |
publishDate |
2014 |
url |
http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf |
_version_ |
1747817414602522624 |