| 研究生: |
林秉正 Lin, Ping-Cheng |
|---|---|
| 論文名稱: |
使用適應性區間模型於語者說話速度之調整 Adaptive Duration Modeling for Speaker Adaptation of Speaking Rate |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2002 |
| 畢業學年度: | 90 |
| 語文別: | 中文 |
| 論文頁數: | 70 |
| 中文關鍵詞: | 語音辨識 、語者調適 、說話速度 、區間模型 |
| 外文關鍵詞: | duration model, speech recognition, speaker adaptation, speaking rate |
| 相關次數: | 點閱:137 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
說話速度的變化是影響語音辨識效果的重要關鍵之一,訓練和測試語料之間說話速度的不匹配,導致了辨識效能的下降。針對說話速度,不但不同語者有不同的說話速度,就算是同一個語者本身,即使唸同一個語句,也可能因為情緒或是健康情況的不同,而導致語音特性上的不同,尤其是說話速度上的差別。
我們用區間模型來描述說話速度特性的方法,利用維特比演算法(Viterbi Algorithm)來切割適當的音框到各個狀態之中,再將每一個狀態所收集到的音框數,對每個狀態訓練一組區間模型,在辨識時將隱藏式馬可夫模型(HMM)延伸考慮區間模型,除了比對HMM的模型參數之外,區間模型的模型參數也需要一併考慮。
本論文提出了利用最大事後機率調整法(Maximum a Posterior, MAP)調整區間模型,調適的目的是為了讓系統可以經由少數的調整語料,調整區間模型參數以提高辨識率的方法,在模型層結合了事前機率和調整語料,針對語者的說話速度做調整,有效的改善系統效能。由最後的實驗結果得知,調整過後的區間模型,可以有效的描述不同速度的語音,使得系統即使在測試快速語料時,也能維持一定的辨識水準。
Speaking rate is one of the mismatches between training and testing environments. Even though the same user speaks the same utterance, the speech signal especially speaking rate changes because of the emotion or other factors. Most speech recognition performance is degraded when speaking rate is faster or slower than normal condition.
Speaker adaptation is an important technique which improves the speech recognition performance. MAP adaptation combines prior probability and few adaptation data to adapt model parameters. Duration model is feasible to describe the property of speaking rate. The recognition estimates both the HMM and duration model parameters during training. In adaptation phase, we apply MAP theory to adapt HMM and duration model parameters together.
This paper presents a new method to adapt model duration parameters. The MAP adaptation technique here is aimed at dealing with the problem of changing speaking rate. From the experiments, the recognition performance is significantly improved by adapting the duration model parameters. The adapted models are more robust when recognizing the utterance in fast speaking rate.
[1] 擁抱未來, Bill Gates著, 王美英譯, 遠流出版社, 1995
[2] A. Anastasakos, R. Schwartz and H. Shu," Duration Modeling in Large Vocabulary Speech Recognition", Proc. ICASSP 1995.
[3] M. Abromovitz and J.A. Stegun, Handbook of Mathematical Functions. New York: Dover Publications, Inc., 1965.
[4] A. Bonafonte, J. Vidal, A. Nogueiras, " Duration Modeling With Expanded HMM Applied To Speech Recognition", Proc. ICSLP 1996.
[5] R. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley, New York, 1973.
[6] T. Fabian, T. Pfau and G. Ruske, " Analysis of N-Best output hypotheses for fast speech in large vocabulary continuous speech recognition", Proc. Eurospeech 2001.
[7] R. Faltlhauser, T. Pfau and G. Ruske, "On-line Speaking Rate Estimation Using Gaussian Mixture Models", Proc. ICASSP 2000, IEEE, Vol. 3, S. 1355-1358.
[8] H. Kuwabara, "Acoustic and perceptual properties of phonemes in continuous speech as a function of speaking rate", Proc. Eurospeech 1997.
[9] H. Kuwabara, " Acoustic Properties of Phonemes in Continuous Speech for Different Speaking Rate", Proc. ICSLP 1996.
[10] W.H. Lai and S.H. Chen, "A novel syllable duration modeling approach for Mandarin speech", Proc. ICASSP 2001.
[11] W.H. Lai and S.H. Chen, "Analysis of syllable duration models for Mandarin speech", Proc. ICASSP 2002.
[12] L.L. Lapin, Modern Engineering Statistics, Duxbury Press, 1997
[13] C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICCASSP 1993, Vol.2, 558-561.
[14] C.-H. Lee, C.H. Lin and B.H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”, IEEE Trans. Acous., Speech, Signal Proc., Vol.39, pp.806-814,1991.
[15] C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters”, Proc. ICCASSP 1993, Vol.2, 558-561
[16] C.H. Lee and Q. Huo, "On Adaptive Decision Rules and Decision Parameter Adaptation for Automatic Speech Recognition," (invited paper), Proceedings of the IEEE, Vol. 88, No. 8, pp.1241-1269, 2000.
[17] C.J. Leggetter, P.C. Woodland." Speaker Adaptation of HMM's Using Linear Regression". Cambridge University, Technical Report, June 1994.
[18] C.J. Leggetter and P.C. Woodland, "Flexible Speaker Adaptation Using Maximum Likelihood Linear Regression", Proceedings of the Spoken Language System Technology Workshop, Jan 1995, pp. 110-115
[19] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language 1995, P.171-P.185
[20] N. Mirghafori, E. Fosler and Nelson Morgan, " Towards Robustness To Fast Speech In ASR", Proc. ICASSP 1996.
[21] S. Mohammad Ahadi-Sarkani, "Bayesian and Predictive Techniques for Speaker Adaptation", Ph.D. Thesis, Cambridge University, U.K., 1996
[22] H. Nanjo, K. Kato, and T. Kawahara, "Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition", Proc. Eurospeech 2001, pp.2531--2534
[23] T. Pfau, G. Ruske, "Creating Hidden Markov Models for Fast Speech", Proc. ICSLP 1998, pp. 205-208.
[24] T. Pfau, G. Ruske, "Estimating The Speaking Rate By Vowel Detection", ICASSP 1998.
[25] K. Power, " Durational Modelling For Improved Connected Digit Recognition " , Proc. ICSLP 1996.
[26] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[27] M. Richardson, M. Hwang, A. Acero, X.D. Huang, "Improvements on Speech Recognition for Fast Talkers", Proc. Eurospeech 1999.
[28] R.L. Scheaffer ,Introduction to Probability and Its Applications, PWS Publishing 1995.
[29] A. Tuerk and S. Young, "Modelling Speaking Rate Using a Between Frame Distance Metric", Proc. Eurospeech 1999, Vol. 1, pp. 419-422
[30] Jan P. Verhasselt and Jean-Pierre Martens , "A Fast And Reliable Rate Of Speech Detector", Proc. ICSLP 1996 , pp. 2258--2261
[31] H.C. Wang, "MAT-a project to collect Mandarin speech data through telephone networks in Taiwan", Computational linguistic and Chinese language Processing, vol2, no.1,pp.73-90,1997.
[32] S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK Book (Version 2.0). ECRL, 1995.