| 研究生: |
吳柏樹 Wu, Bo-Shu |
|---|---|
| 論文名稱: |
鑑別性事前資訊應用於強健性語音辨識 Robust Speech Recognition Using Discriminative Prior Statistics |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 86 |
| 中文關鍵詞: | 事前資訊 、貝氏預測分類器 、模型調適 、強健性語音辨識 、鑑別性訓練 |
| 外文關鍵詞: | Uncertainty, Robust Speech Recognition, Discriminative Training, Bayesian Predictive Classification, Prior Information, Model Adaptation |
| 相關次數: | 點閱:97 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在傳統語音辨識系統中,模型的訓練環境與測試環境不匹配(mismatch)是造成辨識率下降的首要問題,在此議題上,過去文獻已提出許多解決方法,如在語音模型端引入模型參數的不確定性所建立的強健性貝氏預測分類(Bayesian predictive classification)法則,或是調整模型於測試環境的調適方法,如最大事後機率(MAP)調適以及線性迴歸(MLLR)調適,甚至進一步考慮語音模型鑑別性之最小分類錯誤線性迴歸(MCELR)調適等方法。其中,貝氏預測分類法則是將模型參數的不確定性(uncertainty)適當的引入決策法則以達到決策方法的強健性,而參數不確定性反應了雜訊環境及聲學的變異性,它可由事前機率(prior density)來表示,傳統上貝氏學習提供了估測並更新參數事前資訊的機制。
為兼顧決策法則的強健性及鑑別性,本論文提出在貝氏預測分類架構下聲學模型及其事前機率模型之鑑別性訓練及更新,我們使用最小分類錯誤(MCE)之鑑別性準則來估測模型參數之超參數(hyperparameter),並且提出了兩種更新的方法,其一是直接針對隱藏式馬可夫模型平均值向量參數更新其事前統計量;其二是考慮線性迴歸調整,針對迴歸矩陣之事前資訊在最小分類錯誤準則下做更新。在以AURORA 2雜訊語音資料庫為主的評估實驗中發現使用更新過後的事前機率可以提昇貝氏預測分類之鑑別性。
Robustness is a crucial issue for speech recognition because the mismatch between training and testing environments always exists in real-world applications. This mismatch problem will deteriorate system performance substantially. In the literature, many works have been proposed to deal with robustness issue. One is to consider the uncertainty of model parameters in acoustic modeling and fulfill the so-called Bayesian predictive classification (BPC). Alternatively, we can adapt acoustic model parameters to testing environments via distribution estimation approaches, e.g. maximum a posteriori (MAP) adaptation and maximum likelihood linear regression adaptation (MLLR) or discriminative estimation approaches, e.g. minimum classification error linear regression (MCELR). Discriminative approach estimates model parameters with larger separation and smaller classification errors. Basically, the consideration of model uncertainties is feasible to build robust decision rule. These uncertainties represent the variations of noise environment and acoustic signal.
To achieve decision robustness and discriminability, we develop the discriminative prior information for BPC decision rule. We utilize MCE criterion to estimate prior statistics to represent the randomness of system parameters. This discriminative prior estimation is realized both for direct adaptation framework which adapts the hyperparameters of hidden Markov model (HMM) parameters directly and indirect adaptation framework which adapts HMM parameters indirectly using linear regression matrices. In the experiments on noisy speech recognition, we find that discriminative BPC and linear regression BPC performs better than MAP, MLLR, MCELR adaptation and traditional BPC approach in presence of different noise conditions.
[1] Ben-Yishai A., Burshtein, D., ”A discriminative training algorithm for hidden Markov models,” IEEE Trans. on Speech and Audio Processing, vol 12, pp. 204-217, May 2004
[2] L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” in Proc. of International Conference on Acoustic, Speech and Signal Processing, vol. 11, pp. 49-52, April 1986.
[3] A. Biem,S. Katagiri, E. McDermott and B.H. Juang,”An application of discriminative feature extraction to filter bank based speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 96-110, July 2001.
[4] R. Chengalvarayan, “Speaker adaptation using discriminative linear regression on time-varting mean parameters in trended HMM,” IEEE Signal Processing Letters, vol. 5, pp.63-65. 1998
[5] C. Chesta, O. Siohan, and C.-H. Lee. Maximum a posteriori linear regression for hidden Markov model adaptation. In Proceedings of European Conference on Speech Communication and Technology, volume 1, pages 211-214, Budapest, Hungary, 1999.
[6] J.-T. Chien, C.-H. Huang, K. Shinoda and S. Furui, “Towards optimal Bayes decision for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2006, pp. 45-48.
[7] J.-T. Chien, C.-H. Huang “Aggregate a Posteriori Linear Regression Adaptation,” IEEE Trans. Speech Audio Processing, vol. 14,pp. 797-807, May 2006.
[8] J.-T. Chien, “Linear regression based Bayesian predictive classification for speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 70-79, July 2002.
[9] J.-T. Chien, “Quasi-Bayes linear regression for sequential learning of hidden Markov models Speech and Audio Processing,” IEEE Trans. Speech and Audio Processing, vol. 10, pp. 268-278, July 2002.
[10] J.-T. Chien and G.-H. Liao “Transformation-based Bayesian predictive classification using online prior evolution,” IEEE Trans. Speech and Audio Processing, vol. 9, pp. 399-410, May 2001.
[11] J.-T. Chien, “Online hierarchical transformation of hidden Markov models for speech recognition,” IEEE Trans. Speech Audio Processing, vol. 7,pp. 656-667, Nov. 1999.
[12] W. Chou, "Maximum a posteriori linear regression with elliptically symmetric matrix variate priors", in Proc. EUROSPEECH, 1999, pp. 1-4.
[13] M. J. F. Gales and P. C. Woodland, “Mean and variance adaptation within the MLLR Framework,” Computer Speech and Language, Vol. 10, pp. 249-264, 1996.
[14] M. J. F. Gales, “Acoustic Factoristion”, in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU), pp. 77-80, Dec 2001
[15] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 291-298, April 1994.
[16] P. S. Gopalakrishnan, D. Kanevsk, A Nadas, and D. Nahamoo,”An ineuallity for rational function with applications to some statistical estimation problem,” IEEE Trans. Inf. Theory, vol. 37, no. 1 ,pp. 107-113, Jan 1991.
[17] X. He, W. Chou, ”Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs”, in Proc. Int. Conf. Multimedia and Expo (ICME), vol. 1, 2003, pp. 6-9.
[18] Q. Huo and C.-H. Lee, “Robust speech recognition based on adaptive classification and decision strategies,” Speech Communication, vol. 34, pp. 175-194, 2001.
[19] Q. Huo and Chin-Hui Lee, “A Bayesian predictive classification approach to robust speech recognition,” IEEE Trans. Speech And Audio Processing, vol.8, no.2, 2000.
[20] Q. Huo and C.-H. Lee, “A study of prior sensitivity for Bayesian predictive classification based robust speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, 1998, pp. 741-744.
[21] Q. Huo, H. Jiang and C.-H. Lee, “A Bayesian predictive classification approach to robust speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Munich, Germany, pp. II- 1547-1550, 1997.
[22] Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 161-172, Mar.1997.
[23] H, Jiang, X. Li and C Liu, “Large margin hidden Markov models for speech recognition,” To appear in IEEE Trans. Audio,Speech and Language Processing, 2006.
[24] H. Jiang and Li Deng, “A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 10, no. 1, January 2002.
[25] H. Jiang and Li Deng, “A bayesian approach to the verification problem: applications to speaker verification,” IEEE Trans. on Speech and Audio Processing, vol. 9, vo. 8, pp. 874-884, 2001.
[26] H. Jiang, K. Hirose, Q. Huo,”A minimax search algorithm for robust continuous speech recognition,” IEEE Trans. on Speech and Audio Processing, Vol.8, No. 6, pp.668-694,2000
[27] H. Jiang, K. Hirose and Q. Huo, “Robust speech recognition based on a Bayesian prediction approach,” IEEE Trans. on Speech And Audio Processing, vol. 7, no. 4, 1999.
[28] H. Jiang, K. Hirose and Q. Huo, “Improving Viterbi Bayesian predictive classification via sequential Bayesian learning in robust speech Recognition,” Speech Communication, vol. 28, no. 4, pp. 313-326, 1999.
[29] H. Jiang, K. Hirose, and Q. Huo, “Robust speech recognition based on Viterbi Bayesian predictive classification,” in Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing (ICASSP),1997, vol. II, pp 1551-1554.
[30] W. Jian, H. Qiang, “Supervised adaptation of MCE-trained CDHMMs using minimum classification error linear regression,” in Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), vol. 1, 2002, pp. I-605 - I-608.
[31] B.-H. Juang, W. Hou and C.-H. Lee, “Minimum classification error rate Methods for Speech Recognition”, IEEE Trans. Speech and Audio Processing, vol. 5, no. 3 , pp. 257-265, May 1997.
[32] B.-H. Juang, and S. Katagiri, “Discriminative learning for minimum error classification”, IEEE Trans. Speech and Audio Processing, vol. 40, no. 12 , pp. 3043-3054, Dec 1992.
[33] C.-H. Lee and B.-H. Juang, “A survey on automatic speech recognition with an illustrative example on continuous speech recognition of Mandarin”, Computational Linguistics and Chinese Language Processing, vol. 1, no.1, pp. 01-36, August 1996.
[34] C. J. Leggeter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, pp. 171-185, 1995.
[35] Q. Li, “Discovering relations among discriminative training objectives”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2004, pp. 33-36.
[36] Q. Li, B.-H. Juang, “A new algorithm for fast discriminative training”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2002, pp. 97-100.
[37] Q. Li, B.-H. Juang, “Fast discriminative training for sequential observations with application to speaker identification”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, 2003, pp. 397-400.
[38] X. Li, H. Jiang and C. Liu, “Large margin HMMs for speech recognition,” Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 2005, pp. 513-516.
[39] C. Liu, H. Jiang and X. Li, “Discriminative training of CDHMMs for maximum relative separation margin,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2005, pp. 101-104.
[40] N. Merhav and C.-H. Lee. “A minimax classification approach with application to robust speech recognition.” IEEE Transactions on Speech and Audio Processing, vol. 1, pp.90-100, 1993
[41] Y. Normandin, R. Cardin and R. De Mori, “High-performance connected digit recognition using maximum mutual information estimation,” IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 299-311, 1994.
[42] L. R. Rabiner and B.-H. Juang, Foundmentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[43] R. C. Rose, C.-H. Lee, and B.-H. Juang, “Model compensation for robust ASR,” in Proc. IEEE ASR Workshop, 1995, pp.98-100.
[44] R. Schluter, W. Macherey, “Comparison of discriminative training criteria”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 1998, pp. 493-496.
[45] O. Siohan, C. Chesta, and C.-H. Lee. “Joint maximum a posteriori adaptation of transformation and HMM parameters.” IEEE Transactions on Speech and Audio Processing, vol. 9, pp.417-428, 2001
[46] O. Siohan, C. Chesta, and C.-H. Lee. “Hidden Markov model adaptation using maximum a posteriori linear regression.” In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, 1999.
[47] J. Wu and Q. Huo, “Modelling uncertainty in stochastic vector mapping with minimum classification error training for robust speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, 2003, pp. 97-100.
[48] R. D. Yates, D. J. Goodman,”Probability and stochastic processes-A friendly introduction for electrical ad computer engineers,” 2005 John Wiley & Sons.
[49] S. Yooung, J. Jansen, J. Odell, D. Ollason, P. Woodland, The HTK BOOK(Version 2.0). ECRL, 1995
[50] K. Yu and M. J. F. Gales, “Incremental Adaptation using Bayesian inference,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, 2006, pp. 217-220.