| 研究生: |
黃柏景 Huang, Po-Ching |
|---|---|
| 論文名稱: |
最大化邊界隱藏式馬可夫模型之證據架構應用於語音辨識 Evidence Framework for Large Margin Hidden Markov Model Based Speech Recognition |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 77 |
| 中文關鍵詞: | 最大化邊界 、證據架構 |
| 外文關鍵詞: | large margin, evidence framework |
| 相關次數: | 點閱:66 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們將貝氏證據架構應用於語音辨識中新穎之最大化邊界隱藏式馬可夫模型的參數調適。而我們最主要的訴求在於藉由改善傳統最大化邊界模型的一般化能力來改進分類學習的效果。傳統上,最大化邊界分類器是基於兼顧邊界最大化與訓練錯誤最小化的原則而建構出來的,但卻未能保證該分類器在未知的測試資料中也能保持良好的效果。因此,本方法提出了以貝氏推論為基礎的證據架構應用於最大化邊界模型的訓練,其中訓練語料參數事後機率之估算是藉由針對最大化邊界模型邊緣化的方式所求出。透過適當地挑選的超參數來循序性地更新模型參數為本方法核心之精神。除此之外,本研究利用(Expectation-Maximization, EM)演算法來估測貝氏證據架構中的的最大化事後機率以及最大化證據參數。在實驗中,我們使用TIMIT語料庫進行語音模型參數之訓練,並針對模型的一般化能力與辨識能力等議題與其他方法作比較。
The Bayesian evidence framework is presented in this paper for speech recognition based on the state-of-art large margin hidden Markov model (LM-HMM). Our aim is to elevate the speech recognition performance by improving the model generalization for LM-HMM. Traditionally, a large margin classifier is built by considering the concept of margin maximization and training error minimization. The trained LM-HMM is not guaranteed to gear with good prediction for test speech. For this consideration, we develop the Bayesian approach to LM-HMM training where the posterior distribution of training data is calculated and the marginalization over LM-HMM parameters is performed. By an appropriate choice of LM-HMM hyper parameters, the proposed evidential LM-HMM (ELM-HMM) is established. The expectation-maximization (EM) algorithm is applied in the Bayesian evidence framework for finding the maximum a posteriori parameters and the maximum evidence parameters in LM-HMM framework. In the experiments on TIMIT speech database, the proposed large margin HMM obtains good model generalization and speech recognition performance.
[1] Y. Altun and T. Hofmann, “Large margin methods for label sequence learning,” in Proc. of Interspeech, pp. 993-996, 2003.
[2] L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 11, pp. 49-52, 1986.
[3] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer Science+ Business Media, 2006.
[4] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, no. 2, pp. 121-167, 1998.
[5] J.-K. Chen and F. K. Soong, “An N-best candidates-based discriminative training for speech recognition applications,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 206-216, 1994.
[6] W. Chou, B. H Juang and C. H. Lee, “Segmental GPD training of HMM based speech recognizer,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 473-476, 1992.
[7] W. Chou, C.-H. Lee and B.-H. Juang, “Minimum error rate training based on N-best string models,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 652-655, 1993.
[8] J.-T. Chien, "Quasi-Bayes linear regression for sequential learning of hidden Markov models," IEEE Trans. on Audio, Speech and Language Processing, vol. 10, no. 5, pp.268-278, 2002.
[9] J.-T. Chien and C.-H. Huang, "Bayesian learning of speech duration models," IEEE Trans. on Audio, Speech and Language Processing, vol. 11, no. 6, pp. 558-567, 2003.
[10] J.-T. Chien and S. Furui, "Predictive hidden Markov model selection for speech recognition," IEEE Trans. on Speech and Audio Processing, vol. 13, no. 3, pp. 377-387, 2005.
[11] J.-T. Chien, C.-H. Huang, K. Shinoda and S. Furui, “Towards optimal Bayes decision for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 45-48, 2006.
[12] J.-T. Chien, C.-H. Huang, ” Aggregate a Posteriori linear regression adaptation,” IEEE Trans. Speech Audio Processing, vol. 14, pp. 797-807, 2006.
[13] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001.
[14] B.-H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 40, no. 12, pp. 3043-3054, 1992.
[15] B.-H. Juang, W. Chou and C.-H. Lee, “Minimum classification error rate methods for speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 5, no. 2, pp. 266–277, 1997.
[16] H. Jiang, X. Li and C. Liu, “Large margin hidden Markov models for speech recognition,’’ IEEE Trans. on Audio, Speech and Language Processing, vol. 14, no. 5, pp.1584-1595, 2006.
[17] S. Katagiri, C.-H. Lee and B.-H. Juang, “New discriminative training algorithms based on the generalizedprobabilistic descent method,” in Proc. IEEE Workshop Neural Network for Signal Processing, pp. 299-308, 1991.
[18] S. Katagiri, B.-H. Juang, and C.-H. Lee, ”Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2345-2373, 1998.
[19] J. T.-Y. Kwok, "Moderating the outputs of support vector machine classifiers," IEEE Trans. on Neural Networks, vol. 10, no. 5, pp. 1018-1031, 1999.
[20] J. T.-Y. Kwok, “The evidence framework applied to support vector machines,” IEEE Trans. on Neural Networks, vol. 11, no. 5, pp. 1162-1173, 2000.
[21] J. Keshet, S. Shalev-Shwartz, Y. Singer and D. Chazan, “A large margin algorithm for speech-to-phoneme and music-to-score alignment,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2373-2382, 2007.
[22] C.-H. Lee and B.-H. Juang, “A survey on automatic speech recognition with an illustrative example on continuous speech recognition of Mandarin”, Computational Linguistics and Chinese Language Processing, vol. 1, no.1, pp. 01-36, 1996..
[23] K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641–1648, 1988.
[24] X. Li, H. Jiang and C. Liu, “Large margin HMMs for speech recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 5, pp. 513-516, 2005.
[25] C. Liu, H. Jiang and X. Li, “Discriminative training of CDHMMs for maximum relative separation margin,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 101-104, 2005.
[26] C. Liu, H. Jiang, L. Rigazio, “Maximum relative margin estimation of HMMs based on N-best string models for continuous speech,” in Proc. IEEE Workshop Automatic Speech Recognition and Understanding, pp. 420–425, 2005.
[27] C. Liu, H. Jiang and L. Rigazio, “Recent improvement on maximum relative margin estimation of HMMs for speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 269-272, 2006.
[28] J. Li, M. Yuan and C.-H. Lee, “Soft margin estimation of hidden Markov model parameters,” in Proc. Interspeech, pp. 2422-242, 2006.
[29] J. Li, M. Yuan and C.-H. Lee, “Approximate test risk bound minimization through soft margin estimation,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2393-2404, 2007.
[30] D. J. C. MacKay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415-447, 1992.
[31] D. J. C. MacKay, “The Evidence Framework Applied to Classification Networks,” Neural Computation, vol. 4, no. 4, pp. 720-736, 1992.
[32] E. McDermott, T. J. Hazen, J. L. Roux, A. Nakamura and S. Katagiri, “Discriminative training for large-vocabulary speech recognition using minimum classification error,” IEEE Trans. on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 203-223, 2007.
[33] D. Povey and P. C. Woodland, “Minimum phone error and I-smoothing for improved discriminative training,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 105-108, 2002.
[34] D.W. Purnell and E.C. Botha, “Improved Generalization of MCE Parameter Estimation With Application to Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 4, pp. 232–239, 2002.
[35] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[36] R. Schluter, W. Macherey, B. Muller and H. Ney, "Comparison of discriminative training criteria and optimization methods for speech recognition," Speech Communication, pp. 287-310, 2001.
[37] F. Sha and L. K. Saul, “Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 4, pp. 313-316, 2007.
[38] F. K. Soong and E. -F. Huang, “A tree-trellis based fast search for finding the N-bestsentence hypotheses in continuous speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 705-708, 1991.
[39] V. N. Vapnik, The Nature of Statistical Learning Theory, New-York: Springer-Verlag, 1995.
[40] V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
[41] P. C. Woodland, J. J. Odell, V. Valtchev and S. J. Young, “Large vocabulary continuous speech recognition using HTK,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 125-128, 1994.
[42] S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK
BOOK (Version 2.0), 1995.