簡易檢索 / 詳目顯示

研究生: 翁毓謙
Weng, Yu-Chien
論文名稱: 鑑別性貝氏分類法則應用於大詞彙連續語音辨識
Discriminative Bayesian Classification for Large Vocabulary Continuous Speech Recognition
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 93
中文關鍵詞: 語音辨識貝氏分類法則
外文關鍵詞: bayesian classification, speech recognition
相關次數: 點閱:82下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇論文中,我們從貝氏分類決策理論出發,同時考慮語音模型的不確定性與語音模型間的鑑別性,設計出以最小化貝氏期望風險(Minimum Bayes Risk)為主之語音辨識演算法。在最小化貝氏風險分類決策法則中,損失函數(loss function)應考慮類別鑑別性而設計,以期改進辨識效能。本研究提出以貝氏因子(Bayes Factor)為主之損失函數設計,針對目的候選詞與其最競爭之候選詞進行假設檢定(hypothesis test)以計算對應之損失量。為求檢定效能之強健性,我們在計算貝氏因子導入預測事前/事後機率,使設計出來的分類器能達到強健性及鑑別性。本論文將此損失函數與最小化貝氏風險決策法則實作於以詞圖(word graph)為主之搜尋演算法,針對於所設計之損失量先於第一階段記錄所需之候選詞與相關之狀態資訊後,於詞圖為主之重計分(rescoring)階段,將每一個候選詞之對應損失函數值計算出來並進行最小化貝氏風險之辨識流程。在以廣播語料為主之大詞彙連續語音辨識實驗中,我們評估並比較使用傳統詞圖之重計分與最小化貝氏風險辨識之辨識結果。

    In this thesis, we deal with the issues of model uncertainty and model discriminability when building Bayesian classification rule for large vocabulary continuous speech recognition. In conventional Bayeisan classification, we optimize the criterion of minimum Bayes risk (MBR) where the zero-one loss function is considered. The resulting maximum a posteriori (MAP) classification rule has been applied in many speech recognition systems. To improve discriminability of pattern classifier, it is important to design a discriminative loss function where input speech classified to different models should be properly penalized. In this study, we develop a Bayes factor based loss function. This loss/penalty function is established by performing hypothesis test of input speech corresponding to a target model against a competing model. The predictive distributions of target and competing models are computed to determine Bayes factors. In general, the new classification rule is discriminative and robust since the competing model and parameter uncertainty are considered in loss function. We also realize the proposed discriminative Bayesian classification in word graph based search algorithm. From the estimated word candidates and corresponding states, we can calculate loss functions and used them for word graph rescoring of individual word candidates. In the evaluation of broadcast news transcription using MATBN database, we show the superiority of proposed classification compared to MAP classification.

    第一章 緒論 1 1.1 研究動機 1 1.2 語音問題與特性 2 1.3 論文方法與主要成果 4 1.4 章節大綱 5 第二章 最小貝氏風險分類 7 2.1 語音辨識決策法則 7 2.2 最小貝氏風險決策實現 8 2.3 分段式最小貝氏風險分類架構 11 2.4 最佳貝氏決策法則 13 2.5 強健性決策法則 14 2.6 本章結論 15 第三章 大詞彙連續語音辨識搜尋技術 17 3.1 最佳貝氏決策法則 17 3.2 詞彙樹 19 3.3 一段式搜尋演算法 20 3.4 詞圖 31 3.5 詞圖刪減與重新計分 38 第四章 貝氏鑑別式分類法則 41 4.1 損失函數 41 4.2 假設檢定 44 4.3 貝氏因子 (Bayes Factors) 45 4.4 貝氏預估分類法則 47 4.5 以貝氏因子為基礎的損失函數 50 4.6 運用具有鑑別性的分類損失函數於詞圖重新累計分數 53 第五章 實驗設定與辨識系統架構 58 5.1 中文大詞彙連續語音辨識系統 59 5.2 實驗採用的訓練及測試語料 63 5.3 加入分類損失代價於詞圖中重新累計分數 65 5.4 聲學模型分數、語言模型分數與分類代價間的比重 68 5.5 系統展示 71 第六章 結論及未來展望 74 6.1 結論 74 6.2 未來展望 75 第七章 參考文獻 77

    [1] X. Aubert and H. Ney, “Large vocabulary continuous speech recognition using word graphs,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , pp.49-52, 1995.
    [2] L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 11, pp. 49-52, 1986.
    [3] P. C. Chang and B.-H. Juang, “Discriminative training of dynamic programming based speech recognizers,” IEEE Trans. on Speech and Audio Processing, Vol. 1, No. 2, pp. 135-143, 1993.
    [4] J. L. Gauvain and Chin-Hui Lee, “Maximum A Posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. on Speech and Audio Processing, vol.2, pp. 291-298, 1994.
    [5] V. Goel and W. Byrne. “LVCSR rescoring with modified loss function: a decision theoretic perspective,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 425-428, 1998.
    [6] P. S. Gopalakrishnan, L. R. Bahl and R. L. Mercer, “A tree search strategy for large vocabulary continuous speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 572-575, 1995.
    [7] R. Haeb-Umbach and H. Ney, “Improvements in time-synchronous beam search for 10000-word continuous speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 2, pp.353-356, 1994.
    [8] X. D. Huang, Y. Ariki and M. A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh University Press, 1990.
    [9] Qiang Huo and Chin-Hui Lee, “Robust speech recognition based on adaptive classification and decision strategies,” Speech Communication, vol. 34, pp. 175-194, 2001.
    [10] Qiang Huo and Chin-Hui Lee, “A Bayesian predictive classification approach to robust speech recognition,” IEEE Trans. Speech And Audio Processing, vol.8, no.2, 2000.
    [11] Qiang Huo, Hui Jiang and Chin-Hui Lee, “A Bayesian predictive classification approach to robust speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Munich, Germany, pp. II- 1547-1550, 1997.
    [12] Qiang Huo and Chin-Hui Lee, “On-Line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate,” IEEE Trans. on Speech and Audio Processing, vol. 5,no. 2, 1997.
    [13] F. Jelinek, “A fast sequential decoding algorithm using a stack,” IBM Journal of Research Development, 13: 675-685, 1969.
    [14] Hui Jiang, Keikichi Hirose and Qiang Huo, “Robust speech recognition based on a Bayesian prediction approach,” IEEE Trans. on Speech And Audio Processing, vol. 7, no. 4, 1999.
    [15] Hui Jiang and Li Deng, “A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 10, no. 1, January 2002.
    [16] Hui Jiang and Li Deng, “A bayesian approach to the verification problem: applications to speaker verification,” IEEE Trans. on Speech and Audio Processing, vol. 9, vo. 8, pp. 874-884, 2001.
    [17] Hui Jiang, K. Hirose and Qiang Huo, “Improving Viterbi Bayesian predictive classification via sequential Bayesian learning in robust speech Recognition,” Speech Communication, vol. 28, no. 4, pp. 313-326, 1999.
    [18] B.-H. Juang, W. Hou and C.-H. Lee, “Minimum classification error rate methods for speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 5, no. 3, pp. 257-265, 1997.
    [19] J. Kaiser, B. Horvat and Z. Kačič, “A novel loss function for the overall risk criterion based discriminative training of HMM models” in Int. Conf. on Spoken Language Processing, 2000
    [20] C. J. Leggetter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, vol. 9, pp. 171-185, 1995.
    [21] N. Merhav and Chin-Hui Lee, “A minimax classification approach with application to robust speech recognition,” IEEE Trans. on Speech and Audio Processing, vol. 1, no. 1, pp. 90-100, 1993.
    [22] Hermann Ney, Dieter Mergel, Andreas Noll, and Annedore Paeseler, “Data driven search organization for continuous speech recognition,” IEEE Trans. on Signal Processing, vol. 40, no.2, 1992 .
    [23] Hermann Ney, “Search strategies for large-vocabulary continuous speech recognition,” In Speech recognition and Coding-New Advances and Trends, pp.210-225, 1993.
    [24] H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 263-271, 1984.
    [25] H. Ney, R.Haeb-Umbach, B.-H. Tran, and M. Oerder, “Improvements in beam search for 10000-word continuous speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp.13-16, 1992.
    [26] H. Ney, D.Mergel, A. Noll, and A.Paeseler, “Data-driven organization of the dynamic programming beam search for continuous speech recognition.” IEEE. Trans. Signal Processing, vol. 40, pp.272-281, 1992.
    [27] H. Ney and S. Orthmanns, “Extensions to the word graph methos for large vocabulary continuous speech recognition,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 3, pp. 1787-1790, 1997.
    [28] Hermann Ney and Stefan Ortmanns, “Progress in dynamic programming search for LVCSR,” Proceedings of the IEEE, vol. 88, no. 8, pp. 1224-1240, 2000.
    [29] M. Oerder and H. Ney, “Word graphs: an efficient interface between continuous speech recognition and language understanding,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp. 119-122, 1993.
    [30] S. Ortmanns, H. Ney, and X. Aubert, “A word graph algorithm for large vocabulary continuous speech recognition,” Computer Speech and Language, vol. 11, pp. 43-72, 1997.
    [31] S. Ortmanns, H. Ney, F. Seide, and I. Lindam, “A comparison of vocabulary speech recognition,” in Int. Conf. Spoken Language Processing, pp. 2091-2094, 1996.
    [32] L. R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
    [33] D. B. Rose, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol.1, pp.25-28, 1992.
    [34] D. B. Rose, “Algorithms for an optimal A* search and linearizing the search in the stack decoder,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 693-996, 1991.
    [35] R. Schwartz and Y. L. Chow, “The N-Best algorithm: an efficient and exact procedure for finding the N most likely sentence hypotheses,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol.1, pp. 81-84, 1990.
    [36] R. Schwartz and S. Austin, “A comparison of several approximate algorithms for finding multiple (N-Best) sentence hypotheses,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 701-704, 1991.
    [37] V. Steinbiss, H. Ney, R. Gaeb-Umbach, B.-H. Tran, U. Essen, and R. Kneser et al., “The Philips research system for large-vocabulary continuous-speech recognition,” in Proc. Euro Conf. Speech Communication and Technology, pp. 2125-2128, 1993.
    [38] V. Steinbiss, B.-H. Tran, and H.Ney, “Improvements in beam search,” in Int. Conf. Spoken Language Processing, pp. 1355-1358, 1994.
    [39] A. Stolcke, Y. König and M. Weintraub, “Explicit word error rate minimization in N-best list rescoring,” in Proc. European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 163-166, 1997.
    [40] F. Wessel, R. Schlüter and Hermann Ney, “Explicit word error minimization using word hypothesis posterior probabilities,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 33-36, 2001.

    下載圖示 校內:2006-08-04公開
    校外:2006-08-04公開
    QR CODE