簡易檢索 / 詳目顯示

研究生: 趙子賢
Chao, Tzu-Hsien
論文名稱: 具信賴性斷點偵測之語音搜尋演算法
LVCSR Search Algorithm Using Reliable Change Point Detection
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 95
中文關鍵詞: 連串檢定震盪現象無母數統計斷點偵測大詞彙連續語音辨識
外文關鍵詞: Vibration, LVCSR, Change Point Detection, Non-parametric Statistics, Run Test
相關次數: 點閱:110下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 以動態規劃法(dynamic programming)和隱藏式馬可夫模型(hidden Markov model)為基礎的自動語音辨識(automatic speech recognition)是架構目前大字彙連續語音辨識系統的核心技術。然而,在連續語音訊號做動態規劃搜尋面臨著許多困難,其中包括了環境噪音干擾和連續語音之連音現象所造成的斷點偵測不穩定問題,如何提供快速且具信賴性的斷點偵測效果並提昇語音辨識率已成為關鍵性的研究課題。傳統上以相似度比值(likelihood ratio)做信心量測(confidence measure)可以改善斷點偵測的可靠度。但由於自發性語音的連音現象非常嚴重使得相似度比值計算在不同聲音單元交界處發生模糊且震盪(vibration)的情形,造成語音資料和聲學模型之間的校準(alignment)失真、語音參數訓練不精準以及語音辨識結果不佳。雖然文獻已有語音斷點偵測方法,但常需要設定經驗臨界值做偵測判斷,且並未直接處理邊界處之震盪問題。

    在本論文中,我們提出以統計學中檢定一串數列是否具有隨機性的連串檢定(run test)作斷點偵測的依據,透過連串檢定可以有效檢驗出震盪狀態的隨機性並找出最佳的斷點,此連串檢定的技術也結合聲學模型間相似度比率檢定的理論發展出一套新潁語音辨識搜尋演算法,在實驗中我們以TDT2廣播新聞語料驗證此一方法在中文大詞彙連續語音辨識上之效能。

    Basically, the state-of-the-art automatic speech recognition (ASR) systems are based on techniques of dynamic programming and hidden Markov model. There are several crucial issues happening in building desirable ASR performance. Among them, how to reliably detect change points of continuous speech in presence of high co-articulation effect and distortion environments plays a critical role. In the literature, likelihood-ratio (LR) based confidence measure was developed to improve detection performance. This likelihood ratio (LR) criterion could be used to decide the acceptance or rejection for the alignment between speech frames and acoustic models/units. However, in case of spontaneous-style speech, the probabilistic scores in some intervals turn out to be vibrating and confusing. This causes unreliable alignment during search processing for large vocabulary continuous speech recognition (LVCSR). Previously, some methods were presented to detect change points in HMM state level. But, these works should specify empirical detection threshold and were not considered as a direct solution to overcome vibration problems in boundaries of speech units.

    In this thesis, we present the run test approach to test the randomness of the states of decision probabilistic scores in observation speech sequence. The non-parametric statistics is calculated and used to determine the optimal change point with the best randomness for the states before and after the change point. Through combining this principle and LR criterion, we can sequentially detect change points for building desirable LVCSR search algorithm. In the experiments, we implement and evaluate this approach using TDT2 Mandarin broadcast news corpus.

    摘要 4 ABSTRACT 5 誌謝 7 目錄 8 表目錄 11 圖目錄 12 符號表 14 第一章 緒論 15 1.1研究動機 15 1.2語音問題與特性 16 1.3論文方法與主要成果 18 1.4章節大綱 19 第二章 相關研究 20 2.1大詞彙連續語音辨識搜尋技術 20 2.1.1一段式搜尋演算法 24 2.2詞圖 35 2.3詞圖刪減與重新計分 43 2.4斷點偵測(CHANGE POINT DETECTION) 45 2.5模型選擇演算法 48 2.5.1貝氏資訊法則(BIC) 49 2.5.2最短描述長度(MDL) 50 第三章 具信賴性斷點偵測法 52 3.1震盪(VIBRATION)現象 52 3.2相似度比值檢定(LIKELIHOOD RATIO TEST) 54 3.3連串檢定(RUN TEST) 55 3.4導入大詞彙連續語音辨識 58 3.5結合演算法 63 3.6階層式斷點偵測 63 3.7與BIC,MDL之比較 64 第四章 實驗 65 4.1中文大詞彙連續語音辨識系統 65 4.2實驗文集說明 69 4.3實驗設定與參數萃取 70 4.4實驗結果 72 4.5系統展示 84 第五章 結論與未來展望 86 5.1結論 86 5.2未來展望 86 參考文獻 88 附錄 93 作者簡歷 95

    [1]X.L. Aubert, “An Overview of Decoding Techniques for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, pp. 89-114, 2002.
    [2]B. Brodsky and B.S. Darkhovsky, “Nonparametric Methods in Change-Point Problems,” Norwell, MA: Kluwer, 1993.
    [3]R.E. Bellman, “Dynamic Programming,” Princeton, NJ: Princeton University Press, 1957.
    [4]R.K. Bansal and P. Papantoni-Kazakos, “An Algorithm for Detecting A Change in A Stochastic Process,” IEEE Transactions on Information Theory, pp. 227-235, 1986.
    [5]S. Chen and P. Gopalakrishnan, “Speaker, Environment And Channel Change Detection And Clustering Via Bayesian Information Criterion,” in the Proc. of DARPA Broadcast News Transcription Understanding Workshop, pp. 127-132, 1998.
    [6]S.-S. Cheng and H.-M. Wang, “A Sequential Metric-Based Audio Segmentation Method via the Bayesian Information Criterion,” in Proc. of EUROSPEECH, pp. 945-948, 2003.
    [7]J-T. Chien, C.-H. Huang, K. Shinoda, and S. Furui, “Towards Optimal Bayes Decision for Speech Recognition,” in Proc. of IEEE ICASSP, pp. 45-48, 2006.
    [8]R.O. Duda, P.E. Hart, and D.G. Stork, “Pattern Classification 2nd Edition,” John Wiley & Sons, Inc, 2000.
    [9]J.D. Gibbons and S. Chakraborti, “Nonparametric Statistical Inference,” New York: Marcel Dekker, 1992.
    [10]P. Grunwald, I.J. Myung, and M. Pitt, “Advances in Minimum Description Length: Theory and Application,” MIT Press, 2005.
    [11]B. Gold and N. Morgan, “Speech And Audio Signal Processing – Processing And Perception of Speech And Music,” John Wiley & Sons, Inc. 2000.
    [12]Hui Jiang, “Confidence Measures for Speech Recognition: A Survey,” Speech Communication, pp. 455-470, 2005.
    [13]T. Kawahara, C.-H. Lee, and B.-H. Juang, “Flexible Speech Understanding Based Combined Key-Phrase Detection and Verification,” IEEE Transactions on Speech and Audio Processing, pp. 558-568, 1998.
    [14]M.-W. Koo, C.-H. Lee, and B.-H. Juang, “Speech recognition and utterance verification based on a generalized confidence score,” IEEE Transactions on Speech and Audio Processing, pp. 821 – 832, 2001.
    [15]C.-H. Lee, F.K. Soong, K.K. Paliwal, “Automatic Speech And Speaker Recognition,” Kluwer Academic Publishers, 1996
    [16]Qi Li, “A Detection Approach to Search-Space Reduction for HMM State Alignment in Speaker Verification,” IEEE Transactions on Speech and Audio Processing, pp. 569-578, 2001.
    [17]Qi Li, “A Fast Decoding Algorithm Based on Sequential Detection of the Changes in Distribution,” in Proc. of ICSLP, 1998.
    [18]Qi Li, “A Fast, Sequential Decoding Algorithm with Application to Speaker Verification,” in Proc. of IEEE ICASSP, 1999.
    [19]H. Ney, “Progress in Dynamic Programming Search for LVCSR,” Proceedings of the IEEE, pp. 1224-1240, 2000.
    [20]H. Ney, “The Use of A One-Stage Dynamic Programming Algorithm for Connected Word Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, pp. 263-271, 1984.
    [21]H. Ney, D. Mergel, A. Noll, and A. Paeseler, “Data-Driven Search Organization for Continuous Speech Recognition,” IEEE Transactions on Signal Processing, 1992.
    [22]H. Ney and S. Orthmanns, “Dynamic Programming Search for Continuous Speech Recognition,” IEEE Signal Processing Magazine, pp.64-83, 1999.
    [23]S. Ortmanns, H. Ney, and X. Aubert, “A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,” Computer Speech and Language, pp. 43-72, 1997.
    [24]M. K. Omar, U. Chaudhari, and G. Ramaswamy, ”Blind Change Detection for Audio Segmentation,” in Proc. of IEEE ICASSP, pp.501-504, 2005.
    [25]M. Oerder and H. Ney, “Word Graphs: An Efficient Interface Between Continuous Speech Recognition And Language Understanding,” in Proc. of IEEE ICASSP, pp. 119-122, 1993.
    [26]E.S. Page, “Continuous Inspection Schemes,” Biometrika, pp. 100-115, 1954.
    [27]E.S. Page, “A Test for a Change in a Parameter Occurring at An Unknown Point,” Biometrika, pp. 523-527, 1955.
    [28]R.C. Rose, B.-H. Juang, and C.-H. Lee, “A Training Procedure for Verifying String Hypothesis in Continuous Speech Recognition,” in Proc. of IEEE ICASSP, pp. 281-284, 1995.
    [29]R.A. Sukkar, “Rejection for connected digit recognition based GPD Segmental discrimination,” in Proc. of IEEE ICASSP, pp. 393-396, 1994.
    [30]R.A. Sukkar, and J.G. Wilpon, “A Two Pass Classification for Utterance Rejection in Keyword Spotting,” in Proc. of IEEE ICASSP, pp. 451-454, 1993.
    [31]G. Schwarz, “Estimating The Dimension of A Model,” Ann. Math. Statist, pp.461-464, 1978.
    [32]R. Schwartz and S. Austin, “A Comparison of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses,” in Proc. of IEEE ICASSP, pp. 701-704, 1991.
    [33]L. R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
    [34]V. Steinbiss, B.-H. Tran, and H. Ney, “Improvements in beam search,” in Proc. of ICSLP, pp. 1355-1358, 1994.
    [35]R. Schwartz and S. Austin, “A comparison of several approximate algorithms for finding multiple (N-Best) sentence hypotheses,” in Proc. of IEEE ICASSP, pp. 701-704, 1991.
    [36]Paul M. B. Vitanyi and Ming Li, "Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity", in IEEE Transactions on Information Theory, pp. 446-464, 2000.
    [37]A.J. Viterbi, “Error Bounds for Convolution Codes And An Asymptotically Optimal Decoding Algorithm,” in IEEE Transactions on Information Theory, pp. 260-269, 1967.
    [38]H. Vinvent Poor, “An Introduction to Signal Detection and Estimation,” Springer-Verlog, 1994.
    [39]A. Wald, “Sequential Analysis,” London, U.K.: Chapman & Hall, 1947.
    [40]A. Wald and J. Wolfowitz, “On A Test Whether Two Samples Are From The Same Population,” Ann. Math. Stat, pp. 147–162, 1947.
    [41]C.-H. Wu, and C.-H. Hsieh, “Multiple Change-Point Audio Segmentation and Classificaiton Using an MDL-Based Gaussian Model,” IEEE Transactions on Speech and Audio Processing, pp. 1-11, 2005.
    [42]F. Wessel, R. Schlüter and Hermann Ney, “Explicit word error minimization using word hypothesis posterior probabilities,” in Proc. of IEEE ICASSP, pp. 33-36, 2001.
    [43]S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland, “The HTK Book Version 2.0, ” ECRL, 1995
    [44]B. Zhou, and J. H. Hansen, “Efficient Audio Stream Segmentation via the Combined T2 Statistic and Bayesian Information Criterion,” IEEE Transactions on Speech and Audio Processing, pp. 467-474, 2005.
    [45]翁毓謙, ”鑑別性貝氏分類法則應用於大詞彙連續語音辨識,” 碩士論文, 國立成功大學資訊工程系, 2005.

    下載圖示 校內:立即公開
    校外:2006-09-06公開
    QR CODE