簡易檢索 / 詳目顯示

研究生: 王奕凱
Wang, Yii-Kai
論文名稱: 聚集事後機率線性迴歸調適法應用於語音辨識
Aggregate A Posteriori Linear Regression for Speech Recognition
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2004
畢業學年度: 92
語文別: 中文
論文頁數: 97
中文關鍵詞: 隱藏式馬可夫模型語音辨識鑑別式訓練語者調適
外文關鍵詞: Hidden Markov Model, Speech Recognition, discriminant training, speaker adaptation
相關次數: 點閱:162下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   在本論文中,我們提出一套由聚集事後機率為基礎之鑑別式線性轉換矩陣參數調適演算法。傳統上,鑑別式訓練在模型參數訓練上提供了較以最大相似度為參數估測準則更好的準確性,不僅考慮到針對個別語音類別之相似度,更進一步考慮到與所謂的競爭類別間的相似度比例,故可以透過估測出之參數提升各類別間之鑑別性。然而,缺點是訓練時間較長,一般皆只能以坡度下降法漸次估測所需參數。在使用線性轉換矩陣為主之模型調適演算法中,最早被提出來的是最大相似度線性迴歸(maximum likelihood linear regression, MLLR)調適演算法,由於使用線性轉換矩陣可以針對分配屬於同一轉換類別之語音模型參數進行調適,所以當收集所得之調適語料不足以調適到所有語音模型參數時,此調適演算法便較所謂的最大事後機率調適演算法為優。之後,針對轉換矩陣調適之強健性,而引入轉換矩陣事前機率分佈,便產生了最大事後機率線性迴歸(maximum a posteriori linear regression, MAPLR)調適演算法與具備線上即時調適機制之近似貝氏線性迴歸(quasi-Bayes linear regression, QBLR)調適演算法。在近幾年,由於鑑別式訓練的效果優越,於是便出現使用鑑別式訓練法則進行轉換矩陣調適,稱為最小分類錯誤率線性迴歸(minimum classification error linear regression, MCELR)調適演算法。我們認為使用最小分類錯誤率準則進行線性迴歸調適時,若能再進一步考慮線性迴歸矩陣之事前機率分佈,則可以結合貝氏法則之強健性與最小分類錯誤率之鑑別性,以估測出更佳之轉換矩陣用於語者調適上。透過聚集事後機率與鑑別式訓練間之關連及適當之條件簡化,則可得到參數更新之封閉解型式以加速鑑別式訓練的參數估測。另外,也比較此種調適演算法與最大事後機率線性迴歸調適演算法在理論上之不同點。在實驗中,我們使用TCC300語料進行語音模型參數之訓練與迴歸矩陣之事前機率分佈之參數估測,而在調適及測試時,均使用公共電視台所錄製之電視新聞語料,改變不同之調適語料之使用,進行轉換矩陣估測強健性之評估與其他轉換矩陣參數調適效能之比較。

      This study proposed an aggregate a posteriori probability-based discriminant linear regression adaptation algorithm. Discriminant training approach was better than maximum likelihood-based one on the model parameter estimation. Not only the similarity of observation to the objective model was considered, but it also took the likelihood ratio between the objective model and other competing ones into consideration. Therefore, it was observed that the classification error rate could be reduced effectively by the use of the discriminant model parameters. Its drawback is longer training time cost because the gradient descent algorithm was the only one algorithm used for its training. In the linear regression-based model adaptation algorithms, the first proposed one is maximum likelihood linear regression (MLLR) adaptation. Because the regression class tied similar acoustic units to share the same regression matrix, its adaptation performance would be better than that of maximum a posteriori (MAP) adaptation when the number of the adaptation utterances was not enough to adapt all parameters. For the robustness of regression matrix adaptation, the maximum a posteriori linear regression (MAPLR) and the quasi-Bayes linear regression (QBLR) which was capable of online adaptation were proposed in which the a priori density of regression matrix was included. Recently, the discriminant training was combined with linear regression adaptation to be the minimum classification error linear regression (MCELR) adaptation because of the superiority of discriminant training. Herein, the prior information of regression matrix was adopted when the discriminant training criterion was used to adapt the matrix. Better regression matrix could be estimated and used in the speaker adaptation through the combination of robustness from Bayes criterion and the discrimination from minimum classification error criterion. According to the relation between aggregate a posteriori probability and discriminant training, the closed-form solution of parameter estimation was obtained and it could accelerate the discriminant parameter estimation under proper simplification. The theoretical difference between the MAPLR and the method we proposed here could be established. In the experiments, TCC300 database was used to train the SI acoustic models and the prior distribution of regression matrix. In the parameter adaptation and testing, the TV broadcast news database collected by public television service foundation (PTS) was used to evaluate the adaptation performance. We evaluated the robustness of regression matrix adaptation using different number of adaptation utterances and compared the performance using different regression matrix adaptation algorithms, included MLLR, MAPLR, QBLR and MCELR.

    目錄 1 圖目錄 12 表目錄 13 符號定義表 14 第一章 緒論 16 1.1 前言 16 1.2 研究動機 20 1.3 章節概要 23 第二章 鑑別式訓練介紹 24 2.1 前言 24 2.2 最小分類錯誤訓練法則 24 2.2.1 字串形式(Sentence-level) MCE估測法則 29 2.3 最大交互資訊訓練法則 30 2.3.1 最小分類錯誤與最大交互資訊 34 2.4 一般化最小錯誤率(GMER) 37 2.5 最小分類錯誤與聚集事後機率準則之關係 40 第三章 線性迴歸語者調適(Linear Regression Speaker Adaptation)介紹 42 3.1 前言 42 3.2 最大相似度線性迴歸 (MLLR) 43 3.2 最大事後機率線性迴歸(MAPLR) 47 3.2.1 事前機率 47 3.2.2 最大事後機率參數估測 48 3.2.3 超參數(Hyperparameter)的估測 50 3.3 近似貝氏線性迴歸(QBLR) 51 3.4 最小分類錯誤線性迴歸(MCELR) 56 第四章 應用一般化最小錯誤率於模型調適 64 4.1 前言 64 4.2 一般化最小錯誤率線性迴歸(Generalized Minimum Error Rate Linear Regression, GMERLR) 64 4.3 最大事後機率線性迴歸與聚集事後機率之關連與比較 65 4.4 最小錯誤率線性迴歸之參數估測 67 第五章 實驗 71 5.1 實驗設定 71 5.1.1 實驗語料 71 5.1.2 實驗方式 72 5.1.3 實驗結果 74 5.2 展示系統介紹 84 第六章 結論與未來工作 87 參考文獻 89

    [1] S. Amari, “A theory of adaptive pattern classifiers” IEEE Trans. Elec. Comput., vol. EC-16, pp. 299-307, June 1967.
    [2] L. Bahl, P. Brown, P. de Souza and R. Mercer, “Maximum mutual information estimation of hidden Markov model parameters for speech recognition”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 11, April 1986, pp. 49-52.
    [3] P. Beyerlin, “Discriminative model combination”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 1998, pp. 481-485.
    [4] P. C. Chang and B.-H. Juang, “Discriminative training of dynamic programming based speech recognizers”, IEEE Trans. Speech and Audio Processing, vol. 1, no. 2, pp. 135-143, April 1993.
    [5] K. Chen, H. Wang, ”Eigenspace-based maximum a posteriori linear regression for rapid speaker adaptation,” in Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), vol. 1, 2001, pp. 317-320.
    [6] R. Chengalvarayan, ”Speaker adaptation using discriminative linear regression on time-varying mean parameters in trended HMM”, IEEE Trans. Signal Processing Letters, vol. 5, pp. 63-65, March 1998.
    [7] C. Chesta, O. Siohan, and C.-H. Lee. Maximum a posteriori linear regression for hidden Markov model adaptation. In Proceedings of European Conference on Speech Communication and Technology, volume 1, pages 211-214, Budapest, Hungary, 1999.
    [8] J.-T. Chien, “Online hierarchical transformation of hidden Markov models for speech recognition,” IEEE Trans. Speech Audio Processing, vol. 7,pp. 656-667, Nov. 1999.
    [9] J.-T. Chien, “Quasi-Bayes linear regression for sequential learning of hidden Markov models Speech and Audio Processing,” IEEE Trans. Speech and Audio Processing, vol. 10, pp. 268-278, July 2002.
    [10] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer”, in Proc. IEEE Int. Conf. Acoustics, Speech, Audio Processing (ICASSP), vol. 1, 1992, pp. 473-476.
    [11] W. Chou, "Maximum a posteriori linear regression with elliptically symmetric matrix variate priors", in Proc. EUROSPEECH, 1999, pp. 1-4.
    [12] M. J. F. Gales and P. C. Woodland, “Mean and Variance adaptation within the MLLR Framework,” Computer Speech and Language, Vol. 10, pp. 249-264, 1996.
    [13] J.-L. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains”, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 291-298, April 1994.
    [14] N. C. Giri, Multivariate Statistical Analysis. New York: Marcel Dekker, 1996.
    [15] P. S. Gopalakrishnan, D. Kanevsky, A. Nadas and D. Nahamoo, “An inequality for rational functions with applications to some statistical estimation problems”, IEEE Trans. Information Theory, vol. 37, no. 1, pp. 107-113, Jan. 1991.
    [16] A. Gunawardana and W. Byrne, “Discriminative speaker adaptation with conditional maximum likelihood linear regression”, in Proc. EUROSPEECH, 2001, pp.1203-1206.
    [17] X. He, W. Chou, ”Minimum classification error linear regression for acoustic model adaptation of continuous density HMMs”, in Proc. Int. Conf. Multimedia and Expo (ICME), vol. 1, 2003, pp. 6-9.
    [18] Q.Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 161-172, Mar.1997.
    [19] W. Jian, H. Qiang, “Supervised adaptation of MCE-trained CDHMMs using minimum classification error linear regression,” in Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing (ICASSP), vol. 1, 2002, pp. I-605 - I-608.
    [20] H. Jiang, F. Soong and C.-H. Lee, “A data selection strategy for utterance verification in continuous speech recognition”, in Proc. Eurospeech, 2001, pp. 2573-2576.
    [21] B.-H. Juang, W. Hou and C.-H. Lee, “Minimum classification error rate Methods for Speech Recognition”, IEEE Trans. Speech and Audio Processing, vol. 5, no. 3 , pp. 257-265, May 1997.
    [22] B.-H. Juang and S. Katagirl, “Discriminative learning for minimum error classification,” IEEE Trans. Signal Processing, vol. 40, pp. 3043-3054, December 1992.
    [23] H.-K.J. Kuo, E. Fosle-Lussier, H. Jiang and C.-H. Lee, “Discriminative training of language models for speech recognition”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2002, pp. I-325-I-328.
    [24] C.-H. Lee, C.-H. Lin and B.-H. Juang, “A study on speaker adaptation of the parameters of continuous density hidden Markov models,” IEEE Trans. Signal Processing, vol. 39, pp. 806–814, April 1991.
    [25] C.-H. Lee and B.-H. Juang, “A survey on automatic speech recognition with an illustrative example on continuous speech recognition of Mandarin”, Computational Linguistics and Chinese Language Processing, vol. 1, no.1, pp. 01-36, August 1996.
    [26] C. J. Leggeter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, 1995, pp. 171-185.
    [27] Q. Li, “Discovering relations among discriminative training objectives”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2004, pp. 33-36.
    [28] Q. Li, B.-H. Juang, “A new algorithm for fast discriminative training”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 2002, pp. 97-100.
    [29] Q. Li, B.-H. Juang, “Fast discriminative training for sequential observations with application to speaker identification”, in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, 2003, pp. 397-400.
    [30] R. P. Lippmann, “An intorduction to computing with neural nets”, IEEE ASSP Mag., pp. 4-22, April 1987.
    [31] E. McDermott and S. Katagiri, “Shift-invariant multi-category phoneme recognition using kohonen’s LVQ2,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 1989, pp. 81-84.
    [32] H. Ney and S. Ortmanns, “Progress in dynamic programming search for LVCSR”, Proceedings of the IEEE, vol. 88, no. 8, pp. 1224-1240, August 2000.
    [33] L. R. Rabiner and B.-H. Juang, Foundmentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
    [34] H. Robbins, “The empirical Bayes approach to statistical decision problems,” Ann. Math. Statist., vol 35, pp. 1-20, 1964.
    [35] P. S. Rao, M. D. Monkowski, S. Roukos, “Language model adaptation via minimum discrimination information”, in Proc. IEEE Int. Conf. Acoustics, Speech, Audio Processing (ICASSP), vol. 1, 1995, pp. 161-164.
    [36] A. Sankar and C.-H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition”, IEEE Trans. Speech and Audio Processing, vol. 4, no. 3, pp. 190-202, May 1996.
    [37] R. Schluter, W. Macherey, “Comparison of discriminative training criteria”, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, 1998, pp. 493-496.
    [38] R. Schlüter, W. Macherey, B. Müller and H. Ney, “A combined maximum mutual information and maximum likelihood approach for mixture density splitting”, in Proc. EUROSPEECH, vol. 4, 1999, pp. 1715-1718.
    [39] S. R. Searle, Matrix Algebra Useful for Statistics. New York: John Wiley & Sons, 1982.
    [40] O. Siohan, C. Chesta, and C.-H. Lee. “Hidden Markov model adaptation using maximum a posteriori linear regression.” In Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, 1999.
    [41] S. Young, J. Jansen, J. Odell, D. Ollason, and P Woodland. The HTK Book (Version 2.0). ECRL, 1995.
    [42] 林秉正, “使用適應性區間模型於語者說話速度之調整”, 國立成功大學資訊工程學系碩士論文, July 2002.
    [43] 中研院資訊所, http://rocling.iis.sinica.edu.tw/default.htm。

    下載圖示 校內:立即公開
    校外:2004-08-10公開
    QR CODE