簡易檢索 / 詳目顯示

研究生: 林俊郁
Lin, Chun-Yu
論文名稱: 應用事前模型與環境調適於隨機向量映射為基礎之噪音語音辨識
Using Prior Model and Environment Adaptation for Stochastic Vector Mapping-Based Noisy Speech Recognition
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 62
中文關鍵詞: 噪音語音辨識環境
外文關鍵詞: speech recognition, noisy environment, noisy speech, recognition, speech, noise
相關次數: 點閱:98下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   目前的語音辨識系統在有良好的訓練之下,可以達到相當高的辨識率(95%以上)。如果在不是很安靜的環境下使用,會使辨識率有一定程度的下降而造成使用上的困擾。因此,我們希望設計一個自動語音辨識系統,可以容忍在噪音環境下的語音辨識。
    本論文一共有三個流程,分別為訓練流程、模型調適流程與測試流程。
      在訓練流程,我們先建立三個聲學模型,分別為噪音語音、乾淨語音與噪音環境的語音。接著我們對訓練語料做噪音估計,其目的是用來為後面的環境模型與常態化噪音做準備。而我們使用了最大事後機率的概念來調整我們的訓練模型。
      而在噪音特徵的處理,包括我們應用了哪些方式來分析噪音,與如何處理被噪音影響的乾淨語音。首先我們應用循序最大期望值(Sequential-EM)來做噪音的估算,與收集噪音常態化的資訊。接著我們利用最小分類錯誤(minimum classification error, MCE)應用隨機向量映射的觀念,對噪音語音做聲學模型補償與語音加強。最後使用最大事後機率(maximum a posteriori, MAP)做模型調適,利用測試語料的模型調適將有助於使我們的事前噪音模型更完整。

     Recent speech recognition system has reached very high levels of recognition rate (more than 95%). But the recognition rate may degrade significantly in a noisy environment. And that would cause some inconvenient to use a speech recognition system. Therefore, we hope to design an ASR system that can be tolerant of noisy environment.
    This thesis contains three phases. They are training phase, adaptation phase, and test phase.
     In the training phase, first of all, to build three prior models—the noise model, the clean speech model, and the noisy speech model. Then estimate noise from training data for environment models and noise normalization of further steps.
     For noisy speech features, this thesis contains how to analyze noise, and how to cope with noisy speech. For noise analysis, we apply Sequential-EM for noise estimation and collect noise normalization information. And then using environment compensated minimum classification error (MCE) to apply model compensation and speech enhancement. Finally, apply maximum a posteriori (MAP) and use testing data for model adaptation in order to make the prior model more completed.

    摘要…………………………………………………………………………………I ABSTRACT…………………………………………………………………………II 誌謝…………………………………………………………………………………III 目錄…………………………………………………………………………………IV 圖目錄………………………………………………………………………………VI 表目錄……………………………………………………………………………VII CHAPTER 1 序論………………………………………………………………1 1.1 研究動機與目的………………………………………………………………1 1.2 研究方法簡介…………………………………………………………………1 1.3 章節概要………………………………………………………………………3 CHAPTER 2 噪音環境的語音辨識與先前研究………………………………5 2.1 噪音與語音辨識………………………………………………………………5 2.2 噪音阻抗特徵與相似性………………………………………………………8 2.2.1 聲學表現與相似……………………………………………………………8 2.2.2 線性識別分析(LINEAR DISCRIMINANT ANALYSIS, LDA)………8 2.3 語音增強………………………………………………………………………9 2.4 噪音模型補償…………………………………………………………………10 2.5 分析與討論……………………………………………………………………11 CHAPTER 3 隨機向量映射(STOCHASTIC VECTOR MAPPING)      與事前機率模型(PRIOR ODEL) …………………………………………………14 3.1 噪音語音的數學基本觀念……………………………………………………15 3.2 隨機向量映射(STOCHASTIC VECTOR MAPPING)……………………17 3.3 事前機率模型(PRIOR MODEL) …………………………………………20 CHAPTER 4 噪音特徵的處理…………………………………………………23 4.1 應用循序最大期望之噪音估算與噪音常態化………………………………23 4.2 應用於環境補償模型最小分類錯誤…………………………………………28 4.3 應用最大事後機率於環境模型調適…………………………………………31 4.3.1 事前密度(PRIOR DENSITY)的選擇……………………………………32 4.3.2 GAUSSIAN MIXUTRE的MAP估計……………………………………34 CHAPTER 5 實驗………………………………………………………………38 5.1 實驗語料………………………………………………………………………39 5.1.1 AURORA 2…………………………………………………………………40 5.1.2 MATBN……………………………………………………………………41 5.2 實驗流程………………………………………………………………………44 5.3 實驗結果………………………………………………………………………48 5.3.1 AURORA 2 實驗結果………………………………………………………48 5.3.2 MATBN實驗結果…………………………………………………………54 CHAPTER 6 結論與展望………………………………………………………57 參考文獻……………………………………………………………………………59

    [1] P. Moreno, Speech Recognition in Noisy Environments, Ph.D. thesis, Carnegie Mellon University, 1996.
    [2] Acero, A., Acoustical and Environmental Robustness in Automatic Speech Recognition, Ph. D. thesis, Electrical and Computer Engineering 1990, Carnegie Mellon University, Pittsburgh.
    [3] V. Krishnamurthy and J. B. Moore, “Online estimation of hidden markov model parameters based on the Kullback-Leibler information mature,” IEEE Trans. Signal Processing, vol. 41, pp. 2557-2573, 1993.
    [4] Li Deng, Jasha Droppo and Alex Acero, “Recursive Estimation of Nonstationary Noise Using iterative Stochastic Approximation for Robust Speech Recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 11, No. 6, November 2003.
    [5] Jian Wu and Qiang Huo, ”An Environment Compensated Minimum Classification Error Training Approach and Its Evaluation on Aurora2 Database.”
    [6] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Environmental Robustness”, Spoken Language Processing, pp. 482-486, 2001.
    [7] Yifan Gong, “Speech recognition in noisy environments: A survey”, Speech Communication 16, pp. 261-291, 1995.
    [8] Jean-Luc Gauvain and Chin-Hui Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chins,” IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 2, April 1994.
    [9] Seungjin Choi, “Sequential EM Learning for Subspace Analysis,” Pattern Recognition Letters, Vol. 25, Issue 14, pp. 1559-1567, October 2004
    [10] A. Acero, L. Deng, T. Kristjansson, and J. Zhang, “HMM adaptation using vector Taylor series for noisy speech recognition,” Proc. ICSLP, Vol.3, 2000, pp. 869-872.
    [11] P. Moreno, B. Raj, and R. Stern. “A vector Taylor series approach for environment-independent speech recognition,” Proc. ICASSP, Vol. 1, 1996, pp. 733-736.
    [12] Li Deng, Jasha Droppo, and Alex Acero, “Recursive Noise Estimation Using Iterative Stochastic Approximation for Stereo-based Robust Speech Recognition,” Microsoft Research.
    [13] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Distortion Measures”, Chapter 4: Pattern Recognition, Spoken Language Processing, pp. 164-169, 2001.
    [14] M. DeGroot, Optimal Statistical Decisions. New York: McGraw-Hill, 1970
    [15] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.
    [16] C. R. Rao, Linear Statistical Inference and Its Applications. New York: Wiley, 1973, 2nd ed.
    [17] B. O. Koopman, “On distributions admitting a sufficient statistic, “ Trans. Amer. Math. Soc., vol. 39, pp. 399-409, 1936.
    [18] G. Darmois, “Sur les lois de probabilite a estimation exhaustive,” C. R. Acad. Sci., vol. 260, pp. 1265-1266, 1935.
    [19] N. L. Johnson and S. Kotz, Distribution in Statistics. New York: Wiley, 1972.
    [20] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. On Acoust., Speech, Signal Proc., vol. ASSP-27, pp. 113-120, April 1979.
    [21] B. Dautrich, L. Rabiner, T. Martin, “On the effects of varying filter-bank parameters on isolated word recognition,” IEEE Trans. On Acoust.,Speech, Signal Proc., vol. ASSP-31, pp. 793-806, 1992.
    [22] M.J.F. Gales and S.J. Young, “Cepstral parameter compensation for HMM recognition in noise,” Speech Commun., vol. 12, no. 3, pp. 231-239, 1993
    [23] M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEEE Trans. On Speech and Audio Proc., vol. 4, no. 5, pp. 352-359, September 1996.
    [24] L. E. Baum, “An inequality and associated maximization technique in statistical estimation for probabilistics functions of Markov processes,” Inequalities, vol. 3, pp. 1-8, 1972.
    [25] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist.Soc. Ser. B, vol. 39, pp. 1-38, 1977.
    [26] R. A. Redner and H. F. Walker, “Mixture densities, maximum likelihood and EM Algorithm, “ SIAM Rev., vol. 26, no. 2, pp. 195-239, Apr. 1984.
    [27] Y. V. Prohorov and Y. A. Rozanov, Probability Theory. New York: Springer-Verlag. 1969.
    [28] B.-H Juang, W. Chou and C.-H Lee, “Minimum classification error rate methods for speech recognition”, IEEE Trans. On Speech and Audio Processing, Vol. 5, pp. 257-265, 1997.
    [29] Hsin-Min Wang, “MATBN 2002: A Mandarin Chinese Broadcast News Corpus” ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR2003)
    [30] C. Barras, E. Geoffrois, Z. B. Wu, M. Liberman, “Transcriber: Development and Use of S tool for Assisting Speech Corpora Production,” Speech Communication, 33, pp. 5-22, 2001
    [31] L. Deng, A. Acero, M. Plumpe, and X.-D. Huang, “Largevocabulary speech recognition under adverse acoustic environments", Proc. of ICSLP-2000, China, October 2000.
    [32] J. Droppo, L. Deng and A. Acero, “Evaluation of the SPLICE algorithm on the Aurora2 database", Proc. of Eurospeech-2001, Aalborg, Denmark, September 2001.

    下載圖示 校內:立即公開
    校外:2006-01-19公開
    QR CODE