| 研究生: |
林俊郁 Lin, Chun-Yu |
|---|---|
| 論文名稱: |
應用事前模型與環境調適於隨機向量映射為基礎之噪音語音辨識 Using Prior Model and Environment Adaptation for Stochastic Vector Mapping-Based Noisy Speech Recognition |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 62 |
| 中文關鍵詞: | 噪音 、語音辨識 、環境 |
| 外文關鍵詞: | speech recognition, noisy environment, noisy speech, recognition, speech, noise |
| 相關次數: | 點閱:98 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前的語音辨識系統在有良好的訓練之下,可以達到相當高的辨識率(95%以上)。如果在不是很安靜的環境下使用,會使辨識率有一定程度的下降而造成使用上的困擾。因此,我們希望設計一個自動語音辨識系統,可以容忍在噪音環境下的語音辨識。
本論文一共有三個流程,分別為訓練流程、模型調適流程與測試流程。
在訓練流程,我們先建立三個聲學模型,分別為噪音語音、乾淨語音與噪音環境的語音。接著我們對訓練語料做噪音估計,其目的是用來為後面的環境模型與常態化噪音做準備。而我們使用了最大事後機率的概念來調整我們的訓練模型。
而在噪音特徵的處理,包括我們應用了哪些方式來分析噪音,與如何處理被噪音影響的乾淨語音。首先我們應用循序最大期望值(Sequential-EM)來做噪音的估算,與收集噪音常態化的資訊。接著我們利用最小分類錯誤(minimum classification error, MCE)應用隨機向量映射的觀念,對噪音語音做聲學模型補償與語音加強。最後使用最大事後機率(maximum a posteriori, MAP)做模型調適,利用測試語料的模型調適將有助於使我們的事前噪音模型更完整。
Recent speech recognition system has reached very high levels of recognition rate (more than 95%). But the recognition rate may degrade significantly in a noisy environment. And that would cause some inconvenient to use a speech recognition system. Therefore, we hope to design an ASR system that can be tolerant of noisy environment.
This thesis contains three phases. They are training phase, adaptation phase, and test phase.
In the training phase, first of all, to build three prior models—the noise model, the clean speech model, and the noisy speech model. Then estimate noise from training data for environment models and noise normalization of further steps.
For noisy speech features, this thesis contains how to analyze noise, and how to cope with noisy speech. For noise analysis, we apply Sequential-EM for noise estimation and collect noise normalization information. And then using environment compensated minimum classification error (MCE) to apply model compensation and speech enhancement. Finally, apply maximum a posteriori (MAP) and use testing data for model adaptation in order to make the prior model more completed.
[1] P. Moreno, Speech Recognition in Noisy Environments, Ph.D. thesis, Carnegie Mellon University, 1996.
[2] Acero, A., Acoustical and Environmental Robustness in Automatic Speech Recognition, Ph. D. thesis, Electrical and Computer Engineering 1990, Carnegie Mellon University, Pittsburgh.
[3] V. Krishnamurthy and J. B. Moore, “Online estimation of hidden markov model parameters based on the Kullback-Leibler information mature,” IEEE Trans. Signal Processing, vol. 41, pp. 2557-2573, 1993.
[4] Li Deng, Jasha Droppo and Alex Acero, “Recursive Estimation of Nonstationary Noise Using iterative Stochastic Approximation for Robust Speech Recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 11, No. 6, November 2003.
[5] Jian Wu and Qiang Huo, ”An Environment Compensated Minimum Classification Error Training Approach and Its Evaluation on Aurora2 Database.”
[6] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Environmental Robustness”, Spoken Language Processing, pp. 482-486, 2001.
[7] Yifan Gong, “Speech recognition in noisy environments: A survey”, Speech Communication 16, pp. 261-291, 1995.
[8] Jean-Luc Gauvain and Chin-Hui Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chins,” IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 2, April 1994.
[9] Seungjin Choi, “Sequential EM Learning for Subspace Analysis,” Pattern Recognition Letters, Vol. 25, Issue 14, pp. 1559-1567, October 2004
[10] A. Acero, L. Deng, T. Kristjansson, and J. Zhang, “HMM adaptation using vector Taylor series for noisy speech recognition,” Proc. ICSLP, Vol.3, 2000, pp. 869-872.
[11] P. Moreno, B. Raj, and R. Stern. “A vector Taylor series approach for environment-independent speech recognition,” Proc. ICASSP, Vol. 1, 1996, pp. 733-736.
[12] Li Deng, Jasha Droppo, and Alex Acero, “Recursive Noise Estimation Using Iterative Stochastic Approximation for Stereo-based Robust Speech Recognition,” Microsoft Research.
[13] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, “Distortion Measures”, Chapter 4: Pattern Recognition, Spoken Language Processing, pp. 164-169, 2001.
[14] M. DeGroot, Optimal Statistical Decisions. New York: McGraw-Hill, 1970
[15] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[16] C. R. Rao, Linear Statistical Inference and Its Applications. New York: Wiley, 1973, 2nd ed.
[17] B. O. Koopman, “On distributions admitting a sufficient statistic, “ Trans. Amer. Math. Soc., vol. 39, pp. 399-409, 1936.
[18] G. Darmois, “Sur les lois de probabilite a estimation exhaustive,” C. R. Acad. Sci., vol. 260, pp. 1265-1266, 1935.
[19] N. L. Johnson and S. Kotz, Distribution in Statistics. New York: Wiley, 1972.
[20] S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. On Acoust., Speech, Signal Proc., vol. ASSP-27, pp. 113-120, April 1979.
[21] B. Dautrich, L. Rabiner, T. Martin, “On the effects of varying filter-bank parameters on isolated word recognition,” IEEE Trans. On Acoust.,Speech, Signal Proc., vol. ASSP-31, pp. 793-806, 1992.
[22] M.J.F. Gales and S.J. Young, “Cepstral parameter compensation for HMM recognition in noise,” Speech Commun., vol. 12, no. 3, pp. 231-239, 1993
[23] M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEEE Trans. On Speech and Audio Proc., vol. 4, no. 5, pp. 352-359, September 1996.
[24] L. E. Baum, “An inequality and associated maximization technique in statistical estimation for probabilistics functions of Markov processes,” Inequalities, vol. 3, pp. 1-8, 1972.
[25] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist.Soc. Ser. B, vol. 39, pp. 1-38, 1977.
[26] R. A. Redner and H. F. Walker, “Mixture densities, maximum likelihood and EM Algorithm, “ SIAM Rev., vol. 26, no. 2, pp. 195-239, Apr. 1984.
[27] Y. V. Prohorov and Y. A. Rozanov, Probability Theory. New York: Springer-Verlag. 1969.
[28] B.-H Juang, W. Chou and C.-H Lee, “Minimum classification error rate methods for speech recognition”, IEEE Trans. On Speech and Audio Processing, Vol. 5, pp. 257-265, 1997.
[29] Hsin-Min Wang, “MATBN 2002: A Mandarin Chinese Broadcast News Corpus” ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR2003)
[30] C. Barras, E. Geoffrois, Z. B. Wu, M. Liberman, “Transcriber: Development and Use of S tool for Assisting Speech Corpora Production,” Speech Communication, 33, pp. 5-22, 2001
[31] L. Deng, A. Acero, M. Plumpe, and X.-D. Huang, “Largevocabulary speech recognition under adverse acoustic environments", Proc. of ICSLP-2000, China, October 2000.
[32] J. Droppo, L. Deng and A. Acero, “Evaluation of the SPLICE algorithm on the Aurora2 database", Proc. of Eurospeech-2001, Aalborg, Denmark, September 2001.