| 研究生: |
蔡明翰 Tsai, Ming-Han |
|---|---|
| 論文名稱: |
應用訊號偏倚移除法於雜訊環境下語音辨識 Using Signal Bias Removal Method for Speech Recognition under Noisy Environment |
| 指導教授: |
吳植森
Wu, Chih-Sen |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 63 |
| 中文關鍵詞: | 語音辨識 、隱藏式馬可夫模型 、訊號偏倚移除 、雜訊環境 |
| 外文關鍵詞: | additive noise, Speech Recognition, Signal Bias Removal, Hidden Markov Model |
| 相關次數: | 點閱:143 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,語音辨識的技術已經應用於許多方面,以管理層面而言,透過語音可以加強人與電腦溝通的管道,因此語音辨識的人機介面操作方式也成為目前最重要的研究課題之一。語音辨識系統通常是將在安靜的環境中訓練出來的參數應用於實際的環境中,如果實際環境也是安靜的,則辨識率將達到良好的辨識率,然而在雜訊環境下時,由於雜訊語音與模型參數間的不匹配,會導致辨識率明顯的下降。本研究主要目的在於辨識雜訊語音時,如何應用訊號偏倚移除法(Signal Bias Removal),來估測雜訊帶來的偏倚,最後移除語音參數的偏倚,以降低測試語音參數與原先語音模型不匹配的情況,從而提高辨識的正確率。
本研究運用到機器學習的理論與統計方法,首先將乾淨的語音資料經過數位訊號的處理,例如:聲波數位化、取音框、端點偵測、預強調、加窗、梅爾倒頻譜等前置處理,之後獲得的特徵參數,透過隱藏式馬可夫模型訓練出乾淨語音的模型,在測試語音中,首先針對不同的噪音語音,進行隱藏式馬可夫模型的參數調整,以降低不匹配的現象,而達到較佳的辨識結果。
進行訊號偏倚移除的過程中,選擇了幾種不同的雜訊進行實驗:白色雜訊、人聲雜訊、汽車內的雜訊、高通通道雜訊及工廠雜訊,並分別在不同的訊雜比(Signal Noise Ratio)下進行實驗。整體得到的效果確實提升了辨識的正確率,而由於進行訊號偏倚移除不用重新訓練語音模型,在執行上相對於需要重新訓練或調整模型的方法也有不錯的效率。
In recent years, technology of speech recognition has been used in many situations, especially in management communication. Speech can improve the communication effectiveness among people and computers. Hence, interface of speech recognition is one of the most important research areas. Speech recognition systems usually use parameters obtained from training noise-free samples. Therefore, in noise environment various additive noises will deteriorate speech recognition results. To solve this problem, extra processes are requited to improve accuracy of recognition. In this study we use Signal Bias Removal method to estimate the bias of noise and then remove the bias of the speech parameters to minimize undesirable effects.
In this study, we also use machine learning and statistic methods for speech recognition. Feature analysis with the clean speech data is proceeded through a series processes, such as: digitization、frameing、endpoint detection、pre-emphasis、hamming window、mfcc. After those steps, we identify speech parameters from training Hidden Markov Models. During recognition phase, an estimate of the bias was computed for each test utterance and subtracted from it. These approaches minimize mismatches and reach better result of recognition accuracy.
In the process of Signal Bias Removal, We choose several different noises(white noise、babble noise、car noise、high frequency noise and factory noise) and different SNR for the experiment. The results of experiments show improvement of speech accuracy. One advantage of the SBR method over other methods is that it can be employed during the testing phase alone.
一、中文部份
林輝彥 (2001). "應用具聽覺效應之模型於噪音環境中語音辨識." 國立成功大學資訊工程研
陳松琳 (2002). "以類神經網路為架構之語音辨識系統." 國立中山大學電機工程研究所碩士論文.
蘇木春 and 張孝德 (1997). 機器學習:類神經網路、模糊系統以及基因演算法則. 全華科技圖書.
二、英文部分
Afify, M. and Siohan, O. (2004). "Sequential Estimation With Optimal Forgetting for Robust Speech Recognition." IEEE Transactions on Speech and Audio Processing 12(1): 19-26.
Deng, L., Droppo, J. and Acero, A. (2002). Log-domain speech feature enhancement using sequential MAP noise estimation and a phasesensitive model of the acoustic environment. ICSLP.
Furui, S. (1981). "Cepstral Analysis Technique for Automatic Speaker Verification." IEEE Trans. Acoust. Speech Signal Process.
Gauiain, J. L. and Lee, C. H. (1994). "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains." IEEE Transactions on Speech and Audio Processing.
Gong, Y. (1995). "Speech recognition in noisy environments: A survey." Speech Communication 16: 261-291.
Hermansky, H. and Morgan, N. (1994). "RASTA processing of speech." IEEE Transactions on Speech and Audio Processing 2: 578-589.
Hung, J. W. and Lee, L. S. (2005). "Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition." IEEE Transactions on Speech and Audio Processing: Accepted for future publication PP(99): 1-25.
Hung, J. W., Shen, J. L. and Lee, L. S. (2001). "New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques." IEEE Transactions on Speech and Audio Processing.
Juang, B. H. and Rahim, M. G. (1996). "Signal bias removal by maximum likelihood estimation for robust telephone speech recognition." 4(1): 19.
Junqua, J. C., Reaves, B. and Mak, B. (1991). "A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize." Proc. Eurospeech: 1371-1374.
Lee, C. H. (1998). "On stochastic feature and model compensation approaches to robust speech recognition." Speech Communication 25: 29-47.
Leggetter, C. J. and Woodland, P. C. (1995). "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models." Computer Speech and Language.
Li, Q., Zheng, J., Tsai, A. and Zhou, Q. (2002). "Robust endpoint detection and energy normalization for real-time speech and speaker recognition." IEEE Transactions on Speech and Audio Processing 10(3): 146 - 157.
Linde, Y., Buzo, A. and Gray, R. M. (1980). "An algorithm for vector quantizer design." IEEE Transactions on Communications 28: 84-95.
Mak, B. K., Tam, Y. C. and Li, P. Q. (2004). "Discriminative auditory-based features for robust speech recognition." IEEE Transactions on Speech and Audio Processing 12(1): 27-36.
Morgan, D. P. and Scofield, C. L. (1991). Neural Networks and Speech Processing. Kluwer Academic.
Oppenheim, A. V. and Schafer, W. (1999). Discrete-Time Signal Processing. Prentice Hall.
Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77(2): 257-286.
Rabiner, L. R. and Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice Hall.
Rao, K. R. and Yip, P. (1990). Discrete cosine transform: algorithms, advantages, applications. Academic Press Professional.
Rathinavalu, C. and Deng, L. (1997). "HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features." IEEE Transactions on Speech and Audio Processing: 243-256.
Sakoe, H. and Chiba, S. (1978). "Dynamic Programming Optimization for Spoken Word Recognition." IEEE Trans on ASSP 26: 43-49.
Skowronski, M. D. and Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE Intl Symposium on Circuits and Systems, Thailand.
Wilpon, J. G., Rabiner, L. R. and Martin, T. (1984). "An Improved Word-Detection Algorithm for Telephone-Quality Speech Incorporating Both Syntactic and Semantic Constraints." AT&T Bell Labs. Tech. J 63: 479-498.