簡易檢索 / 詳目顯示

研究生: 蔡明翰
Tsai, Ming-Han
論文名稱: 應用訊號偏倚移除法於雜訊環境下語音辨識
Using Signal Bias Removal Method for Speech Recognition under Noisy Environment
指導教授: 吳植森
Wu, Chih-Sen
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理研究所
Institute of Information Management
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 63
中文關鍵詞: 語音辨識隱藏式馬可夫模型訊號偏倚移除雜訊環境
外文關鍵詞: additive noise, Speech Recognition, Signal Bias Removal, Hidden Markov Model
相關次數: 點閱:143下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,語音辨識的技術已經應用於許多方面,以管理層面而言,透過語音可以加強人與電腦溝通的管道,因此語音辨識的人機介面操作方式也成為目前最重要的研究課題之一。語音辨識系統通常是將在安靜的環境中訓練出來的參數應用於實際的環境中,如果實際環境也是安靜的,則辨識率將達到良好的辨識率,然而在雜訊環境下時,由於雜訊語音與模型參數間的不匹配,會導致辨識率明顯的下降。本研究主要目的在於辨識雜訊語音時,如何應用訊號偏倚移除法(Signal Bias Removal),來估測雜訊帶來的偏倚,最後移除語音參數的偏倚,以降低測試語音參數與原先語音模型不匹配的情況,從而提高辨識的正確率。

    本研究運用到機器學習的理論與統計方法,首先將乾淨的語音資料經過數位訊號的處理,例如:聲波數位化、取音框、端點偵測、預強調、加窗、梅爾倒頻譜等前置處理,之後獲得的特徵參數,透過隱藏式馬可夫模型訓練出乾淨語音的模型,在測試語音中,首先針對不同的噪音語音,進行隱藏式馬可夫模型的參數調整,以降低不匹配的現象,而達到較佳的辨識結果。

    進行訊號偏倚移除的過程中,選擇了幾種不同的雜訊進行實驗:白色雜訊、人聲雜訊、汽車內的雜訊、高通通道雜訊及工廠雜訊,並分別在不同的訊雜比(Signal Noise Ratio)下進行實驗。整體得到的效果確實提升了辨識的正確率,而由於進行訊號偏倚移除不用重新訓練語音模型,在執行上相對於需要重新訓練或調整模型的方法也有不錯的效率。

    In recent years, technology of speech recognition has been used in many situations, especially in management communication. Speech can improve the communication effectiveness among people and computers. Hence, interface of speech recognition is one of the most important research areas. Speech recognition systems usually use parameters obtained from training noise-free samples. Therefore, in noise environment various additive noises will deteriorate speech recognition results. To solve this problem, extra processes are requited to improve accuracy of recognition. In this study we use Signal Bias Removal method to estimate the bias of noise and then remove the bias of the speech parameters to minimize undesirable effects.

    In this study, we also use machine learning and statistic methods for speech recognition. Feature analysis with the clean speech data is proceeded through a series processes, such as: digitization、frameing、endpoint detection、pre-emphasis、hamming window、mfcc. After those steps, we identify speech parameters from training Hidden Markov Models. During recognition phase, an estimate of the bias was computed for each test utterance and subtracted from it. These approaches minimize mismatches and reach better result of recognition accuracy.

    In the process of Signal Bias Removal, We choose several different noises(white noise、babble noise、car noise、high frequency noise and factory noise) and different SNR for the experiment. The results of experiments show improvement of speech accuracy. One advantage of the SBR method over other methods is that it can be employed during the testing phase alone.

    目 錄 i 表目錄 iii 圖目錄 iv 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的 2 第三節 研究範圍與限制 2 第四節 研究方法與架構 3 第二章 文獻探討 4 第一節 語音辨識的基本處理 4 第二節 語音訊號的前置處理與特徵擷取 5 第三節 不同辨識方法的基本模型架構 11 第四節 隱藏式馬可夫模型的建立 13 第五節 噪音環境下達到強健性的各種方法 24 第六節 小節 29 第三章 研究方法 30 第一節 研究架構及步驟 30 第二節 準備語音資料 33 第三節 建構語音辨識系統 34 第四節 實驗結果與討論 42 第四章 實證分析 43 第一節 資料來源 43 第二節 實驗環境與系統架構 44 第三節 訊號偏倚移除的辨識效能之驗證及比較 46 第五章 結論與建議 59 第一節 研究結論 59 第二節 未來研究 59 參考文獻 61 一、中文部份 61 二、英文部分 61

    一、中文部份
    林輝彥 (2001). "應用具聽覺效應之模型於噪音環境中語音辨識." 國立成功大學資訊工程研

    陳松琳 (2002). "以類神經網路為架構之語音辨識系統." 國立中山大學電機工程研究所碩士論文.

    蘇木春 and 張孝德 (1997). 機器學習:類神經網路、模糊系統以及基因演算法則. 全華科技圖書.

    二、英文部分
    Afify, M. and Siohan, O. (2004). "Sequential Estimation With Optimal Forgetting for Robust Speech Recognition." IEEE Transactions on Speech and Audio Processing 12(1): 19-26.

    Deng, L., Droppo, J. and Acero, A. (2002). Log-domain speech feature enhancement using sequential MAP noise estimation and a phasesensitive model of the acoustic environment. ICSLP.

    Furui, S. (1981). "Cepstral Analysis Technique for Automatic Speaker Verification." IEEE Trans. Acoust. Speech Signal Process.

    Gauiain, J. L. and Lee, C. H. (1994). "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains." IEEE Transactions on Speech and Audio Processing.

    Gong, Y. (1995). "Speech recognition in noisy environments: A survey." Speech Communication 16: 261-291.

    Hermansky, H. and Morgan, N. (1994). "RASTA processing of speech." IEEE Transactions on Speech and Audio Processing 2: 578-589.

    Hung, J. W. and Lee, L. S. (2005). "Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition." IEEE Transactions on Speech and Audio Processing: Accepted for future publication PP(99): 1-25.

    Hung, J. W., Shen, J. L. and Lee, L. S. (2001). "New Approaches for Domain Transformation and Parameter Combination for Improved Accuracy in Parallel Model Combination (PMC) Techniques." IEEE Transactions on Speech and Audio Processing.

    Juang, B. H. and Rahim, M. G. (1996). "Signal bias removal by maximum likelihood estimation for robust telephone speech recognition." 4(1): 19.

    Junqua, J. C., Reaves, B. and Mak, B. (1991). "A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize." Proc. Eurospeech: 1371-1374.

    Lee, C. H. (1998). "On stochastic feature and model compensation approaches to robust speech recognition." Speech Communication 25: 29-47.

    Leggetter, C. J. and Woodland, P. C. (1995). "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models." Computer Speech and Language.

    Li, Q., Zheng, J., Tsai, A. and Zhou, Q. (2002). "Robust endpoint detection and energy normalization for real-time speech and speaker recognition." IEEE Transactions on Speech and Audio Processing 10(3): 146 - 157.

    Linde, Y., Buzo, A. and Gray, R. M. (1980). "An algorithm for vector quantizer design." IEEE Transactions on Communications 28: 84-95.

    Mak, B. K., Tam, Y. C. and Li, P. Q. (2004). "Discriminative auditory-based features for robust speech recognition." IEEE Transactions on Speech and Audio Processing 12(1): 27-36.

    Morgan, D. P. and Scofield, C. L. (1991). Neural Networks and Speech Processing. Kluwer Academic.

    Oppenheim, A. V. and Schafer, W. (1999). Discrete-Time Signal Processing. Prentice Hall.

    Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE 77(2): 257-286.

    Rabiner, L. R. and Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice Hall.

    Rao, K. R. and Yip, P. (1990). Discrete cosine transform: algorithms, advantages, applications. Academic Press Professional.

    Rathinavalu, C. and Deng, L. (1997). "HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features." IEEE Transactions on Speech and Audio Processing: 243-256.

    Sakoe, H. and Chiba, S. (1978). "Dynamic Programming Optimization for Spoken Word Recognition." IEEE Trans on ASSP 26: 43-49.

    Skowronski, M. D. and Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE Intl Symposium on Circuits and Systems, Thailand.

    Wilpon, J. G., Rabiner, L. R. and Martin, T. (1984). "An Improved Word-Detection Algorithm for Telephone-Quality Speech Incorporating Both Syntactic and Semantic Constraints." AT&T Bell Labs. Tech. J 63: 479-498.

    下載圖示 校內:2011-07-20公開
    校外:2011-07-20公開
    QR CODE