簡易檢索 / 詳目顯示

研究生: 黃尹鐶
Huang, Yin-Huan
論文名稱: 應用於遠場麥克風陣列之改良型廣義旁瓣消除器語音增強演算法
Speech Enhancement Algorithm Based on Modified Generalized Sidelobe Canceller For Far-field Microphone Array Application
指導教授: 邱瀝毅
Chiou, Lih-Yih
共同指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 103
中文關鍵詞: 麥克風陣列語音增強波束形成適應性濾波器噪音抑制語音洩漏
外文關鍵詞: Microphone Array, Speech Enhancement, Adaptive Filter, Speech Leakage, Noise Reduction
相關次數: 點閱:74下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 環境噪音對於人機互動系統或是通訊系統是主要干擾語音辨識的因素之一,本論文提出應用於遠場麥克風陣列之改良型廣義旁瓣消除器語音增強演算法,透過多支麥克風在空間上的擺放特性,計算出語音到達各麥克風的時間差,對各麥克風做時間補償,使語音訊號在各通道上達到時序對齊(Time Aligned),接著將各通道取平均得到延遲相加波束形成(Delay and Sum Beamforming)的輸出,做為初步增強的語音訊號。另外,利用時間補償後各通道的訊號,藉由阻塞矩陣(Blocking Matrix)對噪音進行初步的估測,接著利用適應性濾波器抑制掉阻塞矩陣所洩漏的語音,獲得更精確的估測噪音,最後將此估測噪音通過另一個適應性濾波器,對延遲相加波束形成輸出所殘留的噪音進行進一步的抑制,得到最終增強的語音訊號,此外,本論文改良了適應性濾波器的迭代公式,使收斂後的濾波器不會隨語音出現而偏離最佳係數。本論文透過實際的錄音進行實驗模擬,經過本論文的語音增強演算法處理後,在客觀指標訊號雜訊比(Signal-to-Noise Ratio, SNR)、相干性與語音辨識度(Coherence and Speech Intelligibility Index, CSII),以及語音品質之感知評價(Perceptual Evaluation of Speech Quality, PESQ)的效能分析上,能有相當幅度的提升,代表本演算法架構所輸出的語音訊號,具有相當的辨識度和語音品質,使人機互動裝置較能有效的辨識出使用者所下達的指令。

    By placing a number of microphones in the space, the array can collect the spatial samples of the propagation wave. Using the spatial information of the microphone array signals, we expect to extract the desired path speech signal from the noisy signal by applying spatial filter. Conventional generalized sidelobe canceller(CGSC) is composed of delay and sum beamforming, blocking matrix, and multi-channel adaptive. It can get good performance with a few microphones. However, this algorithm have some problems thus as speech leakage. Another problem is that after the adaptive filter of conventional generalized sidelobe canceller is convergent, the adaptive filter coefficients will deviate the optimal coefficients as speech present, cause excess mean square error. In this thesis, a modified gereralized sidelobe canceller speech enhancement algorithm is proposed, it modifies the NLMS updating equation, and let the excess mean square error problem can be suppressed effectively, leading the performance is better than conventional updating equation after convergence.
    We record noisy signals of microphone array with four different kinds of noise and compare the results with double affine projection generalized sidelobe canceller(DAP-GSC) and CGSC. The signal-to-noise ratio(SNR) performance is better than DAP-GSC and CGSC, that is, the enhanced speech of proposed algorithm contains less nois. The perceptual evaluation of speech quality(PESQ) and coherence and speech intelligibility index(CSII) performances are important indexes. Our performances are better than other algorithms. It means our enhanced speech have better quality, and person or machine can clearly recognize it. Hence, it is worth applying to communcation systems or human-machine interaction devices.

    中文摘要 I EXTENDED ABSTRACT II 誌謝 XI 目錄 XII 表目錄 XV 圖目錄 XVII 第一章 緒論 1 1.1. 人耳聽覺系統概論 1 1.1.1. 人耳聽覺遮蔽效應 1 1.2. 噪音環境簡介 2 1.2.1. 複數相干性函數 3 1.2.2. 噪聲環境分類 3 1.3. 遠場人機交互系統 6 1.3.1. 人機交互系統 6 1.3.2. 遠場條件 8 1.4. 研究動機與目的 9 1.5. 論文章節組織 10 第二章 相關文獻回顧 11 2.1. 單通道語音增強演算法 11 2.1.1. 頻譜相減法 11 2.1.2. 最小控制遞迴平均法 12 2.2. 適應性濾波器 14 2.2.1. 維納濾波器 14 2.2.2. 最小均方法 16 2.2.3. 正規化最小均方法 22 2.3. 麥克風陣列波束形成演算法 23 2.3.1. 延遲相加波束形成 23 2.3.2. 麥克風陣列後濾波 30 2.3.3. 廣義旁瓣消除器 33 2.4. 語音增強演算法相關客觀指標 38 2.4.1. 訊號雜訊比指標 38 2.4.2. 相干性與語音辨識度指標 40 2.4.3. 語音品質之感知評價指標 41 第三章 探討遠場麥克風陣列之改良型廣義旁瓣消除器語音增強演算法 44 3.1. 麥克風陣列簡介 44 3.1.1. 麥克風陣列的擺設 44 3.1.2. 麥克風的性質 44 3.2. 麥克風陣列環境假設 46 3.2.1. 語音與環境噪音擺設 46 3.2.2. 麥克風陣列語音與噪音特性 47 3.3. 語音增強演算法架構介紹 48 3.3.1. 語音增強演算法目標 48 3.3.2. 語音增強演算法架構 48 3.4. 抑制語音洩漏 50 3.5. 改良型適應性濾波器迭代公式 53 3.6. 麥克風陣列語音增強演算法總結 54 第四章 麥克風陣列語音增強演算法結果分析與比較 56 4.1. 實際錄音效能分析與比較 56 4.1.1. 錄音模擬環境設置 56 4.1.2. 演算法比較目的與比較項目 57 4.1.3. 語音功率譜估測 58 4.1.4. 麥克風數目影響與比較 60 4.1.5. 語音洩漏問題分析與比較 64 4.1.6. 適應性濾波器迭代公式 66 4.2. 語音增強演算法之客觀指標評比 74 4.2.1. 訊號雜訊比之效能分析與比較(實驗一) 75 4.2.2. 相干性與語音辨識度之效能分析與比較(實驗一) 79 4.2.3. 語音品質之感知評價之效能分析與比較(實驗一) 83 4.2.4. 訊號雜訊比之效能分析與比較(實驗二) 87 4.2.5. 相干性與語音辨識度之效能分析與比較(實驗二) 91 4.2.6. 語音品質之感知評價之效能分析與比較(實驗二) 95 4.3. 演算法運算複雜度比較 99 4.4. 演算法整體比較與結論 100 第五章 結論與未來發展 101 參考文獻 102

    [1] Zwicker, Eberhard, and Hugo Fastl. Psychoacoustics: Facts and models. Vol. 22. Springer Science & Business Media, 2013.
    [2] P. Noll, “MPEG digital audio coding,” Signal Processing Magazine, IEEE, vol. 14, no.5, 1997.Sep, pp. 59-81.
    [3] BRANDSTEIN, Michael; WARD, Darren (ed.). Microphone arrays: signal processing techniques and applications. Springer Science & Business Media, 2013.
    [4] Meyer, Joerg, and Klaus Uwe Simmer. "Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction." 1997 IEEE international conference on acoustics, speech, and signal processing, Vol. 2. IEEE, 1997. pp. 1167-1170.
    [5] Speech Recognition-Overview
    Available: https://kknews.cc/zh-tw/tech/z5megg.html
    [6] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, 1979.Apr , pp. 113-120.
    [7] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE signal processing letters, vol. 9, no. 1, 2002.Jan , pp. 12-15.
    [8] Martin, Rainer. "Noise power spectral density estimation based on optimal smoothing and minimum statistics." IEEE Transactions on speech and audio processing, vol. 9, no.5, 2001.Jul , pp. 504-512.
    [9] Farhang-Boroujeny, Behrouz. Adaptive filters: theory and applications. John Wiley & Sons, 2013.
    [10] Widrow, Bernard, et al. "Adaptive noise cancelling: Principles and applications." Proceedings of the IEEE, vol. 63, no. 12, 1975.Dec, pp. 1692-1716.
    [11] Van Veen, Barry D., and Kevin M. Buckley. "Beamforming: A versatile approach to spatial filtering." IEEE assp magazine, vol. 5, no. 2, 1988.Apr , pp. 4-24.
    [12] Van Trees, Harry L. Optimum array processing: Part IV of detection, estimation, and modulation theory. John Wiley & Sons, 2004.
    [13] Zelinski, Rainer. "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms." ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing. IEEE, 1988.
    [14] McCowan, Iain A., and Hervé Bourlard. "Microphone array post-filter based on noise field coherence." IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, 2003.Nov , pp. 709-716.
    [15] Griffiths, Lloyd, and C. W. Jim. "An alternative approach to linearly constrained adaptive beamforming." IEEE Transactions on antennas and propagation, vol. 30, no.1, 1982.Jan , pp. 27-34.
    [16] PRIYANKA, S. Siva. A review on adaptive beamforming techniques for speech enhancement. In: 2017 Innovations in Power and Advanced Computing Technologies (i-PACT). IEEE, 2017.April , pp. 1-6.
    [17] Claesson, Ingvar, and Sven Nordholm. "A spatial filtering approach to robust adaptive beaming." IEEE Transactions on Antennas and Propagation, vol. 40, no. 9, 1992.Sep , pp. 1093-1096.
    [18] Bitzer, Joerg, Klaus Uwe Simmer, and K-D. Kammeyer. "Theoretical noise reduction limits of the generalized sidelobe canceller (GSC) for speech enhancement." 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258). Vol. 5. IEEE, 1999.
    [19] XIE, Julan, et al. Finite data performance analysis of a sidelobe canceller. Multidimensional Systems and Signal Processing, 2017, vol. 28, no. 4, pp. 1737-1756.
    [20] Liu, Zehua, et al. "A new GSC beamforming algorithm based on double affine projection." 2014 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. IEEE, 2014.June , pp. 1-4.
    [21] Kates, James M., and Kathryn H. Arehart. "Coherence and the speech intelligibility index." The journal of the acoustical society of America, vol. 117, no. 4, 2005.Apr , pp. 2224-2237.
    [22] PAL Acoustics Technology Ltd. - Perceptual Evaluation of Speech Quality (PESQ). Available: http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
    [23] Ma, Jianfen, Yi Hu, and Philipos C. Loizou. "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions." The Journal of the Acoustical Society of America, vol. 125, no. 5, 2009.May, pp. 3387-3405.
    [24] A. ANSI, "S3. 5-1997, Methods for the calculation of the speech intelligibility index," New York: American National Standards Institute, vol. 19, 1997.Sep, pp. 90-119.
    [25] Beerends, J., et al. "Measurement of speech intelligibility based on the PESQ approach." Proceedings of the Workshop Measurement of Speech and Audio Quality in Networks (MESAQIN), Prague, Czech Republic. 2004.
    [26] CMH8K-Superlux Available:http://www.superlux.com.tw/upload/function.product.info/cce05bc6-0bef-4706-8a87-ea0f72b159a7/resource/CMH8K.pdf
    [27] OCTA-CAPTURE Hi-Speed USB Audio Interface.
    Available: http://tw.roland.com/products/octa-capture/
    [28] E. Rothauser, "IEEE recommended practice for speech quality measurements," IEEE Trans. on Audio and Electroacoustics, vol. 17, 1969, pp. 225-246.
    [29] Loizou, Philipos C. Speech enhancement: theory and practice. CRC press, 2007.
    [30] Greenberg, Julie E., and Patrick M. Zurek. "Evaluation of an adaptive beamforming method for hearing aids." The Journal of the Acoustical Society of America, vol. 91, no .3, 1992. pp. 1662-1676.

    無法下載圖示 校內:2024-07-09公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE