簡易檢索 / 詳目顯示

研究生: 陳益正
Chen, I-Cheng
論文名稱: 使用強健性時間延遲與訊號子空間方法於麥克風陣列語音加強
Robust Time Delay and Signal Subspace Approaches to Microphone Array Speech Enhancement
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2003
畢業學年度: 91
語文別: 中文
論文頁數: 85
中文關鍵詞: 語音加強麥克風陣列
外文關鍵詞: microphone array, speech enhancement
相關次數: 點閱:60下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出在汽車噪音環境下的麥克風陣列語音訊號加強方法,其目的在於有效抑制噪音主要分佈頻帶的干擾,以減少有色的環境噪音對語音訊號的影響。我們的方法是利用麥克風陣列收集語音訊號,由於每個語音訊號會有不同的延遲時間,因此我們使用強健性時間延遲估測方法,在不需要假設語音波行為平面波下來找出語者的位置,再利用其空間幾何關係來估算語音訊號到達每個麥克風的延遲時間,接著使用Delay-and-Sum Beamforming 的陣列訊號處理的技術將語音訊號加強。此外,我們還利用適應性訊號子空間的方法進行語音訊號加強的後處理,抑制噪音主要分佈頻帶的噪音干擾,來達到語音加強的目的。從實驗結果中可以發現我們提所出的方法能有效的提昇雜訊語音訊號的訊噪比和語音辨識正確率,且從加強後的語音訊號頻譜圖更可以發現我們所提出的方法能抑制噪音主要分佈頻帶的噪音強度。在我們開發出來的展示系統中,除了有不錯的雜訊語音辨識率外,也獲得相當精準的語者定位效果。

    In this paper, we propose the microphone array speech processing approach to enhance the colored noisy speech signal recording in car environments. Our objective is to suppress the car noise interference by performing noise reduction for the critical band of car noisy speech. First of all, we use a linear microphone array to acquire car noisy speech data. A robust time delay estimator is proposed for delay-and-sum beamforming speech enhancement. The estimator searches the time delay through a speaker localization algorithm without the assumption of plane wave for speech signal. The speaker location is estimated through finding two angles of microphone pairs. After delay-and-sum beamforming, an adaptive signal subspace method is presented to further reduce the colored noise interference in car speech signal. Our method is to suppress the car noise interference via an adaptive signal subspace enhancement method. From the experiments, we find that the proposed method can significantly improve the speech signal to noise ratio (SNR) and speech recognition rate. Also, it is obvious to see the improvement on the spectrogram when applying the proposed algorithm. We obtain desirable performance on finding speaker location in the developed demonstration system.

    中文摘要 英文摘要 誌謝 誌謝 1 圖目錄 1 表目錄 1 第一章 導論 1 1.1 前言 1 1.2 研究動機與目的 2 1.3 研究方法簡介 3 1.4 章節概要 4 第二章 背景介紹 5 2.1 Telematics 簡介 5 2.2 麥克風陣列運用於語音辨識之簡介 8 2.3 使用麥克風陣列於語音加強之文獻探討 10 2.4 麥克風陣列時間延遲的估測方法 19 第三章 系統架構 22 3.1 系統簡介 22 3.2 強健性時間延遲估測 26 3.3 適應式訊號子空間方法運用於汽車噪音環境之語音加強 30 第四章 實驗 41 4.1 實驗設定 41 4.2 麥克風陣列語料庫 43 4.3 實驗結果 45 4.4 實驗討論 60 4.5 系統展示 61 結論與建議 65 5.1 結論 65 5.2 建議 66 參考文獻 68 附錄一、訊號子空間方法最佳濾波器推導 75 附錄二、語音訊號受噪音干擾前後之影響 77 附錄三、音檔人名列表 81

    [1] J. E. Adcock, Yoshihiko Gotoh, Daniel J. Mashao, Harvey F. Silverman, “Microphone-Array Speech Recognition Via Increment MAP Training”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2., pp. 897-900, 1996
    [2] M. Akagi, T. Kago, “Noise Reduction Using A Small-Scale Microphone Array In Multi Noise Source Environment ”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 1, pp. I-909-I-912, 2002
    [3] M. S. Brandstein, “An Event-Based Method for Microphone Array Speech Enhancement”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 953-956, 1999
    [4] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”. IEEE Trans. Acoustics, Speech and Signal Processing, Vol ASSP-27, NO 2,pp. 113-120, April 1979.
    [5] R.L. Bouquin and G. Faucon, “Using the Coherence Function for Noise Reduction”, IEE Processings-I, Vol. 139, No. 3, pp. 276-280, June 1992.
    [6] R.L. Bouquin, A.A. Azirani, and G. Faucon, “Enhancement of Speech Degraded by Coherent and Incoherent Noise Using a Cross-Spectral Estimator”, IEEE Transactions on Speech and Audio Processing, Vol. 5. No. 5, pp. 484-487,September 1997.
    [7] J.-T. Chien, J.-R. Lai and P.-Y. Lai, “Microphone Array Signal Processing for Far-Talking Speech Recognition”, IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications, March 20-23, 2001.
    [8] J-T Chien and J-R Lai. “Use of Microphone array and Model Adaptation for Hand-Free Speech Acquisition and Recognition,” accepted for publication in Journal of VLSI Speech Processing Systems for Signal, Image and Video Technology, 2002
    [9] I. Cohen, Baruch Berdugo, “Microphone Array Post-Filtering for Non-Stationary Noise Supperession”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 1, pp. I-901-I-904, 2002
    [10] M. Dahl and I. Claesson, “Acoustic Noise and Echo Canceling with Microphone Array”, IEEE Transactions on Vehicular Technology. Vol. 48. No.5, pp.1518-1526, September 1999.
    [11] Y. Ephraim and H. L. Van Trees, “A Signal Subspace Approach for Speech Enhancement”, IEEE Transactions on Speech and Audio Processing, Vol. 3. No. 4, pp. 251-266,July 1995.
    [12] S. Fischer and K.U. Simmer, “Beamforming Microphone Arrays for Speech Acquisition in Noisy Environments”, Speech Communication, pp. 215-227, 1996.
    [13] K. Furuya, “Noise Recognition and Dereverberation Using Correlation Matrix Based on the Multiple-Input/Output Inverse-Filtering Theorem”, Hand-free speech communication, pp.201-204, 2001
    [14] D. Giuliani, M. Omologo and P. Svaizer, “Experiments of Speech Recognition In a Noisy and Reverberant Environment Using a Microphone Array and HMM Adaptation”, Proceeding of international conference on Spoken Language Processing (ICSLP), pp. 1329-1332, October 1996.
    [15] J. Gonzales-Rodriguez and J. Ortega-Garcia, “Robust Speaker Recognition Through Acoustic Array Processing and Spectral Normalization”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 1103-1106, 1997
    [16] S. M. Griebel, Michael S. Brandstein, “Microphone Array Speech Deverberation Using Coarse Channel Modeling”, Applications of Signal Processing to Audio and Acoustics, pp. 71-74, 2001
    [17] H. G. Hirsch and C. Ehrlicher, “Noise Estimation Technique for Robust Speech Recognition”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.1, pp. 153 -156, 1995.
    [18] Y. Hu and P. C. Loizou, “A Subspace Approach for Enhancing Speech Corrupted by Colored Noise”, IEEE Signal Processing Letters, Vol.9. No.7. pp.204-206, July 2002.
    [19] T. B Hughes, H.-S. Kim, “Using A Real-Time, Tracking Microphone Array As Input To An HMM Speech Recognition”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.1, pp.249-252, 1998
    [20] M. Inoue, S. Nakamura, T. Yamada and K. Shikano, “Microphone Array Design Measures for Hands-Free Speech Recognition”, Proc. of Eurospeech, vol 1, pp. 331-334, September 1997.
    [21] F. Jabloun and B. Champagne, “A Multi-Microphone Signal Subspace Approach for Speech Enhancement”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.1. pp. 205-208,2001
    [22] D. H. Johnson and D. E. Dudgeon, Array Signal Processing Concepts and Techniques, Prentice Hall Signal Processing Series, 1993.
    [23] W. Kim, S. Kang and H. Ko, “Spectral Subtraction Based on Phonetic Dependency and Masking Effects”, IEE Proc. Image Signal Process. Vol. 147. No.5. pp. 423-427, October 2000.
    [24] J. Kleban and Y. Gong, “HMM Adaptation And Microphone Array Processing For Distant Speech Recognition”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), pp.1411-1414, 2000
    [25] H. Lev-Ari and Y. Ephraim, “Extension of the Signal Subspace Speech Enhancement Approach to Colored Noise”, IEE Signal processing Letters. Vol. 10. No.4. pp. 104-106, April 2003.
    [26] Z. Li, Michael and W. Hoffman, “Evaluation of Microphone Array For Enhancing Noisy And Reverberant Speech For Coding”, IEEE Transactions on Speech and Audio Processing, Vol.7, pp.91-95, 1999
    [27] E. Lleida, J. Fernandez, and Masgrau, “Robust Continuous Speech Recognition System Based on A Microphone Array”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 1, pp. 241-244, 1998
    [28] D. Mahmoudi, “A Microphone Array for Speech Enhancement Using Multiresolution Wavelet Transform”, in Proc. of Eurospeech97, pp. 339-342, September. 1997.
    [29] D. Mahmoudi and A. Drygajlo, “Combined Wiener and Coherence Filtering in Wavelet Domain Microphone Array Speech Enhancement”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 1, pp. 385 –388, 1998
    [30] C. Marro, Y. Mahieux and K. U. Simer, “Analysis of Noise Reduction and Dereverberation Techniques Based on Microphone Arrays with Postfiltering”, IEEE Transactions on Speech and Audio processing, vol. 6. No. 3, pp. 240-259, May 1998.
    [31] M. Matassoni, M. Omologo and D. Giuliani, “Hands-Free Speech Recognition Using A Filtered Clean Corpus and Incremental HMM Adaptation”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 3, pp. 1407-1410, 2000.
    [32] M. Matassoni, M. Omologo, D. Giuliani, “Hands-Free Speech Recognition Using A Filtered Clean Corpus And Incremental HMM Adaptation”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.3, pp.1407-1410, 2000
    [33] I. A. McCowan, C. Marro and L. Mauuary, “Robust Speech Recognition Using Near-Field Superdirective Beamforming With Post-Filtering”, Hand-free Speech Communication, pp.123-126, 2001
    [34] I. A. McCowan and S. Sridharan, “Adaptive Parameter Compensation for Robust Hand-Free Speech Recognition Using A Dual Beamforming Microphone Array”, Proceedings of 2001 Intemational Symposium on Intelligent Multimedia, Video and Speech Processing, pp.254-257, May 2001
    [35] I. A. McCowan, Andrew Morris, Herve Bourlard, “Improving Speech Recognition Performance Of Small Microphone Arrays Using Missing Data Techniques”, Proceeding of international conference on Spoken Language Processing (ICSLP), pp.2181-2184, 2002
    [36] J. Meyer and K. U. Simmer, “Multi-Channel Speech Enhancement in A Car Environment Using Filtering and Spectral Subtraction”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 1167-1170, 1997.
    [37] M. Mizumachi and M. Akagi, “Noise Reduction by Paired-Microphones Using Spectral Subtraction”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 1001-1004, 1998.
    [38] M. Omologo and P. Svaizer, “Use of the Crosspower-Spectrum Phase in Acoustic Event Location”, IEEE Transactions on Speech and Audio Processing, vol. 6, No. 3, pp. 288-292, May 1997.
    [39] L. Rabiner and B-H. Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, 1993.
    [40] P Raghavan, R. J. Renomeron, C Che, D-S Yuk, JL Flanagan, “Speech Recognition In A Reverberant Environment Using Matched FilterArray Processing And Linguistic-Tree Maximum Likelihood Linear Pegression Adaptation”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.2, pp.777-780, 1999
    [41] A. M. Rao, D. L. Jones, “A Denoising Approach to Multisensor Singal Estimation”, IEEE Transactions On Signal Processing, Vol. 48, No. 5, MAY 2000
    [42] J. G. Rodriguez et al, “Coherence-based Subband Decomposition for Robust Speech and Speaker Recognition in Noisy and Reverberant Rooms”, Proceeding of international conference on Spoken Language Processing (ICSLP), pp. 385-388,Sydney, 1998.
    [43] J. G. Rodriguez, J. L. Sanchez-Bote and J. Ortega-Garcia, “Speech Dereverberation and Noise Reduction With A Combined Microphone Array Approach”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol. 2, pp. 1037-1040, 2000.
    [44] A. Satue-Villar, Juan Fernandez-Rubio, “Time-Frequency Transforms And Beamforming For Speaker Recognition”, Proceeding of international conference on Spoken Language Processing (ICSLP), 2002
    [45] M. L. Seltzer and B. Raj, “Calibration of Microphone Arrays for Improved Speech Recognition”, Proc. Eurospeech, pp. 243-246, 2001
    [46] H. F. Silverman and S. E. Kirtman, “A Two-Stage Algorithm for Determining Talker Location from Linear Microphone Array Data”, Computer Speech and Language. Vol.6. pp. 129-152, 1992
    [47] T. Takiguchi, S. Nakamura, Q. Huo, K. Shikano, “Model Adaptation Based On HMM Decomposition For Reverberant Speech Recognition”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.2, pp.827-830, 1997
    [48] M.A. Tuffy and D. I. Laurenson, “Estimating Clean Speech Thresholds for Perceptual Based Speech Enhancement”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 127-130, Oct. 17-20, 1999.
    [49] E. Visser, M. Otsuka, T. -W. Lee, “A Spatio-Temporal Speech Enhancement Scheme for Robust Speech Recognition”, Proceeding of international conference on Spoken Language Processing (ICSLP), pp.1821-1824, 2002
    [50] B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice-Hall Signal Processing Series, 1985.
    [51] T. Yamada, S. Nakamura and K. Shikano, “Hand-Free Speech Recognition Based on 3-D Viterbi Search Using A Microphone Array”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.1, pp.245-248, 1998
    [52] R. Zelinski, “A Microphone Array with Adaptive Post-Filtering for Noise Reduction in Reverberant Rooms”, Proceeding of IEEE international conference on Acoustic, Speech, and Signal Processing (ICASSP), Vol.5, pp. 2578 -2581, 1988.
    [53] Telematics產業初探, http://www.asiawired.com.tw/piont/02/09.html
    [54] 當汽車遇上行動通訊:Telematics的發展現況與趨勢, 第三波資訊2002號1月份第十一期

    下載圖示 校內:立即公開
    校外:2003-07-14公開
    QR CODE