簡易檢索 / 詳目顯示

研究生: 杜秉鴻
Tu, Bing-Hong
論文名稱: 基於修改型功率頻譜密度差應用於雙麥克風行動裝置之語音增強演算法
Speech Enhancement Algorithm Based on Modified Power Spectral Density Difference Applied to Dual Microphone Smart Handheld Devices
指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 100
中文關鍵詞: 智慧行動裝置雙麥克風語音增強噪音抑制功率階層差語音出現機率
外文關鍵詞: Smart Handheld Devices, Dual Microphones, Speech Enhancement, Noise Reduction, Power Level Difference, Speech Present Probability
相關次數: 點閱:105下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出根據行動裝置雙麥克風特性與改良型功率頻譜密度差之語音增強演算法,透過智慧行動裝置上兩支麥克風擺放位置特性,統計兩支麥克風能量密度之修改型能量階層差(Modified Power Level Difference),接著根據改良型能量階層差的資料分部計算出語音出現機率(Speech Present Probability)演算法,再根據語音出現機率動態估測噪音功率頻譜密度(Noise Power Spectral Density Estimation)。同時,更進一步估測語音功率頻譜密度估測(Speech Power Spectral Density Estimation),利用先前計算出的語音出現機率做一些運算來完成語音活動偵測(Voice Activity Detection)。若判斷目前音框之語音沒出現,利用輸入混合訊號之功率頻譜密度減去先前估測的噪音頻譜密度;若判斷目前音框語音出現,將依據三種不同情況做估測: 1.語音能量維持 2.語音能量劇烈上升或下降 3.語音能量緩緩上升或下降,針對不同情況進行語音功率頻譜密度估測。最後,將改良型S函數(Modified Sigmoid Function)加入語音出現機率進行改良,使增益函數(Gain Function)能更有效抑制噪音(Noise Suppression)且語音訊號能有效保留。面對行動裝置常見噪音環境進行實際錄音,經過本演算法架構處理後,在客觀指標訊號雜訊比(Signal-to-Noise Ratio, SNR)、相干性與語音辨識度(Coherence and Speech Intelligibility Index, CSII)和語音品質之感知計算(Perceptual Evaluation of Speech Quality, PESQ)上能相當幅度提升,表示本演算法架構輸出乾淨語音訊號,具有相當的辨識度與語音品質,能確保在噪音環境下使用智慧行動裝置輸出的語音訊號能被另一端接收者有效辨識。

    This thesis presents a speech enhancement algorithm based on modified power level difference (MPLD) applied to smart handheld devices with dual microphones. By exploring the smart handheld devices microphones of position characteristic, we gather the modified power level difference (MPLD) from two microphones’ power spectral density. Compare to using original parameter to calculate the speech present probability, we calculate it according to the MPLD data distribution that we gather. Then, we use the speech present probability as a dynamic parameter to estimate noise power spectral density. We also propose speech power spectral density estimation. First, we utilize the Voice Activity Detection (VAD) by using the speech probability and some simple calculation to determine whether the speech exist at the present frame. If the speech not exist, we use the Subtraction method to improve the accuracy. Conversely, we estimate speech power spectral density directly according to the different speech condition. Finally, we propose Modified Sigmoid Function as a gain function to suppress noise signal and save speech signal effectively.
    We record noisy signal of smart handheld devices in different noise corrupted environment and compare the results of algorithms. The signal-to-noise ratio (SNR) performance and perceptual evaluation of speech quality (PESQ) performance are better than PLR algorithm, that is, the output signal of our algorithm is less noise signal and higher quality. The coherence and speech intelligibility index (CSII) is the most important index. The CSII performance of our algorithm is better than PLR algorithms, so person on the other end can clearly hear. Furthermore, the proposed algorithm’s complexity is lower than PLR algorithm. Therefore, it is beneficial for applying to smart handheld devices.

    中文摘要 I EXTENDED ABSTRACT II 誌謝 XIII 目錄 XV 表目錄 XVIII 圖目錄 XX 第一章 緒論 1 1.1. 人耳聽覺系統簡介 1 1.1.1. 人耳生理介紹 1 1.1.2. 人耳聽覺遮蔽效應 2 1.2. 噪音簡介 4 1.3. 網路協議通話 6 1.4. 研究動機與目的 7 1.5. 論文章節組織 8 第二章 相關文獻回顧 10 2.1. 語音增強演算法文獻回顧 10 2.1.1. 頻譜相減法 10 2.1.2. 最小控制遞迴平均法 11 2.1.3. 根據功率階層比之快速雙麥克風噪音抑制演算法 14 2.1.4. 改良功率階層差之雙麥克風噪音抑制演算法 18 2.2. 語音增強演算法相關客觀指標 26 2.2.1. 訊號雜訊比指標 26 2.2.2. 相干性與語音辨識度指標 27 2.2.3. 語音品質之感知評價指標 29 第三章 探討行動裝置雙麥克風特性與改良型功率頻譜密度差之語音增強演算法 33 3.1. 行動裝置麥克風簡介 33 3.1.1. 雙麥克風的擺設 33 3.1.2. 麥克風的性質 34 3.2. 行動裝置環境假設 37 3.2.1. 語音與環境噪音擺設 37 3.2.2. 雙麥克風語音與噪音特性 37 3.3. 語音增強演算法架構介紹 40 3.3.1. 語音增強演算法目標 40 3.3.2. 語音增強演算法架構 40 3.4. 時域訊號之窗型函數分割與快速傅立葉轉換 41 3.4.1. 窗型函數與傅立葉轉換 41 3.5. 功率頻譜密度分析及語音出現機率 41 3.5.1. 功率頻譜密度計算 41 3.5.2. 語音出現機率計算 42 3.6. 語音及噪音功率頻譜密度估測 46 3.6.1. 噪音功率頻譜密度估測 46 3.6.2. 語音功率頻譜密度估測 49 3.7. 修改型S函數與增益函數 55 3.7.1. S函數 55 3.7.2. 修改型S函數 55 3.8. 本論文雙麥克風語音增強演算法總結 57 第四章 雙麥克風語音增強演算法結果分析與比較 63 4.1. 實際錄音效能分析與比較 63 4.1.1. 錄音模擬環境設置 63 4.1.2. 演算法比較目的及比較項目 64 4.1.3. 特徵與語音出現機率分析與比較 65 4.1.4. 噪音功率頻譜密度分析與比較 67 4.1.5. 語音功率頻譜密度分析與比較 72 4.1.6. 增益函數分析與比較 77 4.1.7. 訊號雜訊比之效能分析與比較 79 4.1.8. 相干性與語音辨識度效能分析與比較 83 4.1.9. 語音品質之感知評價效能之分析與比較 87 4.1.10. 演算法比較總結與分析 91 4.2. 演算法運算複雜度比較 94 4.3. 演算法整體比較與結論 95 第五章 結論與未來發展 96 參考文獻 98

    [1] 長庚醫院耳鼻喉科 耳朵的構造及聽覺形成的原因. Available: http://www.ear.com.tw/CGMH-WEB/earinfo.htm
    [2] Auris Medical Cochlear therapies - The Inner Ear. Available: http://www.aurismedical.com/seiten_e/01_about.htm
    [3] Cochlear Implant HELP - Electrodes and Channels. Available: https://cochlearimplanthelp.com/journey/choosing-a-cochlear-implant/electrodes-and-channels/
    [4] P. Noll, “MPEG digital audio coding,” Signal Processing Magazine, IEEE, vol. 14, no.
    5, 1997. Sep, pp. 59-81.
    [5] H. Fastl, and E. Zwicker, Psychoacoustics: Facts and models: Springer Science & Business Media, 2007.
    [6] 臺灣行政院環保署噪音管制資訊網-噪音定義及簡介. Available: http://ncs.epa.gov.tw/noise/bb/b-04-01.htm
    [7] 國家發展委員會&聯合行銷股份有限公司-”104年持有手機民眾數位機會調查報告” 民國104年9月Available: http://ws.ndc.gov.tw/Download.ashx?u=LzAwMS9hZG1pbmlzdHJhdG9yLzEwL2NrZmlsZS9kYTc4ZGNiYS03N2RkLTQ2NmEtYWRiMS03ZTRhMjVjMDgwYzcucGRm&n=MTA076aO5oyB5pyJ5omL5qmf5rCR55y%2B76Wp5L2N5qmf5pyD6Kq%2F5p%2Bl5aCx5ZGKLnBkZg%3D%3D
    [8] International Telecommunication Union (ITU) - About VoIP Available: https://www.itu.int/net/itunews/issues/2009/07/21.aspx
    [9] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, 1979.Apr , pp. 113-120.
    [10] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE signal processing letters, vol. 9, no. 1, 2002. Jan , pp. 12-15.
    [11] Zhang, Jian, et al. "A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone." IEEE Chinese Spoken Language Processing (ISCSLP), 2012. Dec 8th International Symposium on.
    [12] Wang, Shiwei, et al. "A modified power-level-difference-based noise reduction for dual-microphone headsets." IEEE Information, Communications and Signal Processing (ICICS) 2013 9th International Conference on. pp. 1-5.
    [13] P. C. Yong, S. Nordholm, H. H. Dam, and S.Y. Low, “on the optimization for sigmoid
    function for speech enhancement,” 2011 19th European Signal Processing Conference, pp. 211–215.
    [14] M. Jeub, C. Herglotz, C. Nelke, C. Beaugeant, and P. Vary, "Noise reduction for dual-
    microphone mobile phones exploiting power level differences," in Acoustics, Speech
    and Signal Processing (ICASSP), IEEE International Conference on, 2012, pp.
    1693-1696.
    [15] J. M. Kates and K. H. Arehart, "Coherence and the speech intelligibility index," The
    journal of the acoustical society of America, vol. 117, no. 4, 2004.Apr, pp. 2224-2237.
    [16] PAL Acoustics Technology Ltd. - Perceptual Evaluation of Speech Quality (PESQ). Available: http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
    [17] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech
    intelligibility in noisy conditions based on new band-importance functions," The
    Journal of the Acoustical Society of America, vol. 125, no. 5, 2009. May, pp. 3387-3405.
    [18] A. ANSI, "S3. 5-1997, Methods for the calculation of the speech intelligibility index,"
    New York: American National Standards Institute, vol. 19, 1997.Sep, pp. 90-119.
    [19] J. G. Beerends, E. Larsen, N. Iyer, and J. M. van Vugt, "Measurement of speech
    intelligibility based on the PESQ approach," Measurement of Speech and Audio
    Quality in Networks (MESAQIN), 2004.
    [20] E. Rothauser, "IEEE recommended practice for speech quality measurements," IEEE Trans. on Audio and Electroacoustics, vol. 17, 1969, pp. 225-246.
    [21] Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no.7, 2007.July, pp. 588-601.
    [22] Oppenheim, Alan V. Discrete-time signal processing. Pearson Education India, 1999.
    [23] 程大器, 統計學: 理論與應用: theory and application. 智勝出版, 2002.
    [24] Sony Mobile Communcations Inc.- SONY Xperia X Available: https://www.sonymobile.com/tw/products/phones/xperia-x/
    [25] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2007.
    [26] M. Jeub, C. Nelke, H. Krüger, C. Beaugeant, and P. Vary, "Robust dual-channel noise power spectral density estimation," IEEE in Signal Processing Conference, 2011 19th European, 2011, pp. 2304-2308.
    [27] R. P. ITU-T, "862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs," International Telecommunication Union-Telecommunication Standardisation Sector, 2001.

    無法下載圖示 校內:2023-07-01公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE