簡易檢索 / 詳目顯示

研究生: 黃聖彣
Huang, Sheng-Wen
論文名稱: 基於功率頻譜密度比應用於智慧型行動裝置之語音增強演算法
Speech Enhancement Algorithm Based on Power Spectral Density Ratio Applied to Smart Handheld Devices
指導教授: 雷曉方
Lei, Sheau-Fang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 中文
論文頁數: 92
中文關鍵詞: 智慧行動裝置雙麥克風語音增強噪音抑制功率頻譜密度比
外文關鍵詞: Smart Handheld Devices, Dual Microphones, Speech Enhancement, Noise Reduction, Power Spectral Density Ratio
相關次數: 點閱:105下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出探討行動裝置雙麥克風特性與功率頻譜密度之語音增強演算法,透過探討智慧行動裝置上兩支麥克風擺放位置特性,統計兩支麥克風能量密度比值,提出語音出現機率(Speech Present Probability)演算法。相較於僅用固定參數估測噪音功率頻譜密度(Noise Power Spectral Density Estimation),本演算法根據語音出現機率提出動態參數估測噪音功率頻譜密度。同時,更進一步提出語音功率頻譜密度估測(Speech Power Spectral Density Estimation),將語音分解成4種情況: 1.噪音階段 2.語音初始 3.語音階段 4.語音結束,對於不同情況使用不同參數進行語音功率頻譜密度估測。最後,將維納濾波器(Wiener Filter)加入語音出現機率進行改良,使增益函數(Gain Function)能更有效抑制噪音(Noise Suppression)且語音訊號能有效保留。面對行動裝置常見噪音環境進行真實錄音,經過本演算法架構處理後,在客觀指標訊號雜訊比(Signal-to-Noise Ratio, SNR)、相干性與語音辨識度(Coherence and Speech Intelligibility Index, CSII)和語音品質之感知計算(Perceptual Evaluation of Speech Quality, PESQ)上能相當幅度提升,表示本演算法架構輸出乾淨語音訊號,具有相當的辨識度與語音品質,能確保在噪音環境下使用智慧行動裝置輸出的語音訊號能被另一端接收者有效辨識。

    This thesis presents a speech enhancement algorithm based on power spectral density ratio applied to smart handheld devices with dual microphones. By exploring the smart handheld devices microphones of position characteristic, we gather the ratio of two microphones power spectral density and propose the speech present probability algorithm. Compare to using fixed parameter to estimate noise power spectral density and we use the dynamic parameter to estimate noise power spectral density. The noise power spectral density estimation is more accurate. We further propose speech power spectral density estimation and improve wiener filter to effectively suppress noise signal and save speech signal.
    We record noisy signal of smart handheld devices in different noise corrupted environment and compare the results of algorithms. The signal-to-noise ratio(SNR) performance and perceptual evaluation of speech quality(PESQ) performance are better than other algorithms, that is, the output signal of our algorithm is less noise signal and higher quality. The coherence and speech intelligibility index(CSII) is the most important index. The CSII performance of our algorithm is better than other algorithms, so person on the other end can clearly hear. Our algorithm is also low complexity therefor it is beneficial for applying to smart handheld devices.

    中文摘要 I EXTENDED ABSTRACT II 致謝 XI 目錄 XIII 表目錄 XVI 圖目錄 XVIII 第一章 緒論 1 1.1. 聽覺系統簡介 1 1.1.1. 耳朵生理簡介 1 1.1.2. 聽覺遮蔽效應 2 1.2. 噪音簡介 4 1.3. 網路協議通話 6 1.4. 研究動機與目的 7 1.5. 論文章節組織 8 第二章 相關文獻回顧與分析 10 2.1. 語音增強演算法文獻回顧 10 2.1.1. 頻譜相減法 10 2.1.2. 最小控制遞迴平均法 12 2.1.3. 理想二位元時頻遮罩 14 2.1.4. 雙麥克風手機功率頻譜密度差噪音消除演算法 16 2.2. 訊號雜訊比指標 22 2.3. 相干性與語音辨識度指標 24 2.4. 語音品質之感知計算指標 26 第三章 探討行動裝置雙麥克風特性與功率頻譜密度之語音增強演算法 29 3.1. 行動裝置麥克風簡介 29 3.1.1. 麥克風擺放方式 29 3.1.2. 麥克風性質 30 3.2. 行動裝置環境假設 30 3.2.1. 語音與噪音擺設 30 3.2.2. 雙麥克風語音與噪音特性 31 3.3. 語音增強演算法架構簡介 33 3.3.1. 語音增強演算法目標 33 3.3.2. 語音增強演算法架構 33 3.4. 時域延遲及窗型函數分割與快速傅立葉轉換 34 3.4.1. 雙麥克風時域延遲 34 3.4.2. 窗型函數與傅立葉轉換 34 3.5. 功率頻譜密度分析及語音出現機率 34 3.5.1. 功率頻譜密度分析 34 3.5.2. 語音出現機率 35 3.6. 語音及噪音功率頻譜密度估測 40 3.6.1. 噪音功率頻譜密度估測 40 3.6.2. 語音功率頻譜密度估測 41 3.7. 改良型維納濾波器與增益函數 45 3.7.1. 改良型維納濾波器 45 3.7.2. 增益函數 45 3.7.3. 維納濾波器與改良型維納濾波器比較 45 3.8. 本論文雙麥克風語音增強演算法總結 47 第四章 雙麥克風語音增強演算法結果分析與比較 60 4.1. 實際錄音演算法效能分析與比較 60 4.1.1. 錄音模擬環境設置 60 4.1.2. 演算法參數設計與比較目的 61 4.1.3. 噪音功率頻譜密度分析與比較 62 4.1.4. 訊號雜訊比之效能分析與比較 67 4.1.5. 相干性與語音辨識度之效能分析與比較 71 4.1.6. 語音品質之感知評價之分析與比較 76 4.1.7. 效能分析與比較總結 81 4.2. 演算法運算複雜度比較 85 4.3. 演算法分析與結果總結 86 第五章 結論與未來展 88 參考文獻 90

    [1] 耳朵的構造及聽覺形成的原因. Available: http://www.ear.com.tw/CGMH-WEB/earinfo.htm
    [2] The Inner Ear. Available: http://www.aurismedical.com/seiten_e/01_about.htm
    [3] Cochlear implant Help. Available: https://cochlearimplanthelp.com/
    [4] H. Fastl and E. Zwicker, Psychoacoustics: facts and models. Springer Science & Business Media, 2006.
    [5] 行政院噪音管制資訊網-噪音小百科. Available: http://ncs.epa.gov.tw/noise/BB/B-04-01.htm
    [6] 104年持有手機民眾數位機會調查報告. Available: http://ws.ndc.gov.tw/Download.ashx?u=LzAwMS9hZG1pbmlzdHJhdG9yLzEwL2NrZmlsZS9kYTc4ZGNiYS03N2RkLTQ2NmEtYWRiMS03ZTRhMjVjMDgwYzcucGRm&n=MTA076aO5oyB5pyJ5omL5qmf5rCR55y%2B76Wp5L2N5qmf5pyD6Kq%2F5p%2Bl5aCx5ZGKLnBkZg%3D%3D
    [7] 何謂網路通話(VoIP). Available: http://neuron.csie.ntust.edu.tw/homework/93/csie_introduction/homework2/B9315021-2/%E4%BD%95%E8%AC%82VoIP.htm
    [8] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, 1979.
    [9] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE signal processing letters, vol. 9, no. 1, pp. 12-15, 2002.
    [10] Y. Li and D. Wang, "On the optimality of ideal binary time–frequency masks," Speech Communication, vol. 51, no. 3, pp. 230-239, 2009.
    [11] M. Jeub, C. Herglotz, C. Nelke, C. Beaugeant, and P. Vary, "Noise reduction for dual-microphone mobile phones exploiting power level differences," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 1693-1696: IEEE.
    [12] E. ETSI, "202 396-1 (V1. 1.2):" Speech Processing," Transmission and Quality Aspects (STQ), pp. 2008-09.
    [13] P. Kabal, "TSP speech database," McGill University, Database Version, vol. 1, no. 0, pp. 09-02, 2002.
    [14] N. Yousefian, A. Akbari, and M. Rahmani, "Using power level difference for near field dual-microphone speech enhancement," Applied Acoustics, vol. 70, no. 11, pp. 1412-1421, 2009.
    [15] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions," The Journal of the Acoustical Society of America, vol. 125, no. 5, pp. 3387-3405, 2009.
    [16] J. M. Kates and K. H. Arehart, "Coherence and the speech intelligibility index," The journal of the acoustical society of America, vol. 117, no. 4, pp. 2224-2237, 2005.
    [17] A. ANSI, "S3. 5-1997, Methods for the calculation of the speech intelligibility index," New York: American National Standards Institute, vol. 19, pp. 90-119, 1997.
    [18] Perceptual Evaluation of Speech Quality (PESQ). Available: http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
    [19] J. G. Beerends, E. Larsen, N. Iyer, and J. M. van Vugt, "Measurement of speech intelligibility based on the PESQ approach," Measurement of Speech and Audio Quality in Networks (MESAQIN), 2004.
    [20] E. Rothauser, "IEEE recommended practice for speech quality measurements," IEEE Trans. on Audio and Electroacoustics, vol. 17, pp. 225-246, 1969.
    [21] Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no. 7, pp. 588-601, 2007.
    [22] 王乃堅 and 李中富, "使用三麥克風到達時間差及空間幾何搜尋法達成三維聲音定位," International Journal of Science and Engineering, vol. 4, no. 2, pp. 153-158, 2014.
    [23] A. Oppenheim and R. Schafer, "Discrete-time signal processing, Upper Saddle River, NJ, 1999," ed: Prentice Hall.
    [24] R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Transactions on speech and audio processing, vol. 9, no. 5, pp. 504-512, 2001.
    [25] 程大器, 統計學: 理論與應用: theory and application. 智勝出版, 2002.
    [26] 標準分數分佈機率圖(Standard Normal Probabilities) Available: http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf
    [27] ASUS. (2016). ZenFone3(ZE520KL)行動裝置. Available: https://www.asus.com/tw/Phone/ZenFone-3-ZE520KL/Beauty/
    [28] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013.
    [29] M. Jeub, C. Nelke, H. Krüger, C. Beaugeant, and P. Vary, "Robust dual-channel noise power spectral density estimation," in Signal Processing Conference, 2011 19th European, 2011, pp. 2304-2308: IEEE.
    [30] R. P. ITU-T, "862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs," International Telecommunication Union-Telecommunication Standardisation Sector, 2001.

    無法下載圖示 校內:2022-07-13公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE