| 研究生: |
黃聖彣 Huang, Sheng-Wen |
|---|---|
| 論文名稱: |
基於功率頻譜密度比應用於智慧型行動裝置之語音增強演算法 Speech Enhancement Algorithm Based on Power Spectral Density Ratio Applied to Smart Handheld Devices |
| 指導教授: |
雷曉方
Lei, Sheau-Fang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2017 |
| 畢業學年度: | 105 |
| 語文別: | 中文 |
| 論文頁數: | 92 |
| 中文關鍵詞: | 智慧行動裝置 、雙麥克風 、語音增強 、噪音抑制 、功率頻譜密度比 |
| 外文關鍵詞: | Smart Handheld Devices, Dual Microphones, Speech Enhancement, Noise Reduction, Power Spectral Density Ratio |
| 相關次數: | 點閱:105 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出探討行動裝置雙麥克風特性與功率頻譜密度之語音增強演算法,透過探討智慧行動裝置上兩支麥克風擺放位置特性,統計兩支麥克風能量密度比值,提出語音出現機率(Speech Present Probability)演算法。相較於僅用固定參數估測噪音功率頻譜密度(Noise Power Spectral Density Estimation),本演算法根據語音出現機率提出動態參數估測噪音功率頻譜密度。同時,更進一步提出語音功率頻譜密度估測(Speech Power Spectral Density Estimation),將語音分解成4種情況: 1.噪音階段 2.語音初始 3.語音階段 4.語音結束,對於不同情況使用不同參數進行語音功率頻譜密度估測。最後,將維納濾波器(Wiener Filter)加入語音出現機率進行改良,使增益函數(Gain Function)能更有效抑制噪音(Noise Suppression)且語音訊號能有效保留。面對行動裝置常見噪音環境進行真實錄音,經過本演算法架構處理後,在客觀指標訊號雜訊比(Signal-to-Noise Ratio, SNR)、相干性與語音辨識度(Coherence and Speech Intelligibility Index, CSII)和語音品質之感知計算(Perceptual Evaluation of Speech Quality, PESQ)上能相當幅度提升,表示本演算法架構輸出乾淨語音訊號,具有相當的辨識度與語音品質,能確保在噪音環境下使用智慧行動裝置輸出的語音訊號能被另一端接收者有效辨識。
This thesis presents a speech enhancement algorithm based on power spectral density ratio applied to smart handheld devices with dual microphones. By exploring the smart handheld devices microphones of position characteristic, we gather the ratio of two microphones power spectral density and propose the speech present probability algorithm. Compare to using fixed parameter to estimate noise power spectral density and we use the dynamic parameter to estimate noise power spectral density. The noise power spectral density estimation is more accurate. We further propose speech power spectral density estimation and improve wiener filter to effectively suppress noise signal and save speech signal.
We record noisy signal of smart handheld devices in different noise corrupted environment and compare the results of algorithms. The signal-to-noise ratio(SNR) performance and perceptual evaluation of speech quality(PESQ) performance are better than other algorithms, that is, the output signal of our algorithm is less noise signal and higher quality. The coherence and speech intelligibility index(CSII) is the most important index. The CSII performance of our algorithm is better than other algorithms, so person on the other end can clearly hear. Our algorithm is also low complexity therefor it is beneficial for applying to smart handheld devices.
[1] 耳朵的構造及聽覺形成的原因. Available: http://www.ear.com.tw/CGMH-WEB/earinfo.htm
[2] The Inner Ear. Available: http://www.aurismedical.com/seiten_e/01_about.htm
[3] Cochlear implant Help. Available: https://cochlearimplanthelp.com/
[4] H. Fastl and E. Zwicker, Psychoacoustics: facts and models. Springer Science & Business Media, 2006.
[5] 行政院噪音管制資訊網-噪音小百科. Available: http://ncs.epa.gov.tw/noise/BB/B-04-01.htm
[6] 104年持有手機民眾數位機會調查報告. Available: http://ws.ndc.gov.tw/Download.ashx?u=LzAwMS9hZG1pbmlzdHJhdG9yLzEwL2NrZmlsZS9kYTc4ZGNiYS03N2RkLTQ2NmEtYWRiMS03ZTRhMjVjMDgwYzcucGRm&n=MTA076aO5oyB5pyJ5omL5qmf5rCR55y%2B76Wp5L2N5qmf5pyD6Kq%2F5p%2Bl5aCx5ZGKLnBkZg%3D%3D
[7] 何謂網路通話(VoIP). Available: http://neuron.csie.ntust.edu.tw/homework/93/csie_introduction/homework2/B9315021-2/%E4%BD%95%E8%AC%82VoIP.htm
[8] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, 1979.
[9] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE signal processing letters, vol. 9, no. 1, pp. 12-15, 2002.
[10] Y. Li and D. Wang, "On the optimality of ideal binary time–frequency masks," Speech Communication, vol. 51, no. 3, pp. 230-239, 2009.
[11] M. Jeub, C. Herglotz, C. Nelke, C. Beaugeant, and P. Vary, "Noise reduction for dual-microphone mobile phones exploiting power level differences," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 1693-1696: IEEE.
[12] E. ETSI, "202 396-1 (V1. 1.2):" Speech Processing," Transmission and Quality Aspects (STQ), pp. 2008-09.
[13] P. Kabal, "TSP speech database," McGill University, Database Version, vol. 1, no. 0, pp. 09-02, 2002.
[14] N. Yousefian, A. Akbari, and M. Rahmani, "Using power level difference for near field dual-microphone speech enhancement," Applied Acoustics, vol. 70, no. 11, pp. 1412-1421, 2009.
[15] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions," The Journal of the Acoustical Society of America, vol. 125, no. 5, pp. 3387-3405, 2009.
[16] J. M. Kates and K. H. Arehart, "Coherence and the speech intelligibility index," The journal of the acoustical society of America, vol. 117, no. 4, pp. 2224-2237, 2005.
[17] A. ANSI, "S3. 5-1997, Methods for the calculation of the speech intelligibility index," New York: American National Standards Institute, vol. 19, pp. 90-119, 1997.
[18] Perceptual Evaluation of Speech Quality (PESQ). Available: http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
[19] J. G. Beerends, E. Larsen, N. Iyer, and J. M. van Vugt, "Measurement of speech intelligibility based on the PESQ approach," Measurement of Speech and Audio Quality in Networks (MESAQIN), 2004.
[20] E. Rothauser, "IEEE recommended practice for speech quality measurements," IEEE Trans. on Audio and Electroacoustics, vol. 17, pp. 225-246, 1969.
[21] Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no. 7, pp. 588-601, 2007.
[22] 王乃堅 and 李中富, "使用三麥克風到達時間差及空間幾何搜尋法達成三維聲音定位," International Journal of Science and Engineering, vol. 4, no. 2, pp. 153-158, 2014.
[23] A. Oppenheim and R. Schafer, "Discrete-time signal processing, Upper Saddle River, NJ, 1999," ed: Prentice Hall.
[24] R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Transactions on speech and audio processing, vol. 9, no. 5, pp. 504-512, 2001.
[25] 程大器, 統計學: 理論與應用: theory and application. 智勝出版, 2002.
[26] 標準分數分佈機率圖(Standard Normal Probabilities) Available: http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf
[27] ASUS. (2016). ZenFone3(ZE520KL)行動裝置. Available: https://www.asus.com/tw/Phone/ZenFone-3-ZE520KL/Beauty/
[28] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013.
[29] M. Jeub, C. Nelke, H. Krüger, C. Beaugeant, and P. Vary, "Robust dual-channel noise power spectral density estimation," in Signal Processing Conference, 2011 19th European, 2011, pp. 2304-2308: IEEE.
[30] R. P. ITU-T, "862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs," International Telecommunication Union-Telecommunication Standardisation Sector, 2001.
校內:2022-07-13公開