| 研究生: |
杜秉鴻 Tu, Bing-Hong |
|---|---|
| 論文名稱: |
基於修改型功率頻譜密度差應用於雙麥克風行動裝置之語音增強演算法 Speech Enhancement Algorithm Based on Modified Power Spectral Density Difference Applied to Dual Microphone Smart Handheld Devices |
| 指導教授: |
雷曉方
Lei, Sheau-Fang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 中文 |
| 論文頁數: | 100 |
| 中文關鍵詞: | 智慧行動裝置 、雙麥克風 、語音增強 、噪音抑制 、功率階層差 、語音出現機率 |
| 外文關鍵詞: | Smart Handheld Devices, Dual Microphones, Speech Enhancement, Noise Reduction, Power Level Difference, Speech Present Probability |
| 相關次數: | 點閱:105 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出根據行動裝置雙麥克風特性與改良型功率頻譜密度差之語音增強演算法,透過智慧行動裝置上兩支麥克風擺放位置特性,統計兩支麥克風能量密度之修改型能量階層差(Modified Power Level Difference),接著根據改良型能量階層差的資料分部計算出語音出現機率(Speech Present Probability)演算法,再根據語音出現機率動態估測噪音功率頻譜密度(Noise Power Spectral Density Estimation)。同時,更進一步估測語音功率頻譜密度估測(Speech Power Spectral Density Estimation),利用先前計算出的語音出現機率做一些運算來完成語音活動偵測(Voice Activity Detection)。若判斷目前音框之語音沒出現,利用輸入混合訊號之功率頻譜密度減去先前估測的噪音頻譜密度;若判斷目前音框語音出現,將依據三種不同情況做估測: 1.語音能量維持 2.語音能量劇烈上升或下降 3.語音能量緩緩上升或下降,針對不同情況進行語音功率頻譜密度估測。最後,將改良型S函數(Modified Sigmoid Function)加入語音出現機率進行改良,使增益函數(Gain Function)能更有效抑制噪音(Noise Suppression)且語音訊號能有效保留。面對行動裝置常見噪音環境進行實際錄音,經過本演算法架構處理後,在客觀指標訊號雜訊比(Signal-to-Noise Ratio, SNR)、相干性與語音辨識度(Coherence and Speech Intelligibility Index, CSII)和語音品質之感知計算(Perceptual Evaluation of Speech Quality, PESQ)上能相當幅度提升,表示本演算法架構輸出乾淨語音訊號,具有相當的辨識度與語音品質,能確保在噪音環境下使用智慧行動裝置輸出的語音訊號能被另一端接收者有效辨識。
This thesis presents a speech enhancement algorithm based on modified power level difference (MPLD) applied to smart handheld devices with dual microphones. By exploring the smart handheld devices microphones of position characteristic, we gather the modified power level difference (MPLD) from two microphones’ power spectral density. Compare to using original parameter to calculate the speech present probability, we calculate it according to the MPLD data distribution that we gather. Then, we use the speech present probability as a dynamic parameter to estimate noise power spectral density. We also propose speech power spectral density estimation. First, we utilize the Voice Activity Detection (VAD) by using the speech probability and some simple calculation to determine whether the speech exist at the present frame. If the speech not exist, we use the Subtraction method to improve the accuracy. Conversely, we estimate speech power spectral density directly according to the different speech condition. Finally, we propose Modified Sigmoid Function as a gain function to suppress noise signal and save speech signal effectively.
We record noisy signal of smart handheld devices in different noise corrupted environment and compare the results of algorithms. The signal-to-noise ratio (SNR) performance and perceptual evaluation of speech quality (PESQ) performance are better than PLR algorithm, that is, the output signal of our algorithm is less noise signal and higher quality. The coherence and speech intelligibility index (CSII) is the most important index. The CSII performance of our algorithm is better than PLR algorithms, so person on the other end can clearly hear. Furthermore, the proposed algorithm’s complexity is lower than PLR algorithm. Therefore, it is beneficial for applying to smart handheld devices.
[1] 長庚醫院耳鼻喉科 耳朵的構造及聽覺形成的原因. Available: http://www.ear.com.tw/CGMH-WEB/earinfo.htm
[2] Auris Medical Cochlear therapies - The Inner Ear. Available: http://www.aurismedical.com/seiten_e/01_about.htm
[3] Cochlear Implant HELP - Electrodes and Channels. Available: https://cochlearimplanthelp.com/journey/choosing-a-cochlear-implant/electrodes-and-channels/
[4] P. Noll, “MPEG digital audio coding,” Signal Processing Magazine, IEEE, vol. 14, no.
5, 1997. Sep, pp. 59-81.
[5] H. Fastl, and E. Zwicker, Psychoacoustics: Facts and models: Springer Science & Business Media, 2007.
[6] 臺灣行政院環保署噪音管制資訊網-噪音定義及簡介. Available: http://ncs.epa.gov.tw/noise/bb/b-04-01.htm
[7] 國家發展委員會&聯合行銷股份有限公司-”104年持有手機民眾數位機會調查報告” 民國104年9月Available: http://ws.ndc.gov.tw/Download.ashx?u=LzAwMS9hZG1pbmlzdHJhdG9yLzEwL2NrZmlsZS9kYTc4ZGNiYS03N2RkLTQ2NmEtYWRiMS03ZTRhMjVjMDgwYzcucGRm&n=MTA076aO5oyB5pyJ5omL5qmf5rCR55y%2B76Wp5L2N5qmf5pyD6Kq%2F5p%2Bl5aCx5ZGKLnBkZg%3D%3D
[8] International Telecommunication Union (ITU) - About VoIP Available: https://www.itu.int/net/itunews/issues/2009/07/21.aspx
[9] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, 1979.Apr , pp. 113-120.
[10] I. Cohen and B. Berdugo, "Noise estimation by minima controlled recursive averaging for robust speech enhancement," IEEE signal processing letters, vol. 9, no. 1, 2002. Jan , pp. 12-15.
[11] Zhang, Jian, et al. "A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone." IEEE Chinese Spoken Language Processing (ISCSLP), 2012. Dec 8th International Symposium on.
[12] Wang, Shiwei, et al. "A modified power-level-difference-based noise reduction for dual-microphone headsets." IEEE Information, Communications and Signal Processing (ICICS) 2013 9th International Conference on. pp. 1-5.
[13] P. C. Yong, S. Nordholm, H. H. Dam, and S.Y. Low, “on the optimization for sigmoid
function for speech enhancement,” 2011 19th European Signal Processing Conference, pp. 211–215.
[14] M. Jeub, C. Herglotz, C. Nelke, C. Beaugeant, and P. Vary, "Noise reduction for dual-
microphone mobile phones exploiting power level differences," in Acoustics, Speech
and Signal Processing (ICASSP), IEEE International Conference on, 2012, pp.
1693-1696.
[15] J. M. Kates and K. H. Arehart, "Coherence and the speech intelligibility index," The
journal of the acoustical society of America, vol. 117, no. 4, 2004.Apr, pp. 2224-2237.
[16] PAL Acoustics Technology Ltd. - Perceptual Evaluation of Speech Quality (PESQ). Available: http://www.pal-acoustics.com/index.php?a=services&id=143&lang=cn
[17] J. Ma, Y. Hu, and P. C. Loizou, "Objective measures for predicting speech
intelligibility in noisy conditions based on new band-importance functions," The
Journal of the Acoustical Society of America, vol. 125, no. 5, 2009. May, pp. 3387-3405.
[18] A. ANSI, "S3. 5-1997, Methods for the calculation of the speech intelligibility index,"
New York: American National Standards Institute, vol. 19, 1997.Sep, pp. 90-119.
[19] J. G. Beerends, E. Larsen, N. Iyer, and J. M. van Vugt, "Measurement of speech
intelligibility based on the PESQ approach," Measurement of Speech and Audio
Quality in Networks (MESAQIN), 2004.
[20] E. Rothauser, "IEEE recommended practice for speech quality measurements," IEEE Trans. on Audio and Electroacoustics, vol. 17, 1969, pp. 225-246.
[21] Y. Hu and P. C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms," Speech communication, vol. 49, no.7, 2007.July, pp. 588-601.
[22] Oppenheim, Alan V. Discrete-time signal processing. Pearson Education India, 1999.
[23] 程大器, 統計學: 理論與應用: theory and application. 智勝出版, 2002.
[24] Sony Mobile Communcations Inc.- SONY Xperia X Available: https://www.sonymobile.com/tw/products/phones/xperia-x/
[25] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2007.
[26] M. Jeub, C. Nelke, H. Krüger, C. Beaugeant, and P. Vary, "Robust dual-channel noise power spectral density estimation," IEEE in Signal Processing Conference, 2011 19th European, 2011, pp. 2304-2308.
[27] R. P. ITU-T, "862-perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs," International Telecommunication Union-Telecommunication Standardisation Sector, 2001.
校內:2023-07-01公開