簡易檢索 / 詳目顯示

研究生: 邱酩仁
Chiu, Ming-Jen
論文名稱: 一種語音增強方法應用於四旋翼無人機之音訊錄音
A Speech Enhancement Method for Audio Recording on Quadcopter
指導教授: 黃悅民
Huang, Yueh-Min
學位類別: 碩士
Master
系所名稱: 工學院 - 工程科學系
Department of Engineering Science
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 51
中文關鍵詞: 語音增強四旋翼無人機維納濾波器先驗訊噪比頻率權重區段訊噪比
外文關鍵詞: Speech Enhancement, Quadcopter, Wiener Filter, a priori SNR, Frequency-weight Segmental SNR
相關次數: 點閱:141下載:8
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語音增強系統目的是希望目標音訊在輸出時的噪音成分越小越好,以幫助提取目標音訊的訊號特徵值,或用於降低對目標音訊有一定程度影響的環境噪音;四翼無人機為目前最廣為應用的無人機種,但目前多數僅只於搭載攝影機進行影像部分的應用,如使用空拍錄影的觀察場景。若欲在無人機上結合影像與語音之應用,勢必要解決無人機飛行時所產生的環境噪音,否則將使得目標音訊模糊不清、無法辨識。
    本研究提出一套語音增強系統基於先驗訊噪比之維納濾波器用以消除相加性之穩定噪音以提升在無人機飛行環境下錄製目標音訊之辨識度。語音訊號處理首先將訊號分解為音訊框、計算快速傅立葉轉換將訊號轉換到頻率域、預估噪音模型並與語音訊號頻譜量值得到先驗訊噪比、進而得到濾波器之轉移函數並進行補償,最後計算快速傅立葉逆轉換得到時間域上的訊號值,再疊加音訊框重建訊號,得到無噪音的最終語音訊號。由於在不同噪音環境下,補償係數不盡相同,因此本研究將進行實驗以得到在無人機飛行時所產生的環境噪音之最佳補償係數。
    本研究除了在MATLAB運算演算法外,亦實作於樹莓派3開發板上。因此在選擇補償係數時,除了依據輸出語音訊號之品質,也考量到樹莓派3此嵌入式開發環境,得到一組最理想之補償係數於本語音增強系統。最後以頻率權重區段訊噪比驗證實驗結果,並得知在輸入訊號之頻率權重區段訊噪比高於 -10 dB時即可有效地進行本語音增強系統,得到清楚可辨識之語音訊號。

    The purpose of speech enhancement is that let the system output the target signal with the lowest noise. It can help recognize the target signal as like speech recognition. Quadcopter now is the most popular kind of drone. But applications with quadcopter are just using camera to capture the video without audio. It is necessary to reduce the noise when flying the quadcopter otherwise the target signal might be hard to be recognized.
    This thesis proposes a speech enhancement system using Wiener filter based on a priori signal-to-noise ratio (SNR) to reduce additive stationary noise helping recognize the target audio signal when flying the quadcopter. First, decompose the signal into frames, and do the Fast Fourier Transform (FFT). Then, compute the coefficient of transfer function based on a priori SNR, and compensate the coefficient. Finally, do the inverse FFT to translate the spectrum and reconstruct the signal to get the non-noise signal.
    In this thesis, not only run the simulation on MATLAB, but also implement the algorithm on Raspberry pi 3. It is necessary to consider the quality of the output signal and the embedded environment like Raspberry pi 3 when choosing the ideal compensatory coefficient on this speech enhancement.
    Finally, verify the result by using frequency-weight segmental SNR (fwSNRseg). The speech enhancement system in this thesis can output an effective signal when the fwSNRseg of input signal is higher than -10 dB.

    摘要 I Extended Abstract II 誌謝 VII 目錄 VIII 表目錄 X 圖目錄 XI 第一章、緒論 1 1-1 研究動機與背景 1 1-2 研究目的 3 1-3 章節編排 3 第二章、文獻探討 4 2-1 快速傅立葉轉換(Fast Fourier Transform, FFT) 4 2-2 訊號處理評估標準 7 2-3 語音訊號除噪方法 10 2-4 語音訊號處理流程 16 第三章、系統設計 18 3-1 整體系統流程 18 3-2 噪音消除公式與流程 20 3-3 語音增強方法於無人機飛行環境驗證 24 第四章、模擬運算實驗與實際環境實驗 27 4-1 本系統之訊號處理評估標準 27 4-2 模擬運算實驗 30 4-2-1 實驗環境說明 30 4-2-2 實驗結果與分析 31 4-3 本演算法實作於無人機系統 38 4-3-1 實作環境說明 38 4-3-2 系統實作結果與分析 43 第五章、結論與未來展望 47 5-1 結論 47 5-2 未來展望 47 參考文獻 49 表 2-1、訊號處理評估標準比較表 8 表 3-1、能量振幅值比較表 26 表 4-1、清晰度指數 29 表 4-2、自製無人機規格 31 表 4-3、不同輸入訊號fwSNRseg之輸出比較(模擬於MATLAB) 36 表 4-4、自組無人機規格 38 表 4-5、樹莓派3規格表 39 表 4-6、asound.rc語法表 41 表 4-7、不同輸入訊號fwSNRseg之輸出比較 44 圖 2-1、噪音波型抵銷示意圖[10] 10 圖 2-2、濾波器轉移函數示意圖 12 圖 2-3、麥克風陣列示意圖[18] 15 圖 2-4、Steven Boll提出之系統流程圖[13] 16 圖 2-5、訊號疊加示意圖[13] 17 圖 3-1、整體系統流程圖 19 圖 3-2、Wiener Filter流程圖 21 圖 3-3、輸入訊號時頻圖 22 圖 3-4、快速傅立葉逆轉換無窗函數時頻圖 22 圖 3-5、快速傅立葉逆轉換使用窗函數時頻圖 22 圖 3-6、快速傅立葉逆轉換無窗函數局部時頻圖 23 圖 3-7、四旋翼無人機馬達旋轉之噪音與其頻譜圖 24 圖 3-8、人聲訊號與其頻譜圖 25 圖 4-1、區段SNR示意圖 27 圖 4-2、清晰度指數分佈圖 29 圖 4-3、自製無人機 31 圖 4-4、頻率權重區段訊噪比值(frequency-weight segmental SNR) 32 圖 4-5、原使錄音室歌聲訊號圖 33 圖 4-6、錄音室歌聲與噪音混音訊號圖 33 圖 4-7、語音增強後訊號與原始訊號比較圖(λ=2、β1=1、β2=1) 33 圖 4-8、語音增強後訊號與原始訊號比較圖(λ=16.9、β1=1、β2=1) 33 圖 4-9、原使錄音室歌聲時頻圖 34 圖 4-10、錄音室歌聲與噪音混音時頻圖 34 圖 4-11、語音增強後時頻圖(λ=2、β1=1、β2=1) 35 圖 4-12、語音增強後訊號圖(λ=16.9、β1=1、β2=1) 35 圖 4-13、原始訊號與輸出訊號綜合比較圖(-20 dB ~ 5 dB) 37 圖 4-14、實作於自組無人機 38 圖 4-15、樹莓派音訊輸入設定內容 40 圖 4-16、Python函式庫安裝指令 42 圖 4-17、自組四旋翼飛行環境噪音與其頻譜圖 43 圖 4-18、語音增強系統輸入與輸出綜合比較圖(-20 dB ~ 0 dB) 46

    [1] "無人機專題報導", Mms.digitimes.com, 2016. [Online]. Available: http://mms.digitimes.com/tw/indepth/2015_drone/index.html. [Accessed: 28- Apr- 2016].
    [2] "Discrete Fourier transform", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Discrete_Fourier_transform. [Accessed: 28- Apr- 2016].
    [3] "Cooley–Tukey FFT algorithm", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm. [Accessed: 28- Apr- 2016].
    [4] "Signal-to-noise ratio", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Signal-to-noise_ratio. [Accessed: 28- Apr- 2016].
    [5] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, pp. 1109-1121, 1984.
    [6] N. Bassiou, C. Kotropoulos, and I. Pitas, "Greek folk music denoising under a symmetric α-stable noise assumption," in Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine), 2014 10th International Conference on, 2014, pp. 18-23.
    [7] S. M. Kuo and D. R. Morgan, "Active noise control: a tutorial review," Proceedings of the IEEE, vol. 87, pp. 943-973, 1999.
    [8] S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments," Speech Communication, vol. 48, pp. 220-231, 2// 2006.
    [9] J. Kang, D. G. Lee, and D. Choi, "Convolutive Noise Filtering in Power Analysis on Smartcards Using the Cepstrum," in 2009 Fourth International Conference on Embedded and Multimedia Computing, 2009, pp. 1-4.
    [10] H. Gether, "Active noise cancellation: Trends, concepts, and technical challenges", EDN, 2013. [Online]. Available: http://www.edn.com/design/consumer/4422370/Active-noise-cancellation--Trends--concepts--and-technical-challenges. [Accessed: 28- Apr- 2016]
    [11] P. Lueg, “Process of silencing sound oscillations,” U.S. Patent 2043416, June 9, 1936.
    [12] C. Schremmer, T. Haenselmann, and F. Bomers, "A wavelet based audio denoiser," in Proc. IEEE International Conference on Multimedia and Expo, 2001, pp. 145-148.
    [13] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, pp. 113-120, 1979.
    [14] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by acoustic noise," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79., 1979, pp. 208-211.
    [15] J. S. Lim and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, pp. 1586-1604, 1979.
    [16] C. Plapous, C. Marro, and P. Scalart, "Improved Signal-to-Noise Ratio Estimation for Speech Enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 2098-2108, 2006.
    [17] G. Yu, S. Mallat, and E. Bacry, "Audio Denoising by Time-Frequency Block Thresholding," IEEE Transactions on Signal Processing, vol. 56, pp. 1830-1839, 2008.
    [18] S. Doclo and M. Moonen, "GSVD-based optimal filtering for single and multimicrophone speech enhancement," IEEE Transactions on Signal Processing, vol. 50, pp. 2230-2244, 2002.
    [19] R. L. Bouquin and G. Faucon, "Using the coherence function for noise reduction," IEE Proceedings I - Communications, Speech and Vision, vol. 139, pp. 276-280, 1992.
    [20] M. Baker and D. Logue, "A comparison of three noise reduction procedures applied to bird vocal signals", J Field Ornithology, vol. 78, no. 3, pp. 240-253, 2007.
    [21] "Asoundrc - AlsaProject", Alsa-project.org, 2016. [Online]. Available: http://www.alsa-project.org/main/index.php/Asoundrc. [Accessed: 15- May- 2016].
    [22] "ALSA project - the C library reference: PCM (digital audio) plugins", Alsa-project.org, 2016. [Online]. Available: http://www.alsa-project.org/alsa-doc/alsa-lib/pcm_plugins.html. [Accessed: 16- May- 2016].
    [23] "Advanced Linux Sound Architecture - ArchWiki", Wiki.archlinux.org, 2016. [Online]. Available: https://wiki.archlinux.org/index.php/Advanced_Linux_Sound_Architecture. [Accessed: 16- May- 2016].
    [24] I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Processing, vol. 81, pp. 2403-2418, 11// 2001.
    [25] I. Cohen, "Relaxed statistical model for speech enhancement and a priori SNR estimation", IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 870-881, 2005.
    [26] S. Quackenbush, T. Barnwell and M. Clements, Objective measures of speech quality. Englewood Cliffs, N.J.: Prentice Hall, 1988.
    [27] J. Tribolet, P. Noll, B. McDermott, and R. Crochiere, "A study of complexity and quality of speech waveform coders," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '78., 1978, pp. 586-590.
    [28] K. Taira and K. Kondo, "Estimation of binaural intelligibility using the frequency-weighted segmental SNR of stereo channel signals," in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, pp. 101-104.
    [29] "Articulation Index - DiracDelta Science & Engineering Encyclopedia", Diracdelta.co.uk, 2016. [Online]. Available: http://www.diracdelta.co.uk/science/source/a/r/articulation%20index/source.html#.V3YnQfl965u. [Accessed: 01- Jul- 2016].
    [30] "Raspberry Pi", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Raspberry_Pi. [Accessed: 06- Jul- 2016].

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE