| 研究生: |
邱酩仁 Chiu, Ming-Jen |
|---|---|
| 論文名稱: |
一種語音增強方法應用於四旋翼無人機之音訊錄音 A Speech Enhancement Method for Audio Recording on Quadcopter |
| 指導教授: |
黃悅民
Huang, Yueh-Min |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2016 |
| 畢業學年度: | 104 |
| 語文別: | 中文 |
| 論文頁數: | 51 |
| 中文關鍵詞: | 語音增強 、四旋翼無人機 、維納濾波器 、先驗訊噪比 、頻率權重區段訊噪比 |
| 外文關鍵詞: | Speech Enhancement, Quadcopter, Wiener Filter, a priori SNR, Frequency-weight Segmental SNR |
| 相關次數: | 點閱:141 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音增強系統目的是希望目標音訊在輸出時的噪音成分越小越好,以幫助提取目標音訊的訊號特徵值,或用於降低對目標音訊有一定程度影響的環境噪音;四翼無人機為目前最廣為應用的無人機種,但目前多數僅只於搭載攝影機進行影像部分的應用,如使用空拍錄影的觀察場景。若欲在無人機上結合影像與語音之應用,勢必要解決無人機飛行時所產生的環境噪音,否則將使得目標音訊模糊不清、無法辨識。
本研究提出一套語音增強系統基於先驗訊噪比之維納濾波器用以消除相加性之穩定噪音以提升在無人機飛行環境下錄製目標音訊之辨識度。語音訊號處理首先將訊號分解為音訊框、計算快速傅立葉轉換將訊號轉換到頻率域、預估噪音模型並與語音訊號頻譜量值得到先驗訊噪比、進而得到濾波器之轉移函數並進行補償,最後計算快速傅立葉逆轉換得到時間域上的訊號值,再疊加音訊框重建訊號,得到無噪音的最終語音訊號。由於在不同噪音環境下,補償係數不盡相同,因此本研究將進行實驗以得到在無人機飛行時所產生的環境噪音之最佳補償係數。
本研究除了在MATLAB運算演算法外,亦實作於樹莓派3開發板上。因此在選擇補償係數時,除了依據輸出語音訊號之品質,也考量到樹莓派3此嵌入式開發環境,得到一組最理想之補償係數於本語音增強系統。最後以頻率權重區段訊噪比驗證實驗結果,並得知在輸入訊號之頻率權重區段訊噪比高於 -10 dB時即可有效地進行本語音增強系統,得到清楚可辨識之語音訊號。
The purpose of speech enhancement is that let the system output the target signal with the lowest noise. It can help recognize the target signal as like speech recognition. Quadcopter now is the most popular kind of drone. But applications with quadcopter are just using camera to capture the video without audio. It is necessary to reduce the noise when flying the quadcopter otherwise the target signal might be hard to be recognized.
This thesis proposes a speech enhancement system using Wiener filter based on a priori signal-to-noise ratio (SNR) to reduce additive stationary noise helping recognize the target audio signal when flying the quadcopter. First, decompose the signal into frames, and do the Fast Fourier Transform (FFT). Then, compute the coefficient of transfer function based on a priori SNR, and compensate the coefficient. Finally, do the inverse FFT to translate the spectrum and reconstruct the signal to get the non-noise signal.
In this thesis, not only run the simulation on MATLAB, but also implement the algorithm on Raspberry pi 3. It is necessary to consider the quality of the output signal and the embedded environment like Raspberry pi 3 when choosing the ideal compensatory coefficient on this speech enhancement.
Finally, verify the result by using frequency-weight segmental SNR (fwSNRseg). The speech enhancement system in this thesis can output an effective signal when the fwSNRseg of input signal is higher than -10 dB.
[1] "無人機專題報導", Mms.digitimes.com, 2016. [Online]. Available: http://mms.digitimes.com/tw/indepth/2015_drone/index.html. [Accessed: 28- Apr- 2016].
[2] "Discrete Fourier transform", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Discrete_Fourier_transform. [Accessed: 28- Apr- 2016].
[3] "Cooley–Tukey FFT algorithm", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm. [Accessed: 28- Apr- 2016].
[4] "Signal-to-noise ratio", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Signal-to-noise_ratio. [Accessed: 28- Apr- 2016].
[5] Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, pp. 1109-1121, 1984.
[6] N. Bassiou, C. Kotropoulos, and I. Pitas, "Greek folk music denoising under a symmetric α-stable noise assumption," in Heterogeneous Networking for Quality, Reliability, Security and Robustness (QShine), 2014 10th International Conference on, 2014, pp. 18-23.
[7] S. M. Kuo and D. R. Morgan, "Active noise control: a tutorial review," Proceedings of the IEEE, vol. 87, pp. 943-973, 1999.
[8] S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments," Speech Communication, vol. 48, pp. 220-231, 2// 2006.
[9] J. Kang, D. G. Lee, and D. Choi, "Convolutive Noise Filtering in Power Analysis on Smartcards Using the Cepstrum," in 2009 Fourth International Conference on Embedded and Multimedia Computing, 2009, pp. 1-4.
[10] H. Gether, "Active noise cancellation: Trends, concepts, and technical challenges", EDN, 2013. [Online]. Available: http://www.edn.com/design/consumer/4422370/Active-noise-cancellation--Trends--concepts--and-technical-challenges. [Accessed: 28- Apr- 2016]
[11] P. Lueg, “Process of silencing sound oscillations,” U.S. Patent 2043416, June 9, 1936.
[12] C. Schremmer, T. Haenselmann, and F. Bomers, "A wavelet based audio denoiser," in Proc. IEEE International Conference on Multimedia and Expo, 2001, pp. 145-148.
[13] S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, pp. 113-120, 1979.
[14] M. Berouti, R. Schwartz, and J. Makhoul, "Enhancement of speech corrupted by acoustic noise," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '79., 1979, pp. 208-211.
[15] J. S. Lim and A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings of the IEEE, vol. 67, pp. 1586-1604, 1979.
[16] C. Plapous, C. Marro, and P. Scalart, "Improved Signal-to-Noise Ratio Estimation for Speech Enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 2098-2108, 2006.
[17] G. Yu, S. Mallat, and E. Bacry, "Audio Denoising by Time-Frequency Block Thresholding," IEEE Transactions on Signal Processing, vol. 56, pp. 1830-1839, 2008.
[18] S. Doclo and M. Moonen, "GSVD-based optimal filtering for single and multimicrophone speech enhancement," IEEE Transactions on Signal Processing, vol. 50, pp. 2230-2244, 2002.
[19] R. L. Bouquin and G. Faucon, "Using the coherence function for noise reduction," IEE Proceedings I - Communications, Speech and Vision, vol. 139, pp. 276-280, 1992.
[20] M. Baker and D. Logue, "A comparison of three noise reduction procedures applied to bird vocal signals", J Field Ornithology, vol. 78, no. 3, pp. 240-253, 2007.
[21] "Asoundrc - AlsaProject", Alsa-project.org, 2016. [Online]. Available: http://www.alsa-project.org/main/index.php/Asoundrc. [Accessed: 15- May- 2016].
[22] "ALSA project - the C library reference: PCM (digital audio) plugins", Alsa-project.org, 2016. [Online]. Available: http://www.alsa-project.org/alsa-doc/alsa-lib/pcm_plugins.html. [Accessed: 16- May- 2016].
[23] "Advanced Linux Sound Architecture - ArchWiki", Wiki.archlinux.org, 2016. [Online]. Available: https://wiki.archlinux.org/index.php/Advanced_Linux_Sound_Architecture. [Accessed: 16- May- 2016].
[24] I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Processing, vol. 81, pp. 2403-2418, 11// 2001.
[25] I. Cohen, "Relaxed statistical model for speech enhancement and a priori SNR estimation", IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 870-881, 2005.
[26] S. Quackenbush, T. Barnwell and M. Clements, Objective measures of speech quality. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[27] J. Tribolet, P. Noll, B. McDermott, and R. Crochiere, "A study of complexity and quality of speech waveform coders," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '78., 1978, pp. 586-590.
[28] K. Taira and K. Kondo, "Estimation of binaural intelligibility using the frequency-weighted segmental SNR of stereo channel signals," in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015, pp. 101-104.
[29] "Articulation Index - DiracDelta Science & Engineering Encyclopedia", Diracdelta.co.uk, 2016. [Online]. Available: http://www.diracdelta.co.uk/science/source/a/r/articulation%20index/source.html#.V3YnQfl965u. [Accessed: 01- Jul- 2016].
[30] "Raspberry Pi", Wikipedia, 2016. [Online]. Available: https://en.wikipedia.org/wiki/Raspberry_Pi. [Accessed: 06- Jul- 2016].