| 研究生: |
顏姿宜 Yen, Zih-Yi |
|---|---|
| 論文名稱: |
音源定位及分離之陪伴機器人 Sound Source Localization and Separation for A Companion Robot |
| 指導教授: |
周榮華
Chou, Jung-Hua |
| 學位類別: |
碩士 Master |
| 系所名稱: |
工學院 - 工程科學系 Department of Engineering Science |
| 論文出版年: | 2019 |
| 畢業學年度: | 107 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 陪伴機器人 、音源定位 、音訊分離 、麥克風陣列 |
| 外文關鍵詞: | companion robot, sound source localization, sound source separation, microphone array |
| 相關次數: | 點閱:150 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
聲音來源定位及分離所得到的資訊,可以幫助機器人提供更多的功能。以陪伴機器人(Companion Robot)和社交機器人(Social Robot)為例,前者在服務長者或年幼的使用者時,可以得知周遭用戶各別的指令,執行相對應的功能;後者則可避免機器人在社交場合上出現雞同鴨講的現象。
本論文即是針對陪伴機器人的需求研究,使用麥克風陣列收音,先以MUSIC (MUltiple SIgnal Classification)演算法進行多音源的定位,接著由此結果找出適當的音源資訊,再以GCC-NMF (Generalized Cross Correlation – Non-Negative Matrix Factorization)演算法,完成音訊分離,最終目的在於能呈現出各個音源的方位和音頻資訊,以利後續分析。
實驗環境為背景噪音45~55 dB的室內空間,以手機、藍牙喇叭播放聲音和人聲做為測試聲音,音量控制在65~75 dB(此為一般說話的聲音大小)。由於陪伴機器人是假定在一個小家庭內使用,且選用的麥克風陣列為四個麥克風一組,因此本論文的重點以不大於三個的音源為主,進行定位及分離。
根據實驗結果,音源定位可以量測到距離1.5公尺以內的音源,且角度誤差大約在正負3度之間,電腦的計算時間約0.45秒,相較於傳統的Beamforming來說,時間已大幅縮短。而在音訊分離前加入挑選麥克風聲道的步驟,不論是從圖表分析還是實際聆聽音檔,都可以看到分離的效果明顯提升。
Recently most companion robots are designed to interact with people through vision and sound. In this thesis, the author added a sound source recognition system to an existing facial expression recognition robot by using a microphone array. The sound source recognition system consists of two parts, namely sound source localization and sound source separation. The former is achieved by using MUSIC (MUltiple SIgnal Classification) algorithm to estimate the angle of sound source; whereas the latter is by GCC-NMF (Generalized Cross Correlation – Non-Negative Matrix Factorization) algorithm to separate different sound sources. In order to improve the separation accuracy after localization, the author selected appropriate microphone channels via the sound directionality before separation to enhance the separation results.
Since the companion robot aims to serve in small families, the main goal of this study is to treat 2 to 3 sound signals with background noise levels typically in the range of about 45 to 55 dB. The results show that the MUSIC algorithm can estimate the target source accurately, and need less computation time than conventional method, for example, beamforming. As for separation, whether it’s directly listening to audio files or conducting a spectrogram analysis, it all had a significant effect on the results.
[1] The robot invasion arrived at CES 2019 — and it was cuter than we expected. https://www.digitaltrends.com/home/cutest-companion-robots-ces-2019/. 2019.06查詢.
[2] The robots of CES 2019.https://www.androidauthority.com/new-robots-942422/. 2019.06查詢.
[3] Zoetic AI Kiki. https://www.zoeticai.com/. 2019.06查詢.
[4] GROOVE X LOVOT. https://lovot.life/en/. 2019.06查詢.
[5] Stanley Black & Decker Pria. https://www.okpria.com/. 2019.06查詢.
[6] Okuno, H. G., & Nakadai, K. (2015, April). Robot audition: Its rise and perspectives. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5610-5614, IEEE.
[7] EARS. https://robot-ears.eu/. 2019.06查詢.
[8] Nakadai, K., Okuno, H. G., Takahashi, T., Nakamura, K., Mizumoto, T., Yoshida, T., ... & Ince, G. (2011, September). Introduction to open source robot audition software hark. In The 29th annual conference of the robotics society of Japan. Robotics Society of Japan.
[9] Elkachouchi, H., & Elsalam Mofeed, M. A. (2005, March). Direction-of-arrival methods (DOA) and time difference of arrival (TDOA) position location technique. In Proceedings of the Twenty-Second National Radio Science Conference, 2005. NRSC 2005, pp. 173-182, IEEE.
[10] Tuma, J., Janecka, P., Vala, M., & Richter, L. (2012, May). Sound source localization. In Proceedings of the 13th International Carpathian Control Conference (ICCC), pp. 740-743, IEEE.
[11] Kim, S., On, B., Im, S., & Kim, S. (2017, February). Performance comparison of FFT-based and GCC-PHAT time delay estimation schemes for target azimuth angle estimation in a passive SONAR array. In 2017 IEEE Underwater Technology (UT), pp. 1-4, IEEE.
[12] Yue, X., Qu, G., Liu, B., & Liu, A. (2018, September). Detection sound source direction in 3D space using convolutional neural networks. In 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 81-84, IEEE.
[13] 劉子維.(2019). 使用麥克風陣列設計與實作聲音來源定位機制. 國立成功大學工程科學系學位論文.
[14] Baig, N. A., & Malik, M. B. (2013). Comparison of direction of arrival (DOA) estimation techniques for closely spaced targets. International Journal of Future Computer and Communication, 2(6), 654.
[15] Lavate, T. B., Kokate, V. K., & Sapkal, A. M. (2010, April). Performance analysis of MUSIC and ESPRIT DOA estimation algorithms for adaptive array smart antenna in mobile communication. In 2010 Second International Conference on Computer and Network Technology, pp. 308-311, IEEE.
[16] Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), pp. 113-120.
[17] Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), pp. 1109-1121.
[18] Vuvuzela sound denoising algorithm. https://www.mathworks.com/matlabcentral/fileexchange/27912-vuvuzela-sound-denoising-algorithm. 2019.06查詢.
[19] Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014, October). Singing-voice separation from monaural recordings using deep recurrent neural networks. In ISMIR, pp. 477-482.
[20] Aytar, Y., Vondrick, C., & Torralba, A. (2016). Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems, pp. 892-900, NIPS 2016.
[21] SoundNet. https://www.youtube.com/watch?v=yJCjVvIY4dU. 2019.06查詢.
[22] Schmidt, R. (1986). Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3), pp. 276-280.
[23] Tang, H. (2014). DOA estimation based on MUSIC algorithm. [online] Available: https://pdfs.semanticscholar.org/5ff7/806b44e60d41c21429e1ad2755d72bba41d7.pdf. 2019.06查詢.
[24] Wood, S. U., Rouat, J., Dupont, S., Pironkov, G., Wood, S. U., Rouat, J., ... & Pironkov, G. (2017). Blind speech separation and enhancement with GCC-NMF. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25(4), pp. 745-755.
[25] SiSEC數據集.[Online]. Available: https://sisec.inria.fr/. 2019.06查詢.
[26] Emiya, V., Vincent, E., Harlander, N., & Hohmann, V. (2011). Subjective and objective quality assessment of audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), pp. 2046-2057.
[27] Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), pp. 1462-1469.
[28] Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), pp. 1950-1960.
[29] Duong, N. Q., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), pp. 1830-1840.
[30] 孫佾微. (2018). 由表情辨識情緒之陪伴機器人. 國立成功大學工程科學系學位論文.
[31] Seeed Studio - Seeed Wiki - ReSpeaker Mic Array v2.0. http://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/ . 2019.06查詢.
[32] 淘晶馳 - HMI觸控液晶顯示模組. http://www.tjc1688.com/Product/Txilie/. 2019.06查詢.
[33] 羅技C525網路攝影機.
https://www.logitech.com/zh-tw/product/hd-webcam-c525#specification-tabular. 2019.06查詢.
[34] 小米隨身藍牙喇叭.https://www.mi.com/tw/littleaudio/. 2019.06查詢.
[35] Mouser Electronics - 麥克風MP34DT01TR-M.https://reurl.cc/rqWeZ. 2019.06查詢.
[36] Scheibler, R., Bezzam, E., & Dokmanić, I. (2018, April). Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 351-355, IEEE.
[37] DiBiase, J. H. (2000). A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. PhD thesis, Engineering, Brown University, Providence RI, USA.
[38] Yoon, Y. S., Kaplan, L. M., & McClellan, J. H. (2006). TOPS: New DOA estimator for wideband signals. IEEE Transactions on Signal processing, 54(6), pp. 1977-1989.
[39] GitHub - Sean Wood - GCC-NMF. https://github.com/seanwood/gcc-nmf. 2019.06查詢.
[40] 陳旻甄. (2018). 高齡者陪伴機器人. 國立成功大學工程科學系學位論文.