| 研究生: |
洪慈欣 Hung, Tzu-Hsin |
|---|---|
| 論文名稱: |
居家服務機器人之基於類神經網路語音辨識系統 Neural Network Based Speech Recognition System for Home Service Robot |
| 指導教授: |
李祖聖
Li, Tzuu-Hseng S. |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 英文 |
| 論文頁數: | 85 |
| 中文關鍵詞: | 類神經網路 、語音辨識 |
| 外文關鍵詞: | Neural Network, Speech Recognition |
| 相關次數: | 點閱:86 下載:4 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要在探討居家服務型機器人所使用的語音辨識系統。為了讓機器人可以和不同的人互動,本論文提出了一個結合隱藏式馬可夫模型 (HMM) 和倒傳遞類神經 (BPNN) 的方法,發展出一套語音辨識系統,其中使用了快速傅立葉估算的倒頻譜參數以及梅爾頻率倒頻譜參數,作為語音的特徵參數。選取此兩種特徵參數的好處在於參數包含有聲音音色、音調等特性。為了改善辨識的效能,運用BPNN提出了改良式語音訊練模型以及改良式辨識系統。類神經網路包含三層架構,將高斯混和模型 (GMM) 計算出來的相似機率值作為類神經網路的輸入值。除此之外,語音辨識系統中再次確認的功能可以降低錯誤辨識率。實驗結果顯示出在RoboCup@Home的競賽中機器人可以成功辨識出未知語者的命令。最後,以實驗結果來證明所設計之語音辨識系統之效能與適用性。
This thesis is mainly to confer the neural network based speech recognition system for home service robot. In order to make the home service robot be able to interaction with different people, a speech independent system is developed by using the improved method, which combines the hidden Markov model (HMM) with the back-propagation neural network (BPNN). Both cepstrum coefficients derived by the Fast Fourier Transform (FFT) and Mel-frequency cepstrum coefficients are used as the speech feature parameters. The advantage of two kinds of feature parameters is that they contain the unique characteristic of the speech signal, such as timbre, tone, and so on. In order to improve the performance of the speech recognition, both the modified model training and modified identification using BPNN method are proposed. The network contains three layers. The Gaussian mixture model (GMM) likelihood probabilities are calculated to be utilized as the input units of neural network. Moreover, in order to provide a reliable service, the function of the verification is adopted to reduce the acceptance rate of false recognition. The experimental results demostrate that the robot can successfully identify commands of an unknown person in the RoboCup@Home competition. Finally, the efficiency and feasibility of the proposed system are also verified by practical experiments.
[1]V. Jones, J. H. Jo, and J. Han, “The future of robot-assisted learning in home,” International Journal of Pedagogies and Learning, vol. 2, no. 1, pp. 63-75, 2006.
[2]R. Nisimura, T. Uchida, A. Lee, H. Saruwatari, K. Shikano, and Y. Matsumoto, “ASKA: Receptionist Robot with Speech Dialogue System,” in Proc. of IEEE/RSJ International Conference on Intelligent Robots and System, 2002, vol. 2, pp. 1314-1319.
[3]S. Huwel, B. Wrede, and G. Sagerer, “Robust speech understanding for multi-modal human-robot communication,” in Proc. of the 15th IEEE International Symposium on Robot and Human Interactive Communication, 2006, pp. 45-50.
[4]S. Yamamoto and K. Nakadai, “Real-time robot audition system that recognizes simultaneous speech in the real world,” in Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 5333-5338.
[5]Y. R. Oh, J. S. Yoon, J. H. Park, M. Kim, and H. K. Kim, IEEE, “A name recognition based call-and-come service for home robots,” IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp.247-251, 2008.
[6]K. Saeed and M. K. Nammous, “A speech-and-speaker identification system: feature extraction, description, and classification of speech-signal image,” IEEE Transactions on Industrial Electronics, vol. 54, pp. 887-897, 2007.
[7]J. M. Valin, S. Yamamoto, J. Rouat, F. Michaud, K. Nakadai, and H.G. Okuno, “Robust Recognition of Simultaneous Speech by a Mobile Robot,” IEEE Transactions on Robotics, vol. 23, no. 4, pp. 742-752, 2007.
[8]Y. Cho, D. Yook, S. Chang, and H. Kim, “Sound source localization for robot auditory system,” IEEE Transactions on Consumer Electronics, vol. 55, no. 3, pp.1663-1668, 2009.
[9]K. Kwak and S. Kim, “Sound source localization with the aid of excitation source information in home robot environments,” IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp. 852-856, 2008.
[10]V. C. Raykar, B. Yegnanarayana, S. R. Mahadeva Prasanna, and R. Duraiswami, “Speaker localization using excitation source information in speech,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 751-761, 2005.
[11]B. Yegnanarayana, S. R. Mahadeva Prasanna, R. Duraiswami, and D. Zotkin, “Processing of reverberant speech for time-delay estimation,” IEEE Transactions on speech and audio processing, vol. 13, no. 6, pp.1110-1118, N 2005.
[12]P. Mills and J. Bowles, “Fuzzy logic enhanced symmetric dynamic programming for speech recognition,” in Proc. of the fifth IEEE International Conference on Fuzzy Systems, 1996, vol. 3, pp. 2013-2019.
[13]J. Sun, F. Karray, O. Basir, and M. Kamel, “Natural language understanding through fuzzy logic inference and its application to speech recognition,” in Proc. of the 2002 IEEE International Conference on Fuzzy Systems, 2002, vol. 2, pp. 1120-1125.
[14]A. Cavallaro, F. Beritelli, and S. Casale, “A fuzzy logic-based speech detection algorithm for communications in noisy environments,” in Proc. of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, vol. 1, pp. 565-568. [15][Online]. Available: http://www.ai.rug.nl/robocupathome/.
[16][Online]. Available: http://www.altera.com/..
[17]S. H. Hsu, Design and implementation of motion control system for semi-humanoid robot arm, Master Thesis, Dept. of Electrical Engineering, National Cheng Kung Univ., Tainan, Taiwan, R.O.C., June 2008.
[18][Online]. Available: http://www.futurashop.it/pdf_eng/7300-MD03.pdf.
[19][Online]. Available: http://www.micromo.com/n128190/n.html.
[20][Online]. Available: http://www.robotis.com/zbxe/intro.
[21][Online]. Available: http://www.logitech.com/index.cfm.
[22][Online]. Available: http://www.sick.com/group/EN/home/Pages/Homepage1.aspx.
[23][Online]. Available: http://www.national.com/an/AN/AN-236.pdf
[24]X. Huang, A. Acero, and H. W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall PTR, 2001.
[25]F. Thomas Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, Prentice Hall PTR, 2002.
[26]王小川, 語音訊號處理, 台北 全華圖書 2007
[27]R. Lawrence Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of The IEEE, vol. 77, pp. 257-286, 1989.
[28]D. A. Reynold, “Speech identification and verification using Gaussian mixture speaker models,” Speech Commu., vol. 17, pp. 91-108, 1995.
[29]A. Waibel, T. Hanazawa, G. Hinton, K. Shiano, and K. Lang, “Phoneme recognition using time-delay neural network,” in IEEE Tran. on Acoust., Speech, and Signal Processing, vol. 37, pp. 328-339, 1989.
[30]E. Barnard, R. Cole, M. Fanty, and P. Vermeulen, “Real-word speech recognition with neural networks,” In Proceedings of the International Symposium on Aerospace/Defense Sensing Control and Dual-Use Photonics, 1995.
[31]S. J. Lee, K. C. Kim, H. Yoon, and J. W. Cho, “Application of fully recurrent neural networks for speech recognition,” Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 77- 80, 1991.
[32]A. Hunt, “Recurrent neural networks for syllabification,” Speech Communication, vol. 13, pp. 323-332, 1993.
[33]T. Lee, P. C. Ching, and L. W. Chan, “Recurrent neural networks for speech modeling and speech recognition,” Int. Conf. on Acoustics, Speech and Signal Processing, vol. 5, pp. 3319-3322, 1995.
[34]W. Y. Chen, Y.-F. Liao, and S.-H. Chen, “Speech recognition with hierarchical recurrent neural networks,” Pattern Recognition, vol. 28, no. 6, pp. 795-805, 1995.