| 研究生: |
何冠宏 He, Guan-Hong |
|---|---|
| 論文名稱: |
基於加強式跨字參考模板之語者獨立孤立詞語音辨識之低成本嵌入式系統設計 Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words Reference Templates for Low Cost Embedded System Design |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 參考模板 、動態時間校準 、語者獨立 、孤立詞語音辨識 |
| 外文關鍵詞: | reference templates, dynamic time warping, speaker independent, isolated word recognition |
| 相關次數: | 點閱:107 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文提出了新穎的加強式跨字參考模板,並將其應用到語者獨立孤立詞的語音辨識系統。加強式的跨字參考模板是由一群模板中所產生出來的。其主要的產生步驟有兩個,分別是動態時間校準(Dynamic Time Warping)的配對,還有算術平均。因為每個模板之間的長度都不盡相同,所以並不能直接平均。在使用平均運算之前,必須先讓每個模板的長度一致。所以我們就採用了動態時間校準來解決這個問題。在配對完成之後,就可以進行平均運算,產生一個加強式的跨字參考模板。
軟體實現的實驗結果指出本篇論文所提出的系統使用線性預測倒頻譜係數(Linear Prediction Cepstral Coefficients)作為特徵,可以在30個命令語句的環境下,可高達到98.83%的辨識率。而使用跨字參考模板以及一般的參考模板所得到的辨識率分別為97.58%和93.58%。使用加強式跨字參考模板的辨識率明顯高於使用其他兩種模板。此外,硬體實現的實驗結果指出不論是內部測試或外部測試,平均辨識率都高於90%。這樣的實驗結果證實了我們所提出之想法的有效性。
In this study, the novel enhanced cross-words reference templates (ECWRTs) are proposed and applied to speaker-independent isolated word recognition. The ECWRT is a reference template generated from a set of templates. The main procedures of ECWRT generation are DTW-matching and average operations. Due to the variance of lengths of templates, the average operations cannot perform directly. To solve this problem, dynamic time warping (DTW) is used. After DTW-matching, the matched frames of templates are averaged to form the ECWRT. The experimental results of software implementation show that the proposed system with linear prediction cepstral coefficients (LPCCs) for 30-word vocabulary can achieve an average accuracy rate of 98.83%. Such a recognition rate is higher than 97.58% and 93.58% using CWRTs and conventional reference templates, respectively. Moreover, the experimental results of hardware implementation indicate that the average recognition rates for the inside and outside test are higher than 90%. The experimental results demonstrate the effectiveness of the proposed idea.
[1] B. Liu, “Research and implementation of the speech recognition technology based on DSP,” in Proc. 2nd Int. Conf. Artificial Intelligence, Management Science and Electronic Commerce, Zhengzhou, China, 2011, Aug. 8-10, pp. 4188-4191.
[2] Q. Qu, and L. Li, “Realization of embedded speech recognition module based on STM32,” in Proc. 11th IEEE Int. Symposium on Communications and Information Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77.
[3] S. Phadke, R. Limaye, S. Verma, and K. Subramanian, “On design and implementation of an embedded automatic speech recognition system,” in Proc. 17th Int. Conf. VLSI Design, Mumbai, India, 2004, Jan. 5-9, pp. 127-132.
[4] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian, China, 2010, Aug. 12-15, pp. 73-75.
[5] C. Wan, and L. Liu, “Research and improvement on embedded system application of DTW-based speech recognition,” in Proc. 2nd Int. Conf. Anti-counterfeiting, Security and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
[6] T. Nomura, and R. Nakatsu, “Speaker-independent isolated word recognition for telephone voice using phoneme-like templates,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2687-2690
[7] L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
[8] S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52-59, Dec. 1986.
[9] M. Hoshimi, M. Miyata, S. Hiraoka, and K. Niyada, “Speaker independent speech recognition method using training speech from a small number of speakers,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, San Francisco, California, USA, 1992, Mar. 23-26, pp. 469-472
[10] A. Mokeddem, H. Hugli, and F. Pellandini, “New clustering algorithms applied to speaker independent isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2691-2694
[11] M. A. Rashwan, and M. M. Fahmy, “A new technique for speaker-independent isolated-word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, New York, USA, 1988, Apr. 11-14, pp. 195-198
[12] H. S. Hinton, and L. J. Siegel, “Speaker independent isolated word automatic speech recognition using computer generated phonemes,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 727-730
[13] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 134-141, Apr. 1979.
[14] W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for DTW-based speech recognition,” in Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579.
[15] L. R. Rabiner, “On creating reference templates for speaker independent recognition of isolated words,” IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 26, no. 1, pp. 34-42, Feb. 1978
[16] Y. Matsuura, H. Miyazawa, and T. E. Skinner, “Word recognition using a neural network and a phonetically based DTW ,” in Proc. IEEE Workshop, Neural Networks for Signal Processing, Ermioni, Greece, Sep. 6-8, pp. 329-334
[17] C. Levy, G. Linares, P. Nocera, and J. F. Bonastre, “Reducing computational and memory cost for cellular phone embedded speech recognition system,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 309-312
[18] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on 8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 301-304
[19] B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the use of filter bank features for isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 1061-1064
[20] N. S. Nehe, and R. S. Holambe, “Isolated word recognition using normalized teager energy cepstral features,” in Proc. Int. Conf. Advances in Computing, Control, and Telecommunication Technologies, Bangalore, India, 2009, Dec. 28-29, pp. 106-110
[21] D. Guerchi, “Embedded reference memory in automatic speech recognition systems,” in Proc. 8th Int. Symposium on Signal Processing and it’s Applications, Sydney, Australia, 2005, Aug. 28-31, pp. 707-710
[22] L. Rabiner, and B. Juang, Fundamentals of speech recognition: Upper Saddle River, NJ: Prentice-hall, 1993.
[23] A. N. Sloss, D. Symes, and C. Wright: ARM System Developer’s Guide: Designing and Optimizing System Software.
[24] C. H. SU, J. R. Jang, “Speech recognition on 32-bit fixed-point processors: implementation & discussions,” Master’s Thesis, National Tsing Hua University, Hsinchu City, Taiwan, 2005