| 研究生: |
何佳霖 Ho, Chia-Lin |
|---|---|
| 論文名稱: |
基於超低緩衝器架構之低成本語者獨立語音辨識晶片設計 A Low Cost Chip Design for Speaker-Independent ASR System Using Ultra-Low Buffer Method |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 72 |
| 中文關鍵詞: | 語者獨立 、動態時間校準 、語音辨識 、晶片設計 |
| 外文關鍵詞: | speaker independent, dynamic time warping, speech recognition, chip design |
| 相關次數: | 點閱:86 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來消費者益愈重視產品的便利及娛樂性,智慧型人機介面因此扮演重要的腳色,並促進語音辨識系統相關研究及發展。為了滿足低成本需求之語者獨立語音辨識晶片設計,本研究提出了超低緩衝器架構(Ultra-Low Buffer Method)在有限的記憶體資源上實現語音特徵萃取參數運算,節省了96.48%自相關係數運算記憶體資源並提升256倍的自相關係數運算反應時間;另外,亦改良線性預測倒頻譜係數(Linear Prediction Cepstral Coefficients, LPCC)特徵萃取程序步驟並做電路最佳化,同步精簡電路面積、運算量及關鍵路徑(Critical Path)。最後,採取複雜度低準確率高的動態時間校準演算法(Dynamic Time Warping, DTW)作為辨識方法,有效實現低成本需求的晶片硬體設計。
我們利用晶片設計製作中心(Chip Implementation Center, CIC )與台灣積體電路公司(TSMC)所提供的0.9製程梯次(TN90MSG-102A)完成本晶片實作下線(Tape-Out)。晶片面積為1.16*1.16 mm2,以48支接腳封裝,閘總數(Gate Count)約為43609,消耗功率為1.006 mW,工作頻率為10MHz,取樣頻率為4kHz。
In this study, to achieve a low cost chip design for speaker-independent automatic speech recognition (ASR) system, a novel ultra-low buffer method is proposed to realize the feature extraction operation in limited memory resources. Total 96.48% memory requirement of autocorrelation computation can be reduced and the response time of the autocorrelation computation for one frame can be performed 256 times improvement. Besides, the improved linear prediction cepstral coefficients (LPCC) feature extraction procedure and its related optimized circuits reduce the dedicated hardware area, computational requirements, and the critical path. Finally, the low-complexity and high-accuracy dynamic time warping (DTW) classifier is adopted as the recognition part in this ASR system to efficiently implement the low cost chip design.
This study has been taped-out in TSMC’s 90nm process via Chip Implementation Center (CIC). The chip area is 1.16*1.16 mm2, 48-pin package, gate count is 43609, and the power dissipation is 1.006 mW. The operation frequency is 10 MHz, while the Sampling rate is 4 kHz.
[1] B. Liu, “Research and implementation of the speech recognition technology based on
DSP,” in Proc. 2nd Int. Conf. Artificial Intelligence, Management Science and
Electronic Commerce, Zhengzhou, China, 2011, Aug. 8-10, pp. 4188-4191.
[2] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive
clustering techniques for selecting speaker-independent reference templates for
isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing,
vol. 27, no. 2, pp. 134-141, Apr. 1979.
[3] S. Phadke, R. Limaye, S. Verma, and K. Subramanian, “On design and
implementation of an embedded automatic speech recognition system,” in Proc. 17th
Int. Conf. VLSI Design, Mumbai, India, 2004, Jan. 5-9, pp. 127-132.
[4] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition
system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian,
China, 2010, Aug. 12-15, pp. 73-75.
[5] C. Wan, and L. Liu, “Research and improvement on embedded system application of
DTW-based speech recognition,” in Proc. 2nd Int. Conf. Anti-counterfeiting, Security
and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
[6] T. Nomura, and R. Nakatsu, “Speaker-independent isolated word recognition for
telephone voice using phoneme-like templates,” in Proc. IEEE Int. Conf. Acoustic,
Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2687-2690
[7] L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a
moderate size(54 word)vocabulary,” IEEE Trans. Acoustics, Speech, and Signal
Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
[8] S. Furui, “Speaker-independent isolated word recognition using dynamic features of
speech spectrum,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no.
1, pp. 52-59, Dec. 1986.
[9] M. Hoshimi, M. Miyata, S. Hiraoka, and K. Niyada, “Speaker independent speech
recognition method using training speech from a small number of speakers,” in Proc.
IEEE Int. Conf. Acoustic, Speech, and Signal Processing, San Francisco, California,
USA, 1992, Mar. 23-26, pp. 469-472
[10] A. Mokeddem, H. Hugli, and F. Pellandini, “New clustering algorithms applied to
speaker independent isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic,
Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2691-2694
[11] M. A. Rashwan, and M. M. Fahmy, “A new technique for speaker-independent
isolated-word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal
Processing, New York, USA, 1988, Apr. 11-14, pp. 195-198
71
[12] H. S. Hinton, and L. J. Siegel, “Speaker independent isolated word automatic speech
recognition using computer generated phonemes,” in Proc. IEEE Int. Conf. Acoustic,
Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 727-730
[13] Q. Qu, and L. Li, “Realization of embedded speech recognition module based on
STM32,” in Proc. 11th IEEE Int. Symposium on Communications and Information
Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77
[14] C.H. Chou, G.H. He, B.W. Chen, P.C. Lin, S.H. Chen, J.F. Wang, and T.W. Kuan,
“Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words
Reference Templates for Embedded Systems,“ in Proc. Hong Kong International
Conference on Engineering and Applied Science, Marriott, Hong Kong, 2012, Dec.
14-16.
[15] M. Vacher, D. Istrate and J.F. Serignat, "Speech and sound analysis: an application of
probabilistic models," In Proc. Int. Symposium on System Theory, Automation,
Robotics, Computers, Informatics, Electronics and Instrumentation, Craiova,
Romania, Oct. 18-20. pp. 173-178, 2007
[16] J.F. Wang, J.C. Wang, H.C. Chen, T.L. Chen, C.C. Chang, M.C. Shih, “Chip Design
of Portable Speech Memopad Suitable for Persons with Visual Disabilities,” IEEE
Trans. Speech and Audio Processing, vol. 10, no. 8, pp. 644-658, Nov. 2002.
[17] G.H. He, ”Speaker-Independent Isolated Word Recognition Based on Enganced
Cross-Words Reference Templates for Low Cost Embedded System
Design,“ Master’s Thesis, National Cheng Kung University, Tainan City, Taiwan,
2012.
[18] C.H. Peng, T.W. Kuan, P.C. Lin, B.W. Chen, J.F. Wang, and G.J. Wu, “Butterfly
Framework of LPCC ASIC Design for Friendly HMI in Speaker Identification,” in
Proc. IEEE Conf. Orange Technologies, Tainan, Taiwan, Mar. 2013
[19] C. Levy, G. Linares, P. Nocera, and J. F. Bonastre, “Reducing computational and
memory cost for cellular phone embedded speech recognition system,” in Proc. IEEE
Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May
17-21, pp. 309-312
[20] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on
8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing,
Montreal, Canada, 2004, May 17-21, pp. 301-304
[21] B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the use of filter bank features for
isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal
Processing, Boston, USA, 1983, Apr. 14-16, pp. 1061-1064
[22] D. Guerchi, “Embedded reference memory in automatic speech recognition systems,”
in Proc. 8th Int. Symposium on Signal Processing and it’s Applications, Sydney,
72
Australia, 2005, Aug. 28-31, pp. 707-710
[23] C. H. SU, J. R. Jang, “Speech recognition on 32-bit fixed-point processors:
implementation & discussions,” Master’s Thesis, National Tsing Hua University,
Hsinchu City, Taiwan, 2005
[24] Agilent93000 SmartTest User Guideline, 2009, Feb. 18.
[25] National Chip Implementation Center, National Applied Research Laboratories.
http://www.cic.org.tw/
[26] Agilent Technologies, “Agilent 93000 SOC Series User Training Part 1” Dec. 2002.
[27] Agilent Technologies, “Agilen 93000 SOC Series User Training Part1” Oct. 2004.
[28] Agilent Technologies, “Agilent 93000 System Training ASCII Interface Training”
May, 2001.
[29] W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for
DTW-based speech recognition,” in Proc. IEEE Region 10 Conf. Convergent
Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579
[30] L. Rabiner, and B. Juang, Fundamentals of speech recognition: Upper Saddle River,
NJ: Prentice-hall, 1993.
[31] S. Yuanyuan, L. Jia, and L. Runsheng, “Single-chip speech recognition system based
on 8051 microcontroller core,” IEEE Trans. Consumer Electronics, vol. 47, no. 1, pp.
149-153, Feb. 2001.
[32] C. Bernal-Ruiz, F. E. Garcia-Tapias, B. Martin-del-Brio, A. Bono-Nuez, and N. J.
Medrano-Marques, “Microcontroller implementation of a voice command recognition
system for human-machine interface in embedded systems,” in Proc. IEEE Int. Conf.
Emerging Technologies and Factory Automation, Catania, Italy, 2005, Sep. 19-22, pp.
587-591
[33] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on
8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing,
Montreal, Canada, 2004, May 17-21, pp. 301-304
校內:2016-08-14公開