簡易檢索 / 詳目顯示

研究生: 何佳霖
Ho, Chia-Lin
論文名稱: 基於超低緩衝器架構之低成本語者獨立語音辨識晶片設計
A Low Cost Chip Design for Speaker-Independent ASR System Using Ultra-Low Buffer Method
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 英文
論文頁數: 72
中文關鍵詞: 語者獨立動態時間校準語音辨識晶片設計
外文關鍵詞: speaker independent, dynamic time warping, speech recognition, chip design
相關次數: 點閱:86下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來消費者益愈重視產品的便利及娛樂性,智慧型人機介面因此扮演重要的腳色,並促進語音辨識系統相關研究及發展。為了滿足低成本需求之語者獨立語音辨識晶片設計,本研究提出了超低緩衝器架構(Ultra-Low Buffer Method)在有限的記憶體資源上實現語音特徵萃取參數運算,節省了96.48%自相關係數運算記憶體資源並提升256倍的自相關係數運算反應時間;另外,亦改良線性預測倒頻譜係數(Linear Prediction Cepstral Coefficients, LPCC)特徵萃取程序步驟並做電路最佳化,同步精簡電路面積、運算量及關鍵路徑(Critical Path)。最後,採取複雜度低準確率高的動態時間校準演算法(Dynamic Time Warping, DTW)作為辨識方法,有效實現低成本需求的晶片硬體設計。
    我們利用晶片設計製作中心(Chip Implementation Center, CIC )與台灣積體電路公司(TSMC)所提供的0.9製程梯次(TN90MSG-102A)完成本晶片實作下線(Tape-Out)。晶片面積為1.16*1.16 mm2,以48支接腳封裝,閘總數(Gate Count)約為43609,消耗功率為1.006 mW,工作頻率為10MHz,取樣頻率為4kHz。

    In this study, to achieve a low cost chip design for speaker-independent automatic speech recognition (ASR) system, a novel ultra-low buffer method is proposed to realize the feature extraction operation in limited memory resources. Total 96.48% memory requirement of autocorrelation computation can be reduced and the response time of the autocorrelation computation for one frame can be performed 256 times improvement. Besides, the improved linear prediction cepstral coefficients (LPCC) feature extraction procedure and its related optimized circuits reduce the dedicated hardware area, computational requirements, and the critical path. Finally, the low-complexity and high-accuracy dynamic time warping (DTW) classifier is adopted as the recognition part in this ASR system to efficiently implement the low cost chip design.
    This study has been taped-out in TSMC’s 90nm process via Chip Implementation Center (CIC). The chip area is 1.16*1.16 mm2, 48-pin package, gate count is 43609, and the power dissipation is 1.006 mW. The operation frequency is 10 MHz, while the Sampling rate is 4 kHz.

    中文摘要 I Abstract II 誌謝 III Content V Table List VIII Figure List IX Chapter 1 Introduction 1 1.1 Background 1 1.2 Related Works 2 1.3 Motivation 3 1.4 Research Contributions 4 1.5 Organization 4 Chapter 2 Automatic Speech Recognition System 6 2.1 System Overview 6 2.2 Preprocessing 7 2.2.1 Voice Activity Detection 7 2.2.2 Automatic Gain Control 8 2.2.3 Framing 8 2.3 Feature Extraction 9 2.3.1 Linear Predictive Coefficients 9 2.3.2 Linear Prediction Cepstral Coefficients 12 2.4 Dynamic Time Warping 13 Chapter 3 Chip Design for Automatic Speech Recognition System 16 3.1 Architecture Overview 16 3.2 Voice Activity Detection Part 17 3.3 Autocorrelation Part: The Ultra-Low Buffer Method 18 3.3.1 Conventional Double-Buffer Method for Autocorrelation Extraction 18 3.3.2 Proposed Ultra-Low Buffer Method for Autocorrelation Extraction 19 3.3.3 Comparison between Conventional and Proposed Architecture 22 3.4 Linear Prediction Cepstral Coefficients Part 23 3.5 Dynamic Time Warping Part 27 3.6 Debug Interface Part 30 3.6.1 Debug Interface Architecture 30 3.6.2 System Observing Registers 31 3.6.3 LPCC Debug Mode 32 3.6.4 DTW Debug Mode 34 Chapter 4 Tape Out and Simulation Results 36 4.1 Cell-Based Design Flow 36 4.2 Specification of the Proposed Chip Design 38 4.3 Simulation Environment 39 4.3.1 Tools Summarize and Simulation Environmental Setting 39 4.3.2 Training Database 40 4.4 Simulation Results 41 4.4.1 Simulation Waveform 41 4.4.2 Average Recognition Rate 44 4.4.3 Simulation Results Overview 44 4.5 Layout 45 4.6 Measurement Considerations 48 4.6.1 Design for Testing: Scan Chain 49 4.6.2 BIST 49 4.6.3 Pin Sharing Strategy 50 4.6.4 Chip Functional Test Illustration 52 4.7 Pin Configuration and Bonding Diagram 52 4.7.1 Pin Configuration 53 4.7.2 Bonding Diagram 54 4.8 Tape Out Procedure 55 4.8.1 Tape Out Information 56 4.8.2 CIC Review Conference Result 57 4.8.3 Chip Product 57 4.9 Testing 58 4.9.1 Testing Flow 59 4.9.2 Smart Test Overview 62 4.9.3 Testing Results 66 Chapter 5 Conclusions and Future Works 68 5.1 Conclusions 68 5.2 Future Works 69 References 70

    [1] B. Liu, “Research and implementation of the speech recognition technology based on
    DSP,” in Proc. 2nd Int. Conf. Artificial Intelligence, Management Science and
    Electronic Commerce, Zhengzhou, China, 2011, Aug. 8-10, pp. 4188-4191.
    [2] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive
    clustering techniques for selecting speaker-independent reference templates for
    isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing,
    vol. 27, no. 2, pp. 134-141, Apr. 1979.
    [3] S. Phadke, R. Limaye, S. Verma, and K. Subramanian, “On design and
    implementation of an embedded automatic speech recognition system,” in Proc. 17th
    Int. Conf. VLSI Design, Mumbai, India, 2004, Jan. 5-9, pp. 127-132.
    [4] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition
    system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian,
    China, 2010, Aug. 12-15, pp. 73-75.
    [5] C. Wan, and L. Liu, “Research and improvement on embedded system application of
    DTW-based speech recognition,” in Proc. 2nd Int. Conf. Anti-counterfeiting, Security
    and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
    [6] T. Nomura, and R. Nakatsu, “Speaker-independent isolated word recognition for
    telephone voice using phoneme-like templates,” in Proc. IEEE Int. Conf. Acoustic,
    Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2687-2690
    [7] L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a
    moderate size(54 word)vocabulary,” IEEE Trans. Acoustics, Speech, and Signal
    Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
    [8] S. Furui, “Speaker-independent isolated word recognition using dynamic features of
    speech spectrum,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no.
    1, pp. 52-59, Dec. 1986.
    [9] M. Hoshimi, M. Miyata, S. Hiraoka, and K. Niyada, “Speaker independent speech
    recognition method using training speech from a small number of speakers,” in Proc.
    IEEE Int. Conf. Acoustic, Speech, and Signal Processing, San Francisco, California,
    USA, 1992, Mar. 23-26, pp. 469-472
    [10] A. Mokeddem, H. Hugli, and F. Pellandini, “New clustering algorithms applied to
    speaker independent isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic,
    Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2691-2694
    [11] M. A. Rashwan, and M. M. Fahmy, “A new technique for speaker-independent
    isolated-word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal
    Processing, New York, USA, 1988, Apr. 11-14, pp. 195-198
    71
    [12] H. S. Hinton, and L. J. Siegel, “Speaker independent isolated word automatic speech
    recognition using computer generated phonemes,” in Proc. IEEE Int. Conf. Acoustic,
    Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 727-730
    [13] Q. Qu, and L. Li, “Realization of embedded speech recognition module based on
    STM32,” in Proc. 11th IEEE Int. Symposium on Communications and Information
    Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77
    [14] C.H. Chou, G.H. He, B.W. Chen, P.C. Lin, S.H. Chen, J.F. Wang, and T.W. Kuan,
    “Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words
    Reference Templates for Embedded Systems,“ in Proc. Hong Kong International
    Conference on Engineering and Applied Science, Marriott, Hong Kong, 2012, Dec.
    14-16.
    [15] M. Vacher, D. Istrate and J.F. Serignat, "Speech and sound analysis: an application of
    probabilistic models," In Proc. Int. Symposium on System Theory, Automation,
    Robotics, Computers, Informatics, Electronics and Instrumentation, Craiova,
    Romania, Oct. 18-20. pp. 173-178, 2007
    [16] J.F. Wang, J.C. Wang, H.C. Chen, T.L. Chen, C.C. Chang, M.C. Shih, “Chip Design
    of Portable Speech Memopad Suitable for Persons with Visual Disabilities,” IEEE
    Trans. Speech and Audio Processing, vol. 10, no. 8, pp. 644-658, Nov. 2002.
    [17] G.H. He, ”Speaker-Independent Isolated Word Recognition Based on Enganced
    Cross-Words Reference Templates for Low Cost Embedded System
    Design,“ Master’s Thesis, National Cheng Kung University, Tainan City, Taiwan,
    2012.
    [18] C.H. Peng, T.W. Kuan, P.C. Lin, B.W. Chen, J.F. Wang, and G.J. Wu, “Butterfly
    Framework of LPCC ASIC Design for Friendly HMI in Speaker Identification,” in
    Proc. IEEE Conf. Orange Technologies, Tainan, Taiwan, Mar. 2013
    [19] C. Levy, G. Linares, P. Nocera, and J. F. Bonastre, “Reducing computational and
    memory cost for cellular phone embedded speech recognition system,” in Proc. IEEE
    Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May
    17-21, pp. 309-312
    [20] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on
    8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing,
    Montreal, Canada, 2004, May 17-21, pp. 301-304
    [21] B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the use of filter bank features for
    isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal
    Processing, Boston, USA, 1983, Apr. 14-16, pp. 1061-1064
    [22] D. Guerchi, “Embedded reference memory in automatic speech recognition systems,”
    in Proc. 8th Int. Symposium on Signal Processing and it’s Applications, Sydney,
    72
    Australia, 2005, Aug. 28-31, pp. 707-710
    [23] C. H. SU, J. R. Jang, “Speech recognition on 32-bit fixed-point processors:
    implementation & discussions,” Master’s Thesis, National Tsing Hua University,
    Hsinchu City, Taiwan, 2005
    [24] Agilent93000 SmartTest User Guideline, 2009, Feb. 18.
    [25] National Chip Implementation Center, National Applied Research Laboratories.
    http://www.cic.org.tw/
    [26] Agilent Technologies, “Agilent 93000 SOC Series User Training Part 1” Dec. 2002.
    [27] Agilent Technologies, “Agilen 93000 SOC Series User Training Part1” Oct. 2004.
    [28] Agilent Technologies, “Agilent 93000 System Training ASCII Interface Training”
    May, 2001.
    [29] W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for
    DTW-based speech recognition,” in Proc. IEEE Region 10 Conf. Convergent
    Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579
    [30] L. Rabiner, and B. Juang, Fundamentals of speech recognition: Upper Saddle River,
    NJ: Prentice-hall, 1993.
    [31] S. Yuanyuan, L. Jia, and L. Runsheng, “Single-chip speech recognition system based
    on 8051 microcontroller core,” IEEE Trans. Consumer Electronics, vol. 47, no. 1, pp.
    149-153, Feb. 2001.
    [32] C. Bernal-Ruiz, F. E. Garcia-Tapias, B. Martin-del-Brio, A. Bono-Nuez, and N. J.
    Medrano-Marques, “Microcontroller implementation of a voice command recognition
    system for human-machine interface in embedded systems,” in Proc. IEEE Int. Conf.
    Emerging Technologies and Factory Automation, Catania, Italy, 2005, Sep. 19-22, pp.
    587-591
    [33] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on
    8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing,
    Montreal, Canada, 2004, May 17-21, pp. 301-304

    無法下載圖示 校內:2016-08-14公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE