| 研究生: |
吳國吉 Wu, Guo-Ji |
|---|---|
| 論文名稱: |
具二元對分分裂法之低成本語者及語音辨識系統晶片設計 Low Cost Chip Design for Automatic Speaker and Speech Recognition System Using Binary Halved Clustering Method |
| 指導教授: |
王駿發
Wang, Jhing-Fa |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
| 論文出版年: | 2014 |
| 畢業學年度: | 102 |
| 語文別: | 英文 |
| 論文頁數: | 66 |
| 中文關鍵詞: | 語者辨識 、動態時間校準 、語音辨識 、晶片設計 |
| 外文關鍵詞: | speaker recognition, dynamic time warping, speech recognition, chip design |
| 相關次數: | 點閱:105 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文提出一個低成本及可快速訓練的自動語者語音辨識系統晶片設計,利用低成本及可快速訓練的特性使得自動語者語音辨識系統在現實應用中是具效益且可負擔的。本設計分為四個部分,分別為:特徵擷取模組、語者模型訓練模組、語者辨識模組以及語音辨識模組。
特徵擷取模組採用線性預估倒頻譜係數(Linear Prediction Cepstal Coefficients)作為語者發聲之特徵。語音辨識部分使用動態時間校準來判別目標語音。語者模型訓練部分則提出二元對分分類法 (Binary Halved Clustering Method) 產生語者模型,利用二元對分分裂的規律性降低運算複雜度,進而節省52%晶片面積,降低68%反應時間,並達到90%的辨識率,有效實現低成本需求的晶片硬體設計。
我們利用晶片設計製作中心(Chip Implementation Center, CIC )與台灣積體電路公司(TSMC)所提供的90奈米製程梯次(TN90GUTM-103B)完成本晶片實作下線(Tape-Out)。晶片面積為1.47*1.47 mm2,以84支接腳封裝,閘總數(Gate Count)約為395000,消耗功率為8.74 mW,工作頻率為50MHz,取樣頻率為16kHz。
This study proposed a low-cost and fast-trainable chip design for automatic speaker-speech recognition (ASSR) system. There are four parts of this proposed system, which is including: feature extraction module, speaker model training module, speaker recognition module, and speech recognition module.
LPCC (Linear Predictive Cepstral Coefficients) is adopted into the proposed feature extraction module. The speech recognition uses dynamic time warping (DTW) to classify the target speech. The novel binary halved clustering (BHC) method uses binary-halved splitting to generate speaker models for low complexity requirement. Compared with the conventional works, simulation results indicate that the proposed hardware accelerator achieves 52% less cost, 68% less responding time, an ASSR accuracy of 90%. This ASSR system to efficiently implement the low cost chip design.
This study has been taped-out in TSMC’s 90nm process. The chip area is 1.47*1.47 mm2, 84-pin package, gate count is 395K, and the power dissipation is 8.74 mW. The operation frequency is 50 MHz, while the Sampling rate is 16 kHz.
[1] J.-F. Wang, J.-S. Peng, J.-C. Wang, P.-C. Lin, and T.-W. Kuan, “Hardware/software co-design for fast-trainable speaker identification system based on SMO,” IEEE Int. Conf. Systems, Man, and Cybernetics, Anchorage, AK, 2011, Oct. 9–12, pp. 1621–1625.
[2] O. A. Bapat, R. M. Fastow and J. Olson, “Acoustic coprocessor for HMM based embedded speech recognition systems,” IEEE Trans. Consumer Electronics, vol. 59, no. 3, pp. 629-633, Aug. 2013.
[3] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian, China, 2010, Aug. 12-15, pp. 73-75.
[4] C. Wan, and L. Liu, “Research and improvement on embedded system application of DTW-based speech recognition,” in Proc. 2nd Int. Conf. Anti-counterfeiting, Security and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
[5] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 134-141, Apr. 1979.
[6] W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for DTW-based speech recognition,” in Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579.
[7] J.-F. Wang, T.-W. Kuan, J.-C. Wang, and T.-W. Sun, “Dynamic fixed-point arithmetic design of embedded SVM-based speaker identification system,” in Computer Science, 2010, pp. 524–531.
[8] C. Zhang, X. Wu, T.F. Zheng, L. Wang, and C. Yin, “A K-phoneme-class based multi-model method for short utterance speaker recognition,” Asia-Pacific Signal & Information Processing Association Annual Summit and Conference, 2012.
[9] T.-W. Chen and S.-Y. Chien, “Bandwidth Adaptive Hardware Architecture of K-Means Clustering for Video Analysis,” IEEE Transactions Very Large Scale Integration (VLSI) Systems, vol. 18, no. 6, June 2010.
[10] T.-W. Chen and S.-Y. Chien, “Flexible Hardware Architecture of Hierarchical K-Means Clustering for Large Cluster Number,” IEEE Transactions Very Large Scale Integration (VLSI) Systems, Vol. 19, No. 8, Aug. 2011.
[11] T.-W. Kuan, J.-F. Wang, J.-C. Wang, P.-C. Lin, and G.-H. Gu, “VLSI design of an SVM learning core on sequential minimal optimization algorithm,” IEEE Trans. Very Large Scale Integration Systems, vol. 20, no. 4, pp. 673–683, Apr. 2012.
[12] L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
[13] J.F. Wang, J.C. Wang, H.C. Chen, T.L. Chen, C.C. Chang, M.C. Shih, “Chip Design of Portable Speech Memopad Suitable for Persons with Visual Disabilities,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 8, pp. 644-658, Nov. 2002.
[14] G.H. He, ”Speaker-Independent Isolated Word Recognition Based on Enganced Cross-Words Reference Templates for Low Cost Embedded System Design,“ Master’s Thesis, National Cheng Kung University, Tainan City, Taiwan, 2012.
[15] C.H. Peng, T.W. Kuan, P.C. Lin, B.W. Chen, J.F. Wang, and G.J. Wu, “Butterfly Framework of LPCC ASIC Design for Friendly HMI in Speaker Identification,” in Proc. IEEE Conf. Orange Technologies, Tainan, Taiwan, Mar. 2013
[16] Gin-Der Wu and Kuei-Ting Kuo, “System-on-chip architecture for speech recognition,” Journal of Information Science and Engineering 26, 1073-1089, 2010.
[17] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian, China, 2010, Aug. 12-15, pp. 73-75.
[18] Q. Qu, and L. Li, “Realization of embedded speech recognition module based on STM32,” in Proc. 11th IEEE Int. Symposium on Communications and Information Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77
[19] C.H. Chou, G.H. He, B.W. Chen, P.C. Lin, S.H. Chen, J.F. Wang, and T.W. Kuan, “Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words Reference Templates for Embedded Systems,“ in Proc. Hong Kong International Conference on Engineering and Applied Science, Marriott, Hong Kong, 2012, Dec. 14-16.
[20] M. Vacher, D. Istrate and J.F. Serignat, "Speech and sound analysis: an application of probabilistic models," In Proc. Int. Symposium on System Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation, Craiova, Romania, Oct. 18-20. pp. 173-178, 2007
[21] N. S. Shih, “A Reconfigurable Hardware Design for SMO to Improve Speaker Training Efficiency and Memory Reduction,” Master thesis, NCKU 2012
[22] J. L. Ho, “A Low Cost Chip Design for Speaker Independent ASR System Using Ultra-Low Buffer Method,” Master thesis, NCKU 2013
[23] National Chip Implementation Center, National Applied Research Laboratories. http://www.cic.org.tw/
校內:2019-08-29公開