簡易檢索 / 詳目顯示

研究生: 吳奕仲
Wu, Yi-Jhong
論文名稱: 基於相關度權重分析之高辨識率孤立詞辨識具雙級未知詞偵測系統
A High Accuracy Isolated Word Recognition System with Two-Stage Out-of-Vocabulary Detection Based on Correlational Weight Analysis
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 59
中文關鍵詞: 語者獨立動態時間校準語音辨識嵌入式系統設計
外文關鍵詞: speaker independent, dynamic time warping, speech recognition, embedded system design
相關次數: 點閱:121下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文根據跨字參考模板,提出了新穎且更具強健性的模板CWCWRT應用於具未知詞拒絕功能之自動語音辨識系統,提出的模板訓練方式利用了相關性權重分析以加強模板對說話者變化的強健性。相關性權重主要解決兩個問題:訓練階段離群語音的排除與抹除不正常的發音方式所產生的語音特徵。在此篇論文提出兩大方法來解決上述問題: 相關性分析,以及權重調整。在未知詞拒絕功能中,本篇論文使用雙級未知詞偵測用來拒絕未知詞。提出的演算法主要概念是採用兩階段的未知詞:計數階段, 估算輸出分數的分佈和門檻階段,計算輸入特徵向量的信心分數。辨識率的實驗分別在兩種平台上測試,軟體和GPCE063A嵌入式平台,皆包含內部測試和外部測試。實驗結果顯示平均內部測試識別率是98.2%於軟體模擬,93.97%於嵌入式平台。外部測試結果在兩個平台分別為95.48%和90.65%。比較的基準包括交叉詞參考模板(CWRT)和ECWRT,實驗結果顯示CWCWRT具有更高的強健性和正確率。實驗顯示其拒絕未知詞的效果為79%。

    This paper proposes a robust template based on the previously proposed ECWRT (enhanced cross word reference template) for template-based ASR with Out-of-Vocabulary (OOV) detection ability, by using correlational weight adjusting method to improve robustness against speech variation named CWCWRT. This work addresses two vital issues: such as outlier rejection in training set and elimination of speech feature coefficients which usually unwanted utterances. Consequently, two main steps are investigated in this paper, firstly, correlational analyzing, and secondly, weight adjusting. OOV detection here is using the two-stage OOV detection to reject. The main idea of proposed algorithm is divided into two stages: count stage, which is to estimation the distribution of output score, and threshold stage, evaluate the confidence score of input feature vector. Two types of platforms including PC and GPCE063A embedded platform are conducted, both inside test and outside test are also applied. The results show that the average recognition rate for inside test is 98.20% in PC simulation and 93.97% in the embedded platform. The outside test results are 95.48% and 90.65% in two platforms respectively. The related and previous works including cross word reference template (CWRT) and ECWRT are also compared the comparison exhibit that the proposed CWCWRT gives higher robustness and accuracy than two baselines. OOV rejection ratio of OOV is 79%.

    中文摘要 I Abstract II 誌謝 IV Content V Table List VII Figure List VIII Chapter 1 Introduction 1 1.1 Background 1 1.2 Related Works 2 1.3 Motivation 4 1.4 Objectives 4 1.5 Organization 5 Chapter 2 Automatic Speech Recognition System Overview 6 2.1 System Overview 6 2.2 Preprocessing 7 2.2.1 Voice Activity Detection 7 2.2.2 Automatic Gain Control 8 2.2.3 Framing 9 2.3 LPCC Feature Extraction 9 2.3.1 Linear Predictive Coefficients 10 2.3.2 Linear Prediction Cepstral Coefficients 13 2.4 Previous Works of Reference Templates 14 2.4.1 CWRT (cross word reference template) 14 2.4.2 ECWRT (enhanced cross word reference template) 16 2.5 Dynamic Time Warping 16 2.6 Out-of-Vocabulary Detection 19 Chapter 3 CW-CWRT: Correlational Weight Cross-Words Reference Template and Two-Stage OOV Detection 21 3.1 Overview of Templates Training procedure 21 3.2 Correlational Weight Analysis 22 3.2.1 Correlational Analyzing 27 3.2.2 Weight Adjusting 31 3.3 Two-Stage OOV Detection 35 3.3.1 Count Stage 36 3.3.2 Threshold Stage 39 Chapter 4 Experimental Results 42 4.1 Introduction to Training Database 42 4.2 Experimental Results of Software Implementation 44 4.3 Experimental Results of Speech Recognition Robustness 48 4.4 Experimental Results of Hardware Implementation 49 4.4.1 Performance Evaluation of Proposed System on GPCE063A 49 4.4.2 Average Recognition Rates 51 4.5 Experimental Results of OOV Detection 53 4.5.1 Performance Evaluation of Proposed System on Software 53 4.5.2 Performance Evaluation of Proposed System on GPCE063A 54 Chapter 5 Conclusions and Future Works 55 5.1 Conclusions 55 5.2 Future Works 56 References 57

    [1] B. Liu, “Research and implementation of the speech recognition technology based on DSP,” in Proc. 2nd Int. Conf. Artificial Intelligence, Management Science and Electronic Commerce, Zhengzhou, China, 2011, Aug. 8-10, pp. 4188-4191.
    [2] Q. Qu, and L. Li, “Realization of embedded speech recognition module based on STM32,” in Proc. 11th IEEE Int. Symposium on Communications and Information Technologies, Hangzhou, China, 2011, Oct. 12-14, pp. 73-77.
    [3] S. Phadke, R. Limaye, S. Verma, and K. Subramanian, “On design and implementation of an embedded automatic speech recognition system,” in Proc. 17th Int. Conf. VLSI Design, Mumbai, India, 2004, Jan. 5-9, pp. 127-132.
    [4] J. Zhang, “Research of improved DTW algorithm in embedded speech recognition system,” in Proc. Int. Conf. Intelligent Control and Information Processing, Dalian, China, 2010, Aug. 12-15, pp. 73-75.
    [5] C. Wan, and L. Liu, “Research and improvement on embedded system application of DTW-based speech recognition,” in Proc. 2nd Int. Conf. Anti-counterfeiting, Security and Identification, Guiyang, China, 2008, Aug. 20-23, pp. 401-404.
    [6] T. Nomura, and R. Nakatsu, “Speaker-independent isolated word recognition for telephone voice using phoneme-like templates,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2687-2690
    [7] L. R. Rabiner, and J. G. Wilpon, “Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 6, pp. 583-587, Dec. 1979.
    [8] S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 34, no. 1, pp. 52-59, Dec. 1986.
    [9] M. Hoshimi, M. Miyata, S. Hiraoka, and K. Niyada, “Speaker independent speech recognition method using training speech from a small number of speakers,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, San Francisco, California, USA, 1992, Mar. 23-26, pp. 469-472
    [10] A. Mokeddem, H. Hugli, and F. Pellandini, “New clustering algorithms applied to speaker independent isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Tokyo, Japan, 1986, Apr. 7-11, pp. 2691-2694
    [11] M. A. Rashwan, and M. M. Fahmy, “A new technique for speaker-independent isolated-word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, New York, USA, 1988, Apr. 11-14, pp. 195-198

    [12] H. S. Hinton, and L. J. Siegel, “Speaker independent isolated word automatic speech recognition using computer generated phonemes,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 727-730
    [13] S. E. Levinson, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, “Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 134-141, Apr. 1979.
    [14] W. H. Abdulla, D. Chow, and G. Sin, “Cross-words reference template for DTW-based speech recognition,” in Proc. IEEE Region 10 Conf. Convergent Technologies for the Asia-Pacific, Bangalore, India, 2003, Oct. 15-17, pp. 1576-1579.
    [15] L. R. Rabiner, “On creating reference templates for speaker independent recognition of isolated words,” IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 26, no. 1, pp. 34-42, Feb. 1978
    [16] C.H. Chou, G.H. He, B.W. Chen, P.C. Lin, S.H. Chen, J.F. Wang, and T.W. Kuan, “Speaker-Independent Isolated Word Recognition Based on Enhanced Cross-Words Reference Templates for Embedded Systems,“ in Proc. Hong Kong International Conference on Engineering and Applied Science, Marriott, Hong Kong, 2012, Dec. 14-16.
    [17] Y. Matsuura, H. Miyazawa, and T. E. Skinner, “Word recognition using a neural network and a phonetically based DTW ,” in Proc. IEEE Workshop, Neural Networks for Signal Processing, Ermioni, Greece, Sep. 6-8, pp. 329-334
    [18] C. Levy, G. Linares, P. Nocera, and J. F. Bonastre, “Reducing computational and memory cost for cellular phone embedded speech recognition system,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 309-312
    [19] D. Wang, L. Zhang, J. Liu, and R. Liu, “Embedded speech recognition system on 8-bit MCU core,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Montreal, Canada, 2004, May 17-21, pp. 301-304
    [20] B. A. Dautrich, L. R. Rabiner, and T. B. Martin, “On the use of filter bank features for isolated word recognition,” in Proc. IEEE Int. Conf. Acoustic, Speech, and Signal Processing, Boston, USA, 1983, Apr. 14-16, pp. 1061-1064
    [21] N. S. Nehe, and R. S. Holambe, “Isolated word recognition using normalized teager energy cepstral features,” in Proc. Int. Conf. Advances in Computing, Control, and Telecommunication Technologies, Bangalore, India, 2009, Dec. 28-29, pp. 106-110
    [22] L. Rabiner, and B. Juang, Fundamentals of speech recognition: Upper Saddle River, NJ: Prentice-hall, 1993.
    [23] A. N. Sloss, D. Symes, and C. Wright: ARM System Developer’s Guide: Designing and Optimizing System Software.
    [24] C. H. SU, J. R. Jang, “Speech recognition on 32-bit fixed-point processors: implementation & discussions,” Master’s Thesis, National Tsing Hua University, Hsinchu City, Taiwan, 2005
    [25] Pearce, D. and Hirsch, H.: 'The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions', Proc. ISCA ITRW ASR2000, Paris, France, Sep. 2000, pp. 29–32
    [26] M. Padmanabhan, L. R. Bahl, D. Nahamoo, and M. A. Picheny,“Speaker clustering and transformation for speaker adaptation in speech recognition systems,” IEEE Trans. Speech Audio Process., vol.6, no. 1, Jan. 1998, pp. 71–77.
    [27] Woojay Jeon; Changxue Ma; Macho, D., "Statistical Utterance Comparison for Speaker Clustering Using Factor Analysis,"IEEE Trans. Audio, Speech, and Language Processing, vol.20, no.9, Nov. 2012,pp.2482–2491.
    [28] Muscariello, A.; Gravier, G.; Bimbot, F., "Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination,"Audio, Speech, and Language Processing, IEEE Transactions on, vol.20, no.7, pp.2031,2044, Sept. 2012
    [29] D. J. Berndt and J. Clifford, "Using dynamic time warping to find patterns in time series," in KDD Workshop, Seattle, WA, 1994, pp. 359–370.
    [30] Myoung-Wan Koo; Chin-Hui Lee; Biing-Hwang Juang, "Speech recognition and utterance verification based on a generalized confidence score,"Speech and Audio Processing, IEEE Transactions on, vol.9, no.8, pp.821,832, Nov 2001
    [31] Tie Cai; Jie Zhu, "OOV rejection algorithm based on class-fusion support vector machine for speech recognition," Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on , vol.6, no., pp.3695,3699 vol.6, 26-29 Aug. 2004
    [32] Ki-young Park, Soo-young Lee, “Out-of-Vocabulary Rejection based on Selective Attention Model” Neural Processing Letters, 08/2000

    下載圖示 校內:2019-08-29公開
    校外:2019-08-29公開
    QR CODE