簡易檢索 / 詳目顯示

研究生: 陳宥任
Chen, You-Zen
論文名稱: 適合實際應用的多語音模型之中英語音辨識系統設計與實現
Design and Implementation of Robust Mandarin and English Speech Recognition System with Multiple Acoustic Models for Real World Application
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 47
中文關鍵詞: 語音偵測動態時軸較正HMM確認機制喚醒機制
外文關鍵詞: VAD, DTW, HMM, wakeup mechanism, confirmation mechanism
相關次數: 點閱:70下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了讓人們在使用科技產物時能更加的貼近人們最自然的使用方式,語音訊號的科學與技術研究總是不斷的被研發與改進。在語音訊號的處理中被廣為使用的一部分就是語音辨識技術,但是即使到了現今,仍然有許多課題需要被改進:像是語音辨識的強健性、多語音的問題和實用性的問題。在此篇論文中,我們基於目前的語音處理技術提出一些改進的方法來改善語音辨識的實用性並加以實現,希望能為人們帶來更便利的生活。
    在現今的語音辨識應用中,諸如語音玩具和語音控制,最主要的任務是能收集品質好的語音資料庫來訓練出穩健且高效率的語音模型,藉此達到任務導向的應用。然而,現實生活上存在著許多因素影響著語音辨識的辨識率,使它無法達到極高的辨識率,特別是在充滿各種噪音的環境中,幾乎讓目前的語音辨識技術不堪實際使用。
    我們在此篇論文中將從技術和系統架構兩個方面來增進語音辨識的穩健與實用性;在技術上,我們提出基於聲音的諧振資訊所改進的語音偵測技術、基於MFCC 與LPCC的具鑑別性之動態時間軸較正 (Dynamic Time Warping)技術、中文和英文的HMM語音模型訓練;在系統上,我們提出並設計穩健的系統喚醒與確認機制,讓使用者能在要輸入指令前能喚醒語音辨識系統,並在語音辨識系統給予辨識結果後能讓使用者確認或修正結果,以此來達到整體語音系統的正確性與穩健性。
    在我們的實驗結果中可以看出,我們所提出的語音辨識系統可以讓系統的任務達成率明顯提升,使語音辨識在我們生活中的實用與可行性提高而可以依賴。

    Technology always comes from human nature. The growing popularity of speech recognition applicants in living still has great room for improvement. How to improve the speech recognition technology and bring more convenience for people is our continuously target.
    Currently, the accuracy of speech recognition is very hard to achieve acceptable level for user to use this technology as regular input device; when we move speech recognition systems from laboratory demonstrations to real-world applications, even the best ASR systems available today still encounter some serious difficulties which can make the whole system unpractical in real life. Take the voice dialing system for example; if the system fails to correctly receive your speech dialing command, it may make a mistake of dialing to a wrong person and is time-wasting and annoying for user.
    We think that the speech recognition system in real life application can be improved in two aspects: technicality and system architecture. In technicality, in order to improve the speech recognition, we present a real-time Voice Activity Detection (VAD) algorithm using harmonic information, discriminative dynamic time warping (DTW) method using MFCC and LPCC; and multiple acoustic models training for HMM-based Mandarin, English using HTK; in system architecture, in order to improve the task achievement rate of the system, we present mechanisms for system robustness such as wakeup mechanism for user to wake up the system, confirmation mechanism for user to confirm and revise the speech command.
    The experimental results indicate that the proposed system has good task achievement rate which make the speech recognition system practical and reliable in some real life applications.

    CHAPTER 1 INTRODUCTION 1 1.1BACKGROUND AND MOTIVATION 1 1.2 STRATEGY AND FRAMEWORK OF THE PROPOSED SYSTEM 2 1.3 THESIS ORGANIZATION 4 CHAPTER 2 RELATED WORKS 5 2.1VOICE ACTIVITY DETECTION FOR ROBUST SPEECH RECOGNITION 5 2.2MULTILINGUAL SPEECH RECOGNITION SYSTEM 6 2.3 DTW FOR SPEAKER DEPENDENT SPEECH RECOGNITION 9 CHAPTER 3 IMPROVED VAD USING HARMONIC INFORMATION FOR ROBUST SPEECH RECOGNITION 12 3.1 THE PROPOSED VAD ALGORITHM 12 3.2 FEATURE EXTRACTION USING FREQUENCY ANALYSIS 14 CHAPTER 4 ROBUST MANDARIN-ENGLISH SPEECH RECOGNITION SYSTEM FOR REAL-WORLD APPLICATIONS 19 4.1ACOUSTIC MODELS TRAINING WITH MODIFIED PARAMETERS FOR MANDARIN AND ENGLISH 20 4.1.1 The Technology of Speech Recognition ─ Hidden Markov Model (HMM) 20 4.1.2 Training Speaker Independent Acoustic Model via HTK Tool 24 4.2 CONFIDENCE MEASURE USING N-BEST FOR HMM-BASED SPEECH RECOGNITION 28 4.3 DISCRIMINATIVE DTW USING MFCC AND LPCC FOR SPEAKER DEPENDENT SPEECH RECOGNITION 30 4.3.1 Principal of Dynamic Time Warping Algorithm 30 4.3.2 Improvement of DTW Algorithm 32 4.4 THE MECHANISM OF WAKEUP, CONFIRMATION AND SYSTEM INTERACTIVE MODULES 33 CHAPTER 5 EXPERIMENTS 36 5.1 SIMULATION OF THE PROPOSED SYSTEM 36 5.2VAD FOR ROBUST SPEECH RECOGNITION 37 5.3WAKEUP MODULE AND CONFIRMATION MODULE 40 5.4MANDARIN AND ENGLISH SPEECH RECOGNITION 43 CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 45 REFERENCES 46

    [1] M. H. Moattar, M. M. Homayounpour, Nima Khademi Kalantari, “A new approach for robust realtime voice activity detection using spectral pattern”, Acoustics Speech and Signal Processing (ICASSP), March 14-19, 2010.
    [2] Lee Ngee Tan, Bengt J. Borgstrom, Abeer Alwan,“Voice activity detection using harmonic frequency components in likelihood ration test,”ICASSP, March 14-19, 2010.
    [3] Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura,“Long-term spectro-temporal and static harmonic features for voice activity detection,”IEEE signal processing society, Oct. 2010.
    [4] Kentaro Ishizuka, Tomohiro Nakatani,“Study of noise robust voice activity detection based on periodic component to aperiodic component ratio,”2009.
    [5] M. H. Moattar and M. M. Homayounpour, “A Simple but Efficient Real-Time Voice Activity Detection Algorithm,” Eusipco 2009, Glasgow, Scotland, pp. 2549-2553, 2009.
    [6] I.C Yoo and D. Yook, “Robust Voice Activity Detection Using the Spectrals of Vowel Sounds,” ETRI Journal, Volume 31, Number 4, pp. 451-453, August 2009.
    [7] Zhang Jing and Zhang Min, “Speech Recognition System Based Improved DTW Algorithm,” CMCE, Volume 5, Aug. 2010.
    [8] Zaharia, T. ; Segarceanu, S. ; Cotescu, M. ; Spataru, A. ,“Quantized Dynamic Time Warping (DTW) algorithm,” COMM, June 2010.
    [9] Zhou Dexiang ; Zhang Jixin ,“The Improvements of DTW Algorithm in Speech Recognition,” ITAPP, Aug. 2010.
    [10] Yan-Sheng Lin; Chang-Peng Ji, “Research on Improved Algorithm of DTW in Speech Recognition,” ICCASM, volume 9, Oct. 2010.
    [11] Zhang Yuxin; Miyanaga, Y.; Siriteanu, C. ,“New Robust Speech Recognition Using DTW in Noise,” ISCIT, Oct. 2010.
    [12] Dempster, A., Laird, N., and Rubin, D. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society, Series B, 39(1):1–38. ,1977.
    [13] Rabiner L. Fundamentals of Speech Recognition. PTR Prentice-Hall Inc., New Jersey,1993.
    [14] C.-H. Wu, Y.-H. Chiu, C.-J. Shia, and C.-Y. Lin, “Automatic Segmentation and Identification of Mixed-language Speech Using Delta-BIC and LSA-Based GMMs,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 266-276, 2006.
    [15] A. Waibel, H. Soltau, T. Schultz, T. Schaaf, and F. Metze,“ Multilingual Speech Recognition”, Verbmobil: Foundations of Speech-to-Speech Translation, pp. 452-465, Springer, 2000.
    [16] J. Kohler, “Multilingual Phone Models for Vocabulary-Independent Speech Recognition Tasks,” Speech Comm., vol. 35, nos. 1-2, pp. 21-30, 2001.
    [17] Chien-Lin Huang, Chung-Hsien Wu, “Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and contextual Analysis,” IEEE TRANSACTION ON COMPUTERS, vol.56, no.9, September 2007.
    [18] Hui Jiang, “Confidence measure for speech recognition: A survey”, Speech Communication, Vol. 45, no. 4, pp. 455 -470, 2005.
    [19] R.H. Mathews, Mathews’ Chinese-English Dictionary, 13th printing. Caves, 1975.
    [20] J. C. Wells, “Computer-Code Phonetic Notation of Individual Languages of the European Community,” J. Int’l Phonetic Assoc., vol. 19, pp. 32-54, 1989.
    [21] J. L. Hieronymus, “ASCII Phonetic Symbols for the World’s Languages: Worldbet,” J. Int’l Phonetic Assoc., 1993.
    [22] H. C. Wang, “MAT – A project to collect Mandarin speech data through telephone networks,” Computational Linguistics and Chinese Language Processing, 1997, vol.2, no. 1, pp. 73-90.
    [23] B. Mak and E. Barnard, “Phone Clustering Using the Bhattacharyya Distance,” Proc. Int’l Conf. Spoken Language Processing (ICSLP ’96), pp. 2005-2008, 1996.
    [24] J. Goldberger and H. Aronowitz, “A Distance Measure between GMMs Based on the Unsented Transform and Its Application to Speaker Recognition,” Proc. European Conf. Speech Comm. (EUROSPEECH ’05), pp. 1985-1988, 2005.
    [25] S.J. Young, J.J. Odell, and P.C. Woodland, “Tree-Based State Tying for High Accuracy Acoustic Modelling,” Proc. ARPA Human Language Technology Conf., 1994.
    [26] Young, S. et al. HTKbook (V3.4), Cambridge University Engineering Dept. (2006)

    無法下載圖示 校內:2016-09-06公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE