簡易檢索 / 詳目顯示

研究生: 范佐毅
Fan, Tso-Yi
論文名稱: 基於台灣腔英語語音辨識之個人字典語音查詢 系統設計
A Design for Personal Dictionary Inquiry System based on Taiwanese Accented English Speech Recognition
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 42
中文關鍵詞: 台灣腔英語語音辨識隱藏馬可夫模型個人字典語音查詢
外文關鍵詞: Hidden Markov models, Personal dictionary speech inquiry system, English speech recognition, Taiwanese accent
相關次數: 點閱:64下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   傳統的語音辨識技術在處理語者語音訊號識別時,可分成語者相關與非語者相關來抽取相關的語音屬性來建立識別模型,進一步透過分析這些語音的特徵與屬性的發生,亦可知道是否有某一個語音的發生(也就是一個事件的發生),於是得出一序列的事件(Event)以輔助語音辨識。然而語者相關的語音辨識處理,大多無法識別因口音(Accent)特性而影響發音的語音訊號,如發音錯誤,使得傳統語音辨識器是無法良好適用在特定口音的語者身上。另一方面,非語者相關的語音辨識處理,為了消除口音對辨識的影響,需蒐集大量的語音資料來建構模型,而這部分的語音資料需要人工標記,不僅耗時耗力,往往亦難以適當地蒐集完成,且辨識過程也增加了複雜度。
      有鑒於語音辨識在個人化手持裝置需求性增加,因此,本論文發展適用於國人發音之個人化語音辨識技術,在論文中處理台灣口音英語語音屬性與事件偵測之研究。為達到以上目的,本論文首先在PDA平臺上設計一個以動態時間校準演算法為辨識核心的語音識別字典查詢系統 (ASR-based Dictionary Inquiry System),並利用此系統測試國人在英文字母發音時易發音錯誤之字母。針對這些發音錯誤的字母,我們利用台灣口音英語資料庫(English Across Taiwan, EAT)來建構非語者相關的隱藏馬可夫模型。因此,論文中規劃只建構這些發音錯誤字母的非語者相關語音識別模型來辨識這些字母,並利用動態時間校準之語者相關辨識技術處理其他字母的語音識別,以達到較好的個人化英語語音辨識效能。

      There are two categories for users to use automatic speech recognition (ASR) systems. One is for user-dependent and the other is for user-independent. For these systems, they achieved great improvements in recent years. However, speaker variability still affects the performance of ASR systems greatly. Among the factors attributing variability, gender and accent are the most important. It is known that speakers with heavy accents tend to make more pronunciation errors in terms of the standard pronunciation. In addition, it was observed that speakers from the same accent regions had similar tendencies in mispronunciations. Most systems were built on hidden Markov models (HMM) to try to solve this kind of problems. However, HMM training is time-consuming and needs large data. Furthermore, HMM training is a supervised procedure and transcriptions are needed. The transcriptions are either labeled manually, or obtained from a speaker independent model in which the alignment errors will certainly degrade the identification performance.
      Therefore, in this thesis, we design dynamic time warping speech recognition for ASR-based dictionary inquiry system via speech on PDA to obtain mispronunciations for Taiwanese accented English alphabet utterances. To reduce the errors that the same accent regions had similar tendencies in mispronunciations, the HMM-based English alphabet recognizer is used to recognize the alphabet speech by accent-independent hidden Markov models which are trained by English Across Taiwanese (EAT) corpus. With a keyword spotting method, we can get the recognition results of specific accent-dependent alphabets as keywords in the HMM-based English alphabet recognition. The unrecognized speech segments will be input to the DTW-based English alphabet recognizer. With the proposed method, we can obtain a better performance of personalized Taiwanese accented English speech recognition.

    摘要 ii Abstract iii 誌謝 iv Contents vi Table List vii Figure List viii CHAPTER 1 INTRODUCTION 1 1.1 Background 1 1.2 Motivation 3 1.3 Organization of Thesis 4 CHAPTER 2 ANALYSIS FOR TAIWANESE ACCENTED ENGLISH SPEECH 5 2.1 Corpus of English Across Taiwan (EAT) 5 2.2 Hidden Markov Model Toolkit (HTK) 7 2.3 Analysis for Taiwanese accent-dependent and accent-independent alphabet speech 10 CHAPTER 3 THE PROPOSED PERSONAL DICTIONARY INQUIRY SYSTEM 16 3.1 Framework of the Proposed System 16 3.2 Speech Segmentation for speech input 19 3.3 Taiwanese accented English Alphabet recognition 20 3.3.1 HMM-based English Alphabet Recognizer 20 3.3.2 DTW-based English Alphabet Recognizer 26 3.4 Personal Dictionary Inquiry System (PDIS) via Speech on PDA 29 CHAPTER 4 EXPERIMENTAL RESULTS 35 CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 38 REFERENCES 40 作者簡歷 43

    [1] Chen, T., Huang, C., Chang, E., and Wang, J. Automatic accent identification using Gaussian mixture models. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Italy.,(2001).
    [2] Chang, E., Zhou, J., Huang, C., Di, S., and Lee, K.F. Large vocabulary mandarin speech recognition with different approaches in modeling tones. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 983–986. (2000).
    [3] Strik, H. and Cucchiarini, C. Modeling pronunciation variation for ASR: Overview and comparison of methods. Proc. ETRW Workshop on Modeling Pronunciation Variation for ASR, Kerkrade, pp. 137–144 . (1998)
    [4] Riley, M.D., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., Mc- Donough, J., Nock, H., Saraclar, M.,Wooters, C., and Zavaliagkos, G. Stochastic pronunciation modeling from hand-labelled phonetic corpora. Speech Communication, 29:209–224. (1999).
    [5] Hansen, J.H.L. and Arslan, L.M. Foreign accent classification using source generator based prosodic features. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 836–839. (1995).
    [6] Teixeira, C., Trancoso, I., and Serralheiro, A. Accent identification. Proc. International Conference on Spoken Language Processing, vol. 3, pp. 1784–1787. (1996).
    [7] Fung, P. and Liu, W.K. Fast accent identification and accented speech recognition. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 221–224. (1999).
    [8]L. Rabiner and B. H. Juang, “FUNDAMENTALS OF SPEECH RECOGNITION.”
    Prentice-Hall, Inc., (1993).
    [9] S. J. Young, G. Evermann, T. Hain, D. Kershaw, G. L. Moore, J. J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C.Woodland, The HTK Book. Cambridge, U.K.: Cambridge Univ. Press, (2003).
    [10] Yung-Shing Kuo, Jhing-Fa Wang, “Embedded System Design based on SPCE061A for Interactive Spoken Dialogue Learning System with a Programmable Dialogue”2005.
    [11] Chao Hang, Tao Chen and Eric Chang. “Accent Issues in Large Vocabulary Continuous Speech Recognition,” International Journal of Speech Technology 7:141–153. (2004).
    [12]. Berkling, K., Zissman, M.,Vonwiller, J., and Cleirigh, C. Improving accent identification through knowledge English syllable structure. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 89–92. (1998).
    [13]. Chen, T., Huang, C., Chang, E., and Wang, J.. On the use of Gaussian mixture model for speaker variability analysis. Proc. International Conference on Spoken Language Processing, vol. 2, pp. 1249–1252. (2002)
    [14]. Huang, C., Chang, E., Zhou, J.L., and Lee, K.F. Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. Proc. International Conference on Spoken Language Processing, vol. 3, pp.818–821. (2000).
    [15]. Humphries, J.J. and Woodland, P.C.. The use of accent specific pronunciation dictionaries in acoustic model training. Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 317–320. (1998)

    下載圖示 校內:2008-08-21公開
    校外:2008-08-21公開
    QR CODE