簡易檢索 / 詳目顯示

研究生: 李奇峰
Li, Chi-Feng
論文名稱: 中英混語辭彙不特定語者語音辨識器嵌入式系統設計研究
A Design of a Mandarin and English Mixed-language Speaker Independent Speech Recognition Embedded System
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 46
中文關鍵詞: 台灣口音英語資料庫語音辨識
外文關鍵詞: Voice Activity Detection, Mandarin and English Mixed-language, Hidden Markov Models, English Across Taiwan, Speech Recognition
相關次數: 點閱:79下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著全球化趨勢的來臨,文化交流、商業活動和網路資訊都充斥著多語(Multilinguality)的環境及各式各樣的應用,其中混雜語言(Mixed-lingual)之語音常出現於會議紀錄及一般對話等方面。因此發展適用於國人之特定口音的中英文混雜之自動語音辨識技術,並應用於可攜嵌入式系統之中,對人性化數位生活及新世代自動語音辨識技術,都將獲得有效的助益。
    本研究延續原本中文辨識技術,進一步擴展到雙語言的辨識技術。為研發雙語言語音辨識系統,克服不同語言的特性問題,採用多語語音辨識單元集技術,在原有的語音辨識系統前端,建立一有效的混語語音辨識模型,本研究之混雜語言自動語音辨識架構可分為三大部分,1)多語辨識單元集之定義與選取2)混語語音屬性分析與模型建立3)混語語音詞彙識別。另外,為發展適用於國人發音之個人化語音辨識技術,研究中亦利用台灣口音英語資料庫(English Across Taiwan, EAT)來建構非語者相關混語語音聲學模型,使詞彙辨識率上可達7~8成,最後並將系統設計實現在如手持PDA之嵌入式系統裝置上。

    As global communication and multiethnic societies grow, the demand for multilingual capability increases. An utterance is sometimes spoken in two or more languages, as in mixed-language speech. Therefore, a mixed-language speaker independent speech recognition embedded system is proposed in this work. The proposed work will achieve further to benefit all other speakers and make a great progress on next-generation automatic speech recognition.
    The conventional approaches to perform multilingual speech recognition are the usage of a multilingual phone set. The multilingual phones are generally created by merging phones across acoustically similar target languages in an attempt to obtain a minimal phone set covering all the sounds that exist in all of the target languages. In this work, the International Phonetic Alphabet (IPA) representation is adopted for phonetic unit modeling. With accent issue, we also apply English Across Taiwan (EAT) to construct speaker-independent acoustic models. The experimental results show that the proposed system can perform 70~80% lexicon recognition accuracy. Finally, the mixed-language speaker independent speech recognition embedded system is also implemented on PDAs.

    ABSTRACT(CHINESE)........IV ABSTRACT(ENGLISH)........V ACKNOWLEDGMENTS........VI CONTENTS........VII TABLES CAPTIONS........IX FIGURES CAPTIONS........X CHAPTER 1 ........1 INTRODUCTION........1 1.1 BACKGROUND AND MOTIVATION........1 1.2 OBJECTIVE OF PROPOSED WORK........2 1.3 ORGANIZATION OF THESIS........2 CHAPTER 2 ........3 2.1 TRAINING AND TESTING CORPORA........3 2.1.1 Corpora of English Across Taiwan (EAT)........3 2.1.2 Corpora of Mandarin speech data Across Taiwan (MAT‐400)........5 2.2 MONOLINGUAL SPEECH RECOGNITION TECHNOLOGIES........7 2.2.1 Chinese Speech Recognition........7 2.2.2 English Speech Recognition........7 CHAPTER 3........8 MIXED-LANGUAGE PHONE SET CONSTRUCTION AND ACOUSTIC MODELING ........8 3.1 MIXED-LANGUAGE ACOUSTIC MODELING........8 3.1.1 Knowledge-Based Methods........9 3.1.2 Computational Methods........10 3.2 ACOUSTIC MODEL MEASUREMENT BASED ON CONFUSION MATRIX........11 3.2.1 Measures Based on Confusion Matrix........11 3.2.2 Phoneme Model HMMs Estimation of a Confusion Matrix........12 3.3 MIXED-LANGUAGE PHONE SET........15 CHAPTER 4 ........18 MANDARIN AND ENGLISH MIXED-LANGUAGE SPEECH RECOGNITION (MLSR) SYSTEM........18 4.1 FRAMEWORK OF THE PROPOSED SYSTEM........19 4.2 FEATURE EXTRACTION AND FRONT-END SIGNAL PROCESSING ........20 4.2.1 Features and transforms........21 4.2.2 Silence Removal........23 4.2.3 Voice Activity Detection........23 4.3 TRAINING PHASE OF THE PROPOSED SYSTEM........24 4.3.1 Hidden Markov Models........24 4.3.2 HTK – HMM training tool........26 4.4 RECOGNITION PHASE OF THE PROPOSED SYSTEM........29 4.4.1 Tree Lexicon(A Pronunciation Dictionary)........29 4.4.2 The Task Grammar........31 4.4.3 Viterbi Beam Search........33 4.4.4 Recognition Process........33 CHAPTER 5........35 IMPLEMENTATIN AND EXPERIMENTAL RESULTS........35 5.1 IMPLEMENTATION........36 5.1.1 Configuration of Embedded System (Acer N300)........36 5.1.2 Embedded Visual C++ 4.0 (EVC 4.0)........37 5.1.3 Port Speech Recognition to Embedded System (WM 5.0)........38 5.1.4 System Interface........38 5.2 EXPERIMENTS........39 5.2.1 Experimental Setup ........39 5.2.2 Experimental Results........41 CHAPTER 6 ........43 CONCLUSIONS AND FUTURE WORKS........43 REFERENCES........44 AUTHOR’S BIOGRAPHICAL NOTES........46

    [01] Andersen O., Dalsgaard P. and Barry W. On the use of data-driven clustering technique for identification of poly- and mono-phonemes for four european languages. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume 1, pp. 121–124, Adelaide, Australia, Apr. 1994.
    [02] Chomsky, N. and Halle, M., 1968. The Sound Pattern of English. New York: Harper & Row.
    [03] C. Y. Tseng, “A phonetically oriented speech database for Mandarin Chinese,” Proc. ICPhS95, Stockholm, pp. 326- 329 (1995).
    [04] C.-H. Lee, L. Rabiner et al.: 1990. Acoustic modcling for large vocabulary speech recognition. Computer speech and language: Vol. 4_ pp.127- 165.
    [05] C.L. Huang, C-H Wu, “PHONE SET GENERATION BASED ON ACOUSTIC AND CONTEXTUAL ANALYSIS FOR MULTILINGUAL SPEECH RECOGNITION” Department of Computer Science and Information Engineering,National Cheng Kung University, Tainan, Taiwan, R.O.C. (2007)
    [06] C.L. Huang, C-H Wu, “Generation of Phonetic Units for Mixed-Language Speech Recognition Based on Acoustic and Contextual Analysis”, Department of Computer Science and Information Engineering,National Cheng Kung University, Tainan, Taiwan, R.O.C. (2007)
    [07] C. Y MA, Pascale FUNG, “Using English Phoneme Models for Chinese Speech Recognition” , The Human Language Technology Center Department of Electrical and Electronic Engineering Hong Kong University of Science and Technology (HKUST), Hong Kong
    [08] F. Seide. N. J. C. Wang, 1998. Phonetic modeling in the Philips Chinese continuous-speech recognition system. In Proc.
    [09] Harju M., Salmela P., Lepp¨anen J., Viikki O. and Saarinen J. Comparing parameter tying techniques for multilingual acoustic modelling. In Proceedings of the European Conference of Speech Communication and Technology, pp. 2729–2732, Aalborg, Denmark, Sept. 2001.
    [10] H. C. Wang, “MAT – A project to collect Mandarin speech data through telephone networks,” Computational Linguistics and Chinese Language Processing, vol.2, no. 1, pp. 73-90 (1997).
    [11] Imperl B. and Horvat B. The clustering algorithm for the definition of multilingual set of context dependent speech models. In Proceedings of the European Conference of Speech Communication and Technology, pp. 887–890, Budabest, Hungary, 1999.
    [12] J. L. Gauvain, L.F. Laniel, G Adda, M. Adda-Decker, 1994. Speaker Independent Continuous Speech Dictation, Speech Communication, Vol. 15 (l-2), pp. 21-37.
    [13] K¨ohler J. Comparing three methods to create multilingual phone models for vocabulary independent speech recognition tasks. In Proc. ESCA-NATO Tutorial and Research Workshop: Multi-lingual Interoperability in Speech Technology, pp. 79–84, Sept. 1999.
    [14] K¨ohler J. Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication, 35(1-2):21–30, Aug. 2001.
    [15] Karjalainen M. Kommunikaatioakustiikka. Technical Report 51, Helsinki University of Technology, Laboratory of Acoustics and Audio Signal Processing, Espoo, Finland, 1999. Preprint, In Finnish.
    [16] Ladefoged P., Local J. and Shockey L., editors. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, U.K., 1999.
    [17] Rabiner L. Fundamentals of Speech Recognition. PTR Prentice-Hall Inc., New Jersey, 1993.
    [18] Shengmin Yu Sheng Hu Shuwu Zhang Bo Xu, “CHINESE-ENGLISH BILINGUAL SPEECH RECOGNITION”, Hi-Tech Innovation Center, Institute of Automation Chinese Academy of Sciences, Beijing, P. R. China (2003)
    [19] Turunen E. Survey of theory and applications of Lukasiewicz-Pavelka fuzzy logic. In di Nola A. and Gerla G., editors, Lectures on Soft Computing and Fuzzy Logic. Advances in Soft Computing, pp. 313–337. Physica-Verlag, Heidelberg, 2001.
    [20] Vihola M., Harju M., Salmela P., Suontausta J. and Savela J. Two dissimilarity measures for HMMs and their application in phoneme model clustering. Accepted to Proceedings of International Conference on Acoustics, Speech and Signal Processing, Orlando, USA, 2002.
    [21] Y. J. Chen, C-H. Wu et al.: 2002. Generation of robust phonetic set and decision tree for Mandarin using chi-square testing. Speech Communication, Vol. 38 (3-4), pp. 349-364.
    [22] Young, S. et al. HTKbook(V3.2), Cambridge University Engineering Dept. (2002)
    [23] Zgank A., Imperl B. and Johansen F. Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering. In Proceedings of the European Conference of Speech Communication and Technology, pp. 2725–2728, Aalborg, Denmark, Sept. 2001.

    下載圖示 校內:2008-10-09公開
    校外:2012-10-09公開
    QR CODE