簡易檢索 / 詳目顯示

研究生: 沈紹全
Shen, Shao-Chuan
論文名稱: 基於多模型臺國語及混合語言之語音辨識
Taiwanese-Chinese Mixed Language Speech Recognition based on Multi-model Approach
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 32
中文關鍵詞: 語語音辨識混合語音語料訓練多模型
外文關鍵詞: Taiwanese-Chinese speech recognition, mixed speech corpus training, multi-model
相關次數: 點閱:77下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 閩南語為目前台灣第二大的語言,對於面臨高齡化社會的台灣,台語語音辨識的需求應運而生。但目前仍沒有還未有成熟的台國語語音辨識系統在市面上。
    本論文中,我們將以開源軟體Kaldi為基礎,開發適合台灣環境的國台語混合辨語音識系統。此篇論文我們採用兩種方式嘗試自做出臺國語混合語音辨識系統,分別利用台國語混合語音語料訓練,以及台國語雙辨識多模型合併,並實驗何種效果較為優異,並為其進行改善,已開發出作合適的台國語混合語音辨識系統。

    Min Nan is currently the second largest language in Taiwan. For Taiwan, which is facing an aging society, the need for Taiwanese speech recognition has emerged. But there is still no mature Taiwanese-Chinese speech recognition system on the market.
    In this paper, we will use the open source software Kaldi as the basis to develop a Taiwanese-Chinese mixed speech recognition system suitable for the Taiwanese environment. In this paper, we tried to make a Taiwanese-Chinese mixed speech recognition system in two ways, using Taiwanese-Chinese mixed speech corpus training, and Taiwanese-Chinese dual recognition multi-model merge, and experimented with which effect is better and carried out for it. Improvements have been made for a suitable Taiwanese-Chinese speech recognition system.

    Table of contents Chapter.1 Introduction 1 1.1 Background 1 1.2 Motivation 1 1.3 Overview 2 Chapter.2 Related Work 3 2.1 Taiwanese Roman Spelling Phonetic Symbol and Zhuyin 3 2.2 GMM-HMM Acoustic Model 4 2.3 Multilingual automatic speech recognition 5 Chapter.3 Methodology 6 3.1 System framework 6 3.1.1 Taiwanese-Chinese Mixed Model 6 3.1.2 Taiwanese-Chinese 2 Models 7 3.1.3 Taiwanese-Chinese 3 Models 9 3.2 Speech Recognition System 10 3.2.1 Acoustic model 10 3.2.1 Language model 10 3.3 preprocessing 11 3.3.1 Label System 11 3.3.2 Training data of acoustic model 13 3.4 Combining Sequence 14 3.4.1 Sequence scoring 15 3.4.2 Word scoring 16 3.4.3 Syllable Confusion Set 18 3.4.4 Error correction 19 3.4.5 Error correction in TC3M 20 Chapter.4 Experiment 22 4.1 Experiment preparation 22 4.1.1 Evaluation Metrics 22 4.1.2 Testing Data 23 4.2 TCMM Framework Experiment 23 4.2.1 Experiment result 23 4.2.2 Error Analysis 25 4.3 TC2M Framework Experiment 26 4.3.1 Experiment result 26 4.3.2 Error Analysis 27 4.4 TC3M Framework Experiment 27 4.4.1 Experiment result 27 4.4.2 Error Analysis 29 Chapter.5 Conclusion and Future work 30 Reference 31

    1. Lyu, D-C., et al. Large Vocabulary Taiwanese (Min-Nan) Speech Recognition Using Tone Features and Statistical Pronunciation Modeling. EuroSpeech. 2003.
    2. 洪惟仁, 臺灣的語種分布與區分 語言暨語言學. 2013.
    3. 中華民國計算機語言學會. 麥克風與料庫(TCC-300Edu). copy made in 2018.12.
    4. 教育部, 臺灣閩南語羅馬字拼音方案使用手冊. 2007.
    5. 教育部, 國語注音符號手冊. 2000
    6. Lawrence R. Rabiner, et al. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989.
    7. Mark Gales and Steve Yang, The application of hidden markov models in speech recognition. Foundations and Trends® in Signal Processing. 2008.
    8. Anjuli Kannan, et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model. Interspeech 2019. 2019
    9. R.A. Sukkar, Chin-Hui Lee. Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition. IEEE Transactions on Speech and Audio Processing. 1996.
    10. Yeou-Jiunn Chen, et al. Generation of robust phonetic set and decision tree for Mandarin using chi-square testing. Speech Communication. 2002.
    11. Shih-Hao Wang, Chien-Lin Huang and Chung-Hsien Wu, Generation of Phonetic Units for Multilingual Speech Recognition Based on Acoustic and Contextual Analysis. 2006.
    12. Povey, D., et al. The Kaldi Speech Recognition Toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. 2011.
    13. Andreas Stolcke, SRI International, USA. SRILM - an Extensible Language Modeling Toolkit. 2002.
    14. 教育部, 臺灣閩南語常用詞典. 2011.7.
    15. Wang, Y., Acero, A., Chelba, C. Is Word Error Rate a Good Indicator for Spoken Language Understanding Accuracy. IEEEE Workshop on Automatic Speech Recognition and Understanding. 2003.

    無法下載圖示 校內:2025-08-14公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE