簡易檢索 / 詳目顯示

研究生: 高威震
Kao, Wei-Jhen
論文名稱: 台語語音辨識錯誤偵測及修正框架
Errors Detection and Correction Framework for Taiwanese Speech Recognition
指導教授: 楊中平
Young, Chung-Ping
共同指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 35
中文關鍵詞: 台語語音辨識錯誤偵測錯誤矯正混淆集合
外文關鍵詞: Taiwanese, Speech Recognition, Error Detection, Error Correction, Confusion set
相關次數: 點閱:195下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 台語一直是台灣使用率第二高的語言,對台灣文化及社會造成的影響是顯而易見的。近年來,隨著人工智慧的浪潮,各式各樣的語音辨識應用出現出在我們的生活中,但目前還沒有足夠成熟的台語語音識別系統可以實際應用於商業場景。
    本論文中,我們首先使用開放原始碼軟體Kaldi工具庫開發台語語音辨識系統,並對測試資料辨識結果進行錯誤偵測及矯正實驗,實驗基準值則為測試資料辨識結果的詞錯誤率。錯誤偵測的部分我們首先使用隨機森林分類器建立的句子分類器來判斷辨識結果產生的語句是否需要修正,若判斷需要修正則進一步使用線性判別分析建立的文字分類器來判斷輸出的結果中有哪些詞彙是需要進行修正,最後錯誤修正的部分我們使用混淆集合及N最佳結果來建立候選句子並使用逐點相互資訊來對候選假設進行評分後選出修正結果,我們的實驗結果顯示測試資料的詞錯誤率成功從21.88%降至18.12%。

    Taiwanese language has always been the second most used language in Taiwan, obviously, it deeply impacts on Taiwanese culture and society. In recent years, with the waves of Artificial Intelligence, many speech recognition applications have appeared in our life, but there still don’t have a mature enough Taiwanese speech recognition system that can be practically applied to commercial scenes.
    In this paper, we first use open source software Kaldi toolkit to develop Taiwanese speech recognition system then perform error detection and correction experiment on the recognition result of a testing data and baseline of the correction experiment is word error rate of the testing data recognition result. In the part of error detection, we first use sentence classifier trained by Random Forest to determine whether the recognition result is erroneous or not, if it’s determined is erroneous then use word classifier trained by Linear Discriminant Analysis to determine each word of hypothesis needed to be correct or not. In the part of error correction, we use the confusion set and n-best list to build candidate lists for those word of result considered as errors and then reranking the candidates with pointwise mutual information. Our experiment result show word error rate of the testing data has dropped from 21.88%. to 18.12%.

    Abstract I 摘要 II Table of Contents III List of Tables VI List of Figures VII Chapter.1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Overview 2 Chapter.2 Related Work 3 2.1 Taiwanese Phonetics Essential to a Speech Recognition System 3 2.2 Error Detection 4 2.3 Error Correction 5 Chapter.3 Methodology 6 3.1 System Framework 6 3.2 Speech Recognition System 8 3.2.1 Language Model 8 3.2.2 Recognition Model 8 3.3 Preprocessing 9 3.3.1 Dataset Preparation 9 3.3.2 Dataset Labeling 9 3.4 Error Detection Framework 11 3.4.1 Utterance-level Features 12 3.4.2 Word-level Features 14 3.4.3 Utterance/Word Classifier Model 15 3.4.3.1. Support Vector Machine 15 3.4.3.2. Random Forest 15 3.4.3.3. Linear Discriminant Analysis 15 3.4.4 Feature Selection 15 3.5 Error Correction Framework 16 3.5.1 Utterance Error Type Classification 17 3.5.2 Confusing Set Construction 18 3.5.2.1. Training Data Confusion Set 18 3.5.2.2. Lexicon Confusion Set 24 3.5.2.3. Creation of Candidate List 24 3.5.3 Mutual Information 25 3.5.4 Confidence score of utterance 25 3.5.5 Error Correction Procedure 26 Chapter.4 Experiments 27 4.1 Error Detection Framework Experiment 27 4.1.1 Evaluation Metrics 27 4.1.2 Classifier Model Experiment 28 4.1.3 Recursive Feature Elimination with Cross Validation 29 4.1.4 Utterance/Word Error Detection 29 Data Set 29 4.2 Error Correction Framework Experiment 30 4.2.1 Evaluation Metrics 30 4.2.2 Experiment result 31 4.2.3 Error Analysis 32 Chapter.5 Conclusion and Future Work 33 Reference 34

    1. Simons, G.F. and C.D. Fennig, Ethnologue: Languages of Asia. 2017: sil International.
    2. Lyu, D.-C., et al. Large Vocabulary Taiwanese (Min-nan) Speech Recognition Using Tone Features and Statistical Pronunciation Modeling. in Eighth European Conference on Speech Communication and Technology. 2003.
    3. 洪惟仁, 台灣的語種分布與分區. 語言暨語言學, 2013. 14(2): p. 315-369.
    4. 教育部, 臺灣閩南語羅馬字拼音方案使用手冊. 2007.
    5. 楊允言, et al., 台語變調系統實作研究. 第十七屆自然語言與語音處理研討會論文集, 台南, 台灣, 2005: p. 293-304.
    6. Errattahi, R., A. El Hannani, and H. Ouahmane, Automatic speech recognition errors detection and correction: A review. Procedia Computer Science, 2018. 128: p. 32-37.
    7. Hazen, T.J., S. Seneff, and J. Polifroni, Recognition confidence scoring and its use in speech understanding systems. Computer Speech and Language, 2002. 16(1): p. 49-67.
    8. Zhou, Z. and H. Meng. A two-level schema for detecting recognition errors. 2004.
    9. Zhou, Z., H.M. Meng, and W.K. Lo. A multi-pass error detection and correction framework for Mandarin LVCSR. 2006.
    10. 教育部, 臺灣閩南語常用詞辭典. 中華民國 100 年, 2011. 7.
    11. Stolcke, A. SRILM-an extensible language modeling toolkit. in Seventh international conference on spoken language processing. 2002.
    12. Povey, D., et al. The Kaldi speech recognition toolkit. in IEEE 2011 workshop on automatic speech recognition and understanding. 2011. IEEE Signal Processing Society.
    13. Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. in Soviet physics doklady. 1966.
    14. Naumann, F., Similarity measures. 2013.
    15. Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3): p. 273-297.
    16. Ho, T.K. Random decision forests. in Proceedings of 3rd international conference on document analysis and recognition. 1995. IEEE.
    17. Fisher, R.A., The use of multiple measurements in taxonomic problems. Annals of eugenics, 1936. 7(2): p. 179-188.
    18. Guyon, I., et al., Gene selection for cancer classification using support vector machines. Machine learning, 2002. 46(1-3): p. 389-422.
    19. Guo, G., et al. A comparative study on various confidence measures in large vocabulary speech recognition. in 2004 International Symposium on Chinese Spoken Language Processing. 2004. IEEE.

    下載圖示 校內:2024-08-29公開
    校外:2024-08-29公開
    QR CODE