| 研究生: |
林鴻佑 Lin, Hung-Yu |
|---|---|
| 論文名稱: |
基於多重特徵的主題式語音辨識錯誤偵測與矯正系統 Multi-feature based ASR Error Detection and Correction system for Domain-Specific Task |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2023 |
| 畢業學年度: | 111 |
| 語文別: | 英文 |
| 論文頁數: | 46 |
| 中文關鍵詞: | 錯誤偵測 、錯誤矯正 、語義理解 、上下文關係 、音素混淆集合 |
| 外文關鍵詞: | Error Detection, Error Correction, Semantic Resolution, Context information, Confusion Set |
| 相關次數: | 點閱:77 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著語音助理、智慧音箱等產品的普及,語音辨識技術也逐漸成為產品中不可或缺的核心技術。儘管各家科技大廠如Google、Amazon 和 Apple 動用大量資金、人力與技術,用大量資料試圖推出準確率趨近於百分之百的語音辨識系統,但事實上這樣的系統不可能存在,尤其是在特定領域的語音辨識情境中,由於該情境包含大量的專有名詞與約定成俗的口語化用法,其語音辨識的準確率必定會下降。因此,本篇論文的研究主題聚焦在針對特定領域的需求,開發一套針對特定領域之語音辨識的後矯正系統。我們針對特定領域收集相關資料,建立一機率模型以量化詞彙間的相互關係,並針對特定領域建立語義解析模組。此外,我們收集過去的辨識記錄,分析音素與音素之間的相似性,並建立音素混淆集合。最後,我們提出一個以音素、詞彙、語意等多重特徵之語音辨識錯誤偵測與矯正模型,以此來改善語音辨識結果的語意合理性,並提供使用者更正確的辨識結果。
With the popularization of products such as voice assistants and smart speakers, Automatic Speech Recognition(ASR) has gradually become an indispensable core technology in products. Although various technology giants such as Google, Amazon and Apple use a lot of money and technology, trying to build a ASR system with an accuracy rate close to 100%, but in fact such a system cannot exist, especially in a domain-specific scenario. In the domain-specific scenario, it contains a large number of proper nouns and conventional colloquial usages, so the accuracy of speech recognition will definitely decrease. The research topic of this thesis focuses on the development of a ASR post-correction system to meet the needs of domain-specific tasks. We tried to build a statistical model to quantify the relationship between words, and built a semantic resolution module for a specific domain. Furthermore, we analyzed phoneme-to-phoneme similarities, and built phoneme confusion sets through recognition records. We also proposed a ASR Error detection and Correction system based on multiple features such as phoneme, word, and semantics information, so as to improve the semantic rationality of ASR recognition results and provide users with more accurate ASR recognition results.
[1] M. Mohri and F. Pereira, “Weighted finite-state transducers in speech recognition,” 2002.
[2] P. Ponnusamy, C. G. Alireza Roshan Ghias, and R. Sarikaya, “Feedback-based self-learning in large-scale conversational ai agents,” 2019.
[3] A. Mani, S. Palaskar, and S. Konam, “Towards understanding asr error correction for medical conversations,” 2020.
[4] Y. Leng, X. Tan, L. Zhu, J. Xu, R. Luo, L. Liu, T. Qin, X.-Y. Li, E. Lin, and T.-Y. Liu, “Fastcorrect: Fast error correction with edit alignment for automatic speech recognition,” 2021.
[5] Y. Leng, X. Tan, R. Wang, L. Zhu, J. Xu, W. Liu, L. Liu, T. Qin, X.-Y. Li, E. Lin, and T.-Y. Liu, “Fastcorrect 2: Fast error correction on multiple candidates for automatic speech recognition,” 2021.
[6] A. Roshan-Ghias, C. S. Mathialagan, P. Ponnusamy, L. Mathias, and C. Guo, “Personalized query rewriting in conversational ai agents,” 2020.
[7] R. Errattahi, A. E. Hannani, H. Ouahmane, and T. Hain, “Automatic speech recognition errors detection using supervised learning techniques,” 2016.
[8] E. Simonnet, S. Ghannay, N. Camelin, Y. Esteve, and R. D. Mori, “Asr error management for improving spoken language understanding,” 2017.
[9] O. C. Morales and S. Cox, “Modeling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech,” 2007.
[10] H. W. Wang, B. C. Yan, Y. C. Wang, and B. Chen, “Effective asr error correction leveraging phonetic, semantic information and n-best hypotheses,” 2022.
[11] Y. Fusayasu, K. Tanaka, T. Takiguchi, and Y. Ariki, “Word-error correction of continuous speech recognition based on normalized relevance distance,” 2015.
[12] A. Sarma and D. D. Palmer, “Word-error correction of continuous speech recognition based on normalized relevance distance,” 2004.
[13] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” 2020.
[14] Z. H. H. W. Zheng Fang, Ruiqing Zhang and Y. Cao, “Non-autoregressive chinese asr error correction with phonological training,” 2022.
[15] Z. C. Ramón López-Cózar, “Asr post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information,” 2022.
[16] J.-T. Chien and H.-Y. Chen, “Association rule based language models for discovering long distance dependency in chinese,” 2001.
[17] A. R. H. J. Yilin Shen, Yen-Chang Hsu, “Enhancing the generalization for intent classification and out-of-domain detection in slu,” 2021.
[18] X. Li, “Understanding the semantic structure of noun phrase queries,” 2010.
[19] X. L. Mehdi Manshadi, “Semantic tagging of web search queries,” 2009.
[20] H. He and J. D. Choi, “The stem cell hypothesis: Dilemma behind multi-task learning with transformer encoders,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
, (Online and Punta Cana, Dominican Republic), pp. 5555–5577, Association for Computational Linguistics, Nov. 2021.
[21] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
, (Minneapolis, Minnesota), pp. 4171–4186, Association for Computational Linguistics, June 2019.
[22] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018.
校內:2028-07-31公開