簡易檢索 / 詳目顯示

研究生: 蕭郁霖
Xiao, Yu-Lin
論文名稱: 主題式詞彙關聯語言特徵微調模型中英混合醫療語音辨識錯誤矯正系統
Error Correction in Chinese-English Mixed Medical Speech Recognition Using a Thematic Lexical Association Fine-Tuning Model
指導教授: 楊中平
Young, Chung-Ping
共同指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2025
畢業學年度: 114
語文別: 英文
論文頁數: 63
中文關鍵詞: 中英混合語音辨識錯誤矯正主題式詞彙關聯醫療語料
外文關鍵詞: Chinese-English ASR, error correction, thematic lexical association, medical corpus
相關次數: 點閱:19下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,醫療語音辨識需求快速增加,但現有 ASR 系統在臨床中遇到中英混合語境、專業術語多樣化與發音差異時,常出現辨識失準的問題,尤其是藥品名稱、醫療術語與非標準的護理師發音,皆導致辨識結果與實際語意偏離。為解決上述問題,本研究提出一個以主題式詞彙關聯為核心的語音辨識錯誤矯正框架。
    此框架建置護理專用辭典與非標準的護理師英文發音規則,並建立英文 CMU ARPAbet 編碼與Mi2S (成功大學資訊工程系多與心智服務實驗室)開發的中臺客語CTLR音標對應表。研究同時蒐集護理人員真實錄音,並結合 Mi2S 與 Google TTS 生成語音辨識訓練語料,以強化模型對多語者的穩健性。系統整合六種偵測模組:音素混淆、音節錯誤、字詞相似度、N-Best 差異、雙詞詞性錯誤與主題式搭配詞,並以加權評分與閥值機制進行判斷與修正。
    實驗結果顯示,發音規則的導入將藥名 WER 從 74.4% 降至 24.4%;語音辨識錯誤矯正框架則將 CER 從 16.7% 降至 4.9%,優於商用與開源系統(如 Whisper-Large-v3-turbo、Google Speech-To-Text等)。此結果證實本研究提出主題式詞彙關聯語言特徵語音辨識錯誤矯正系統能有效提升醫療語音辨識準確率,減少臨床文書修正負擔,並為未來醫療語音助理與智慧病歷系統的發展奠定基礎。

    In recent years, the demand for medical speech recognition has grown rapidly. However, existing automatic speech recognition (ASR) systems often struggle in clinical settings involving Chinese–English code-switching, diverse terminology, and non-standard nurse pronunciations. These challenges frequently lead to recognition errors in drug names and medical terms. To address this issue, this study proposes a thematic lexical association framework for ASR error correction in the nursing domain.
    The framework builds a nursing-specific lexicon, defines non-standard pronunciation rules, and establishes a phonetic mapping between CMU ARPAbet for English word and the CTLR system developed by the Mi2S Laboratory, NCKU. Real nurse recordings were collected and augmented with Mi2S and Google TTS synthetic data to enhance robustness for multiple speakers. Six detection modules—phonetic confusion, syllabic error, word similarity, N-best difference, POS bigram error, and thematic collocation—are integrated with a weighted scoring and threshold mechanism for correction.
    Experiments show that applying pronunciation rules reduced the WER for drug names from 74.4% to 24.4%, while the full framework lowered the CER from 16.7% to 4.9%, outperforming commercial and open-source systems such as Whisper-Large-v3-turbo and Google Speech-to-Text. The results demonstrate that the proposed method effectively improves medical ASR accuracy and reduces clinical documentation effort.

    摘要 1 Abstract 2 Table of Content 3 List of Table 5 List of Figures 7 Chapter 1 Introduction 8 1.1 Background 8 1.2 Motivation 9 1.3 Goal 10 1.4 Contribution 10 Chapter 2 Related Works 12 2.1 Speech Recognition Model 12 2.2 Phoneme-based ASR Error Correction 13 2.3 Non-phoneme-based ASR Error Correction 14 Chapter 3 Methodology 15 3.1 Correction Strategy and Data Augmentation 15 3.2 System Architecture 16 3.3 Corpus Collection 18 3.4 ASR Model Introduction 22 3.4.1 ASR Model 22 3.4.2 CTLR Encoding 22 3.4.3 N-Best Implementation 23 3.5 Thematic Error Detection 25 3.5.1 Phonetic Confusion Error Detection 25 3.5.2 Syllabic Error Detection 28 3.5.3 Word Similarity Detection 29 3.5.4 N-Best Error Detection 32 3.5.5 Bigram POS Error Detection 34 3.5.6 Thematic Collocation Error Detection 35 3.6 Thematic Error Correction 38 Chapter 4 Experiment 40 4.1 Dataset 40 4.2 Evaluation Method 40 4.3 Recognition of non-standard nurse English pronunciations 41 4.3.1 Experimental Results 41 4.3.2 Positive Example 43 4.3.3 Error Analysis 43 4.4 Error Correction 46 4.4.1 Weighted Word Similarity Optimization 46 4.4.2 N-Best Parameter Beam–Temp Tuning 47 4.4.3 Error Correction Weight Optimization 48 4.4.4 Evaluation of the Error Correction 51 Chapter 5 Conclusion 55 Reference 56 Appendix A: CTLR Phonetic System (with CMU Mapping) 58

    Daniel, P., Arnab, G., Giles, B., & Lukas, B. (2011). The Kaldi Speech Recognition Toolkit. IEEE Signal Processing Society.
    Baevski, A., Zhou, H., Mohamed, A., & Auli, M. (2020). wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems. Association for Computing Machinery.
    Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In ICML’23: Proceedings of the 40th International Conference on Machine Learning. Association for Computing Machinery.
    Ma, R., Gales, M. J. F., Knill, K. M., & Qian, M. (2023). N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space. In INTERSPEECH.
    Ma, R., Qian, M., Gales, M., & Knill, K. (2024). ASR Error Correction using Large Language Models. In IEEE.
    López-Cózar, R., & Callejas, Z. (2008). ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information. In Speech Communication. Association for Computing Machinery.
    Serai, P., Wang, P., & Fosler-Lussier, E. (2019). Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers. In IEEE ICASSP.
    Wang, H.-W., Yan, B.-C., Wang, Y.-C., & Chen, B. (2022). Effective ASR Error Correction Leveraging Phonetic, Semantic Information and N-best hypotheses. APSIPA.
    Hou, J.-D. (2024). Mixed-language Speech Recognition Based on Dual-model Detection of Candidate Words and Thematic Collocation Words Error Correction System.
    Guo, J., Wang, M., Qiao, X., Wei, D., Shang, H., Li, Z., Yu, Z., Li, Y., Su, C., Zhang, M., Tao, S., & Yang, H. (2023). UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction. ICASSP.
    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.
    Lin, H.-Y. (2023). Multi-feature based ASR Error Detection and Correction system for Domain-Specific Task.
    Yeh, C.-F., & Lee, L.-S. (2015). An improved framework for recognizing highly imbalanced bilingual code-switched lectures with cross-language acoustic modeling and frame-level language identification. In IEEE/ACM Transactions on Audio, Speech and Language Processing. Association for Computing Machinery.
    Wei, V. junqiu, Wang, W., Jiang, D., Song, Y., & Wang, L. (2024). ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction.
    Zhu, L., Liu, W., & Lin, L. (2021). Improving ASR Error Correction Using N-Best Hypotheses. IEEE.
    Shih-Hsuan Chiu and Berlin Chen. 2020. Innovative Pretrained-based Reranking Language Models for N-best Speech Recognition Lists. In Proceedings of the 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020), pages 148–162, Taipei, Taiwan. The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
    Errattahi, R., Hannani, A. E., & Ouahmane, H. (2018). Automatic Speech Recognition Errors Detection and Correction: A Review. In Procedia Computer Science.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE