| 研究生: |
林立成 Sarwono, David |
|---|---|
| 論文名稱: |
應用音節群組之加權型核心特徵矩陣於語音辨識替代錯誤修正 A Syllable Cluster Based Weighted Kernel Feature Matrix for ASR Substitution Error Correction |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 前後文相關之音節 、語音辨識 、錯誤修正 、自然語言處理 |
| 外文關鍵詞: | Context Dependent Syllable, Automatic Speech Recognizer, Error Correction, Natural Language Processing |
| 相關次數: | 點閱:115 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
應用音節群組之加權型核心特徵矩陣於語音辨識替代錯誤修正
林立成* 吳宗憲**
國立成功大學資訊工程學系
近年來,語音辨識技術的成熟使其在科技領域的應用上蓬勃發展。然而,由於語音辨識技術在複雜環境等不同變數的影響之下,使其辨識率依然受到限制。所造成的語音辨識的錯誤輸出因此影響了語音辨識技術的實用價值。
因此,本論文透過後處理的方式,利用自然語言處理方式提出前後文相關之音節群組(Context Dependent Syllable Cluster、簡稱CDSC)為主的音節群組之加權型核心特徵矩陣,透過所提出此前後文相關之音節利用發音屬性進行分群,計算核心特徵矩陣結合音節混淆矩陣所得出的音節群組之加權型核心特徵矩陣,來找出最為相似性之音節,以用來改善語音辨識器輸出結果之辨識率。
利用本論文所提出之音節群組之加權型核心特徵矩陣,對語音辨識器之錯誤輸出結果進行修正,能分別將詞錯誤率由原本48.50%降低至45.31%及音節錯誤率由原本15.37%降低至10.31%。實驗結果說明了本論文所提出之方法,能有效地對語音辨識器所輸出的替代錯誤進行修正。
關鍵字-前後文相關之音節,語音辨識,錯誤修正,自然語言處理
*作者 **指導教授
Abstract
A Syllable Cluster Based Weighted Kernel Feature Matrix for ASR Substitution Error Correction
David Sarwono * Chung-Hsien Wu**
Institute of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan, R.O.C.
In recent years Automatic Speech Recognition (ASR) technology has become one of the most growing technologies in engineering science and research. However, the performance of ASR technology is still restricted in adverse environments. Errors in Automatic Speech Recognition outputs lead to low performance for speech applications, therefore correction techniques for these errors will be beneficial to applications relied on ASR outputs.
In this study, A Syllable Cluster Based Weighted Kernel Feature Matrix based on Context Dependent Syllable Cluster (CDSC) is proposed for the generation of correction candidates. For candidate selection in the second stage, the n-gram language model is used to determine the final corrected sentence output, thus to improve speech recognition output results recognition rate.
Experiments show that the proposed method improved from 48.50% to 45.31% and 15.37% to 10.31% in terms of Word Error Rate score and Syllable Error Rate as compared to the speech recognition approach.
Keyword-Context Dependent Syllable, Automatic Speech Recognizer, Error Correction, Natural Language Processing
* The Author ** The Advisor
參考文獻
[1] Dufour R., Estève Y., "Correcting ASR outputs: specific solutions to specific errors in French", in IEEE Workshop on Spoken Language Technology (SLT 2008), Goa, India, 2008, 15-18.
[2] Sagawa, H., Mitamura, T., and Nyberg, E., "Correction grammars for error handling in a speech dialog system," in Proceedings of HLT-NAACL 2004: Short Papers, Boston, 2004, 61-64.
[3] López-Cózar, R. and Callejas, Z., "ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information," Speech Communication, 50(8-9): 745-766, 2008.
[4] Brockett, C., Dolan, W. B., and Gamon, M., "Correcting ESL Errors using Phrasal SMT Techniques," in Proceedings of the 1st International Conference on Computational Linguistics and the 44th annual meeting of the ACL, 2006, 249-256.
[5] Liu, C.-H. and Wu, C.-H., "Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information," in Proceedings of INTERSPEECH-2010, Makuhari, Chiba, Japan, 2010, 1385-1388.
[6] Wu, C.-H., Liu, C.-H., Harris, M., and Yu, L.-C., "Sentence Correction Incorporating Relative Position and Parse Template Language Models," IEEE Trans. on Audio, Speech and Language Processing, 18(6): 1170-1181, 2010.
[7] Weidong Zhou., Baozong Yuan., Zhenjiang Miao., Xiaofang Tang., "Error correction via phonetic similarity-based processing for chinese spoken dialogue system," Signal Processing, 2006 8th International Conference on , vol.3, no., 16-20 2006.
[8] MATBN中文廣播新聞語料庫簡介
[9] Chi-Tien Chiu, Jia-Jang Tu, Jeng-Shien Lin, Sen-Chia Chang., "Phone-based Mandarin Speech Recognition, "ICL TECHNICAL JOURNAL,2007, 37-41.
[10]謝國平,“語言學概論”,三民出版社,民國87年
[11]蔡沛任,應用語音屬性分析於構音障礙者之發音錯誤與修正回饋,碩士論文,國立成功大學資訊工程系,2007
[12] International Phonetic Association (IPA), Handbook.
[13]Wikipedia-Viterbi algorithm
http://en.wikipedia.org/wiki/Viterbi_algorithm
[14]游政仁,語者分群及語音命令辨識之研究,碩士論文,國立台灣科,大學資訊工程系,2008
[15] 溫家誠,多媒體應用之與音辨識系統,碩士論文,國立中央大學電機工程研究所,2008
[16]Jie Wang, Haiping Lu, K. N. Plataniotis, Juwei Lu, "Gaussian Kernel Optimization for Pattern Classification, " Elsevier Science, 2008.
[17]Elena Deza, Michel Marie Deza,Encyclopedia of Distances, 2009, page 94, Springer.
[18] Sha Meng, Peng Yu, Frank Seide, and Jia Liu, "A study of lattice-based spoken term detection for chinese spontaneous speech, " in ASRU, 2007.
[19]Wikipedia-Dynamic Programming
http://en.wikipedia.org/wiki/Dynamic_programming
[20] Iain McCowan, Darren Moore, John Dines, Daniel Gatica-Perez, Mike Flynn, Pierre Wellner, Herv_e Bourlard, "On the Use of Information Retrieval Measures for Speech Recognition Evaluation, "2005.
[21]Xinhui Li, Xiangdong Wang, Yueliang Qian, Shouxun Lin, "Candidate generation for interactive Chinese speech recognition," Pervasive Computing (JCPC), 2009 Joint Conferences on , vol., no., pp.583-588, 3-5 Dec. 2009.