| 研究生: |
楊晏婷 Yang, Yan-Ting |
|---|---|
| 論文名稱: |
應用發音特徵於語言轉換之語音辨認中音素集之建立 Phone Set Construction based on Articulatory Features for Code-Switching Speech Recognition |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 60 |
| 中文關鍵詞: | 多語語音辨識 、語言轉換 、發音特徵 |
| 外文關鍵詞: | Multilingual speech recognition, Code-Switching, Articulatory Feature |
| 相關次數: | 點閱:68 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來由於國際化的影響,對於可辨識語言轉換之多語語音自動辨識器之需求日益增加。由於多語語音辨識所採用的聲學模型集對於辨識的準確度具有絕對影響力,如何建立一套富強健性多語語音辨識之聲學模型集乃是一重要研究課題。然而要收集對於特定國家語者口說之各種語言語料以建立聲學模型並不容易,特別是語言轉換的語料,經常會遭遇資料稀疏(data sparseness)的問題。資料稀疏的問題會導致使用資料驅動(data-driven)之語音模型集建立方法在計算聲學模型相似度時,得到的結果不可靠。因此本研究在基於隱藏式馬可夫模型語音辨識器的情況下,引入發音特徵(articulatory feature, 簡稱AF)來改善計算聲學模型間之距離時,資料稀疏所造成的問題。我們根據發音特徵與聲學特性估算中文與英文的聲學模型的相似度,進而合併相似之聲學模型以建立一中英雙語語音辨識器。在語言模型方面,我們使用以翻譯為基礎之語言模型,使系統可辨識較為廣泛之語言轉換之語句。在實驗部分,我們與近年來學者所提出的方法互相比較,實驗結果顯示本論文所提之方法建立之音素集對於語言轉換語料的辨識結果有相當程度的改進。
Due to globalization, the demands for code-switching automatic speech recognition (ASR) system increase with time. The model set construction of code-switching ASR plays an important role in multilingual and code-switching speech recognition. However, it is hard to collect sufficient code-switched or multi-lingual utterances. Thus, data sparseness problem is usually confronted in multilingual ASR construction or model training. Also, data sparseness problem degrade the reliability of estimating similarities among different phones in data-driven acoustic model set construction. This study integrates articulatory features into similarity measure to estimate the similarities between different models and merges similar phones into the same phone unit. Furthermore, a machine translation-based language model is used to improve the ability in recognizing code-switched utterances. The experimental results show that the proposed model set construction method outperform the other state-of-the-art methods.
[1] P. Fung and T. Schultz, “Multilingual spoken language processing,” IEEE Signal Processing Magazine, vol. 25, pp. 89–97, 2008.
[2] T. Schultz and A. Waibel, “Language-independent and language-adaptive acoustic modeling for speech recognition,” Speech Communication, vol. 35, pp. 31-51, 2001.
[3] C. Burgmer, “Detecting Code-Switch Events Based on Textual Features,” Diplomarbeit, 2009.
[4] S. Yu, S. Zhang and B. Xu, “Chinese-English bilingual phone modeling for cross-language speech recognition,” in Proc. of ICASSP, vol.1, pp. I- 917-20, 2004.
[5] J. Koher, “Multilingual phone model for vocabulary-independent speech recognition tasks,” Speech Communication, vol. 35, pp. 21-30, 2001.
[6] International Phonetic Association (IPA), “Handbook of the International Phonetic Association : A Guide to the Use of the International Phonetic Alphabet,” Cambridge University Press, 1999.
[7] J.-C. Wells, “Computer-Coded Phonemic Notation of Individual Languages of the European Community,” Journal of the International Phonetic Association, Vol. 19, pp. 31-54, 1989.
[8] J.-L. Hieronymus, “ASCII Phonetic Symbols for the World's Languages: Worldbet,” Journal of the International Phonetic Association, 1993.
[9] P.-Y. Shih, J.-F. Wang, H.-P. Lee, H.-J. Kai, H.-T. Kao and Y.-N. Lin, “Acoustic and Phoneme Modeling Based on Confusion Matrix for Ubiquitous Mixed-Language Speech Recognition,” IEEE SUTC, pp.500-506, 2008.
[10] C.-L. Huang and C.-H. Wu, “Phone Set Generation Based on Acoustic and Contextual Analysis for Multilingual Speech Recognition,” in Proc. of ICASSP, vol.4, pp.IV-1017-IV-1020, 2007.
[11] B. Mak and E. Barnard, “Phone clustering using the Bhattacharyya distance,” in Proc. of the International Conference on Spoken Language Processing (ICSLP), pp. 2005-2008, 1996.
[12] J. Goldberger and H. Aronowitz, “A Distance Measure Between GMMs Based on the Unsented Transform and its Application to Speaker Recognition,” in Proc. of Eurospeech, pp. 1985-1988, 2005.
[13] K.-F. Lee and H.-W. Hon, “Speaker-independent Phone Recognition using Hidden Markov Models,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641-1648, 1989.
[14] S. M. Siniscalchi, T. Svendsen and C.-H. Lee, “Toward A Detector-Based Universal Phone Recognizer,” Proc. ICASSP, 2008.
[15] S. M. Siniscalchi, T. Svendsen, and C.-H. Lee, "A penalized logistic regression approach to detection based phone classification," in Proc. Interspeech, pp. 2390-2393, 2008.
[16] C.-H. Lee, M. Clements, S. Dusan, E. Fosler-Lussier, K. Johnson, B.-H. Juang, L. Rabiner, “An Overview on Automatic Speech Attribute Transcription (ASAT),” Proceedings of Interspeech, 2007.
[17] J. Li and C.-H. Lee, “On Designing and Evaluating Speech Event Detectors,” Proc. InterSpeech, 2005.
[18] S. Kullback, “Information theory and statistics,” Dover Publications, 1968.
[19] Y. Qian and J. Liu, “Phone Modeling and Combining Discriminative Training for Mandarin-English Bilingual Speech Recognition,” in Proc. ICASSP, pp.4918-4921, 2010.
[20] H.-W. Sun, T.-L. Nwe, B. Ma and H.-Z. Li, “Speaker Diarization for Meeting Room Audio,” In Interspeech, pp. 900-903, 2009.
[21] C.-H. Wu, Y.-H. Chiu, C.-J. Shia and C.-Y. Lin, “Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMS,” IEEE Transaction on Audio Speech and Language Processing, vol. 14, pp. 266–276, 2006.
[22] A. Stolcke, “SRILM—An Extensible Language Modeling Toolkit,” Proc. International Conference on Spoken Language Processing, vol. 2, pp. 901-904, 2002.
[23] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.-Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, “The Hidden Markov Model Toolkit (HTK) Version 3.4,” 2006. http://htk.eng.cam.ac.uk/
[24] I.-F. Chen and H.-M. Wang, “An Investigation of Phonological Feature Systems Used in Detection-Based ASR,” in Proc. of ISCSLP, 2008.
[25] S. King and P. Taylor, “Detection of Phonological Features in Continuous Speech using Neural Networks,” Computer Speech and Language, vol. 14, pp. 333-353, 2000.
[26] Nikko Strom, “The NICO Artificial Neural Network Toolkit,” 1997. http://www.speech.kth.se/NICO
校內:2016-09-02公開