| 研究生: |
許純珊 Hsu, Chun-Shan |
|---|---|
| 論文名稱: |
應用次語音單元與發音特徵模型於差分貝氏資訊準則為基礎之語言識別 Delta-BIC-Based Language Identification Using Senone and Articulatory Eigenvoice Models |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2012 |
| 畢業學年度: | 100 |
| 語文別: | 英文 |
| 論文頁數: | 49 |
| 中文關鍵詞: | 語言識別 、語音辨識 、特徵模型 、差分貝氏資訊準則 |
| 外文關鍵詞: | language identification, speech recognition, eigenvoice model, delta BIC |
| 相關次數: | 點閱:109 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於社會國際化,對於可辨識語言轉換之多語語音辨識系統之需求日益增
加, 然而,多語辨識偵測語言轉換之轉換點與區分各語言片段的正確性會嚴重
影響語音辨識的效能, 因此,前端語言識別之處理是影響辨識語言轉換語句效
能之重要因素。
在本論文中,提出一個以差分貝氏資訊準則為基礎之偵測語言轉換的語言識
別系統, 其對於由語音辨識器所得之音節邊界,都預視為語言轉換點之所在。
本論文中建立多語言與不同層次之語音單元(例如senone和發音屬性)的語音
特徵模型, 以作為語音特性相似度之估測。在語言識別時,差分貝氏資訊準則
則使用長度為n個音節之分析視窗, 用以分析位於語言轉換邊界前後視窗的語
言相似度,本論文除了使用傳統歐氏距離測量外, 還考量向量內積為基礎的方
向夾角測量方式作為測試語句與多語語音特徵模型間相似度計算, 作為可能語
言邊界之轉換分數,以強化傳統差分貝氏資訊準則。最後,使用動態規劃演算
法來對語言轉換語句搜尋最佳語言序列。
在實驗部分,本研究與近幾年的研究方法相互比較, 實驗結果顯示本論
文所提出的方法在語言轉換語料之語言識別結果上有相當程度的改進, 可達
到71.69
Owing to internationalization of the society, the demand for a code-switching speech recognition system increases with time. Language identification plays an important role in code-switching speech recognition because incorrect detection of the language in the speech utterance degrades speech recognition performance seriously.
In this thesis, a new paradigm for code-switching language identification (LID) based on the delta Bayesian Information Criterion (delta-BIC) is proposed. Syllable boundaries obtained from the speech recognizer are regarded as the potential code-switching boundaries. Senone and articulatory features are employed to construct the eigenvoice models of different languages for similarity estimation.Conventional Euclidean distance and inner product-based direction measure are integrated as similarity score between input speech and each eigenvoice models.
Delta-BIC with an analysis window of n syllables is then employed to output the score for each potential boundary. Finally, the dynamic programming algorithm is employed to search the best language sequence on the output of speech recognition.
The proposed approach was evaluated on Chinese- English COde-switching Speech database (CECOS) and the results show that 71.69 accuracy of the proposed language identi fication system outperforms other previously proposed systems.
[1] Wei Li, editor. The Bilingualism Reader. London: Routledge, 2000.
[2] Helena Halmari. Government and Code-Switching: Explaining American Finnish. Amsterdam: John Benjamins, 1997.
[3] H. Y. Su. Code-switching between mandarin and taiwanese inthree telephone conversation: The negotiation of interpersonal relationships among bilingual speakers intaiwan. In the Symposium about Language and Society,April 2001.
[4] C.M. Chen. Two types of code-switching in taiwan. In Sociolinguistics Symposium 15 (SS15), Newcastle upon Tyne, United Kingdom., April 2004.
[5] Pedro A. Torres-Carrasquillo, Elliot Singer, Mary A. Kohler, Richard J. Greene, Douglas A. Reynolds, and John R. Deller Jr. Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In INTERSPEECH'02, pages 1-1, 2002.
[6] W.H. Tsai and W.W. Chang. Discriminative training of gaussian mixture bigram models with application to chinese dialect identification. Speech Commun., 36(3):317-326, Mar 2002.
[7] Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, and J.R. Deller. Language identification using gaussian mixture model tokenization. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, pages I-757-760, may 2002.
[8] Chi-Jiun Shia, Yu-Hsien Chiu, Jia-Hsin Hsieh, and Chung-Hsien Wu. Language boundary detection and identification of mixed-language speech based on map estimation. In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on, volume 1,pages I-381-384 vol.1, may 2004.
[9] Hongbin Suo, Ming Li, Ping Lu, and Yonghong Yan. Automatic language identification with discriminative language characterization based on svm. IEICE - Trans. Inf. Syst., E91-D(3):567-575, March 2008.
[10] Yan Deng and Jia Liu. Automatic language identification using support vector machines and phonetic n-gram. In Audio, Language and Image Processing, 2008. ICALIP 2008. International Conference on, pages 71-74, july 2008.
[11] Lie Lu, Stan Z. Li, and Hong-Jiang Zhang. Content-based audio segmentation using support vector machines. In Multimedia and Expo, 2001. ICME 2001. IEEE International Conference on, pages 749-752, aug. 2001.
[12] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357-366, aug 1980.
[13] E. Ambikairajah, Haizhou Li, Liang Wang, Bo Yin, and V. Sethu. Language identification: A tutorial. Circuits and Systems Magazine, IEEE, 11(2):82-108, secondquarter 2011.
[14] M.A. Zissman and E. Singer. Automatic language identification of telephone speech messages using phoneme recognition and n-gram modeling.In Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on, volume i, pages I/305-I/308 vol.1, apr 1994.
[15] M.A. Zissman. Comparison of four approaches to automatic language identification of telephone speech.Speech and Audio Processing, IEEE Transactions on, 4(1):31, jan 1996.
[16] T. Nagarajan and H.A. Murthy. Language identification using parallel syllable-like unit recognition. In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on, volume 1, pages 401-404, may 2004.
[17] Timothy J. Hazen and Victor W. Zue. Segment-based automatic language identification. The Journal of the Acoustical Society of America, 101(4):2323-2331, 1997.
[18] Rongqing Huang and J.H.L. Hansen. Dialect/accent classification viaboosted word modeling. In Acoustics, Speech, and Signal Processing, 2005.Proceedings. (ICASSP '05). IEEE International Conference on, volume 1,pages 585-588, 18-23, 2005.
[19] H. Akaike. A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716-723, dec 1974.
[20] Improved speaker segmentation and segments clustering using the bayesian information criterion.
[21] Chung-Hsien Wu and Chia-Hsin Hsieh. Multiple change-point audio segmentation and classification using an mdl-based gaussian model. Audio, Speech,and Language Processing, IEEE Transactions on, 14(2):647-657, march 2006.
[22] Mauro Cettolo and Marcello Federico. Model selection criteria for acoustic segmentation. In in Proc. of the ISCA ITRW ASR2000 Automatic Speech Recognition, pages 221-227, 2000.
[23] Han-Ping Shen, Chung-Hsien Wu, Yan-Ting Yang, and Chun-Shan Hsu. Cecos: A chinese-english code-switching speech database. In Speech Database and Assessments (Oriental COCOSDA), 2011 International Conference on, pages 120-123, oct. 2011.
[24] Christoph Burgmer. Detecting code-switch events based on textual features. Master's thesis, Karlsruher Institut fur Technology, 2009.
[25] S. J. Young, G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. C. Woodland. The HTK Book, version 3.4. Cambridge University Engineering Department, Cambridge, UK, 2006.
[26] Xuedong Huang, Fileno Alleva, Hsiao wuen Hon, Mei yuh Hwang, and Ronald Rosenfeld. The sphinx-ii speech recognition system: An overview. Computer, Speech and Language, 7:137-148, 1992.
[27] M.Y. Hwang and X. Huang. Subphonetic modeling with markov statessenone. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, volume 1, pages 33-36 vol.1, mar 1992.
[28] Han-Ping Shen, Chung-Hsien Wu, and Pei-Shan Tsai. Transformation-based accented speech modeling using articulatory attributes for non-native speech recognition. In APSIPA Annual Summit and Conference 2011 (APSIPA ASC 2011), 2011.
[29] Roman Jakobson, C. Gunnar M. Fant, and Moris Halle. Preliminaries to speech analysis : he distinctive features and their correlates. Technical report,MIT Press, 1952.
[30] P. Kenny, G. Boulianne, and P. Dumouchel. Eigenvoice modeling with sparse training data. Speech and Audio Processing, IEEE Transactions on, 13(3):345-354, may 2005.
[31] T. Schultz, I. Rogina, and A. Waibel. Lvcsr-based language identification. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 2, pages 781-784 vol. 2, may 1996.
[32] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27, 2011.
[33] Andreas Stolcke. SRILM { an extensible language modeling toolkit. In International Conference on Spoken Language Processing 2002, ICSLP-02., volume 2, pages 901-904, Denver, USA, 2002.
校內:2017-08-28公開