| 研究生: |
蔡如意 Tsai, Ru-yi |
|---|---|
| 論文名稱: |
語言模型之連續性表示法於語音辨識之應用 Continuous Lexical Representation of Language Model for Speech Recognition |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 74 |
| 中文關鍵詞: | 潛在主題語言模型 、連續型語言模型 、統計式n-gram模型 |
| 外文關鍵詞: | latent semantic analysis (LSA), topic-based probability model |
| 相關次數: | 點閱:137 下載:2 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
統計式n-gram模型是目前最普遍的語言模型之一,然而,長距離資訊的缺乏、資料稀疏和訓練與測試環境不匹配問題都將嚴重的影響n-gram語言模型的效能。由於n-gram模型的假設,當字詞間的距離超過定義的視窗大小時,其間的相依性將被忽略,因此n-gram將難以長距離的語言資訊。又由於訓練語料的不足,資料稀疏問題導致未出現的字機率為零,n-gram模型的一般化能力(Generalization)將被限制。這個問題當考慮長距離的資料或高階n-gram時將會更加嚴重。此論文著眼於建立一連續型語言模型表示法以改善語言模型所面臨的問題。傳統n-gram在離散的辭彙空間中,是利用詞彙比對求得對應機率,因此訓練語料中沒出現過的事件將會無對應之機率,導致機率為零。反之,我們利用潛在主題式資訊將離散詞彙空間轉換到低維度之連續型空間,詞彙序列將利用主題事後機率表示成一連續向量,並利用最小平方法估出最佳投影矩陣,進而在此空間中計算其語言模型機率,此時沒出現字詞將可藉由訓練語料中相似的字辭估算其最佳機率,進而達到解決資料稀疏的問題,在此連續主題空間中也將可更有效的對模型參數加以調整。又由於主題模型的建立,也將能增加模型對長距離資訊的擷取的能力。此論文也將針對n-gram模型所面臨的三大問題,探討相關文獻中提出之解決方法與此連續主題語言模型的關聯性。在實驗中,我們實作此新穎語言模型方法於華爾街日報(Wall Street Journal)大辭彙連續語音料庫,並對語言模型複雜度與語音辨識錯誤率加以分析。
Statistical n-gram language models suffer form the three weakness which are capacity of long distance information, data sparse and domain mismatch. This study presents the continuous topic language model to improve the robustness of probability prediction. Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space to represent word sequence where the latent topic information is embedded. The continuous vector is formed by the topic posterior probabilities .The least squares projection matrix from discrete word space to continuous topic space is accordingly estimated. The unseen words can be predicted through the new continuous representation of language model. Also, performing language model adaptation in continuous topic space can increase the robustness of the model. Word distribution of an unseen history in the adaptation data can be estimated through considering the neighboring histories in the continuous topic space. Also, using topic framework makes it feasible to exploit long distance regularities. In the experiments, we implement the proposed method on using Wall Street Journal corpus and obtain the significant performance improvement over the conventional latent topic language model.
[1] M. Afify, O. Siohan and R. Sarikaya, “Gaussian mixture language models for speech recognition”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 4, pp. 29-32, 2007.
[2] J. Bellegarda, “Exploiting latent semantic information in statistical language modeling”, Proceedings of the IEEE, vol. 88, no. 8, pp. 1279-1296, 2000.
[3] J. Bellegarda, “Statistical language model adaptation: review and perspectives,” Speech Communication, vol. 42, pp. 93-108, 2004.
[4] Y. Bengio, R. Ducharme, P. Vincent and C. Jauvin, “A neural probabilistic language model”, Journal of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[5] M. Berry, S. Durmais and G. Obrien, “Using linear algebra for intelligent information retrieval,” SIAM Review, vol. 37, pp. 573-595, 1995.
[6] D. Blei, A, Ng and M. Jordan, “ Latent Dirichlet allocation”, Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[7] P. Brown, J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, J. Lafferty, R. Mercer and P. Roossin, ”A statistical approach to machine translation,” Computational Linguistics, vol. 16, pp. 79-85, 1990.
[8] P. Brown, V. Della Pietra, P. De Souza, J. Lai and R. Mercer, “Class-based n-gram models of natural language,” Computational Linguistics, vo. 18, no. 4, pp. 467-479, 1992.
[9] C. Chelba and F. Jelinek, “Structured language modeling,” Computer Speech and Language, vol. 14, no. 4, pp. 283-332, 2000.
[10] S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” Computer Speech and Language, vo. 13, pp. 359-394, 1999.
[11] J.-T. Chien, “Association pattern language modeling”, IEEE Trans. Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1719-1728, 2006.
[12] J.-T. Chien, M.-S. Wu and H.-J. Peng, “On latent semantic language modeling and smoothing”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 2, pp. 1373-1376, 2004.
[13] C.-H. Chueh, J.-T. Chien and H. Wang, “A maximum entropy approach for integrating semantic information in statistical language models”, in Proc. Internal Symposium on Chinese Spoken Language Processing, pp. 309-312, 2004.
[14] S. Deerwester, S. Dumais, G. Furnas, T. Landauer and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American Society of Information Science, vol. 41, pp. 391-407, 1990.
[15] S. Della Pietra, V. Della Pietra R. Bercer and S. Roukos, “Adactive language modeling using minimum discriminant estimation,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 633-636, 1992.
[16] A. Dempster, N. Laird and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1-38, 1977.
[17] A. Emami, P, Xu and F. Jelinek, “Using a connectionist model in a syntactical based language model,” in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 372-375, 2003.
[18] M. Federico, “Bayesian estimation methods for n-gram language model adaptation,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 240-243, 1996.
[19] M. Federico, “Efficient language model adaptation through MDI estimation,” in Proc. Eurospeech, pp. 1583-1586, 1999.
[20] R. Florian and D. Yarowsky, “Dynamic nonlocal language model adaptation via hierarchical topic-based adaptation,” in Proc. ACL, pp. 167-174, 1999.
[21] D. Gildea and T. Hofmann, “Topic-based language models using EM”, in Proc. Eurospeech, pp. 2167-2170, 1999.
[22] I. J. Goodman, “The population frequencies of species and the estimation of population parameters,” Biometrika, vol. 40, pp. 237-264, 1953.
[23] Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/.
[24] T. Hofmann, “Probabilistic latent semantic indexing”, in Proc. ACM SIGIR, pp. 50-57, 1999.
[25] X. Huang, A. Acero and H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, Prentice Hall PTR, 2001
[26] F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” in Proc. Workshop in Pattern Recognition in Practice, pp. 381-402, 1980.
[27] S. M. Katz, “Estimation of probabilities from sparse data for the language model component of a speech recognizer,” IEEE Trans. Acoustic, Speech and Signal Processing, vol. 35, pp. 400-401, 1987.
[28] S. Khudanour and J. Wu, “Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling,” Computer Speech and Language, vol. 14, pp. 355-372, 2000.
[29] R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling”, in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 181-184, 1995.
[30] L. Lamel, R. Kassel, and S. Seneff, “Speech database development: design and analysis of the acoustic-phonetic corpus,“ in Proc. of the DARPA Speech Recognition Workshop, pp. 100-109, 1986.
[31] C. Leggeter and P. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, vol. 9, pp. 171-185, 1995.
[32] G. Lidstone, “Note on general case of the Bayes-Laplace formula for inductive or posteriori probabilities,” Transition of the Faculty of Actuaries, vol. 8, pp. 182-192, 1920.
[33] H. Ney, U. Essen and R. Kneser, “On structuring probabilistic dependencies in stochastic language modeling,” Computer Speech and Language, vol. 8, pp. 1-38, 1994.
[34] D. Paul and J. Baker, “The design for Wall Street Journal based CSR corpus”, in Proc. International Conference on Spoken Language Processing, pp. 899-902, 1992.
[35] J. Ponte and W. Croft, “A language modeling approach to information retrieval,” in Proc. SIGIR on Research and Development in Information Retrieval, pp. 275-281, 1998.
[36] R. Rosenfeld, “A maximum entropy approach to adaptive statistical language modeling,” Computer Speech and Language, vol. 10, pp. 187-228, 1996.
[37] H. Schwenk, “Continuous space language models”, Computer Speech and Language, vol. 21, pp. 492-518, 2007.
[38] H. Schwenk and J. Gauvain, “Connectionist language modeling for large vocabulary speech recognition,” in Proc. International Conference on Acoustic, Speech and Signal Processing, pp. 765-768, 2002.
[39] Y. Tam and T. Schultz, “Correlated latent semantic model for unsupervised LM adaptation”, in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 4, pp. 41-44, 2007.
[40] K. Vertanen, “Baseline WSJ acoustic models for HTK and sphinx: training recipes and recognition experiments”, Technical Report, Cavendish Laboratory, 2006.
[41] H. Wang and T. Kawahara, “PLSA-Based Topic Detection in Meetings for Adaptation of Lexicon and Language,” in Proc. Interspeech, pp. 602-608, 2007.
[42] P. Woodland, J. Odell, V. Valtchev and S. Young, “Large vocabulary speech recognition using HTK,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 2, pp. 125-128, 1994.
[43] J. Wu and S. Khudanpur, “Building a topic-dependent maximum entropy model for very large corpora,” in Proc. International Conference on Acoustic, Speech and Signal Processing, vol. 1, pp. 777-780, 2002.
[44] G. Zhou and K. Liu, “Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition,” Computer Speech and Language, vol. 13, pp. 125-141, 1999.
[45] I. Zitouni, “Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition,” Computer Speech and Language, vol. 21, no. 1, pp. 88-104, 2007.