簡易檢索 / 詳目顯示

研究生: 李昆峯
Lee, Kuen-Feng
論文名稱: 貝氏稀疏學習法於文件模型及語言模型之建立
Construction of Document Model and Language Model Using Bayesian Sparse Learning
指導教授: 簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 86
中文關鍵詞: 機器學習自然語言處理語音辨識貝氏學習稀疏特徵特徵選擇主題模型尖釘與平板分佈語言模型狄氏程序Pitman-Yor程序中國餐廳程序印度晚宴程序
外文關鍵詞: Machine Learning, Natural Language Processing, Speech Recognition, Bayesian Learning, Sparse Features, Feature Selection, Topic Model, Spike and Slab Distribution, Language Model, Dirichlet Process, Pitman-Yor Process, Chinese Restaurant Process, Indian Buffet Process
相關次數: 點閱:197下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著資訊科技發展的日新月異,數位資料(Digital Data)呈現爆炸性的驚人成長,如何有效處理如此龐大的資料量,並擷取重要資訊,已成為自然語言處理、資訊檢索、機器學習、語音辨識、文件模型、語言模型等相關領域中,非常重要的研究議題。
    本論文主要以文件模型中,發展相當成功的主題模型(Topic Model, TM),以及在大詞彙連續語音辨識(Large Vocabulary Continuous Speech Recognition, LVCSR)系統中,扮演非常重要角色的語言模型(Language Model, LM)為研究主軸,分別提出其貝氏稀疏學習法並發展出兩套新穎研究方法。
    首先,我們提出貝氏稀疏主題為主之文件模型,改進目前在主題模型中,最廣為流行的潛在狄氏配置(Latent Dirichlet Allocation, LDA)主題模型,本方法透過貝氏學習及Spike-and-Slab模型,在完整表達不同詞彙之潛在主題(Latent Topics)的前提下,有效從訓練文件中擷取出稀疏且具代表性的詞彙,並估測出稀疏主題模型(Sparse Topic Model, sTM)。不同於傳統主題模型,本論文提出的稀疏主題模型不只提升模型效能並精確表達不同詞彙所屬的主題,另外還能顯著改善模型稀疏度及記憶體成本。
    另外,我們提出貝氏稀疏語言模型,本方法是以解決資料稀疏(Data Sparseness)問題的平滑化N-Gram語言模型(N-Gram Language Model Smoothing)為研究課題,我們基於階層式Pitman-Yor程序進行貝氏非參數式(Bayesian Nonparametric)學習,透過階層式中國餐廳程序(Hierarchical Chinese Restaurant Process)及印度晚宴程序(Indian Buffet Process),有效導入潛在主題的概念,讓平滑化N-Gram語言模型加入語意的資訊,並擷取稀疏主題以克服長延展(Large-Span)N-Gram語言模型之模型規則化(Model Regularization)與模型選擇(Model Selection)等課題。我們使用Gibbs取樣演算法完成模型推論過程並實現出一套無窮式語言模型(Infinite Language Model),本方法在模型Perplexity及大詞彙連續語音辨識字詞錯誤率的評估實驗中已獲得良好的成效。

    Due to the rapid development of information technology, the amount of digital data grows drastically and prosperously. How to effectively and efficiently deal with a huge amount of data and extract the fitted information for users has been becoming a very important issue in many research topics including natural language processing, information retrieval, machine learning, speech recognition, document model, language model and other related fields.
    In this dissertation, we investigate the topic model which has been successfully developed as document model, and explore the language model which plays an important role in large vocabulary continuous speech recognition system. We address Bayesian sparse learning approaches to build document model and language model.
    First of all, we improve the state-of-art topic model based on latent Dirichlet allocation (LDA) and establish a new document model using Bayesian sparse topics. We propose Bayesian learning of a spike-and-slab model and applied it to select sparse and representative words from training data to construct a sparse topic model (sTM) under the assumption that number of latent topics is sufficient to represent all of the words in training data. Compared to traditional topic model (TM), the proposed sTM does not only improve system performance through finding appropriate topics corresponding to different words but also reinforce the model sparsity and reduce the memory costs.
    On the other hand, we propose a Bayesian sparse language model which is a new solution to smoothed n-gram language where data sparseness problem is tackled. We present Bayesian nonparametric learning based on the hierarchical Pitman-Yor process and use the hierarchical Chinese restaurant process and the Indian buffet process to draw aspects and topics for smoothing of n-gram language model. The proposed method effectively extracts latent semantic information from training data so that the issues of model regularization and model selection are handled for large-span language modeling. We develop Gibbs sampling algorithm to accomplish model inference procedure and implement the proposed language model. The experiments on model perplexities and word error rates for large vocabulary continuous speech recognition have shown effectiveness by using the proposed language model.

    摘要 I Abstract III 致謝 V 章節目錄 VI 圖目錄 IX 表目錄 XI 第一章 緒論 1 1.1 研究背景及動機 1 1.2 研究目的與方法 3 1.3 章節概要 5 第 二 章 相關文獻探討 6 2.1 主題模型 6 2.1.1 機率型潛在語意分析主題模型 6 2.1.2 潛在狄氏配置主題模型 8 2.2 語言模型 14 2.2.1 資料稀疏問題 16 2.2.2 事前分佈及程序 17 2.2.3 階層式Pitman-Yor語言模型 23 2.2.4 長距離資訊不充分問題 27 第 三 章 貝氏稀疏主題模型 29 3.1 稀疏模型建立法 29 3.1.1 連續型機率密度方法 30 3.1.2 尖釘與平板分佈方法 31 3.2 稀疏潛在狄氏配置主題模型 32 3.2.1 模型介紹 32 3.2.2 模型建立 32 3.2.3 模型推論 35 第 四 章 貝氏稀疏語言模型 38 4.1 階層式主題語言模型 39 4.1.1 模型介紹 40 4.1.2 模型建立 42 4.1.3 模型推論 47 4.2 貝氏稀疏語言模型 49 4.2.1 印度晚宴程序 51 4.2.2 模型建立 52 4.2.3 模型推論 54 第 五 章 實驗結果 57 5.1貝氏稀疏主題模型 57 5.1.1語料庫與初值設定 57 5.1.2評估標準 58 5.1.3實驗結果 60 5.1.4資料分析 65 5.2貝氏稀疏語言模型 69 5.2.1語料庫與相關設定 69 5.2.2評估標準 69 5.2.3實驗結果 71 第 六 章 結論及未來研究方向 73 6.1 結論 73 6.2 未來研究方向 74 參考文獻 75 附錄一 Newton-Raphson演算法 81 附錄二 貝氏稀疏主題模型變異性推論過程 83

    [1] S. D. Babacan, R. Molina and A. K. Katsaggelos, “Bayesian compressive sensing using Laplace priors”, IEEE Transactions on Image Processing, vol. 19, no. 1, pp. 53-63, 2010.
    [2] S. Bai, H. Li, Z. Lin and B. Yuan, “Building class-based language models with contextual statistics”, Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 173-176, 1998.
    [3] J. R. Bellegarda, “Exploiting latent semantic information in statistical language modeling”, Proceedings of the IEEE, vol. 88, no. 8, pp. 1279-1296, 2000.
    [4] D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet allocation”, Journal of Machine Learning Research, vol. 3, no. 5, pp. 993-1022, 2003.
    [5] P. F. Brown, V. J. Della Pietra, P. V. De Souza, J. C. Lai and R. L. Mercer, “Class-based n-gram models of natural language”, Computational Linguistics, vol. 18, no. 4, pp. 467-179, 1992.
    [6] Y.-L. Chang and J.-T. Chien, “Adaptation of topic model to new domains using recursive Bayes”, NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009.
    [7] Y.-L. Chang, K.-F. Lee and J.-T. Chien, “Bayesian feature selection for sparse topic model”, to appear in Proc. of IEEE International Workshop on Machine Learning for Signal Processing, 2011.

    [8] S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling”, Computer Speech and Language, vol. 13, no. 4, pp. 359–393, 1999.
    [9] J.-T. Chien, “Association pattern language modeling”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 5, pp. 1719-1728, 2006.
    [10] J.-T. Chien and M.-S. Wu, “Adaptive Bayesian latent semantic analysis”, IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 1, pp. 198-207, 2008.
    [11] J.-T. Chien and M.-S. Wu, “Minimum rank error language modeling”, IEEE Transactions on Audio, Speech and Language Processing, vol. 17, no. 2, pp. 267-276, 2009.
    [12] J.-T. Chien and C.-H. Chueh, “Latent Dirichlet language model for speech recognition”, Proc. of IEEE Workshop on Spoken Language Technology, pp. 201-204, 2008.
    [13] J.-T. Chien and C.-H. Chueh, “Joint acoustic and language modeling for speech recognition,” Speech Communication, vol. 52, no. 3, pp. 223-235, 2010.
    [14] J.-T. Chien and C.-H. Chueh, “Dirichlet class language models for speech recognition”, IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 3, pp. 482-495, 2011.
    [15] C.-H. Chueh and J.-T. Chien, “Reliable feature selection for language model adaptation”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5089–5092, 2008.
    [16] C.-H. Chueh and J.-T. Chien, “Continuous topic language modeling for speech recognition”, Proc. of IEEE Workshop on Spoken Language Technology (SLT), pp. 193-196, 2008.
    [17] C.-H. Chueh and J.-T. Chien, “Nonstationary latent Dirichlet allocation for speech recognition,” Proc. of Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 372-375, 2009.
    [18] C.-H. Chueh and J.-T. Chien, “Segmented topic model for text classification and speech recognition,” NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009.
    [19] C.-H. Chueh and J.-T. Chien, “Topic cache language model for speech recognition,” Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5194-5197, 2010.
    [20] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, “Indexing by latent semantic analysis”, Journal of the American Society of Information Science, vol. 41, pp. 391-407, 1990.
    [21] D. L. Donoho, “Compressed sensing”, IEEE Transactions on Information Theory, vol. 52 , pp. 1289-1306, 2006.
    [22] F. Doshi, K. T. Miller, J. Van Gael and Y. W. Teh, “Variational inference for the Indian buffet process”, Proc. of the International Conference on Artificial Intelligence and Statistics, pp. 137–144, 2009.
    [23] Z. Ghahramani, T. L. Griffiths and P. Sollich, “Bayesian nonparametric latent feature models”, Bayesian Statistics, volume 8, 2007.
    [24] D. Gildea and T. Hofmann, “Topic-based language models using EM,” Proc. of European Conference on Speech Communication and Technology (EUROSPEECH), pp. 2167-2170, 1999.
    [25] D. Gorur, F. Jakel and C. E. Rasmussen, “A choice model with infinitely many latent features”, Proc. of the International Conference on Machine Learning, vol. 23, pp. 361-368, 2006.
    [26] T. L. Griffiths and Z. Ghahramani, “Infinite latent feature models and the Indian buffet process”, Advances in Neural Information Processing Systems (NIPS), vol. 18, 2006.
    [27] T. Hofmann, “Probabilistic latent semantic indexing”, Proc. of International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50-57, 1999.
    [28] S. Huang and S. Renals, “Hierarchical Bayesian language models for conversational speech recognition”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 1941-1954, 2010.
    [29] S. Huang and S. Renals, “Power law discounting for n-gram language models”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5178-5181, 2010.
    [30] H. Ishwaran and L. F. James, “Gibbs sampling methods for stick-breaking priors”, Journal of the American Statistical Association, vol. 96, no. 453, pp. 161–173, 2001.
    [31] H. Ishwaran and J. S. Rao. “Spike and slab variable selection: frequentist and Bayesian strategies”, Annals of Statistics, vol. 33, no. 2, pp. 730-773, 2005.
    [32] S. M. Katz, “Estimation of probabilities from spare data for the language model component of a speech recognizer,” IEEE Transaction on Acoustic, Speech and Signal Processing, vol. 35, pp. 400-401, 1987.
    [33] R. Kneser and H. Ney, “Improved backing-off for m-gram language modeling”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 181–184, 1995.
    [34] E. Meeds, Z. Ghahramani, R. M. Neal and S. T. Roweis, “Modeling dyadic data with binary latent factors”, Advances in Neural Information Processing Systems (NIPS), vol. 19, 2007.
    [35] T. J. Mitchell and J. J. Beauchamp, “Bayesian variable selection in linear regression”, Journal of the American Statistical Association, vol. 83, no. 404, pp. 1023-1032, 1988.
    [36] S. Mohamed, K. Heller and Z. Ghahramani, “Sparse exponential family latent variable models”, NIPS Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions, 2010.
    [37] R. B. O’Hara and M. J. Sillanpaa, “A review of Bayesian variable selection methods: what, how and which”, Bayesian Analysis, vol. 4, no. 1, pp. 85-118, 2009.
    [38] J. Pitman and M. Yor, “The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator”, Annals of Probability, vol. 25, no. 2, pp. 855–900, 1997.
    [39] J. Pitman, “Poisson-Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition,” Combinatorics, Probability and Computing, vol. 11, pp. 501-514, 2002.
    [40] G. Saon and J.-T. Chien, “Bayesian sensing hidden Markov models for speech recognition”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5056-5059, 2011.
    [41] G. Saon and J.-T. Chien, “Discriminative training for Bayesian sensing hidden Markov models”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5316-5319, 2011.
    [42] Y. W. Teh, “A Bayesian interpretation of interpolated Kneser-Ney”, Technical Report TRA2/06, School of Computing, National University of Singapore, 2006.
    [43] Y. W. Teh, “A hierarchical Bayesian language model based on Pitman-Yor processes”, Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 985–992, 2006.
    [44] Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei, “Hierarchical Dirichlet processes”, Journal of the American Statistical Association, vol. 101, no. 476, pp. 1566-1581, 2006.
    [45] Y. W. Teh, D. Gorur and Z. Ghahramani, “Stick-breaking construction for the Indian buffet process”, Proc. of the International Conference on Artificial Intelligence and Statistics, vol. 11, 2007.
    [46] Y. W. Teh and D. Gorur, “Indian buffet processes with power-law behavior”, Neural Information Processing Systems (NIPS), pp. 1838–1846, 2009.
    [47] R. Thibaux and M. I. Jordan, “Hierarchical beta processes and the Indian buffet process”, Proc. of the International Workshop on Artificial Intelligence and Statistics, vol. 11, pp. 564–571, 2007.
    [48] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine”, Journal of Machine Learning Research, vol. 1, pp. 211-244, 2001.
    [49] C. Wang and D. M. Blei, “Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process”, Neural Information Processing Systems (NIPS), vol. 22, 2009.
    [50] S. Williamson, C. Wang, K. A. Heller and D. M. Blei, “The IBP compound Dirichlet process and its application to focused topic modeling”, International Conference on Machine Learning, 2010.
    [51] H. Yamamoto, S. Isogai and Y. Sagisaka, “Multi-class composite N-gram language model,” Speech Communication, vol. 41, pp. 369-379, 2003.
    [52] S. Yaman, J.-T. Chien and C.-H. Lee, “Structural Bayesian language modeling and adaptation”, Proc. of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 2365-2368, 2007.
    [53] I. Zitouni, “Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition,” Computer Speech and Language, vol. 21, pp. 88-104, 2007.

    下載圖示 校內:2021-08-01公開
    校外:2021-08-01公開
    QR CODE