| 研究生: |
張智翔 Chang, Jihn-Shiang |
|---|---|
| 論文名稱: |
建立粒子學習法於無限隱藏式馬可夫模型 Particle Learning for Infinite Hidden Markov Models |
| 指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2011 |
| 畢業學年度: | 99 |
| 語文別: | 中文 |
| 論文頁數: | 91 |
| 中文關鍵詞: | 粒子學習 、隱藏式馬可夫模型 、貝氏非參數 、狄氏程序 |
| 外文關鍵詞: | Particle Learning, Hidden Markov Models, Bayesian Nonparametric, Dirichlet Process |
| 相關次數: | 點閱:127 下載:5 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網路的發達和興盛,網路上的資料量也相對日益倍增,人們需要借助外在工具以便於在短時間內理解所取得資料中隱含的意義,迅速且正確地判斷真正符合自身需求的資料,如何透過自然語言處理技術讓使用者快速瀏覽重要且有用的資訊,是目前刻不容緩的研究課題。傳統隱藏式馬可夫模型(Hidden Markov Models, HMMs)可以有效從時間序列(Time Series)的訓練資料中學習並建立起分類及辨識系統,其應用領域非常廣泛,在語音辨認及自然語言處理相關研究扮演著舉足輕重的角色。
在隱藏式馬可夫模型的相關研究中,架構於貝氏非參數化(Bayesian Nonparametric)理論,在近幾年來是相當熱門的研究課題。這類方法主要的目的是要解決由資料本身自動決定馬可夫模型狀態個數及模型複雜度(Model Complexity)等相關之模型選擇(Model Selection)問題,無限隱藏式馬可夫模型(Infinite Hidden Markov Models)就是解決模型選擇問題的有效方法,然而,為了因應資料量過大所造成的冗長訓練時間,我們鑽研無限隱藏式馬可夫模型之線上學習(Online Learning)演算法,從不間斷的訓練資料累進式的學習出具適當模型複雜度之隱藏式馬可夫模型,此演算法是透過粒子學習(Particle Learning)法來實現線上學習的策略,粒子學習法則是透過引出(Draw)模型參數的粒子或取樣點(Samples)的方式進行模型推論,每一個粒子的引出都是依據上一回合(Learning Epoch)模型參數的個別取樣點資訊再加上目前回合所觀察到的新樣本,來執行目前回合的模型參數取樣。實現的過程是導入由折筷子(Stick-Breaking)與波里亞甕(Polya Urn)所建構的狄氏程序(Dirichlet Process),透過蒙地卡羅馬可夫鏈結(Monte Carlo Markov Chain)演算法計算其事後機率來決定每一個粒子或取樣點的權重,最後再藉由不同權重的取樣點來逼近其機率密度函數,進而達到線上學習的效果。隨著資料流的不斷加入系統,我們可以漸進式的根據新進資料來更新隱藏式馬可夫模型狀態的個數及其模型複雜度,大大提升學習效率及系統應用性。另外,模型也會因應資料流的變化,重新取樣模型參數,使模型本身能有效實現自我調整(Adaptation)的能力。
在本論文中,我們實現粒子學習法於無限隱藏式馬可夫模型並透過模型狀態(HMM States)來建構字母(Letter)之順序關係與字彙(Word)之主題(Topics)相依性關係的模型。在文件資料中,由收集到的字母與字彙資料,自動決定兩種層面的隱藏式馬可夫模型狀態的個數,其狀態背後隱含的意義代表的是字母的次序與字彙的主題資訊,因此我們透過粒子學習法求得無限隱藏式馬可夫模型的狀態轉移機率矩陣,進而獲得字母的順序關係與主題間的相依性關係,此套新穎模型可以幫助我們實現資料流的線上學習,既能夠解決資料量過大訓練冗長的問題,亦可隨著資料流的變化,調適模型參數並追蹤最新的資料環境特性。在實驗評估中,我們實現無限隱藏式馬可夫模型之粒子學習法,透過著名小說愛莉絲夢遊仙境及DUC 2005兩套資料集已驗證出線上學習的效能。
With the rapidly developed and flourished computer networks, the huge amount of information in the internet makes it increasingly important to conduct information extraction via natural language processing approaches. The information should be adaptively extracted for different users. Accordingly, the research topics on machine learning and natural language processing have been impacting the community of information technology. Conventionally, hidden Markov models (HMMs) are powerful for representing the dynamics of time-series signals through Markov chains and have been popularly developing for a wide range of applications including speech recognition, information extraction and many others.
In recent years, Bayesian nonparametric learning of HMMs has been attracting many research interests. The infinite HMMs were therefore constructed. The goal of this new trend is to relax the assumptions of fixed number of HMM states, or equivalently fixed model complexity and build up the HMMs with regularization and generalization by tackling the issue of model selection. Data can tell and data themselves can determine the number of states in infinite HMMs. However, estimating the infinite HMM parameters from batch training data is time-consuming. In this study, we investigate online learning of infinite HMMs where the model complexity is continuously updated from newly observed data. Learning efficiency is improved accordingly. The particle learning is fulfilled to conduct a sequential Monte Carlo algorithm for infinite HMMs. Through particle learning and inference, many particles or samples of model parameters are drawn and used to calculate the predictive posterior probability and to determine the weights of particles. These weights of sample points are used to approximate posterior probability density function. In online learning procedure, each particle is drawn by combining the particle sampled from previous learning epoch and the information from new data. The whole model is constructed by Dirichlet process via stick-breaking and Polya Urn methods. With this online learning capability, the system flexibility is increased and the adaptation of HMM parameters to new domains is performed to assure system performance.
In this dissertation, we implement particle learning of infinite HMMs for document modeling. The HMM states are constructed to represent the dynamics of topics in words and topic dependencies between different words and letters order relationship. Basically, we establish two layers of infinite HMMs where numbers of states are automatically determined. The particle learning is realized to estimate the state transition probabilities or equivalently learn the dependencies of topics in an online fashion. We accordingly improve system performance by avoiding batch training and lengthy training time and by adapting model parameters to fit new domain knowledge. The experimental evaluation on the articles of “Alice in Wonderland” and DUC 2005 has shown the effectiveness of online document modeling using proposed method.
[1] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan, “An introduction to MCMC for machine learning,” Machine Learning, vol. 50, pp. 5-43. 2003.
[2] D. Blackwell, and J. MacQueen, “Ferguson distributions via P´olya urn schemes,” Annals of Statistics, vol. 1, pp. 353-355, 1973.
[3] M. J. Beal, Z. Ghahramani and C. E. Rasmussen, “The infinite hidden Markov model,”Advances in Neural Information Processing Systems, vol. 1, pp. 577-584, 2002.
[4] D. M. Blei, and M. I. Jordan, “Variational methods for Dirichlet process mixtures,” Bayesian Analysis, vol. 1, pp. 121-144, 2005.
[5] C. M. Bishop, Pattern Recognition and Machine Learning, 2006.
[6] A. Banerjee, and S. Basu, “Topic models over text streams: A study of batch and online unsupervised learning,” SIAM Data Mining, vol. 26, pp. 437-442, 2007.
[7] J.-C. Chen and J.-T. Chien, “Bayesian large margin hidden Markov models for speech recognition”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3765-3768, 2009.
[8] J.-T. Chien, “On-line hierarchical transformation of hidden Markov models for speaker adaptation”, Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 6, pp. 2295-2298, Sydney, 1998.
[9] J.-T. Chien, “Online hierarchical transformation of hidden Markov models for speech recognition”, IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 656-667, 1999.
[10] J.-T. Chien, “Online unsupervised learning of HMM parameters for speaker adaptation”, Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 331-334, 2000.
[11] J.-T. Chien, “Online unsupervised learning of hidden Markov models for adaptive speech recognition”, IEE Proceedings – Vision, Image and Signal Processing, vol. 148, no. 5, pp. 315-324, 2001.
[12] O. Cappé, E. Moulines, and T. Rydén, “Inference in hidden Markov models,” New York: Springer, 2005.
[13] O. Cappe, S. J. Godsill, and E. Moulines, “An overview of existing methods and recent advances in sequential Monte Carlo,” IEEE Trans. Signal Process, vol. 95, pp. 899-924, 2007.
[14] K. R. Canini, L. Shi, and T. L. Griffiths, “Online inference of topics with latent Dirichlet allocation,” Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 65-72, 2009.
[15] C. Carvalho, H. Lopes, N. Polson, and M. Taddy, “Particle learning for general mixtures,” Working Paper, University of Chicago Booth School of Business, 2009.
[16] C. Carvalho, M. Johannes, H. Lopes, and N. Polson, “Particle learning and smoothing,” Statistical Science, vol. 25, pp. 88-106, 2010.
[17] A. Dempster, N. Laird and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1-38, 1977.
[18] N. Ding, and Z.-J. Ou, “Variational nonparametric Bayesian hidden Markov model,” Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2098-2101, 2010.
[19] M. D. Escobar, and M. West, “Bayesian density estimation and inference using mixtures,” Journal of American Statistical Association, vol. 90, pp. 577-588, 1995.
[20] T. S. Ferguson, “A Bayesian analysis of some nonparametric problems,” Annals of Statistics, vol. 1, pp. 209-230, 1973.
[21] E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky, “An HDP-HMM for systems with state persistence,” Proceedings of the 25th International Conference on Machine Learning, vol. 25, pp. 312-319, 2008.
[22] E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky, “Bayesian nonparametric methods for learning Markov switching processes,” IEEE Signal Processing Magazine, vol. 27, pp. 43-54, 2010.
[23] E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky, “Bayesian nonparametric inference of switching dynamic linear models,” IEEE transactions on signal processing, vol. 59, pp. 1569-1585, 2011.
[24] S. J. Godsill, A. Doucet, and M. West, “Monte Carlo smoothing for nonlinear time series,” Journal of the American Statistical Association, vol. 99, pp. 156-168, 2004.
[25] T. L. Griffiths, and Z. Ghahramani, “Infinite latent feature models and the Indian buffet process,” Advances in Neural Information Processing Systems, vol.18, pp.475, 2006.
[26] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani, “Beam sampling for the infinite hidden Markov model,” Proceedings of the 25th International Conference on Machine Learning, vol. 25, pp. 1088-1095, 2008.
[27] M. D. Hoffman, D. M. Blei, F. Bach, “Online learning for latent Dirichlet allocation,” Advances in Neural Information Processing Systems, vol. 23, pp. 1-9, 2010.
[28] B. H. Juang and L. R. Rabiner, “Hidden Markov models for speech recognition,” American Statistical Association and American Society for Quality, vol. 33, pp. 251-272, 1991.
[29] E. Jacquier, N. G. Polson, and P. E. Rossi, “Bayesian analysis of stochastic volatility models,” Journal of Business and Economic Statistics, vol. 12, pp. 371-389, 1994.
[30] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Transactions of the ASME - Journal of Basic Engineering, vol. 82, pp. 35-45, 1960.
[31] A. Y. Lo, “On a class of Bayesian nonparametric estimates: I. Density estimates,” Annals of Statistics, vol. 12, pp. 351-357, 1984.
[32] D. J. C. MacKay, “Bayesian interpolation”, Neural Computation, vol. 4, no. 3, pp. 415-447, 1992.
[33] P. D. Moral, A. Doucet, and A. Jasra, “Sequential Monte Carlo samplers,” Journal of the Royal Statistical Society - Series B: Statistical Methodology, vol. 68, pp. 411-436, 2006.
[34] R. M. Neal, “Markov chain sampling methods for Dirichlet process mixture models,” Journal of Computational and Graphical Statistics, vol. 9, pp. 249-265, 2000.
[35] J. Pitman, “Poisson-Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition,” Combinatorics, Probability and Computing, vol. 11, pp. 501-514, 2002.
[36] J. Pitman, “Combinatorial stochastic processes,” Lecture Notes in Mathematics 1875. Berlin: Springer-Verlag, 2006.
[37] L. R. Rabiner and B. H. Juang, “An introduction to hidden Markov models,” IEEE ASSP Magazine, vol. 3, pp. 4-16, 1986.
[38] L. R. Rabiner, “A Tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–285, 1989.
[39] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[40] A. Rodriguez, D. B. Dunson, and A. E. Gelfand, “The nested Dirichlet process, with discussion,” Journal of American Statistical Association, vol. 103, pp. 1131-1154, 2008.
[41] A. Rodriguez, D. B. Dunson, and A. E. Gelfand, “Latent stick-breaking processes,” Journal of the American Statistical Association, vol. 105, pp. 647-659, 2010.
[42] A. Rodriguez, “On-line learning for the infinite hidden Markov model,” Communications in Statistics - Simulation and Computation, vol. 40, pp. 879-893, 2011.
[43] J. Sethuraman, “A constructive definition of Dirichlet priors,” Statistica Sinica, vol. 4, pp. 639-650, 1994.
[44] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Sharing clusters among related groups: Hierarchical Dirichlet Processes,” Advances in Neural Information Processing Systems, vol. 17, pp. 1385-1392, 2005.
[45] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical Dirichlet processes,”Journal of the American Statistical Association, vol. 101, no. 476, pp. 1566-1581, 2006.
[46] Y. W. Teh and M. I. Jordan “Hierarchical Bayesian nonparametric models with applications,” Bayesian Nonparametrics, pp. 1-48, 2009.
[47] Y. W. Teh, “Dirichlet processes,” The Annals of Applied Statistics, vol. 4, pp. 916-942, 2010.
[48] Y. Ulker, B. Gunsel, and A. T. Cemgil, “Annealed SMC samplers for Dirichlet process mixture models,” International Conference on Pattern Recognition, pp. 2812-2815, 2010.
[49] Y. Ulker, B. Gunsel, and A. T. Cemgil, “Sequential Monte Carlo samplers for Dirichlet process mixtures,” Journal of Machine Learning Research Workshop and Conference Proceedings, vol. 9, pp. 876-883, 2010.
[50] S. G. Walker, “Sampling the Dirichlet mixture model with slices,” Communications in Statistics - Simulation and Computation, vol. 36, pp. 45-54, 2007.
[51] Y. Zhang, P. Liu, J.-T. Chien and F. Soong, “An evidence framework for Bayesian learning of continuous-density hidden Markov models”, Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3857-3860, 2009.