| 研究生: |
張瑛蘭 Chang, Ying-Lan |
|---|---|
| 論文名稱: |
近似貝氏主題模型於資訊檢索之研究 Approximate Bayesian Topic Models for Information Retrieval |
| 指導教授: |
李同益
Lee, Tong-Yee |
| 共同指導教授: |
簡仁宗
Chien, Jen-Tzung |
| 學位類別: |
博士 Doctor |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 182 |
| 中文關鍵詞: | 主題模型 、貝式推論 、非參數貝式學習 、語言模型 、資訊擷取 、自動化摘要 、文件模型 |
| 外文關鍵詞: | topic model, Bayesian inference, nonparametric Bayesian, language model, information retrieval, automatic summarization, document model |
| 相關次數: | 點閱:129 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文件模型(Document Model)與語言模型(Language Model)是資訊檢索領域的核心技術,本論文依循近似貝氏推論(Approximate Bayesian Inference)並透過遞迴貝氏(Recursive Bayesian)學習、稀疏貝氏(Sparse Bayesian)學習、變異貝氏(Variational Bayesian)估測、非參數型貝氏(Bayesian Nonparametric)學習及取樣方法(Sampling Method),發展一系列以潛在狄氏配置(Latent Dirichlet Allocation, LDA)為主之主題模型(Topic Model)並建立文件模型與語言模型,這一系列方法可以廣泛應用在不同資訊系統上包括文件檢索(Document Retrieval)、文件分類(Document Classification)、文件表示(Document Representation)、文件摘要(Document Summarization)及語音辨識(Speech Recognition)。
在資訊檢索的應用方面,一般擷取資訊所用到的查詢關鍵字都不多,我們可以利用關鍵字模型(Query Model),將關鍵字擴充和延伸,並配合已經訓練好的文件模型(Document Model)進行有效率的相關回饋(Relevance Feedback)。在本論文中,我們提出一種新相關回饋機制,透過最佳事後機率(Maximum a Posteriori)估測法則,結合第一次擷取的相關的文件,擴充並延伸關鍵字相關的Query Model。並利用非監督式學習(Unsupervised Learning)自動調整文件模型的權重,在資訊檢索的實驗評估中,我們提出的貝氏(Bayesian)相關回饋已有效提昇資訊擷取的正確率。
在文件分類方面,我們進一步改良文件模型(Document Model)並透過模型調整(Model Adaptation)將潛在狄氏配置(LDA)模型調整到新的領域(Domain)已提昇新領域之分類正確率,本方法是採用遞迴貝氏(Recursive Bayesian)學習,利用共軛事前機率(Conjugate Prior)分布的特性,收集調整文件(Adaptation Document)並累積事後機率,發展出一套適應性潛在狄氏配置(Adaptive LDA),除了調整模型適應新領域外,還可透過注入的新文件提昇模型估測的可靠度及分類正確率。
在文件表示法方面,我們關心潛在狄氏配置模型過度訓練(Over-Trained)以及應該要有多少個主題才能適當表示文件資料等問題,在傳統LDA模型中主題的數量(Number of Topics)是必須事先給定的,如何根據收集到的資料自動決定主題數量是非常重要的,設定的主題數量過多,會造成文件模型估測不可靠,設定的主題數量過少,又無法精確的表示文件模型。本論文使用貝氏稀疏學習並建立稀疏性潛在狄氏配置(Sparse LDA),我們透過尖釘與平板(Spike-and-Slab)模型有效擷取相關性高的主題(Relevant Topic)來表示文件中不同的字詞(Word),使用特徵過濾(Feature Selection)突顯文件模型的計算效率並節省記憶體需求量,本論文使用變異貝氏(Variational Bayesian Expectation- Maximization, VB-EM)推論法則估測出模型參數,文件表示效能已有效提昇。
在文件摘要方面,我們提出兩套解決方法有效地從海量的文件資料中摘要出精簡資訊或選出具代表性之摘要文句,自動摘要過程是找出文件中重要的主題句,並且依照與文件的相關性進行排序,挑選出前幾名的句子來當摘要文句,文件摘要的效能關鍵在於精確的文句模型(Sentence Model)。因此在第一套方法我們發展出文句式潛在狄氏配置(Sentence LDA)模型,主要的觀念是視文句為文件並將以文件為主(Document-Based)的潛在狄氏配置延伸為以文句為主的模型,模型參數可由VB-EM推論求得。另外第二套方法是提出階層式雙主題模型(Hierarchical Theme and Topic Model, H2TM),我們透過巢狀式中國餐廳過程(Nested Chinese Restaurant Process, nCRP)進行非參數式貝氏學習並建立文句的階層式模型,雙主題模型結構可自動由收集到的訓練文件自動決定,每個文句會自動配置到其對應的樹節點或主題節點(Theme Node),每篇文件會選擇多重路徑(Multiple Tree Paths)以表示文件中的文句,在每個節點裡所有的文句及其字詞會透過階層式狄氏過程(Hierarchical Dirichlet Process, HDP)建立起字詞的主題(Topic),雙主題的個數(Number of Themes and Topics)決定或模型選擇課題可以獲得解決,主題文句可以由此樹狀結構中挑選出來,在這兩套方法中我們使用DUC文件集進行文件摘要評估,已有效改進潛在狄氏配置的摘要效能。
在語音辨識方面,我們發展階層式狄氏及Pitman-Yor語言模型(Hierarchical Dirichlet and Pitman-Yor Language Model),透過非參數型貝氏學習法則解決語言模型訓練過程中長距離不足及訓練文件過少所產生的問題,這套方法是經由階層式Pitman-Yor過程建立主題相關且平滑化語言模型,將模型Backoff機制融入使訓練文集不足問題獲得解決,另外我們導入階層式狄氏過程(HDP)並結合不同的主題相關語言模型以解決長距離不足的問題,透過新穎的中國餐廳情節(Scenario)及取樣過程(Gibbs Sampling)可以取樣出具韌性之主題式語言模型(Topic-Based Language Model),在測試文集及語音的實驗結果中驗證了本方法的有效性。
Document model and language model are known as the core technology for building information retrieval systems. This dissertation presents a series of approximate Bayesian inference algorithms including maximum a posteriori, recursive Bayesian, sparse Bayesian, variational Bayesian, nonparametric Bayesian and sampling method and develops different solutions to topic-based document model and language model based on the latent Dirichlet allocation (LDA). These solutions are applied to a wide range of applications including document retrieval, document classification, document representation, document summarization and speech recognition.
The goal of document retrieval basically aims to adopt a few query words to retrieve a set of documents which are relevant to their interests. In this study, we develop a new query model using relevance feedback where the query words are augmented with additional words extracted from the first-pass retrieval. The maximum a posteriori estimation is developed to find the extended query words and conduct the unsupervised reweighting scheme for relevance documents in a query model. This method effectively improves the precision and recall rates in a document retrieval system.
For the application of document classification, we adapt the LDA-based document model to new domain and aim to improve the classification performance in new domain. This thesis proposes a recursive Bayesian learning algorithm to tackle this problem based on the adaptive LDA (ALDA). Using ALDA, we use the adaptation documents to reproduce the posterior distribution by combining the likelihood function from adaptation data and the accumulated conjugate prior from training data. Model robustness to new domain is accordingly improved for document classification.
More generally, we concern model regularization issue in LDA-based document representation. In implementation of topic model, the number of topics is usually too large to properly reflect domain knowledge from training data as well as test data. The reliability of the estimated model is doubtful. However, assigning too few topics shall suffer from the limitation of representation capability. Automatic selection of the number of topics becomes a critical issue in topic model. This thesis presents a sparse Bayesian learning method to deal with the issue of model selection for LDA. We propose the sparse LDA (sLDA) where a spike-and-slab distribution is introduced to autonomously extract the relevant topics to represent individual words. The scheme of feature selection is introduced so that the computation cost and the memory requirement are reduced. This sLDA is inferred according to the variational Bayesian expectation-maximization (VB-EM) algorithm.
In addition, we propose two approaches for document summarization which aims to summarize compact information from a large set of documents or select the thematic sentences from a rank list of representative sentences. The key issue in document summarization is the construction of sentence model. In the first approach, we establish the sentence-based LDA where each sentence is treated as a document or a grouped data in a LDA learning procedure. This model is inferred by using VB-EM algorithm. On the other hand, we improve the sentence-based LDA by relaxing the constraints of the fixed number of topics as well as the independence in the estimated topics. The nonparametric Bayesian learning through a nested Chinese restaurant process (nCRP) compound hierarchical Dirichlet process (HDP) is performed to construct the hierarchical theme and topic model (H2TM). We follow nCRP and represent the sentences from a document collection based on a hierarchical theme model. The structural learning is conducted for flexible modeling with infinite tree nodes, branches and layers. The sentences in a document are represented by the themes allocated in a selected subtree. The words of the sentences in a tree node are then treated as a grouped data. The set of grouped data in different nodes are then represented by a topic model sampled from a HDP. Using this H2TM, two constraints in sentence-based LDA are tackled. The resulting summarization performance is improved in evaluation of DUC corpus.
For the application of speech recognition, we develop the hierarchical Dirichlet and Pitman-Yor language model (HDPY-LM) to resolve the issues of insufficient long distance information and sparse training data in estimation of statistical n-gram model. The HDP compound hierarchical Pitman-Yor (HPY) process is developed to sample the topic-based language model where HDP handles the sampling of latent topics and HPY process draws the smoothed n-gram probabilities from the backoff (n-1)-gram probabilities. Two issues in language model are accordingly resolved through nonparametric Bayesian learning where the backoff scheme and the number of topics are infinitely extensible. We design a new scenario of Chinese restaurant process to perform Gibbs sampling for inference of topic-based language model. This HDPY-LM is evaluated to be effective for speech recognition.
[1] Adams, R. P., Ghahramani, Z. and Jordan, M. I. "Tree-structured stick breaking for hierarchical data," Advances in Neural Information Processing Systems, 2010.
[2] Babacan S. D., M. R. a. K. A. K. "Bayesian compressive sensing using Laplace priors," IEEE Transactions on Image Processing Vol. 19, no. 1, pp. 53-63, 2010.
[3] Baeza-Yates, R. A. and B. Ribeiro-Neto "Modern Information Retrieval," Addison-Wesley Longman Publishing Co., Inc. 1999.
[4] Beal, M. J. "Variational Algorithms for Approximate Bayesian Inference." Ph.D. Thesis, University College London, 2003.
[5] Beal, M. J., Z. Ghahramani and C. E. Rasmussen. "The infinite hidden Markov model," Advances in Neural Information Processing Systems,Vol. 14, pp. 577-584, 2002.
[6] Bellegarda, J. R. "Exploiting latent semantic information in statistical language modeling," Proceedings of the IEEE Vol. 88, no. 8, pp. 1279-1296, 2000.
[7] Berry, M. W., Dumais, S. T., and O'Brien, G. W. "Using linear algebra for intelligent information retrieval," SIAM Review Vol. 37, no. 4, pp. 573-595, 1995.
[8] Bishop, C. M. "Pattern Recognition and Machine Learning," Springer-Verlag New York, Inc. 2006.
[9] Blei, D. M., T. L. Griffiths and M. I. Jordan. "The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies," Journal of the ACM Vol. 57, no. 2, pp. 1-30, 2010.
[10] Blei, D. M., Griffiths, T. L., Jordan, M. I., and Tenebaum, J. B. "Hierarchical topic models and the nested Chinese restaurant process," Advances in Neural Information Processing Systems. MIT Press. 2004.
[11] Blei, D. M., A. Y. Ng and M. I. Jordan. "Latent Dirichlet allocation," Journal of Machine Learning Research Vol. 3, no. 4-5, pp. 993-1022, 2003.
[12] Brandow, R., K. Mitze and L. F. Rau. "Automatic Condensation of Electronic Publications by Sentence Selection," Information Processing & Management Vol. 31, no. 5, pp. 675-685, 1995.
[13] Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., and Roossin, P. S. "A statistical approach to machine translation," Computational linguistics Vol. 16,no. 2, pp. 79-85, 1990.
[14] Brown, P. F., P. V. deSouza, R. L. Mercer, V. J. D. Pietra and J. C. Lai. "Class-based n-gram models of natural language," Computational Linguistics Vol. 18,no. 4, pp. 467-479, 1992.
[15] Celikyilmaz, A. and Hakkani-Tur, D. Z. "A hybrid hierarchical model for multi-document summarization." Proc. of Annual Meeting of the Association for Computational Linguistics. pp. 815-824, 2010.
[16] Celikyilmaz, A., Hakkani-Tür, D. Z., & Tür, G. "Approximate Inference for Domain Detection in Spoken Language Understanding". Proc. of INTERSPEECH, pp. 713-716, 2011.
[17] Celikyilmaz, A., Hakkani-Tur, D., Tur, G., Fidler, A., & Hillard, D. "Exploiting distance based similarity in topic models for user intent detection". Proc. of IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, , pp. 425-430, 2011
[18] Chang, Y.-L., Hung, J.-J. and Chien, J.-T. "Bayesian nonparametric modeling of hierarchical topics and sentences." Proc. of IEEE International Workshop on Machine Learning for Signal Processing (MLSP). pp. 1-6, 2011.
[19] Chang, Y.-L. and Chien, J.-T., "Bayesian nonparametric language models." Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong. pp. 188-192, 2012.
[20] Chang, Y.-L. and Chien, J.-T., "Language model adaptation for relevance feedback in information retrieval." Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP). pp. 109-112, 2008.
[21] Chang, Y.-L. and Chien, J.-T., "Adaptation of topic model to new domains using recursive Bayes." NIPS Workshop on Applications for Topic Models: Text and Beyond. 2009.
[22] Chang, Y.-L. and Chien, J.-T., "Latent Dirichlet learning for document summarization." Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). pp. 1689-1692, 2009.
[23] Chang, Y.-L. and Chien, J.-T. "Selection of relevant features for sparse topic model." NIPS Workshop on Sparse Representation and Low-Rank Approximation, 2011.
[24] Chang, Y. L., K. F. Lee and J. T. Chien. "Bayesian Feature Selection for Sparse Topic Model," IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2011.
[25] Chen, S. F. and Goodman, J., "An empirical study of smoothing techniques for language modeling," Computer Speech and Language Vol. 13, no. 4, pp. 359-394, 1999.
[26] Chien, J.-T. and Chueh, C.-H., "Dirichlet class language models for speech recognition," IEEE Transactions on Audio, Speech and Language Processing Vol. 19, no. 3, pp. 482-495, 2011.
[27] Chien, J.-T. and Wu, M.-S., "Adaptive Bayesian latent semantic analysis," IEEE Transactions on Audio, Speech and Language Processing Vol. 16, no. 1, pp. 198-207, 2008.
[28] Chien, J.-T. and Chueh, C.-H., "Latent Dirichlet language model for speech recognition." Proc. of IEEE Workshop on Spoken Language Technology. pp. 201-204, 2008.
[29] Chou, T.-C. and Chueh, M. C,. "Using incremental PLSI for threshold-resilient online event analysis," IEEE Transactions on Knowledge and Data Engineering Vol. 20, no. 3, pp. 289-299, 2008.
[30] Dempster, A. P., Laird, N. M. and Rubin, D. B. "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society Vol. 39, pp. 1-38,1977.
[31] Escobar, M. D. and M. West. "Bayesian density estimation and inference using mixtures," Journal of the American Statistical Association Vol. 90,no. 430, pp. 577-588,1995.
[32] Federico, M. "Bayesian estimation methods for n-gram language model adaptation." Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). pp. 240-243, 1996.
[33] Federico, M. "Efficient language model adaptation through MDI estimation." Proc. of European Conference on Speech Communication and Technology (EUROSPEECH): 1583-1586.1999.
[34] Gildea, D., & Hofmann, T. "Topic-based language models using EM". Proc. of European Conference on Speech Communication and Technology,1999.
[35] Ghahramani, Z., P. Sollich and T. L. Griffiths. "Bayesian nonparametric latent feature models," Bayesian Statistics.2007.
[36] Goldwater, S., Griffiths, T. L. and Johnson, M. "Interpo-lating between types and tokens by estimating power-law generators." In Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press.2006.
[37] Gong, Y., and Liu, X., . "Generic text summarization using relevance measure and latent semantic analysis." Proc. of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 19-25, 2001.
[38] Gorur, D., F. Jakel and Rasmussen , C. E., "A choice model with infinitely many latent features." Proc. of the International Conference on Machine Learning. Pittsburgh, Pennsylvania, ACM: pp. 361-368, 2006.
[39] Griffiths, T. L. and Z. Ghahramani "Infinite Latent Feature Models and the Indian Buffet Process." Advances in Neural Information Processing Systems, MIT Press: pp. 475-482, 2005.
[40] Griffiths, T. L., Steyvers, M. D., Blei M. and Tenenbaum, J. B. . "Integrating Topics and Syntax." Advances in Neural Information Processing Systems. pp. 537-544, 2005.
[41] Griffiths, T. L. a. G., Z. "The Indian buffet process: an introduction and review," Journal of Machine Learning Research Vol. 12, no. pp. 1185-1224, 2011.
[42] Griffiths, T. L. a. S., M. "Finding scientific topics." Proceeding of the National Academy of Sciences. pp. 5228-5235, 2004.
[43] Grimmer, J., "An Introduction to Bayesian Inference via Variational Approximations," Political Analysis Vol. 19, no. 1, pp. 32-47, 2011.
[44] Haghighi, A., & Vanderwende, L. "Exploring content models for multi-document summarization," Proc. of Annual Conference of the North American Chapter of the ACL. pp. 362-370, 2009.
[45] Heidel, A., Chang, H. and Lee, L., "Language model adaptation using latent Dirichlet allocation and an efficient topic inference algorithm." Proc. of Annual Conference of the International Speech Communication Association. pp. 2361-2364, 2007.
[46] Hofmann, T., "Probabilistic latent semantic indexing." Proc. of International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, California, United States, ACM: 50-57,1999.
[47] Hsu, B.-J. and Glass, J., "Style and topic language model adaptation using HMM-LDA." Proc. of Conference on Empirical Methods in Natural Language Processing. pp. 373–381, 2006.
[48] Huang, S.-F. and Renals, S., "Hierarchical Bayesian language models for conversational speech recognition," Trans. Audio, Speech and Lang. Proc. Vol. 18, no. 8, pp. 1941-1954, 2010.
[49] Ishwaran, H., & Rao, J. S.,. "Spike and slab variable selection: frequentist and Bayesian strategies.," The Annals of Statistics Vol. 33, no. 2, pp. 730-773, 2005.
[50] Ishwaran, H. and James, L. F., "Gibbs sampling methods for stick-breaking priors," Journal of the American Statistical Association Vol. 96, no. 453, pp. 161-173, 2001.
[51] Lu, Y., Mei, Q., & Zhai, C. "Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA," Information Retrieval, vol. 14, no. 2, pp. 178-203, 2011.
[52] Mitchell T. J. and Beauchamp J. J., "Bayesian variable selection in linear regression," Journal of the American Statistical Association Vol. 83, no. 404, pp. 1023-1032,1988.
[53] Mochihashi, D., Yamada, T., and Ueda, N. "Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling," Proc. of Annual Meeting of the ACL. pp. 100-108, 2009.
[54] Jelinek, F. and Mercer, R. L., "Interpolated estimation of Markov source parameters from sparse data." Proc. of Workshop on Pattern Recognition in Practice. pp. 381-397, 1980.
[55] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L K. "An introduction to variational methods for graphical models," Machine Learning Vol. 37, pp. 183-233, 1999.
[56] Katz, S. M., "Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Transactions on Acoustics Speech and Signal Processing, Vol. 35, no. 3, pp. 400-401, 1987.
[57] Kneser, R. and Ney, H., "Improved backing-off for m-gram language modeling." Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 181-184, 1995.
[58] Lease, M., and Charniak, E., "A Dirichlet-smoothed bigram model for retrieving spontaneous speech." Advances in Multilingual and Multimodal Information Retrieval, Springer Berlin Heidelberg: pp. 687-694, 2008.
[59] MacKay, D. J. C. and Peto, L. C. B., "A hierarchical Dirichlet language model," Natural Language Engineering, Vol. 1, no. 3, pp. 289-307, 1994.
[60] Meeds, E., Ghahramani, Z., Neal, R. M., and Roweis, S. T., "Modeling dyadic data with binary latent factors." Advances in Neural Information Processing Systems, 2007.
[61] Mimno, D., Hoffman, M. D., and Blei, D. M., "Sparse stochastic inference for latent Dirichlet allocation." International Conference on Machine Learning, 2012.
[62] Minka, T. "Estimating a Dirichlet distribution." Technical Report, 2000.
[63] Mohamed, S., Heller, K. and Ghahramani, Z., "Sparse exponential family latent variable models." NIPS Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions, 2010.
[64] Moore, G. and Young , S., "Class-based language model adaptation using mixtures of word-class weights." Proc. of International Conference on Spoken Language Processing (ICSLP): 512-515, 2000.
[65] O’Hara, R. B. and Sillanpaa, M. J., "A review of Bayesian variable selection methods: what, how and which," Bayesian Analysis Vol. 4, no. 1, pp. 85-118, 2009.
[66] Paisley, J., Carin, L. and Blei, D., "Variational Inference for Stick-Breaking Beta Process Priors," International Conference on Machine Learning, 2011.
[67] Pitman, J., "Combinatorial Stochastic Processes." Lecture Notes for St. Flour Summer School, 2002.
[68] Pitman, J. and Yor, M., "The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator," Annals of Probability, Vol. 25, no. 2, pp. 855-900, 1997.
[69] Ponte, J. M. and Croft, W. B., "A language modeling approach to information retrieval." A language modeling approach to information retrieval, pp. 275-281, 1998.
[70] Radev D. R., Jing H., Sty M. and Tam D., "Centroid-based summarization of multiple documents," Information Processing and Management, Vol. 40, no. 6, pp. 919-938, 2004.
[71] Saon, G. and Chien, J.-T., "Bayesian sensing hidden Markov models," IEEE Transactions on Audio, Speech and Language Processing Vol. 20, no. 1, pp. 43-54, 2012.
[72] Sato, I. and Nakagawa , H., "Topic models with power-law using Pitman-Yor process." Proc. of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, ACM: 673-682, 2010.
[73] Su, Y., "Bayesian class-based language models." Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5564-5567, 2011.
[74] Tam, Y.-C., "Dynamic language model adaptation using Variational Bayes inference." Proc. of European Conference on Speech Communication and Technology (EUROSPEECH) pp. 5-8, 2005.
[75] Tam, Y.-C. and Schultz, T., "Unsupervised language model adaptation using latent semantic marginals." Proc. of International Conference on Spoken Language Processing, pp. 2206-2209, 2006.
[76] Teh, Y. W., "A hierarchical Bayesian language model based on Pitman-Yor processes." Proc. of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Sydney, Australia, Association for Computational Linguistics: pp. 985-992, 2006.
[77] Teh, Y. W., Gorur, D., and Ghahramani, Z., "Stick-breaking construction for the Indian buffet process." Proc. of the International Conference on Artificial Intelligence and Statistics. Brookline: Massachusetts, Microtone. Vol. 11, pp. 556-563, 2007.
[78] Teh, Y. W. and Jordan, M. I., "Hierarchical Bayesian Nonparametric Models with Applications." Bayesian Nonparametrics: Principles and Practice, Cambridge University Press, 2009.
[79] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M., "A hierarchical Bayesian language model based on Pitman-Yor processes." Proc. of International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics. pp. 985-992, 2006.
[80] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M., "Hierarchical Dirichlet processes," Journal of the American Statistical Association Vol. 101, no. 476, pp. 1566-1581, 2006.
[81] Teh, Y. W. and Gorur, D., "Indian buffet processes with power-law behavior." Advances in Neural Information Processing Systems, 2009.
[82] Tipping, M. E. "Sparse Bayesian learning and the relevance vector machine," Journal of Machine Learning Research, Vol. 1, pp. 211-244, 2001.
[83] Voorhees, E. and Harman, D., "Proc. of Text Retrieval Conference (TREC1-9)." N. s. Publications, 2001.
[84] Wallach, H. M., "Topic modeling: beyond bag-of-words." Proc. of International Conference on Machine Learning: 977-984, 2006.
[85] Wang, C., Blei, D. and Heckerman, D. "Continuous time dynamic topic models." Proc. of Uncertainty in Artificial Intelligence. pp. 579-586, 2008.
[86] Wang, C. and Blei, D. M., "Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process." Neural Information Processing Systems. 22. 2009.
[87] Wang, D., Zhu, S., Li, T., & Gong, Y. "Multi-document summarization using sentence-based topic models". Proc. of the ACL-IJCNLP Conference, pp. 297-300. 2009.
[88] Wang, K. and Li, X., "Efficacy of a constantly adaptive language modeling technique for web-scale applications." Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 4734-4736, 2009.
[89] Wang, W. and Stolcke, A., "Integrating MAP, marginals, and unsupervised language model adaptation." Proc. of Annual Conference of International Speech Communication Association. pp. 618-621, 2007.
[90] Watanabe, S., Minami, Y., Nakamura, A. and Ueda, N. "Variational Bayesian estimation and clustering for speech recognition," IEEE Transactions on Speech and Audio Processing Vol. 12, no. 4, pp. 365-381, 2004.
[91] Wei, X. and Croft, W. B., "LDA-based document models for ad-hoc retrieval." Proc. of International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 178-185, 2006.
[92] Williamson, S., Wang, C., Heller, K. A. and Blei D. M., "The IBP compound Dirichlet process and its application to focused topic modeling." International Conference on Machine Learning, 2010.
[93] Wood, F. and Teh, Y. W., "A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation." Proc. of International Conference on Ar-tificial Intelligence and Statistics, pp. 607-614, 2009.
[94] Yamamoto, H., Isogai, S., and Sagisaka, Y., "Multi-class composite N-gram language model," Speech Communication, Vol. 41, no. 2-3, pp. 369-379, 2003.
[95] Yu, K. and Gales, M. J. F., "Incremental adaptation using Bayesian inference." Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 217-220, 2006.
[96] Zhai, C. and Lafferty, J., "Model-based feedback in the language modeling approach to information retrieva." Proc. of International Conference on Information and Knowledge Management, pp. 403-410, 2001.
[97] Zhai, C. and Lafferty, J., "Model-based feedback in the language modeling approach to information retrieval." Proc. of International Conference on Information and Knowledge Management: pp. 403-410, 2001.
[98] Zhai, C. and Lafferty, J., "A study of smoothing methods for language models applied to ad hoc information retrieval." Proc. of International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 334-342, 2001.
[99] Zitouni, I., "Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition," Computer Speech and Language Vol. 21, no. 1, pp. 88-104, 2007.
校內:2016-07-23公開