| 研究生: |
辛韋呈 Hsin, Wei-Cheng |
|---|---|
| 論文名稱: |
多相依性隱含狄利克雷分佈 Multi-dependent Latent Dirichlet Allocation |
| 指導教授: |
黃仁暐
Huang, Jen-Wei |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 電腦與通信工程研究所 Institute of Computer & Communication Engineering |
| 論文出版年: | 2016 |
| 畢業學年度: | 105 |
| 語文別: | 英文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 主題模型 、狄利克雷分佈 、相依性 |
| 外文關鍵詞: | Topic Model, LDA, generative model |
| 相關次數: | 點閱:125 下載:8 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隱含狄利克雷分佈是主題模型中相當熱門的研究題目,其原因在於隱含狄利克雷可以經由修改內部的參數及相依性,即可適用於不同的題目上,例如:情緒分析、輿情分析甚至是圖像分析等等,也由於其高度彈性,使得相關研究都著重於提出新的相依性已取得更好的結果。
然而,現實事件的資料分佈、特性相當多元,單一相依性的模型可能忽略每筆資料的特性,此外,不同的資料來源間的差異,可能導致分析結果變差。因此,我們提出多相依性隱含狄利克雷分佈,主要是利用多相依性以符合資料特性,不論其資料原始差異為何,皆能從相依性的集合內取出最適合之的結果,我們同時加入過去曾被提出的模型之相依性,以探討多相依性隱含狄利克雷是否能獲得更優秀的成果,我們也預期,即使有一特殊資料來源,其資料全部屬於特定一種相依性且完美符合某一特定模型,隱含狄利克雷仍舊能獲得與之相同的結果。我們的實驗結果證明,隱含多相依性狄利克雷的確能夠增進不同模型的效能。
Latent Dirichlet Allocation (LDA) is an attractive topic model research because LDA is so flexible for solving different problems. Because of its different core dependencies, it can be applied to many topics, such as emotion detection, information systems or image clustering. In recent works, researchers have focused on novel dependency to obtain perfect fitting to datasets. However, real world data is too diverse and abundant to be fitted with one single dependency. A single dependency model can only concentrate on the overall characteristic of datasets and thus ignores small details in the data. As a result, we propose Multi-dependent Latent Dirichlet Allocation (MD-LDA). MD-LDA can be applied various dependencies into the model. For each piece of data, MD-LDA can pick up the most optimal fitting dependency from the dependency set and therefore obtain the best dependencies for the dataset. We also apply some previous works into MD-LDA as a basis for comparison. In our experiments, MD-LDA exhibits the best performance in various cases and is an improvement compared to the other models under consideration.
[1] H. Achrekar, A. Gandhe, R. Lazarus, S.-H. Yu, and B. Liu. Predicting flu trends using
twitter data. In Computer Communications Workshops (INFOCOM WKSHPS), 2011
IEEE Conference on, pages 702–707. IEEE, 2011.
[2] S. Bao, S. Xu, L. Zhang, R. Yan, Z. Su, D. Han, and Y. Yu. Joint emotion-topic modeling
for social affective text mining. In 2009 Ninth IEEE International Conference on Data
Mining, pages 699–704. IEEE, 2009.
[3] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language
model. journal of machine learning research, 3(Feb):1137–1155, 2003.
[4] S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. A. Reis, and J. Reynar.
Building a sentiment summarizer for local service reviews. In WWW workshop on NLP in
the information explosion era, volume 14, pages 339–348, 2008.
[5] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine
Learning research, 3(Jan):993–1022, 2003.
[6] S. Brody and N. Elhadad. An unsupervised aspect-sentiment model for online reviews.
In Human Language Technologies: The 2010 Annual Conference of the North American
Chapter of the Association for Computational Linguistics, pages 804–812. Association for
Computational Linguistics, 2010.
[7] M. Cataldi, L. Di Caro, and C. Schifanella. Emerging topic detection on twitter based on
temporal and social terms evaluation. In Proceedings of the Tenth International Workshop
on Multimedia Data Mining, page 4. ACM, 2010.
[8] L. Chen, K. T. Hossain, P. Butler, N. Ramakrishnan, and B. A. Prakash. Flu gone
viral: Syndromic surveillance of flu on twitter using temporal topic models. In 2014 IEEE
International Conference on Data Mining, pages 755–760. IEEE, 2014.
[9] X. Chen, W. Tang, H. Xu, and X. Hu. Double lda: A sentiment analysis model based
on topic model. In Semantics, Knowledge and Grids (SKG), 2014 10th International
Conference on, pages 49–56. IEEE, 2014.
[10] G. E. Dahl, R. P. Adams, and H. Larochelle. Training restricted boltzmann machines on
word observations. arXiv preprint arXiv:1202.5695, 2012.
[11] M. Dermouche, J. Velcin, L. Khouas, and S. Loudcher. A joint model for topic-sentiment
evolution over time. In 2014 IEEE International Conference on Data Mining, pages 773–
778. IEEE, 2014.
[12] X. Ding, B. Liu, and P. S. Yu. A holistic lexicon-based approach to opinion mining. In
Proceedings of the 2008 international conference on web search and data mining, pages
231–240. ACM, 2008.
[13] X. Ding, B. Liu, and L. Zhang. Entity discovery and assignment for opinion mining applications.
In Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 1125–1134. ACM, 2009.
[14] Z. Gan, C. Chen, R. Henao, D. Carlson, and L. Carin. Scalable deep poisson factor analysis
for topic modeling. In International Conference on Machine Learning, 2015.
[15] J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant.
Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012–
1014, 2009.
[16] X. He, M. Gao, M.-Y. Kan, Y. Liu, and K. Sugiyama. Predicting the popularity of web
2.0 items based on user comments. In Proceedings of the 37th international ACM SIGIR
conference on Research & development in information retrieval, pages 233–242. ACM,
2014.
[17] Y. He, C. Lin, and A. E. Cano. Online sentiment and topic dynamics tracking over
the streaming data. In Privacy, Security, Risk and Trust (PASSAT), 2012 International
Conference on and 2012 International Confernece on Social Computing (SocialCom), pages
258–266. IEEE, 2012.
[18] G. Heinrich. Parameter estimation for text analysis. University of Leipzig, Tech. Rep,
2008.
[19] T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international
ACM SIGIR conference on Research and development in information retrieval,
pages 50–57. ACM, 1999.
[20] F. Hong, C. Lai, H. Guo, E. Shen, X. Yuan, and S. Li. Flda: latent dirichlet allocation
based unsteady flow analysis. IEEE transactions on visualization and computer graphics,
20(12):2545–2554, 2014.
[21] L. Hong, D. Yin, J. Guo, and B. D. Davison. Tracking trends: incorporating term volume
into temporal topic models. In Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 484–492. ACM, 2011.
[22] W.-C. Hsin, J.-Y. Shen, S.-C. Li, Y.-H. Ke, and J.-W. Huang. An introduction to framework
of security event capturing and tracking. In Ubi-Media Computing (UMEDIA), 2015
8th International Conference on, pages 339–343. IEEE, 2015.
[23] B. Hu and M. Ester. Spatial topic modeling in online social media for location recommendation.
In Proceedings of the 7th ACM conference on Recommender systems, pages 25–32.
ACM, 2013.
[24] Y. Jo and A. H. Oh. Aspect and sentiment unification model for online review analysis. In
Proceedings of the fourth ACM international conference on Web search and data mining,
pages 815–824. ACM, 2011.
[25] Z. Kozareva, B. Navarro, S. V´azquez, and A. Montoyo. Ua-zbsa: a headline emotion
classification through web information. In Proceedings of the 4th International Workshop
on Semantic Evaluations, pages 334–337. Association for Computational Linguistics, 2007.
[26] T. Kurashima, T. Iwata, T. Hoshide, N. Takaya, and K. Fujimura. Geo topic model: joint
modeling of user’s activity area and interests for location recommendation. In Proceedings
of the sixth ACM international conference on Web search and data mining, pages 375–384.
ACM, 2013.
[27] A. Lamb, M. J. Paul, and M. Dredze. Separating fact from fear: Tracking flu infections
on twitter. In HLT-NAACL, pages 789–795, 2013.
[28] V. Lampos, T. De Bie, and N. Cristianini. Flu detector-tracking epidemics on twitter. In
Joint European Conference on Machine Learning and Knowledge Discovery in Databases,
pages 599–602. Springer, 2010.
[29] K. Lee, A. Agrawal, and A. Choudhary. Real-time disease surveillance using twitter data:
demonstration on flu and cancer. In Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 1474–1477. ACM, 2013.
[30] A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic
models. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 891–900. ACM, 2014.
[31] H. Li, R. Lin, R. Hong, and Y. Ge. Generative models for mining latent aspects and their
ratings from short reviews. In Data Mining (ICDM), 2015 IEEE International Conference
on, pages 241–250. IEEE, 2015.
[32] K. W. Lim and W. Buntine. Twitter opinion topic model: Extracting product opinions
from tweets by leveraging hashtags and sentiment lexicon. In Proceedings of the 23rd
ACM International Conference on Conference on Information and Knowledge Management,
pages 1319–1328. ACM, 2014.
[33] C. Lin and Y. He. Joint sentiment/topic model for sentiment analysis. In Proceedings
of the 18th ACM conference on Information and knowledge management, pages 375–384.
ACM, 2009.
[34] Y. Liu, Z. Liu, T.-S. Chua, and M. Sun. Topical word embeddings. In AAAI, pages
2418–2424, 2015.
[35] Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In
Proceedings of the 18th international conference on World wide web, pages 131–140. ACM,
2009.
[36] W. Luo, F. Zhuang, X. Cheng, Q. He, and Z. Shi. Ratable aspects over sentiments:
Predicting ratings for unrated reviews. In 2014 IEEE international conference on data
mining, pages 380–389. IEEE, 2014.
[37] T. Luong, R. Socher, and C. D. Manning. Better word representations with recursive
neural networks for morphology. In CoNLL, pages 104–113, 2013.
[38] S. Malik, A. Smith, T. Hawes, P. Papadatos, J. Li, C. Dunne, and B. Shneiderman.
Topicflow: visualizing topic alignment of twitter data over time. In Proceedings of the
2013 IEEE/ACM international conference on advances in social networks analysis and
mining, pages 720–726. ACM, 2013.
[39] J. McAuley, C. Targett, Q. Shi, and A. van den Hengel. Image-based recommendations on
styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference
on Research and Development in Information Retrieval, pages 43–52. ACM, 2015.
[40] T. Mikolov and J. Dean. Distributed representations of words and phrases and their
compositionality. Advances in neural information processing systems, 2013.
[41] S. Moghaddam and M. Ester. Opinion digger: an unsupervised opinion miner from unstructured
product reviews. In Proceedings of the 19th ACM international conference on
Information and knowledge management, pages 1825–1828. ACM, 2010.
[42] S. Moghaddam and M. Ester. Ilda: interdependent lda model for learning latent aspects
and their ratings from online product reviews. In Proceedings of the 34th international
ACM SIGIR conference on Research and development in Information Retrieval, pages
665–674. ACM, 2011.
[43] S. Moghaddam and M. Ester. On the design of lda models for aspect-based opinion mining.
In Proceedings of the 21st ACM international conference on Information and knowledge
management, pages 803–812. ACM, 2012.
[44] S. Moghaddam and M. Ester. The flda model for aspect-based opinion mining: addressing
the cold start problem. In Proceedings of the 22nd international conference on World Wide
Web, pages 909–918. ACM, 2013.
[45] M. Postel. Point-of-interest recommendation in location based social networks with topic
and location awareness. In 2013 Proceedings of the 13th SIAM International Conference
on Data Mining. SIAM, 2013.
[46] D. Proios, M. Eirinaki, and I. Varlamis. Tipme: Personalized advertising and aspectbased
opinion mining for users and businesses. In Proceedings of the 2015 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining 2015, pages
1489–1494. ACM, 2015.
[47] B. Seerat and F. Azam. Opinion mining: Issues and challenges (a survey). International
Journal of Computer Applications, 49(9), 2012.
[48] N. Srivastava, R. R. Salakhutdinov, and G. E. Hinton. Modeling documents with deep
boltzmann machines. arXiv preprint arXiv:1309.6865, 2013.
[49] J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain collaboration recommendation. In
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 1285–1293. ACM, 2012.
[50] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining
of academic social networks. In Proceedings of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 990–998. ACM, 2008.
[51] I. Titov and R. T. McDonald. A joint model of text and aspect ratings for sentiment
summarization. In ACL, volume 8, pages 308–316. Citeseer, 2008.
[52] A. Trabelsi and O. R. Zaiane. Mining contentious documents using an unsupervised topic
model based approach. In 2014 IEEE International Conference on Data Mining, pages
550–559. IEEE, 2014.
[53] H. M. Wallach, D. M. Mimno, and A. McCallum. Rethinking lda: Why priors matter. In
Advances in neural information processing systems, pages 1973–1981, 2009.
[54] H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis on review text data: a rating
regression approach. In Proceedings of the 16th ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 783–792. ACM, 2010.
[55] H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis without aspect keyword supervision.
In Proceedings of the 17th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 618–626. ACM, 2011.
[56] J. Wang, W. Tong, H. Yu, M. Li, X. Ma, H. Cai, T. Hanratty, and J. Han. Mining multiaspect
reflection of news events in twitter: Discovery, linking and presentation. In Data
Mining (ICDM), 2015 IEEE International Conference on, pages 429–438. IEEE, 2015.
[57] Y.Wu, Q. Zhang, X. Huang, and L.Wu. Phrase dependency parsing for opinion mining. In
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:
Volume 3-Volume 3, pages 1533–1541. Association for Computational Linguistics, 2009.
[58] C. Yang, K. H.-Y. Lin, and H.-H. Chen. Building emotion lexicon from weblog corpora.
In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration
Sessions, pages 133–136. Association for Computational Linguistics, 2007.
[59] H. Yin, B. Cui, L. Chen, Z. Hu, and Z. Huang. A temporal context-aware model for user
behavior modeling in social media systems. In Proceedings of the 2014 ACM SIGMOD
international conference on Management of data, pages 1543–1554. ACM, 2014.
[60] H. Yin, Y. Sun, B. Cui, Z. Hu, and L. Chen. Lcars: a location-content-aware recommender
system. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining, pages 221–229. ACM, 2013.
[61] W. X. Zhao, J. Jiang, J.Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and
traditional media using topic models. In European Conference on Information Retrieval,
pages 338–349. Springer, 2011.
[62] C. Zhu, H. Zhu, Y. Ge, E. Chen, and Q. Liu. Tracking the evolution of social emotions: A
time-aware topic modeling perspective. In 2014 IEEE International Conference on Data
Mining, pages 697–706. IEEE, 2014.
[63] Y. Zuo, J. Wu, H. Zhang, D. Wang, H. Lin, F. Wang, and K. Xu. Complementary aspectbased
opinion mining across asymmetric collections. In Data Mining (ICDM), 2015 IEEE
International Conference on, pages 669–678. IEEE, 2015.