| 研究生: |
張洋瑞 Chang, Yang-Jui |
|---|---|
| 論文名稱: |
利用雙分修訂圖之維基百科編輯預測 Editing Prediction in Wikipedia by Bipartite Revision Graph |
| 指導教授: |
高宏宇
Kao, Hung-Yu |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2013 |
| 畢業學年度: | 101 |
| 語文別: | 英文 |
| 論文頁數: | 47 |
| 中文關鍵詞: | 連結預測 、維基百科 、編輯網路 、雙分圖 、社群偵測 |
| 外文關鍵詞: | Link prediction, Wikipedia, Edit network, Bipartite, Community detection |
| 相關次數: | 點閱:108 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
預測連結問題目標是推測社群網路成員間過去未曾出現的互動。典型的預測連結方法考慮的是點與點間的局部資訊,例如兩點間的相似度或最接近的鄰居等結構性相似度。然而在一個重要的網路類型-雙分圖中,可以產生連結的兩個點間並沒有一個直接的相似度判准。在雙分圖中分屬不同分部的兩個點並不會擁有共同的鄰居,因此考慮局部資訊的預測方法勢必要針對雙分圖進行調整。而當提及預測社群網路中的連結時,除了局部的相似度,利用社群資訊來協助預測模型也是很正常的做法。本論文將連結預測的問題定位在維基百科的雙分編輯圖上,並進一步調查這個編輯圖上的社群資訊。由於維基百科是受成員維護之線上社群最為成功的例子之一,我們相信擷取其內的社群資訊並用以輔助連結預測,可以令擁有內容創作的類似領域得益。也因為在雙分圖上利用社群資訊去輔助預測連結模型的相關問題尚未被定位清楚,因此我們設計並整合了兩個針對雙分圖的預測模型:第一,針對局部資訊做調整的監督式學習預測模型;第二,利用社群資訊的社群知覺預測模型。我們在經由維基百科建構的資料集上的實驗顯示,相較於一般的K最近鄰居法,我們的方法在F1-measure可以達到11%的預測效能改進。除此之外,我們更對編輯網路進行社群結構的調查,並提供另一個檢視維基百科社群的方法。
Link prediction problem aims to infer the new interactions among members in the social network that have not occurred in the past. Classical approaches for link prediction usually use local information, which considers the similarity of two nodes or the structural information such as the immediate neighborhood. However, in a bipartite graph, there is no straightforward similarity measure between two linking nodes. Since two nodes belong to different types in a bipartite graph will not have any common neighbors either, the local model needs to be adjusted to apply on a bipartite relation. In addition to the local information of similarity, when referring to link prediction in a social network, it is naturally to employ the community information to improve the prediction methods. In this paper, we address the link prediction problem on the bipartite edit graph of Wikipedia. Furthermore, we examine the structure of community in this edit graph. As Wikipedia is one of the most successful member-maintained online communities, extract the community information and solve the bipartite link prediction problem for it can help shed light on the process of content creation. Besides, to the best of our knowledge, the problem of using the community information in bipartite for predicting the link occurrence has not been clearly addressed. Hence we designed and integrated two bipartite-specific approaches to predict the link occurrence: First, the supervised learning approach which is built around the adjusted features of local model. Second, the community-awareness approach which utilizes the community information. Experiments conducted on the collection of Wikipedia show that in terms of F1-measure, our approaches can get an 11% improvement over the general methods based on the K-Nearest Neighbor. Apart from this, we also investigated the structure of communities in the edit network and presented a different viewpoint to examine the communities of Wikipedia.
REFERENCES
[1]L. A. Adamic and E. Adar, "Friends and neighbors on the Web," Social Networks, vol. 25, pp. 211 - 230, 2003.
[2]A. L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek, "Evolution of the social network of scientific collaborations," Physica A: Statistical Mechanics and its Applications, vol. 311, pp. 590-614, 2002.
[3]G. Beenen, K. Ling, X. Wang, K. Chang, D. Frankowski, P. Resnick, and R. E. Kraut, "Using social psychology to motivate contributions to online communities," in Proceedings of the 2004 ACM conference on Computer supported cooperative work Chicago, Illinois, USA: ACM, 2004, pp. 212-221.
[4]N. Benchettara, R. Kanawati, and C. Rouveirol, "Supervised Machine Learning Applied to Link Prediction in Bipartite Social Networks," in Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining: IEEE Computer Society, 2010, pp. 326-330.
[5]C.-c. Chang and C.-J. Lin, "LIBSVM: a Library for Support Vector Machines," 2001.
[6]U. Essen and V. Steinbiss, "Cooccurrence smoothing for stochastic language modeling," in Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1 San Francisco, California: IEEE Computer Society, 1992, pp. 161-164.
[7]M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki, "Link Prediction Using Supervised Learning," in In Proc. of SDM 06 workshop on Link Analysis, Counterterrorism and Security, 2006.
[8]Z. Huang, X. Li, and H. Chen, "Link prediction approach to collaborative filtering," in Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries Denver, CO, USA: ACM, 2005, pp. 141-142.
[9]T. Iba, K. Nemoto, B. Peters, and P. A. Gloor, "Analyzing the Creative Editing Behavior of Wikipedia Editors: Through Dynamic Social Network Analysis," Procedia - Social and Behavioral Sciences, vol. 2, pp. 6441 - 6456, 2010.
[10]e. a. J. Leskovec, "SNAP website," 2012.
[11]S. Jianbo and J. Malik, "Normalized cuts and image segmentation," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, pp. 888-905, 2000.
[12]J. Kamps and M. Koolen, "Is Wikipedia link structure different?," in Proceedings of the Second ACM International Conference on Web Search and Data Mining Barcelona, Spain: ACM, 2009, pp. 232-241.
[13]R. Kannan, S. Vempala, and A. Vetta, "On Clusterings: Good, Bad and Spectral," 2000.
[14]G. Karypis and V. Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs," SIAM J. Sci. Comput., vol. 20, pp. 359-392, 1998.
[15]L. Katz, "A new status index derived from sociometric analysis," Psychometrika, vol. 18, pp. 39-43, March 1953.
[16]A. Kittur and R. E. Kraut, "Beyond Wikipedia: coordination and conflict in online production groups," in Proceedings of the 2010 ACM conference on Computer supported cooperative work Savannah, Georgia, USA: ACM, 2010, pp. 215-224.
[17]J. M. Kleinberg, "The small-world phenomenon: an algorithm perspective.," in STOC: ACM, 2000, pp. 163-170.
[18]J. Kunegis, E. W. D. Luca, and S. Albayrak, "The link prediction problem in bipartite networks," in Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty Dortmund, Germany: Springer-Verlag, 2010, pp. 380-389.
[19]L. Lee, "Measures of distributional similarity," in Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics College Park, Maryland: Association for Computational Linguistics, 1999, pp. 25-32.
[20]J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins, "Microscopic evolution of social networks," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining Las Vegas, Nevada, USA: ACM, 2008, pp. 462-470.
[21]J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, "Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters," CoRR, vol. abs/0810.1355, 2008.
[22]J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, "Statistical properties of community structure in large social and information networks," in Proceedings of the 17th international conference on World Wide Web Beijing, China: ACM, 2008, pp. 695-704.
[23]D. Liben-Nowell and J. Kleinberg, "The link prediction problem for social networks," in Proceedings of the twelfth international conference on Information and knowledge management New Orleans, LA, USA: ACM, 2003, pp. 556-559.
[24]W. Liu and L. Lu, "Link Prediction Based on Local Random Walk," Jan 2010.
[25]M. Mitzenmacher, "A Brief History of Generative Models for Power Law and Lognormal Distributions " Internet Mathematics, vol. 1, pp. 226-251, 2004.
[26]M. E. J. Newman, "The structure of scientific collaboration networks," Proceedings of the National Academy of Sciences of the United States of America, vol. 98, pp. 404--409, 2001.
[27]M. E. J. Newman, "Clustering and preferential attachment in growing networks," Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), vol. 64, p. 025102, August 2001.
[28]G. Salton and M. J. McGill, Introduction to Modern Information Retrieval: McGraw-Hill, Inc., 1986.
[29]G. Siganos and U. C. Riverside, "A simple conceptual model for the internet topology," in in Global Internet, 2001, pp. 1667-1671.
[30]J. G. K. J. L. Snell, "Finite Markov Chains," Springer-Verlag, 1976.
[31]S. Soundarajan and J. Hopcroft, "Using community information to improve the precision of link prediction methods," in Proceedings of the 21st international conference companion on World Wide Web Lyon, France: ACM, 2012, pp. 607-608.
[32]T. Tylenda, R. Angelova, and S. Bedathur, "Towards time-aware link prediction in evolving social networks," in Proceedings of the 3rd Workshop on Social Network Mining and Analysis Paris, France: ACM, 2009, pp. 1-10.
[33]J. Voss, "Measuring Wikipedia," 2005.
[34]C. T. Zahn, "Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters," Computers, IEEE Transactions on, vol. C-20, pp. 68-86, 1971.