簡易檢索 / 詳目顯示

研究生: 陳松奇
Chen, Sung-Chi
論文名稱: 虛擬世界實體人物搜尋方法發展
Development of a Method of Physical People Searching in Virtual World
指導教授: 陳裕民
Chen, Yu-Min
共同指導教授: 陳宗義
Chen, Tsung-Yi
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 製造資訊與系統研究所
Institute of Manufacturing Information and Systems
論文出版年: 2013
畢業學年度: 102
語文別: 中文
論文頁數: 58
中文關鍵詞: 虛擬世界網路人物搜尋屬性萃取關聯法則分群演算法
外文關鍵詞: Virtual World, Online social networks, Web People Search, Attribute, Extraction, Association Rule, Clustering
相關次數: 點閱:170下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著網際網路的發展,網路使用者在虛擬世界中進行各種活動,使得人與人之間具有許多虛實之間的關係,形成虛擬與實體交錯的複雜社會網路。人們可能出自於工作、學習或是好奇,想得知其他人的相關訊息。企業亦希望能夠分析顧客偏好並給予推薦,以期能夠提升顧客忠誠度。然而顧客提供之資訊多為基本資訊,若能透過網路蒐集更多該顧客之資訊,如個人首頁、社交圈等,將有利於分析顧客偏好。但由於網路上訊息的零碎性,以及姓名具有的模糊性,使用者在搜尋人物過程中,易遭遇同名人物問題混淆,且難以萃取人物相關資訊。本研究提出一虛擬世界實體人物搜尋方法,藉由網友於網路公開之資訊,以自動化方法獲取實體人物資訊並區分同名人物,迅速整合及消歧網際網路中相同姓名之人物資訊,令使用者快速地於網際網路中進行人物搜尋及判斷。本研究主要研究項目為:(1)人物輪廓模型設計;(2)虛擬世界實體人物搜尋方法設計;(3)虛擬世界實體人物搜尋模式技術開發。

    With the development of Internet, user can engage in a variety of activities in Virtual World, producing many relationship between people in virtual and physical world, forming a far-reaching, intertwined by virtual, physical, complex social network, and people may want to know about information of other people because of their work, study or curiosity. Enterprise also hope to analyze customer preferences and give recommendations, with a view to enhance customer loyalty. However, the information provided by the customer more for basic information, if the Enterprise via the Internet to gather more information, such as personal home page, circle, etc., will help analyze customer preferences. However, because of the fragmentary information on Internet, and the people who have same name could lead to confusion, caused it’s hard to confirm user and extract the information of user. Therefore, this study aims to develop a Physical People Searching Method in Virtual World. Extracting the information of people on Internet and disambiguating the same name of different people. It can be used to construct the social network as a main step. This research topic including: (1) A Personal Profile Model was proposed; (2) A Physical People Searching Model in Virtual World was proposed; (3) A Physical People Searching Method in Virtual World was developed.

    目錄 摘要 I Abstract II 誌謝 III 目錄 IV 表目錄 VI 圖目錄 VII 第一章 緒論 1 1.1研究背景 1 1.2研究動機 2 1.3研究目的 3 1.4研究問題分析 3 1.5研究項目 4 1.6研究步驟 5 1.7論文架構 7 第二章 文獻探討 8 2.1社交網路服務 8 2.2資訊萃取 9 2.2.1人物屬性萃取 9 2.2.2命名實體辨識 10 2.3同名人物識別 11 2.3.1別名萃取 11 2.3.2人名消歧 11 2.4資料探勘 12 2.4.1關聯法則(Associations Rules) 13 2.4.2分群(Clustering) 13 第三章 模式與模型設計 16 3.1虛擬世界實體人物搜尋模式設計 16 3.2人物輪廓模型設計 18 第四章 虛擬世界實體人物搜尋技術發展 21 4.1人物相關內容擷取 21 4.1.1 網頁內容前處理 21 4.1.2 網頁文字特徵擷取 22 4.2人物屬性萃取 22 4.2.1 字典萃取法 23 4.2.2 屬性格式萃取法 24 4.2.3 屬性前後文萃取法 25 4.2.4 網頁結構位置萃取法 26 4.2.5 命名實體辨識萃取法 28 4.3屬性相似度計算 30 4.3.1 屬性權重更新 30 4.3.2 屬性值相似度計算 31 4.4同名人物識別 33 4.4.1 人物相似度計算 33 4.4.2 同名人物分群 34 第五章 系統實作與驗證 37 5.1虛擬世界實體人物搜尋方法驗證流程 37 5.1.1 測試階段驗證 37 5.1.2 應用階段驗證 39 5.2人物屬性萃取方法驗證 40 5.3屬性權重計算結果討論 42 5.4 同名人物識別方法驗證 44 5.4.1 評估準則 44 5.4.2 同名人物識別結果: 46 5.4.3 方法比較 49 5.5 應用階段驗證結果 50 第六章 結論與未來方向 52 6.1 結論 52 6.2研究限制 53 6.3未來方向 53 參考文獻 54

    Adamic, L., Adar, E. (2005), “How to search a social network,” Social Networks, 27, 187–203.
    Agrawal, R. and Srikant, R., (1994). “Fast algorithms for mining association rules,” In Proc. 20th Int. Conf. Very Large Data Bases, VLDB 1215, 487-499.
    Ahn, Y. Y., Han, S., Kwak, H., Eom, Y. H., Moon. S., and Jeong, H. (2007). “Analysis of Topological Characteristics of Huge Online Social Networking Services,” WWW2007 – In Proceedings of International World Wide Web Conference, 835-844.
    Alexander, R., and Michael, K. (2008). “Functions of Social Networking Services, ” Proceedings of the 8th International Conference on the Design of Cooperative Systems, 87-98.
    Artiles, J., Gonzalo, J. and Verdejo, F. (2005), “A Testbed for People Searching Strategies in the WWW,” Proc. SIGIR, 569-570.
    Bollegala , D., Matsuo, Y., Ishizuka, M. (2006), “Extracting Key Phrases to Disambiguate Personal Name Queries in Web Search,” Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?, 17–24.
    Bollegala, D., Matsuo, Y., Ishizuka, M. (2011), “Automatic Discovery of Personal Name Aliases from the Web,” IEEE Transactions on Knowledge and Data Engineering, 23(6), 831-844.
    Boyd, d. m., Ellison, N. B. (2007), “Social NETWORK Sites: Difinition, History, and Scholarship, ” Journal of Computer-Mediated Communication, 13(1), 210-230.
    Casteleyn, J., Mottart, A., and Rutten, K., (2009) “How to use Facebook in your market research,” International Journal of Market Research, 51 (4) 439–447.
    Chen, Y., Lee, S. Y. M., Huang, C. R. (2012),” A robust web personal name information extraction system,” Expert Systems with Applications, 39, 2690–2699.
    Chen, Y., Martin, J. (2007),”Towards Robust Unsupervised Personal Name Disambiguation,” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 190-198.
    Delen, D., Walker, G., and Kadam, A. (2005). “Predicting breast cancer survivability: a comparison of three data mining methods,” Artificial intelligence in medicine, 34(2), 113-128.
    Dodds, P.S., Muhamad, R.,Watts, D.J., 2003, “An experimental study of search in global social networks,” Science 301, 827–829.
    Fan, H., Poole, M. S. (2006), “What is personalization? Perspectives on the design and implementation of personalization in information systems,” Journal of Organizing Computing and Electronic Commerce, 16, 3-4, 179-202.
    Fattahi, R., Wilson, C. S., Cole, F. (2008), “An alternative approach to natural language query expansion in search engines: Text analysis of non-topical terms in Web documents.” Information Processing and Management, 44, 1503–1516.
    Guha, R. and Garg, A. (2004), “Disambiguating People in Search,” In Proceedings of the 13th World Wide Web Conference, ACM Press.
    Han, J., Kamber, M., and Pei, J.(2006), “Data mining: concepts and techniques.” Morgan kaufmann.
    Han, J., Pei, J., & Yin, Y., (2000). “Mining frequent patterns without candidate generation,” In ACM SIGMOD Record 29, (2), 1-12.
    Heidemann, J., Klier, M., Probst, F. (2012), “Online social networks: A survey of a global phenomenon,” Computer Networks 56,3866–3878.
    Ho, S. Y. (2006), “The attraction of Internet personalization to web users,” Electronic Markets, 16, 1, 41–50.
    Hong, C.M., Chen, C.M. , Chiu, C.Y. (2009), “Automatic extraction of new words based on Google News corpora for supporting lexicon-based Chinese word segmentation systems,” Expert Systems with Applications, 36, 3641–3651.
    Iria, J., Xia, L., Zhang, Z. (2007), “Web People Search Disambiguation using Random Walks,” Proceedings of the 4th International Workshop on Semantic Evaluations, 480-483.
    Iribarren, J. L., Moro,E. (2011), ”Affinity Paths and information diffusion in social networks,” Social Networks, 33, 134–142.
    Jiang, J.M, Guo, H.L., Hu, G. and Zhang, T. (2003), “Chinese named entity recognition by regularized winnow algorithm,” In Proceedings of 20th International Conference on Computer Processing of Oriental Languages.
    Jing, H.Y., Florian, R., Luo, X.Q. and Zhang, T. and Ittycheriah, A. (2003), “How to get a chinese name (entity): Segmentation and combination issues,” In EMNLP 2003.
    Kettles, D., David, S.,(2008) “The business value of social network technologies: a framework for identifying opportunities for business value and an emerging research,” Proceedings of the Americas Conference on Information Systems -AMCIS, 14-17.
    Lan, M., Zhang, Y. Z., Lu, Y., Su, J., Tan, C. L. (2009),“Which Who are They? People Attribute Extraction and Disambiguation in Web Search Results,” In 2nd Web People Search Evaluation Workshop, 18th WWW Conference.
    Lefever, E., Fayruzov, T., Hoste, V., De Cock, M. (2010), “Clustering web people search results using fuzzy ants,” Information Sciences , 180 , 3192–3209.
    Levin, F.H., Heuser, C.A. (2010), “Evaluating the Use of Social Networks in Author Name Disambiguation in Digital Libraries,” Journal of Information and Data Management, 1(2), 183–197.
    Liao, X., Yu, H., Qin, B., Liu, T. (2004), “HMM combined with automatic rules-extracting for Chinese Named Entity recognition,” 2rd proceedings of Student Workshop of Computational Linguistics.
    Malin, B. (2005), “Unsupervised name disambiguation via social network similarity,” Proceedings of the SIAM Workshop on Link Analysis, Counterterrorism, and Security, held in conjunction with the 2005 SIAM International Conference on Data Mining, 93-102.
    Mann, G. S., and Yarowsky, D., (2003). “Unsupervised personal name disambiguation,” In Proceedings of the seventh conference on Natural language learning at HLT-NAACL Association for Computational Linguistics, 4, 33-40.
    Milgram, S. (1967), “The small world problem,” Psychology Today, 1, 60–67.
    Pallis, G., Zeinalipour-Yazti, D., Dikaiakos, M.D. (2011) “Online social networks: status and trends” New Directions in Web Data Management, Springer, 213–234.
    Pempek, T. A., Yermolayeva, Y. A., Calvert, S. L. (2009), “College students' social networking experiences on Facebook,” Journal of Applied Developmental Psychology , 33, 227–238.
    Peppers, D., Rogers, M., Dorf, B. (1999), “Is Your Company Ready for One-to-One Marketing,” Harvard Business Review, 77(1), pp.151-160.
    Powers, D. M. (2007). “Evaluation: From precision, recall and f-factor to roc, informedness, markedness & correlation,” School of Informatics and Engineering, Flinders University, Adelaide, Australia, Tech. Rep. SIE-07-001.
    Salton, G., Wong, A. and Yang, C. S., (1975), "A Vector Space Model for Automatic Indexing," Communications of the ACM, 18, (11), 613-620.
    Schneider, F., Feldmann, A., Krishnamurthy, B., Willinger, W., (2009) “Understanding online social network usage from a network perspective,” In Proceedings of the ACM SIGCOMM Conference on Internet, Measurement, 35–48.
    Stanley Milgram, (1967), “The Small World Problem.” Psychology Today, 2, 60–67
    Sun, J., Gao, J.F., Zhang, L., Zhou, M. and Huang, C.N. (2002), “Chinese named entity identification using class-based language model,” COLING.
    Tam, K. Y., Ho, S. Y. (2006), “Understanding the impact of web personalization on user information processing and decision outcomes,” MIS Quarterly, 30, 4, 865–890.
    Vilain, M., Su, J., and Lubar, S. (2007). “Entity extraction is a boring solved problem: or is it?,” In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, 181-184. Association for Computational Linguistics.
    Viterbi A.J. (1967), “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions on Information Theory, 13(2) 260-269.
    Vu, Q. M., Masada, T., Takasu, A., Adachi, J. (2007), “Disambiguation of People in Web Search Using a Knowledge Base,” Proceedings of RIVF'2007, 185-191.
    Wan, X., Gao, J., Li, M., and Ding, B. (2005)“Person Resolution in Person Search Results:WebHawk,” In Proceedings of ACM 14th Conference on Information and Knowledge Management, 163-170.
    Wang, F., Li, J., Tang, J., Zhang, J., and Wang, K. (2008), “Name disambiguation using atomic clusters,” In Web-Age Information Management. WAIM'08. The Ninth International Conference on IEEE, 357-364.
    Wei, Y. C., Lin, M. S., Chen H. H. (2010), “A cascaded classification approach to disambiguating polysemous mentions with social chains,” Expert Systems with Applications, 37, 5404-5414.
    Wong, T. L., Lam, W. (2009), “An unsupervised method for joint information extraction and feature mining across different Web sites,” Data & Knowledge Engineering, 68, 107-125.
    Yu, S.H., Bai, S.H.,Wu, P. (1998) “Description of the kent ridge digital labs system used for muc-7,” In Proceedings of the Seventh Message Understanding Conference.
    Zhou, G.D., Su, J., (2002), “ Named Entity Recognition using an HMM-based Chunk Tagger,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, 473-480.
    楊捷扉,“人物搜尋之資訊擷取與分類”,國立清華大學資訊系統與應用研究所碩士論文,2006。
    潘麒全,“可修正的二分群集法”,私立中原大學資訊管理研究所碩士論文,2003。
    中央研究院中文斷詞系統,http://ckipsvr.iis.sinica.edu.tw/,2013。

    無法下載圖示 校內:2018-11-27公開
    校外:不公開
    電子論文尚未授權公開,紙本請查館藏目錄
    QR CODE