簡易檢索 / 詳目顯示

研究生: 巫佳錄
Wu, Chia-lu
論文名稱: 利用搜尋結果片斷建構階層式使用者搜尋目的以改善網路搜尋效能
Construct Hierarchical User Search Goals by Using Search Result Snippets to Improve Web Search Performance
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 71
中文關鍵詞: 支援項量機使用者搜尋目的多重分類網路搜尋語義相似度
外文關鍵詞: Support Verctor Machines, User Search Goals, Semantic Similarity, Web Search, Multi-class Classification
相關次數: 點閱:111下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 網際網路的發明帶給人類許多便利性,但是,目前網際網路上的資料愈來愈豐富,使用者在利用搜尋引擎工具查尋資料時,由於在習慣上使用者所給定之查詢詞通常不會超過三個詞,導致搜尋引擎所傳回的搜尋結果snippets數量過多,使用者通常無法一一瀏覽,而我們認為使用者在傳送查詢詞到搜尋引擎時,其實心中會隱含一個潛在搜尋目的,且此搜尋目的可分為resource-seeking、informational及navigational這三種類型,於是在本論文中,我們從Google搜尋引擎所傳回的搜尋結果snippets中,擷取出符合使用者搜尋目的之文字標籤,並呈現給使用者,以加速使用者的搜尋效能。
    在過程中,本論文採用目前對於分類方法公認最有效的技術SVM來對snippet進行分類處理,並從各分類snippet中自動偵測出符合使用者搜尋目的之文字標籤及改進其語義關聯性,使得語義關聯性較高之使用者搜尋目的可以獲得較高排名值,且利用本論文所提出的“Hierarchical User Search Goal Model”對各分類之使用者搜尋目的做更細緻及全面化分類處理,最後,再利用本論文所提出的“User-Search-Goal-Based Search Model (USGBSM)”來改善搜尋效能。
    本論文最大的貢獻在於,將三類使用者搜尋目的做更細緻及全面化的分類處理,並且在搜尋過程中,引入了查詢詞、使用者搜尋目的及其分類等因素,使得與使用者搜尋目的較相關的搜尋結果snippet可以獲得較佳的排名值,如此,使用者可以更快找到欲點選的搜尋結果snippet,進而提升搜尋效能。

    The invention of the Internet brings much convenience for human community. There are more and more useful and divers data on the web. However the length of submitted queries by users are usually no more than 3 words, so that a lot of search result snippets returned by search engines cause users to spend much time in browsing them one by one. In fact, we consider that users will have potential search goals in their mind when they submit queries to search engines, and there are three classes of user search goals, including resource-seeking, informational, and navigational. In this paper, we extract text labels that matched the user search goal from search result snippets returned by Google, and expect to enhance search performance.
    We use the most popular techniques SVM to deal with the classification of snippet, and detect text labels which matched user search goals and semantic relevance which improved search goals automatically from each class of snippets, so that the user search goals with higher semantic relevance can get higher ranking, and classifies each type of user search goals in depth by our proposed Hierarchical User Search Goal Model. Finally, we improve search performance by proposing a User-Search-Goal-Base Search Model (USGBSM).
    The major contribution of this paper is that we further classifies three classes of user search goals in depth based on some new factors like query term, user search goal to enhance search performance thus users can find the snippet that they want to click more quickly.

    摘要 iv Abstract vi 誌謝 viii 目錄 ix 圖目錄 xi 表目錄 xii 第1章 導論 1 1.1 研究動機 1 1.2 問題描述 2 1.3 論文組織 4 第2章 相關文獻及研究 5 2.1 改善網路搜尋效能 5 2.1.1 鏈結結構演算法 5 2.1.2 個人化網路搜尋 5 2.1.3 搜尋結果叢集化 6 2.1.4 利用anchor text資訊 6 2.2 User Search Goal相關研究 7 2.2.1 User Search Goal分類 7 2.2.2 偵測User Search Goal方法 7 2.3 改善詞配對語義關聯性 9 2.3.1 使用網路搜尋引擎 9 2.3.2 使用分類知識 10 第3章 研究方法 11 3.1 觀察與想法 11 3.1.1 搜尋結果片斷分類問題 11 3.1.2 User Search Goal Diversity觀察 11 3.1.3 User Search Goal語義適合度觀察 14 3.1.4 改善搜尋效能之想法 15 3.2 搜尋結果片斷分類 16 3.2.1 分類機制 16 3.2.2 Feature選擇 17 3.3 Hierarchical User Search Goal Model 21 3.3.1 Category-Based User Search Goal Model 22 3.3.2 Identification of Resource-seeking Search Goal 22 3.3.3 Identification of Informational Search Goal 25 3.3.4 Identification of Navigational Search Goal 27 3.3.5 Domain-Based Search Goal Validation Model 28 3.4 Subcategory-Based User Search Goal Model 29 3.4.1 Resource-seeking Search Goal分類 29 3.4.2 Informational Search Goal分類 34 3.4.3 Navigational Search Goal分類 35 3.5 User-Search-Goal-Based Search Model 37 3.5.1 Category Model 38 3.5.2 Search Model 39 第4章 實驗 43 4.1 實驗資料集 43 4.2 搜尋結果片斷分類效能 43 4.3 User Search Goal適合度 53 4.4 User Search Goal分類效能 57 4.5 評估搜尋效能 60 4.6 系統介紹 63 第5章 結論與未來研究方向 65 5.1 結論 65 5.2 未來研究方向 66 參考文獻 67 附錄 70 訓練資料集之高頻查詢詞 70 測試資料集之高頻查詢詞 71

    [1] P. Baldi, P. Frasconi, and P. Symth. Modeling the Internet and the Web:Probabilistic Methods and Algorithms. John Wiely & Sons Inc., 2003.
    [2] D. Bollegala, Y. Matsuo, and M. Ishizuka. Measuring Semantic Similarity between Words using Web Search Engines. In Proceedings of the 16th International Conference on World Wide Web (WWW), Pages 757-766, 2007.
    [3] L. Bottou, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, L.D. Jackel, Y. LeCun, U.A. Muller, E. Sackinger, P. Simard, and V. Vapnik. Comparison of Classifier Methods:A Case Study in Handwritten Digit Recognition. In Proceedings of the International Conference on Pattern Recognition, Pages 77-87, IEEE Computer Society Press, 1994.
    [4] A. Broder. A Taxonomy of Web Search. SIGIR Forum 36(2), 2002.
    [5] P.A. Chirita, W. Nejdl, R. Paiu, and C. Kohlschutter. Using ODP Metadata to Personalize Search. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 178-185, 2005.
    [6] CKIP 中文斷詞系統. http://ckipsvr.iis.sinica.edu.tw/.
    [7] N. Craswell, D. Hawking, and S. Robertson. Effective Site Finding using Link Anchor Information. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 250-257, 2001.
    [8] N. Eiron and K.S. McCurley. Analysis of Anchor Text for Web Search. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 459-460, 2003.
    [9] P. Ferragina and A. Gulli. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. In Proceedings of the 14th International Conference on World Wide Web (WWW), 2005.
    [10] J. Friedman. Another Approach to Polychotomous Classification, Technical report, Department of Statistics, Stanford University, 1996.
    [11] Google 完全手冊 http://www.google.com/intl/zh-TW/why_use.html.
    [12] Google Search Engine:http://www.google.com/.
    [13] C.W. Hsu and C.J. Lin. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, Vol. 13, Pages 415-425, 2002.

    [14] B. J. Jansen, A. Spink, and T. Saracevic. Real Life, Real Users, and Real Needs:A Study and Analysis of User Queries on the Web. Information Processing and Management, Vol. 36, no. 2, Pages 207-227, 2000.
    [15] T. Joachims. Optimizing Search Engines using Clickthrough Data. In Proceedings of Knowledge Discovery in Databases (SIGKDD), 2002.
    [16] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately Interpreting Clickthrough Data as Implicit Feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.
    [17] I.H. Kang and G. Kim. Query Type Classification for Web Document Retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 64-71, 2003.
    [18] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithm, 1998.
    [19] U. KreBel. Advances In Kernel Methods:Support Vector Learning Book Contents. MIT Press, Cambridge, MA, USA, Pages 255–268, 1999.
    [20] J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the 18th International Conference on Machine Learning, Pages
    282-289. Morgan Kaufmann, San Francisco, CA, 2001.
    [21] U. Lee, Z. Liu, and J. Cho. Automatic Identification of User Goals in Web Search. In Proceedings of the 14th International Conference on World Wide Web (WWW), Pages 391-400, 2005.
    [22] H. Li, Y. Cao, J. Xu, Y. Hu, S. Li, and D. Meyerzon. A New Approach to Intranet Search Based on Information Extraction. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM), Pages 460-468, 2005.
    [23] Y. Li, Z.A. Bandar, and D. Mclean. An Approach for Measuring Semantic Similarity between Words using Multiple Information Sources. IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, 2003.
    [24] LIBSVM:http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
    [25] Y. Liu, M. Zhang, L. Ru, and S. Ma. Automatic Query Type Identification Based on Click Through Information. Springer-Verlag Berlin Heidelberg, Pages 593–600, 2006.
    [26] ODP:http://www.dmoz.org/.
    [27] L. Page, S. Brin, R. Motwani, and T. Windograd. The PageRank Citation Ranking:Bringing Order to the Web. Stanford Digital Library Technologies Project, 1998.
    [28] F. Qiu and J. Cho. Automatic Identification of User Interest for Personalized Search. In Proceedings of the 15th International Conference on World Wide Web (WWW), 2006.
    [29] F. Radlinski and T. Joachims. Query Chains:Learning to Rank from Implicit Feedback. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2005.
    [30] D.E. Rose and D. Levinson. Understanding User Goals in Web Search. In Proceedings of the 13th International Conference on World Wide Web (WWW), Pages 13-19, 2004.
    [31] H.J. Zeng, Q.C. He, Z. Chen, W.Y. Ma, and J. Ma. Learning to Cluster Web Search Results. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 210-217, 2004.
    [32] B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.Y. Ma. Improving Web Search Results using Affinity Graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005.
    [33] C.N. Ziegler, K. Simon, and G. Lausen. Automatic Computation of Semantic Proximity using Taxonomic Knowledge. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), Pages 465-474, 2006.
    [34] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving Web Pages using Content, Links, URLs and Anchors. In Proceedings of TREC10, 2002.
    [35] WordNet:http://wordnet.princeton.edu/perl/webwn.
    [36] W. Xi, B. Zhang, Z. Chen, Y. Lu, S. Yan, W.Y. Ma, and E.A. Fox. Link Fusion:A Unified Link Analysis Framework for Multi-Type Interrelated Data Objects. In Proceedings of the 13th International Conference on World Wide Web (WWW), Pages 319-327, 2004.
    [37] G.R. Xue, H.J. Zeng, Z. Chen, Y. Yu, W.Y. Ma, W. Xi, and W.G. Fan. Optimizing Web Search using Web Click-through Data. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM), Pages 118-126, 2004.
    [38] 何寬禹、盧文祥,碩士論文:自動偵測隱含使用者目的以改善網路搜尋 Automatically Identifying Latent User Goals to Improve Web Search,2007.

    下載圖示 校內:立即公開
    校外:2008-09-03公開
    QR CODE