| 研究生: |
何寬禹 He, Kuan-yu |
|---|---|
| 論文名稱: |
自動偵測隱含使用者目的以改善網路搜尋 Automatically Identifying Latent User Goals to Improve Web Search |
| 指導教授: |
盧文祥
Lu, Wen-hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2007 |
| 畢業學年度: | 95 |
| 語文別: | 英文 |
| 論文頁數: | 61 |
| 中文關鍵詞: | 使用者需求 、使用者行為 、使用者目的 、網路搜尋 |
| 外文關鍵詞: | User Goals, User Behavior, User Needs, Web Search |
| 相關次數: | 點閱:98 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著網際網路的快速擴張,資訊量呈現爆炸性的成長。在此環境下,使用者常常會遭遇到資訊過量的問題。因此,身為使用搜尋引擎的全球廣大使用者,殷切期盼搜尋引擎能夠快速回傳符合他們需求的正確資訊。然而,僅藉由使用者習慣下達的短查詢詞了解使用者的目的,對於現今的搜尋引擎而言,似乎是困難的。
自搜尋引擎誕生以來,搜尋引擎的搜尋機制,有了許多的演進。從最早期使用的傳統資訊檢索技術-關鍵字比對,到後來考量網頁間鍵結結構的鍵結演算法,與利用使用者過去點選紀錄的相關搜尋機制等等,皆可在當時既有的搜尋技術下,推動一波改進的浪潮。
但隨著網際網路的資源越來越具多樣性,使用者在這樣的環境下,需求跟著多樣化。因此,作為快速取得網際網路豐富資源的窗口的搜尋引擎,使用者對於它的搜尋效能會有更進一步的要求。顯然地,過去發展的搜尋機制,已經越來越難以滿足使用者越來越多樣化的需求。
在最近的改進網路搜尋的浪潮中,基於使用者目的的改進機制已成了新一波的主流。唯有了解使用者為什麼做網路搜尋,即了解使用者的目的為何,才能確實有效地替網路搜尋帶來革命性的改進。但目前對於使用目的的相關研究,僅僅分析使用者目的本身的特性,或者提出自動化的機制判斷使用者在下達查詢詞後,其背後可能存在的使用者目的類型為何,並無法真正有效偵測可能的使用者目的,進而改進網路搜尋。
在本篇論文中,我們提出改善過去使用語法結構(動名詞配對)的方法,利用搜尋結果片段進而偵測其中隱含的使用者目的。與過去方法不同的是,我們利用由監督式學習與Boostrapping兩種方式得到的提示動詞,並考量URL與標題資訊,將搜尋結果片段有效分類至三類,這三類類型分別為資源類型,資訊類型與瀏覽類型。另外,我們提出三個不同的偵測使用者目的模型,利用三類類型的搜尋結果片段,偵測出三種不同類型且具廣泛性的隱含使用者目的。我們亦提出使用者目的統合模型,將三類型的使用者目的依查詢詞本身的特性統合起來,我們並利用統合使用者目的將搜尋結果作重新排名,達到改進網路搜尋的效果。概觀此論文,最大的貢獻即將最具多樣性也最難以偵測的資源型使用者目的,以動名詞組合的型式有效偵測出來。
實驗結果顯示,本論文所提出偵測資源型使用者目的方法表現得比過去提出的方法來得優異。另外,我們亦偵測出過去方法所忽略的資訊型與瀏覽型隱含使用者目的,也有很好的表現。實驗結果亦顯示了,基於偵測出的使用者目的所得到的重新排名搜尋結果,可更加滿足使用者的搜尋需求。
With the quick expansion of the Web, users often suffer from the problem of information overloading. Therefore, it is eagerly expected for users that search engines could quickly respond exact results what they want. However, it seems difficult for most existing search engines to understand user needs exactly behind diverse short queries with limited information.
There is much evolution in the history of developing search mechanisms after the advent of search engines in the world. The search mechanisms, such as keyword matching which is the classical technique in informational retrieval, link-structure algorithms which consider the link structure between Web pages, and the related search techniques which employ user’s past click-through data, all can bring a wave of improvement to search engines in the times.
As the resources in Web are getting more and more heterogeneous, the needs of users are also getting more and more various. As the role of a window to facilitate users to quickly obtain the resources in Web, search engines are required more heavily by Web users day by day.
In the current wave of improving Web search, search mechanisms based on user goals become a main stream. Just accurately understanding what the user needs are, that is, why users do Web search, can bring an evolution of improvement to the Web. However, nowadays, current researches on user goals simply discussed the characteristics of user goals, or proposed an automatic approach to judge the category of the possible user goal behind an issued query, not mentioned that they can identify possible user goals to further improve Web search.
In this thesis, we propose an enhanced approach to utilizing search-result snippets to identify latent user goals for improving our previous method which employs syntactic structures (verb-object pairs) to discover a variety of latent user goals. Our new approach employs supervised-learning and boostrapping techniques to learn hint verbs, and considers URL information and title information to classify snippets into three major categories, which are resource-seeking, informational, and navigational. Also, we propose three different methods to identify three different categories of diverse latent user goals from classified snippets. In addition, we propose a unified user goal model to unify three categories of user goals and finally employ our proposed search model utilizing unified user goals to re-rank search results. The most valuable contribution in this thesis is that our method can identify heterogeneous resource-seeking goals, which are very difficult to identify, in the form of verb-object pair.
The experimental results show that the performance of our new method of resource-seeking goal identification is better than our previous method and we also identify two other categories (informational and navigational) of latent user goals ignored by our previous work. The experimental results also show that the re-ranked search results based on user goals fitting to users can more satisfy users’ search need.
[1] Aktas, M., Nacar, M., and Menczer, F. (2004). Personalizing PageRank based on domain profiles. In Proceedings of WebKDD 2004: KDD Workshop on Web Mining and Web Usage Analysis.
[2] Agichtein, E., Brill, E., Dumais, S., & Ragno, R. (2006). Learning User Interaction Models for Predicting Web Search Result Preferences. In Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).
[3] Agichtein, E., Brill, E., & Dumais, S. (2006). Improving Web Search Ranking by Incorporating User Behavior. In Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).
[4] Baldi, P., Frasconi, P., Symth, P. (2003), Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiely & Sons Inc.
[5] Balfe, E. & Smyth, B. (2004). Query Mining for Community Based Web search. In Proceedings of international conference on Web Mining (WI’04).
[6] Broder, A. (2002). A taxonomy of web search. SIGIR Forum 36(2), 2002.
[7] Chakrabarti S. (2003) Mining the Web – Discoverying Knowledge from Hypertext Data. Morgan Kaufmann
[8] Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press.
[9] Cui, H., Wen, J.-R., Nie, J.-Y., & Ma, W.-Y. (2002). Probabilistic Query Expansion Using Query Logs. In Proceedings of the 11th International Conference on World Wide Web.
[10] Fox, S., Karnawat, K., Mydland, M., Dumais, S. T. & White, T. (2005). Evaluating Implicit Measures to Improve the Search Experience. ACM transactions on Information Systems.
[11] Google Search Engine, (2007). from http://www.google.com
[12] Han, J. and Kamber, M. (2000) Data Mining: Concepts and Techniques. Morgan Kaufmann.
[13] Haveliwala, T. (2002). Topic-sensitive PageRank. In Proceedings of the Eleventh International. World Wide Web Conference.
[14] Hearst, M.A., Pedersen, J.O. (1996). Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[15] Hogg and Tanis. (2001). Probability and Statistical Inference. Prentice Hall
[16] Jeh, G. and Widom, J. (2003). Scaling Personalized Web Search. In Proceedings of the 12th International. World Wide Web Conference.
[17] Joachims, T. (2002). Optimizing Search Engines Using Click-through Data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (SIGKDD).
[18] Joachims, T., Granka, L., Pang, B., Hembrooke, H. & Gay, G. (2005). Accurately Interpreting Click-through Data as Implicit Feedback. In Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).
[19] Kang, I. H. and Kim, G. (2003). Query Type Classification for Web Document Retrieval. In Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval.
[20] Kleinberg, J. M. (1998). Authoritative Sources in a Hyperlinked Environment. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms.
[21] Lawrie, D. and Croft, W. B. (2001). Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Informational Retrieval.
[22] Lee, U., Liu, Z. & Cho, J. (2005). Automatic Identification of User Goals in Web Search. In Proceedings of the 14th International Conference on World Wide Web, (pp. 391–400).
[23] Leuski A. and Croft W. B. (1996). An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst.
[24] Mitchell, T. (1997). Machine Learning McGraw-Hill.
[25] MSN Search Engine, (2007). from http://www.msn.com
[26] Nie, Z., Zhang, Y. Wen, J. R., Ma1, W. Y. (2005). Object-Level Ranking: Bringing Order to Web Objects. In Proceedings of the 14th International Conference on World Wide Web.
[27] Page, L., Brin, S., Motwani, R. & Windograd, T. (1998). The Pagerank Citation Ranking: Bring order to the Web, Stanford Digital Library Technologies Project.
[28] Qiu, F. and Cho, J. (2006). Automatic Identification of User Interest for Personalized Search. In Proceedings of the 15th International Conference on World Wide Web.
[29] Rose, D.E. & Levinson, D. (2004). Understanding User Goals in Web Search. In Proceedings of the 13th International Conference on World Wide Web.
[30] Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
[31] Scholkopf, B. and Smola, A. J. (2002). Learning with Kernels – Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press.
[32] Silverstein, C., Henzinger, M., Marais, H. & Moricz, M. (1999). Analysis of a Very Large Web Search Engine Query Log. SIGIR Forum, 33(3). Originally published as DEC Systems Research Center Technical Note, 1998.
[33] Sun, J., Zeng, H., Liu, H., Lu, Y. and Chen, Z. (2005). CubeSVD: A novel approach to personalized web search. In Proceedings of the 14th International World Wide Web Conference.
[34] Tanudjaja, F. and Mui, L. (2002). Persona: A contextualized and personalized web search. In Proceedings of the 35th Annual Hawaii International Conference on System Sciences.
[35] Xi, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y., Fox, E. (2004) A. Link Fusion: A Unified Link Analysis Framework for Multi-Type Interrelated Data Objects.In Proceedings of the international conference on World Wide Web.
[36] Yate, B., and Neto, R. (1999). Modern Information Retrieval. Addison Wesley Inc.
[37] Yahoo Search Engine, (2007). from http://www.yahoo.com
[38] Zamir O. and Etzioni O. (1999). Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of the 8th International World Wide Web Conference, Toronto, Canada.
[39] Zeng, H. J., He, Q.C., Chen, Z., Ma, W. Y. and Ma, J. (2004). Learning to Cluster Web Search Results. In Proceedings of the 27th Annual International ACM SIGIR conference on Research and development in information retrieval.
[40] Zhang, B., Li, H., Liu, Y., Ji L., Xi, W., Fan, W. (2005). Improving Web Search Results Using Affinity Graph. In Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval.
[41] 游士誼,盧文祥 (2006) 碩士論文:基於使用者目的偵測之網路短語查詢的改善 Improving Short-query Web Search Based on User Goal Identification.