簡易檢索 / 詳目顯示

研究生: 蔡昆育
Tsai, Kun-Yu
論文名稱: 利用多樣化網路資源產生複雜任務名稱與其子任務目的以改善網路搜尋
Generating Complex Task Names with Sub-Task Goals to Improve Web Search by Utilizing Multiple Web Resources
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 127
中文關鍵詞: 複雜任務子任務目的網路搜尋多樣化網路資源
外文關鍵詞: Complex task, Sub-task Goal, Web Search, Multiple Web Resources
相關次數: 點閱:113下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 傳統的搜尋引擎對於使用者輸入的查詢詞往往只視為一個簡單任務來進行處理。然而,由於近年來網路迅速的成長,越來越多使用者所下達的查詢詞是由複雜任務所驅使。一個複雜任務包含許多子任務。為了完成這個複雜任務,使用者總是需要下達一系列的查詢詞。舉例來說:有一個複雜任務叫做「北京旅遊」,而此複雜任務包含幾個子任務目的,包括「訂購機票」、「預訂旅館」以及「查找地圖」。理解複雜任務可以讓搜尋引擎幫助使用者完成所有預測到的子任務目的。
    在本論文中,我們提出一個以主題-事件為基礎的複雜任務模型來處理上面所提到的問題。我們的模型包含三個主要的步驟,分別為任務分群、辨識子任務目的與產生複雜任務名稱。首先,我們要先將一群搜尋引擎的查詢詞進行分群;接著,根據這些分群過後的查詢詞,去辨識他們可能有哪些子任務目的;最後,根據被辨識出來的子任務目的來產生複雜任務的名稱。為了改進模型的效能,我們使用多樣化的網路資源,包括:查詢紀錄、點擊頁面、社群問答服務、搜尋引擎結果頁面與微網誌。此外,我們也開發了一個應用:以複雜任務為基礎的搜尋引擎,其可以根據不同的子任務目的來提供整合式的搜尋結果。實驗結果證明,我們提出的模型可以有效的產生複雜任務名稱以及與複雜任務相關的子任務目的。而我們的應用也提供更合適的搜尋排名結果來幫助使用者更輕鬆的去完成他們的複雜任務。

    Conventional search engines usually consider a search query corresponding only to a simple task. Nevertheless, due to the explosive growth of web usage in recent years, more and more queries are driven by complex tasks consisting of multiple sub-tasks. In order to accomplish a complex task, users usually have to issue a series of queries. For example, the complex task “travel to Beijing” may involve several sub-task goals, including “book flights,” “reserve hotel,” and “survey map”. Understanding complex tasks can allow a search engine to predict a variety of sub-task goals to be efficiently accomplished simultaneously.
    In this work, we propose a topic-event-based complex task model (TECTM) to deal with the above problem. Our TECTM contains three main stages. The first is task-coherence clustering which groups queries into the same complex task. The second is sub-task goal identification which identifies some sub-task goals for a complex task based on queries from the same task. The third is task name generation which utilizes the identified sub-task goals to generate the complex task name. For improving the performance of TECTM, we exploit multiple web resources including query log, clicked pages, community question answering (CQA), search engine results page (SERP), and microblogs. In addition, we develop an application, complex-task-based search engine (CTSE) which provides integrated search results for sub-task goals based on TECTM. Experimental results show that our TECTM is effective in generating complex task names with corresponding identified sub-task goals for a complex task. Furthermore, CTSE also provides more suitable ranking of search results to help users accomplish their complex tasks with less effort.

    摘要 III Abstract V 誌謝 VII Table of Contents IX List of Tables XII List of Figures XIV Chapter 1 Introduction 1 1.1 Background 1 1.2 Problem 2 1.3 Motivation 4 1.4 Method 7 1.5 Challenges 9 1.6 Organization of this Dissertation 9 Chapter 2 Related Work 10 2.1 Understanding the Search Goals behind Search Queries 10 2.2 Investigating Simple Search Tasks 11 2.3 Analyzing and Modeling Complex Tasks 12 2.4 Improving Task-oriented Search-Result Ranking 14 2.5 Utilizing Multiple Web Resources 15 Chapter 3 Method 17 3.1 Framework of Topic-Event-based Complex Task Model 17 3.2 Web Resources 20 3.2.1 Query Log 21 3.2.2 Clicked Pages 22 3.2.3 CQA 23 3.2.4 SERP 25 3.2.5 Microblogs 26 3.2.6 The Distribution of Task Names and Sub-task Goals in Web Resources 27 3.3 Task-coherence Clustering 29 3.3.1 Candidate Topic Extraction 30 3.3.2 Task-coherence Clustering Algorithm 35 3.3.3 Similarity Measures and Clustering Features 38 3.4 Sub-task Goal Identification 43 3.4.1 Sub-task Goal Extraction 45 3.4.2 Features for Identifying Sub-task Goal 47 3.4.3 Sub-task Goal Grouping 51 3.5 Task Name Generation 54 3.5.1 Task-related Information Retrieval 55 3.5.2 Task Name Determination 59 3.5.3 Features for Task Name Determination 60 Chapter 4 Experiments 68 4.1 Dataset 68 4.2 Experiment of Task-coherence Clustering 70 4.2.1 Dataset for Task-coherence Clustering 70 4.2.2 Method Comparison 71 4.2.3 Evaluation Metrics 71 4.2.4 Parameter Selection 73 4.2.5 Results of Task-coherence Clustering 76 4.2.6 Examples of Clustering Results 78 4.3 Experiment of Sub-task Goal Identification 80 4.3.1 Dataset for Sub-task Goal Identification 80 4.3.2 Method Comparison 82 4.3.3 Evaluation Metrics 84 4.3.4 Parameter Selection 86 4.3.5 Results of Sub-task Goal Identification 89 4.3.6 Examples of Sub-task Goals 92 4.4 Experiment of Task Name Generation 94 4.4.1 Dataset for Task Name Generation 94 4.4.2 Method Comparison 96 4.4.3 Evaluation Metrics 97 4.4.4 Parameter Selection 98 4.4.5 Results of Task Name Generation 100 4.4.6 Examples of Task Names 103 4.5 Discussion 107 4.5.1 Point of View with Web Resources 107 4.5.2 Discussion of Task-coherence Clustering 108 4.5.3 Discussion of Sub-task Goal Identification 109 4.5.4 Discussion of Task Name Generation 110 Chapter 5 Application: Complex-Task-based Search Engine 112 5.1 Query-Task Prediction 113 5.2 Complex-Task-based Search 114 5.3 Evaluation of Complex-Task-based Search Engine 117 5.4 Examples of Complex-Task-based Search Results 119 Chapter 6 Conclusions and Future Works 122 6.1 Conclusions 122 6.2 Future Works 123 References 124

    [1]Aiello, L. M., Donato, D., Ozertem, U., and Menczer, F. Behavior-driven Clustering of Queries into Topics. In Proc. of CIKM, 1373-1382, 2011.
    [2]Agichtein, E., White, R. W., Dumais, S. T., and Bennett, P. N. In Proc. of SIGIR, 315-324, 2012.
    [3]Ageev, M., Lagun, D., and Agichtein, E. Improving Search Result Summaries by Using Searcher Behavior Data. In Proc. of SIGIR, 13-22, 2013.
    [4]Beeferman, D. and Berger, A. Agglomerative Clustering of a Search Engine Query log. In Proc. of KDD, 407-416, 2000.
    [5]Broder, A. A Taxonomy of Web Search. In Proc. of SIGIR Forum, 36(2), 3-10, 2002.
    [6]Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent Dirichlet Allocation. JMLR, 3, 993-1022, 2003.
    [7]Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. The Query-Flow Graph: Model and Applications. In Proc. of CIKM, 609-618, 2008.
    [8]Cui, J., Liu, H., Yan, J., Ji L., Jin R., He, J., Gu, Y., Chen, Z., and Du, X. Multi-view Random Walk Framework for Search Task Discovery from Click-through Log. In Proc. of CIKM, 135-140, 2011.
    [9]Downey, D., Dumais, S., Liebling D., and Horvitz, E. Understanding the Relationship between Searchers’ Queries and Information Goals. In Proc. of CIKM, 449-458, 2008.
    [10]Donato, D., Bonchi, F., Chi, T., and Maarek, Y. Do You Want to Take Notes? Identifying Research Missions in Yahoo! Search Pad. In Proc. of WWW, 321-330, 2010.
    [11]Dror, G., Maarek, Y., Mejer, A., and Szpektor, I. From Query to Question in One Click: Suggesting Synthetic Questions to Searchers. In Proc. of WWW, 391-402, 2013.
    [12]Fowlkes, E. B. and Mallows, C. L. A Method for Comparing Two Hierarchical Clusterings. JASA, 78(383), 553-569, 1983.
    [13]Feild, H. and Allan, J. Task-Aware Query Recommendation. In Proc. of SIGIR, 83-92, 2013.
    [14]Griffiths, T. Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation. Technical Report, Stanford University, 2002.
    [15]Guo, Q. and Agichtein, E. Ready to Buy or Just Browsing? Detecting Web Searcher Goals from Interaction Data. In Proc. of SIGIR, 130-137, 2010.
    [16]Guan, D., Zhang, S., and Yang, H. Utilizing Query Change for Session Search. In Proc. of SIGIR, 453-462, 2013.
    [17]Jones, R., and Klinkner, K. Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs. In Proc. of CIKM, 699-708, 2008.
    [18]Ji, M., Yan, J., Gu, S., Han, J., He, X., Zhang, W. V., and Chen, Z. Learning Search Tasks in Queries and Web Pages via Graph Regularization. In Proc. of SIGIR, 55-64, 2011.
    [19]Jayanthy, S., and Rao, P. S. Harmonizing User Search Data with Efficient Query Clustering. IJAIEM, 2(11), 457-464, 2013.
    [20]Kellar, M., Watters, C., and Shepherd, M. A Field Study Characterizing Web-based Information-seeking Tasks. JASIST, 58(7), 999-1018, 2007.
    [21]Kotov, A., Bennett, P. N., White, R. W., Dumais, S. T., and Teevan, J. Modeling and Analysis of Cross-Session Search Tasks. In Proc. of SIGIR, 5-14, 2011.
    [22]Lafferty, J., Mccallum, A., and Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of ICML. 282-289, 2001.
    [23]Lee, U., Liu, Z., and Cho, J. Automatic Identification of User Goals in Web Search. In Proc. of WWW, 391-400, 2005.
    [24]Liu, J. and Belkin, N. J. Personalizing Information Retrieval for Multi-Session Tasks: The Roles of Task Stage and Task Type. In Proc. of SIGIR, 26-33, 2010.
    [25]Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. Identifying Task-based Sessions in Search Engine Query Logs. In Proc. of WSDM, 277-286, 2011.
    [26]Lu, C.-Y. Improve Keyword Advertising Recommendation by Using Task-based Advertisement Model. Master’s thesis, National Cheng Kung University, Tainan, Taiwan, R.O.C., 2013.
    [27]Lin, T., Pantel, P., Gamon, M., Kannan, A., and Fuxman, A. Active Objects: Actions for Entity-Centric Search. In Proc. of WWW, 589-598, 2012.
    [28]Liu, Q., Agichtein, E., Dror, G., Maarek, Y., and Szpektor, I. When Web Search Fails, Searchers Become Askers: Understanding the Transition. In Proc. of SIGIR, 801-810, 2012.
    [29]Murtagh, F. Complexities of Hierarchic Clustering Algorithms: the state of the art. Computational Statistics Quarterly 1, 101-113, 1984.
    [30]Manning, C. D., Schütze, H. Foundations of Statistical Natural Language Processing. The MIT Press. Cambridge, US, 1999.
    [31]MacKay, B. and Watters, C. Exploring Multi-Session Web Tasks. In Proc. of CHI, 1187-1196, 2008.
    [32]Nakatani, M., Jatowt, A., and Tanaka, K. Adaptive Ranking of Search Results by Considering User’s Comprehension. In Proc. of ICUIMC, 182-192, 2010.
    [33]Rose, D. E., and Levinson, D. Understanding User Goals in Web Search. In Proc. of WWW, 13-19, 2004.
    [34]Raman, K., Bennett, P. N., and Collins-Thompson, K. Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search. In Proc. of SIGIR, 463-472, 2013.
    [35]Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z., and Han, J. Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts. In Proc. of WSDM, 23-32, 2014.
    [36]Singhal, A. Modern Information Retrieval: A Brief Overview. IEEE Data Eng. Bull., 24(4), 35-43, 2001.
    [37]Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. Clustering Query Refinements by User Intent. In Proc. of WWW, 841-850, 2010.
    [38]Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. Clustering User Queries of Search Engine. In Proc. of WWW, 162-168, 2001.
    [39] Wang, T.-X., and Lu, W.-S. Identifying Popular Search Goals behind Search Queries to Improve Web Search Ranking. In Proc. of AIRS , 250-262, 2011.
    [40]Wang, H., Song, Y., Chang, M.-W., He, X., White, R. W., and Chu, W. Learning to Extract Cross-Session Search Tasks. In Proc. of WWW, 1353-1364, 2013.
    [41]White, R. W., Chu, W., Hassan, A., He, X., Song, Y., and Wang, H. Enhancing Personalized Search by Mining and Modeling Task Behavior. In Proc. of WWW, 1411-1420, 2013.
    [42]Xiang, B., Jiang, D., Pei, J., Sun, X., Chen, E. and Li, H. Context-aware ranking in web search. Proc. In Proc. of SIGIR, 451-458, 2010.
    [43]Yin, X. and Shah, S. Building Taxonomy of Web Search Intents for Name Entity Queries. In Proc. of WWW, 1001-1010, 2010.
    [44]Yamamoto, T., Sakai, T., Iwata, M., Yu, C., Wen, J.-R., and Tanaka, K. The Wisdom of Advertisers: Mining Subgoals via Query Clustering. In Proc. of CIKM, 505-514, 2012.
    [45]Yan, Q., Wu, L., and Zheng, L. Social Network Based Microblog User Behavior Analysis. Physica A: Statistical Mechanics and its Application, 392(7), 1712-1723, 2013.
    [46]Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., and Ma, J. Learning to Cluster Web Search Results. In Proc. of SIGIR, 210-217, 2004.
    [47]Zhang, Y., Chen, W., Wang, D., and Yang, Q. User-Click Modeling for Understanding and Predicting Search-Behavior. In Proc. of KDD,1388-1396, 2011.
    [48]Zhu, X., Ming, Z.-Y., Zhu, X., and Chua, T.-S. Topic hierarchy construction for the organization of multi-source user generated contents. In Proc. of SIGIR, 233-242, 2013.

    下載圖示 校內:立即公開
    校外:立即公開
    QR CODE