成功大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	林致祿 Lin, Chih-Lu
論文名稱：	利用多搜尋結果進行階層分群之查詢結果萃取之研究 Query Result Distillation by Hierarchical Clustering and Result Aggregation on Multiple Search Engines
指導教授：	高宏宇 Kao, Hung-Yu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering
論文出版年：	2006
畢業學年度：	94
語文別：	英文
論文頁數：	44
中文關鍵詞：	分群、搜尋引擎、中文搜尋環境
外文關鍵詞：	User goal, Search engine, Clustering
相關次數：	點閱：136 下載：1
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著近年來網路快速的發展，我們經由網路可以接觸到的網路資源也隨之越來越多，但問題卻也伴隨而來，例如：缺乏有效尋找到有用資源的辦法。雖然對這個問題已經有很多有效的解決辦法，而眾多解決辦法之中以搜尋引擎及其相關的技術在此領域最為蓬勃發展，但仍然有一部分的問題尚需解決，如本篇論文中會提到的（一）對於一個短查詢來說，搜尋引擎不容易了解使用者的目的，難以提供給使用者真正想取得資源位置。（二）搜尋引擎如同就像無邊境的圖書館，雖然網頁被索引起來，但是當索引的數量過於龐大的話，仍需一套好的分群的辦法將大量的結果根據描述主題分群，來提高搜尋引擎能給使用者的便利性。因此，本篇論文的重心在於延續前人的對於分群的研究，以對於搜尋結果的分析，找出可運用的新特徵及一套可使用於中文搜尋環境下的分群方法，不僅預先替使用者產生易讀的群名，提高使用者使用搜尋引擎的便利性。另外，本篇論文亦會對一個查詢，適不適合作分群的處理作研究，目的在於避免不必要的處理，造成使用者多餘的閱讀負擔。

As the rapid development of the network environment in recent years, we could get more and more Web resources, however some problems happened as followed, e.g., Lacking of the effective method of finding the Web resources. This problem is solved as the birth of the search engines, but there are some other problems and issues needed to be resolved.
For some examples that will be mentioned in this paper: (1) for a short query, it is difficult for search engines to understand what users’ goal of the Web search. As a result, search engines are difficult to provide Web resources that are related to users’ search goal. (2) Without the effective method for helping the users, finds their information need among search engines’ enormous indexes. Therefore, this paper will focus on continuing and improving the previous work about clustering, and also try to study the suitability of pre-deciding that whether the query should be clustered or not, in order to avoid additional overheads both the search engines and users.

中文摘要	I
ABSTRACT	II
誌謝	III
TABLE LISTING	VI
FIGURE LISTING	VII
1.	INTRODUCTION	1
2.	MOTIVATION	5
3.	ISSUES IN THIS PROBLEM	9
　3.1	THE CLUSTER NAME	9
　3.2	THE EVALUATION METHOD	9
　3.3	USING DIVERSE RESULTS OF DIFFERENT SEARCH ENGINES	9
　3.4	CHINESE SEARCHING RESULTS	10
4.	RELATED WORK	10
　4.1	OVERALL DESCRIPTION	10
　4.2	DESCRIPTION OF THE PRESENT WEB SEARCH ENVIRONMENT	12
　4.3	OVERALL INTRODUCTION OF PREVIOUS METHODS	13
　　4.3.1	Traditional methods	13
　　4.3.2	Extended version of traditional methods	13
　　4.3.3	Suffix Tree Clustering (STC)	13
　　4.3.4	Clustering under the network environment or present search engine:	14
　4.4	DESCRIPTION OF ZHENG’S METHOD [16]	14
　　4.4.1	Search Result Fetching	15
　　4.4.2	Document Parsing and Phrase Property Calculation	15
　　4.4.3	SVR brief description	18
5.	EXPERIMENT	19
　5.1	PRE-CLUSTERING ANALYSIS	19
　　5.1.1	Method description	20
　　5.1.2	Real Cases:	22
　5.2	OUR PROPOSED METHOD	27
　　5.2.1	Description	27
　　5.2.2	Data set	27
　　5.2.3	Description of experiment’s property	29
　　5.2.4	URL structure	30
　　5.2.5	About filtering method of Chinese search	33
　　5.2.6	Combing Results of Multiple Search Engines	33
　　5.2.7	Experiment Result and Discussion	34
　5.3	OUR RESEARCH ON HIERARCHICAL CLUSTER	37
　　5.3.1	Description	37
　　5.3.2	Dataset and Algorithm	37
　　5.3.3	Evaluation	39
6.	CONCLUSION AND FUTURE WORK	40
7.	REFERENCE:	41

                                    

[1] D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of ACMSIGKDD ’00, 2000
[2] D. R. Cutting, D. R. Karger, and J. O. Pederson. Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93), pages 125-135, Pittsburgh, PA, 1993.
[3] C. C. Chang, C. J. Lin, LIBSVM: A library for sup- port vector machines, 2001, Software available at http:// www.csie.ntu.edu.tw/?cjlin/papers/libsvm.ps.gz
[4] N. Eiron and K.S. McCurley. Analysis of anchor text for Web search. In Proceedings of ACM SIGIR ’03,2003.
[5] M. A. Hearst, J. O. Pedersen. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'96), Zurich, June 1996.
[6] I. Kang and G. Kim. Query type classification for web document retrieval. In Proceedings of ACM SIGIR’03, 2003.
[7] R. Kraft and J. Zien. Mining anchor text for query refinement. In Proceedings of the Thirteenth Int’l.World Wide Web Conf., 2004.
[8] D. Lawrie, W. B. Croft. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pages 349-357, 2001.
[9] B. Lent, R. Agrawal, R. Srikant. Discovering Trends in Text Databases. In Proceedings of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD'97), Newport Beach, California, August 1997.
[10] A. V. Leouski. W. B. Croft. An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996.
[11] A. Leuski and J. Allan. Improving Interactive Retrieval by Combining Ranked List and Clustering. In Proceedings of RIAO, College de France, pp. 665-681, 2000.
[12] B. Liu, C. W. Chin, and H. T. Ng. Mining Topic-Specific Concepts and Definitions on the Web. In Proceedings of the Twelfth International World Wide Web Conference (WWW'03), Budapest, Hungary, 2003.
[13] U. Lee , Z. Liu , J.H. Cho, Automatic identification of user goals in Web search, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
[14] D. E. Rose and D. Levinson. Understanding user goals in Web search. In Proceedings of the Thirteenth Int’l.World Wide Web Conf., 2004.
[15] C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large Web search engine query log. SIGIR Forum, 33(1):6 – 12, 1999.
[16] H. J. Zheng, Q. C. He, Z. Chen, W. Y. Ma, J. Ma. Learning to cluster Web search results. In Proceedings of SIGIR ’04, pages 210–217, 2004.
[17] O. Zamir, O. Etzioni. Web Document Clustering: A Feasibility Demonstration, In Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98), 46-54, 1998.
[18] O. Zamir, O. Etzioni. Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of the Eighth International World Wide Web Conference (WWW8), Toronto, Canada, May 1999.
[19] Google, http://www.google.com
[20] Yahoo, http://tw.yahoo.com
[21] Vivisimo, http://vivisimo.com
[22] MSN search, http://www.msn.com.tw

2006-08-29公開

簡易檢索 / 詳目顯示

相關論文