簡易檢索 / 詳目顯示

研究生: 卓漢鵬
Cho, Han-peng
論文名稱: 利用搜尋目的類型改善搜尋結果的呈現
Improving the Display of Search Result Using Search Goal Type
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 52
中文關鍵詞: 網頁搜尋目的搜尋結果
外文關鍵詞: search result, web pages, search goal
相關次數: 點閱:120下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於網路的發展迅速造成網頁的數量繁多且內容繁雜,先前知名的搜尋引擎Google在2005年公佈他們已經收集到超過八十億個網頁。因此搜尋引擎對於網路使用者相對的變得更加重要,但搜尋引擎目前只專注改善他們搜尋結果網頁的排名,而未注意到其實搜尋結果的呈現方式,對於使用者的點選結果也會造成很大的影響。目前搜尋引擎對於所有搜尋結果都是呈現文字片段,因為搜尋引擎認為使用者只想要搜尋文字資訊。但我們發現其實使用者對於不同的網頁內容都隱含著不同的目的。譬如使用者可能是希望瀏覽某些網頁裡的連結,或是使用者在某個網頁想要下載一個特定的資源,並非只是想搜尋文字資訊。因此本論文將搜尋結果的網頁內容依使用者搜尋目的類型將其分類,並且對各類別進行不同的處理及不同的呈現方法。如此能確保我們呈現出來的結果是使用者所需要的結果。因此本論文提出了網頁類型識別模型根據使用者搜尋目的來識別網頁的類型,之後再使用網頁內容呈現模式,擷取各個類型網頁所要呈現的內容。
    實驗結果顯示本論文提出的網頁類型識別模型的精確度也還不錯有到八成的正確率,而使用Song et al. (2004)的區塊識別技術,識別區塊的正確率也有高達85%。各個擷取器的精確度平均來說也約有百分之七十,也就是說我們所呈現的結果,至少有百分之七十的內容是使用者所希望看到的資訊。

    The number of web page is more and the content of web page is more complex, because the web develops quickly. The well-known search engine, Google, had announced that they collected more than 8 billion pages in 2005. So search engine is more important for the web users. However, search engine just want to improve the rank of pages in search results. They don’t know that the display of search results will impact user’s click. Most of search engines are using textual snippet to display search result, because they think users just want to search information about text. But we find out that users have different purpose for different web pages. For example, users may want to get the links from the pages or download a particular resource not only the textual information. Therefore, we classify the web pages of search results according to user search goals and use different process method and different display approach for different search goal type of web page. So we proposed page type identification model to identify the type of web page according to user search goal. Then, we used page content display model to extract the contents which are displayed on search result for each type of web pages.
    The experiment results show that the recall of page type identification model is about 80% and the recall of block identification is about 85%. The average precision of each extractor is about 70%, in other words at least 70% of the contents of search result are useful for the users.

    摘要 IV 章節目錄 VI 圖目錄 VIII 表目錄 X 第一章 序論 - 1 - 1.1 研究動機 - 1 - 1.2 研究方法 - 3 - 1.3 論文架構 - 4 - 第二章 相關研究與文獻 - 5 - 2.1 搜尋結果的呈現方式 - 5 - 2.2 尋找網頁重要區塊 - 6 - 2.3 搜尋者的目的 - 8 - 2.4 網頁內容摘要 - 9 - 2.5搜尋結果網頁分群 - 10 - 第三章 改善搜尋結果的呈現方法 - 12 - 3.1系統架構 - 13 - 3.2 改善搜尋結果的呈現方法 - 13 - 3.3網頁類型識別模型 - 14 - 3.4 網頁區塊切割與識別 - 17 - 3.5 網頁內容呈現模型 - 20 - 3.5.1 相關連結擷取器 - 22 - 3.5.2 資源擷取器 - 27 - 3.5.3 資訊擷取器 - 28 - 第四章 實驗 - 32 - 4.1 實驗資料 - 32 - 4.2 評估方法 - 33 - 4.2.1網頁類型識別模型評估方法 - 33 - 4.2.2區塊識別評估方法 - 34 - 4.2.3各個擷取器評估方法 - 34 - 4.3 實驗結果 - 34 - 4.3.1網頁類型識別模型 - 35 - 4.3.2 區塊切割與識別 - 39 - 4.3.3 相關連結擷取器 - 40 - 4.3.4 資源擷取器 - 43 - 4.3.5 資訊擷取器 - 46 - 第五章 結論及未來工作 - 49 - 5.1 結論 - 49 - 5.2 未來工作 - 49 - 參考文獻 - 51 -

    [1] Amitay, E., Paris, C. 2000. Automatically Summarising Web Sites - Is There A Way Around It? In Proceedings of the Conference on Information and Knowledge Management.
    [2] Broder, A. 2002. A taxonomy of web search. In Proceedings of the 25th annual international ACM SIGIR conference.

    [3] Cai, D., Yu, S.P., Wen, J.R. and Ma, W. Y. 2003. VIPS:A vision-based page segmentation algorithm. In Proceedings of the MSR-TR

    [4] Chang, C.C. and Lin, C.J. 2001. LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

    [5] Delort, J.Y., Bouchon-Meunier, B., Rifqi, M. 2003. Enhanced web document summarization using hyperlinks. In Proceedings of the 14th Conference on Hypertext and Hypermedia.

    [6] Dziadosz, S., and Chandrasekar, R. 2002. Do thumbnail previews help users make better relevance decisions about web search results? In Proceedings of the 25th annual international ACM SIGIR conference.

    [7] He, K.Y., Chang, Y.S. and Lu, W.H. 2007. Improving Identification of Latent User Goals through Search-Result Snippet Classification. In Proceedings of the Web Intelligence

    [8] Hsu, C.W., Chang, C.C. and Lin, C.J. 2003. A practical guide to support vector classification.

    [9] Kang, I.H. and Kim, G.C. 2003. Query type classification for web document retrieval. In Proceedings of the 26th Annual International ACM SIGIR conference on research and development in Information Retrieval.

    [10] Kao, H.Y., Ho, J.M. and Chen, M.S. 2004. DOMISA: DOM-Based Information Space Adsorption of Web Information Hierarchy Mining. In Proceedings of the 4th SIAM Intern'l Conference on Data Mining.

    [11] Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the ICML.

    [12] Lee, U., Liu, Z. and Cho, J. 2005. Automatic identification of user goals in Web search. In Proceedings of the 14th international conference on World Wide Web.

    [13] Lin, S.H. and Ho, J.M. 2002. Discovering Informative Content Blocks from Web Documents. In Proceedings of the the eighth ACM SIGKDD international conference on Knowledge discovery and data mining Edmonton.

    [14] Paek, T., Dumais, S.T., and Logan, R. 2004. WaveLens: a new view onto Internet search results. In Proceedings of theCHI.

    [15] Rose, D.E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web.

    [16] Shen, D., Chen, Z., Yang, Q., Zeng, H., Zhang, B., Lu, Y., Ma, W. 2004. Web-page classification through summarization. In. Proceedings of the 27th ACM International Conference of Information Retrieval.

    [17] Song, R., Liu, H., Wen, J. R., Ma, W. Y. 2004. In Proceedings of the 13th international conference on World Wide Web.

    [18] Sun, J.T., Shen, D., Zeng, H.J., Yang, Q., Lu, Y., and Chen, Z. 2005. Web-page summarization using clickthrough data. In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in information Retrieval.

    [19] Tombros, A. and Sanderson, M. 1998. Advantages of query biased summaries in information retrieval. In Proceedings of the 21st ACM SIGIR conference.

    [20] Vapnik, V.N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin Heidelberg.

    [21] Woodruff, A., Faulring, A., Rosenholtz, R., Morrison, J., and. Pirolli, P. 2001. Using Thumbnails to Search the Web. In Proceedings of the CHI.

    下載圖示 校內:立即公開
    校外:2008-08-15公開
    QR CODE