| 研究生: |
林振祺 Lin, Chen-chi |
|---|---|
| 論文名稱: |
利用網頁連結路徑改善自然語言問句的搜尋 Improve Natural Language Question Search Using Page Link Path |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2008 |
| 畢業學年度: | 96 |
| 語文別: | 中文 |
| 論文頁數: | 64 |
| 中文關鍵詞: | 搜尋 、自然語言 |
| 外文關鍵詞: | Search, Natural Language |
| 相關次數: | 點閱:56 下載:1 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前廣為大部分人所使用的搜尋引擎皆是以單一網頁的內容、標題為索引的單位,這樣的索引方式會忽略了單一網站中,網頁和網頁之間可能的順序及階層關係。且對於自然語言問句的處理上,傳統的搜尋引擎目前也沒有有效的方法回傳使用者真正想要的搜尋結果。針對以上兩個問題,首先,本論文利用一個有效樹狀結構索引的演算法GRIPP,將每一個網站視為一棵樹,將網站階層結構所對應的樹的所有節點及連結結構(link structure)索引,降低搜尋整個樹狀結構的時間複雜度。而根據我們的觀察,問句中可能隱含有entity-feature的結構,所以我們對於自然語言問句的處理方式,會先將原問句做簡單的前處理,再根據前處理後的結果去找出可能相關的網頁,然而這些相關的網頁之間的連結似乎可構成和問題相似的結構,所以本論文提出Page Link Set Table Construction Algorithm找出這些相關網頁的網頁連結集合(page link set),另外也提出Top-down Based Page Link Path Search Algorithm及Bottom-Up Based Page Link Path Search Algorithm兩種不同的方法建構出所有可能的候選路徑(candidate path),而這些候選路徑就是和自然語言問句匹配且可能含有問句答案的搜尋結果,最後本論文利用問句隱含語意結構匹配模型,引入問句中可能的entity-feature結構,去計算每條候選路徑為問句的相關網頁的機率,而機率值愈高的候選路徑極有可能是我們想要的搜尋結果。而根據我們所實驗的結果,首先,本論文利用網站的多層及跨層連結結構找出和自然語言問句匹配的搜尋結果正確率,會比以單一網頁的title及anchor text為索引單位的方法還來得高。另外,和其他同樣處理自然語言問句隱含結構的方法比較,本論文所提出的方法不僅在速度上也不錯的改進,在正確率及其他評估方法都比其他一起比較的方法還要高出許多。
Traditional search engines take single page as the unit to index its titles and contents. However, these indexing methods may loss the order and hierarchical structure relationship between web pages. Traditional search engines also have no effective solutions to correctly return the result that user really want in natural language question search. To solve these two problems, we employ an effective algorithm GRIPP to reduce the time complexity in searching the whole website. We regard each website as a tree and indexing the link structure between nodes of this tree. According to our observations, user’s questions may contain implicit semantic entity-feature relations. Therefore, we pre-process original natural language question and find relevant web pages according to the pre-processing results. The link structures between these web pages seem are similar to the structure of original natural language question. In this paper, We proposed Page Link Set Table Construction Algorithm to find page link set of these related pages, and then exploited Top-Down Based Page Link Path Search Algorithm and Bottom-Up Based Page Link Path Search Algorithm to find these candidate path which map to the implicit structure of natural language question and involve the answers. Finally, we proposed implicit structure matching model (ISMM) to calculate the probability of relevant web pages, lead-in the possible entity-feature relationships, of each candidate path. The bigger probability value maybe more closely to the result we want. According to our experiments, we can see that our proposed method which find those web pages based on multi-level link structure of web site has higher accuracy than traditional search engines which use single-page indexing. Besides, comparing with other methods which handle the implicit structure of natural language question, our method significantly outperforms these methods much in accuracy and other evaluation metrics.
Andrew Nierman and H. V. Jagadish, Evaluating Structural Similarity in XML Documents, WebDB,2002.
Antonio Badia, Mehmed Kantardzic, Graph Building as a Mining Activity:
Finding Links in the Small, LinkKDD’05.
David Fernandes, Edleno S. de Moura, Berthier Ribeiro-Neto, Altigran S. da Silva, Marcos André Gonçalves, Computing Block Importance for Searching on Web Sites, CIKM’07,2007.
Dengyong Zhou, Jason Weston, Arthur Gretton,Olivier Bousquet, and
Bernhard Sch¨olkopf, Ranking on Data Manifolds, NIPS 2003, Vancouver, Canada(2004).
Dragomir R. Radev, Hong Qil, Zhiping Zhengl, Sasha Blair-Goldensohrt, Zhu
Zhangl, Weiguo Fan, John Prager, Mining the Web for Answers to Natural
Language Questions, CIKM’01,2001.
Dragomir R.Radev, Hong Qi, Harris Wu, Weiguo Fan, EvaluatingWeb-based Question Answering Systems.
G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu and Z. Chen, Exploiting the Hierarchical Structure for Link Analysis, SIGIR’05,2005.
Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma, Hong-Jiang Zhang, and Chao-Jun Lu, Implicit Link Analysis for Small Web Search, SIGIR’03, 2003.
Gunter Neumann and Feiyu Xu, Mining Answers in German Web Pages, WI’03,2003.
Gu Xu, Wei-Ying Ma, Building implicit links from content for forum search, SIGIR’06,2006.
Haixun Wang, Hao He, Jun Yang, Philip S. Yu, Jeffrey Xu Yu, Dual Labeling: Answering Graph Reachability Queries in Constant Time, ICDE’06,2006.
Jennifer Chu-Carroll, John Prager, Yael Ravin and Christian Cesar, A Hybrid Approach to Natural Language Web Search, Language Processing (EMNLP), Philadelphia, July 2002, pp. 180-187.
Jin Zhou, Chen Ding, Dimitrios Androutsos, Improving web site search using web server logs, Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research, October 16-19, 2006, Toronto, Ontario, Canada.
Jure Leskovec, Susan Dumais, Eric Horvitz, Web Projections: Learning from Contextual Subgraphs of the Web, WWW’2007,2007.
Marc Najork, Hugo Zaragoza, Michael Taylor, HITS on the Web: How does it
Compare? SIGIR ’07,2007.
Marius Pa¸sca, Lightweight Web-Based Fact Repositories for Textual
Question Answering, CIKM’07,2007.
Matthew W. Bilotti, Paul Ogilvie, Jamie Callan and Eric Nyberg, Structured Retrieval for Question Answering, SIGIR’07,2007.
Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis, Berthier Ribeiro-Neto, Efficient Search Ranking in Social Networks, CIKM’07,2007.
Norbert Martínez-Bazan, Victor Muntés-Mulero, Sergio Gómez-Villamor, Jordi Nin, Mario-A. Sánchez-Martínez, Josep-L. Larriba-Pey, DEX: High-Performance Exploration on Large Graphs for Information Retrieval, CIKM’07,2007.
Proceedings of the Conference on Empirical Methods in Natural
Guang Feng, Tie-Yan Liu, Ying Wang, Ying Bao, Zhiming Ma, Xu-Dong Zhang, and Wei-Ying Ma, AggregateRank: Bringing Order to Web Sites, SIGIR’06,2006.
Satoshi Morinaga, Hiroki Arimura, Takahiro Ikeda, Key Semantics Extraction by Dependency Tree Mining, KDD’05, 2005.
Shivani Agarwal, Ranking on Graph Data, Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
Silke Trißl, Ulf Leser, Fast and Practical Indexing and Querying of Very
Large Graphs, SIGMOD’07,2007.
Srini Narayanan, Sanda Harabagiu, Question Answering Based on Semantic
Structures, Proceedings of the 20th international conference on
Computational Linguistics.
Xian Zhang, Yu Hao, Xiaoyan Zhu, Ming Li, Information Distance from a
Question to an Answer, KDD’07,2007.