| 研究生: |
林高弘 Lin, Kao-Hung |
|---|---|
| 論文名稱: |
使用網站結構學習問句結構之方法改進自然語言搜尋 Learning Question Structure based on Website Link Structure to Improve Natural Language Search |
| 指導教授: |
盧文祥
Lu, Wen-Hsiang |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2006 |
| 畢業學年度: | 94 |
| 語文別: | 中文 |
| 論文頁數: | 50 |
| 中文關鍵詞: | 自然語言搜尋 、結構化索引 |
| 外文關鍵詞: | Structural Indexing, Natural Language Search, Tri-link |
| 相關次數: | 點閱:92 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
傳統的搜尋引擎的搜尋方式皆以單一網頁為單位作索引,因此無法記錄網頁之間的順序結構關係;另外,傳統搜尋引擎對於自然語言式的問句仍然沒有較為有效的處理方式。針對上述兩個問題,首先,本論文先提出一個結構化索引方法來記錄網頁之間的連結結構關係(link structure),解決以往以單一網頁為索引單位無法記錄網頁間結構資訊的問題;此外,對於自然語言式問句的處理方式,本論文將提出一個Tri-link對應模型(TMM)來找出問句中隱含的結構,並利用這樣的結構去對應出網站之Tri-link結構,計算問句與Tri-link結構之間對應的機率數值;本論文中另外還利用啟發式(heuristic)之方法將問句平分成三等份當成問句結構,並利用「與位置相依之相似度 (PDS)」與「與位置獨立之相似度 (PIS)」兩種不同的方式計算問句結構與Tri-link之相似度,這兩種方式是以傳統向量空間模型(Vector Space Model)為基礎所變化而來。本論文之實驗方面,將比較使用者之問句對於結構化索引方式與非結構化索引方式之優劣,經由實驗結果,我們發現對於自然語言式問句來說,結構化索引方式之搜尋正確率將會高於非結構化索引的方式。
The indexing method of conventional search engine always regards one page (or document) as primary indexing unit. Therefore, we cannot record the structural information between web pages. Furthermore, conventional search engines cannot deal with natural language questions effectively. For alleviating these two problems, first, we propose a structural indexing method to record the link structure information between pages. Second, we propose a Tri-link Mapping Model (TMM) to extract question implicitly structures embedded in user’s natural language questions. On the basis of question structures, we can map relevant Tri-links in the relevant website. We also propose a heuristic method to extract question structure by dividing question into three parts, and calculate the similarity with Tri-link based on the Position Dependent Similarity (PDS) and Position Independent Similarity (PIS). These two kinds of similarity measures are based on conventional Vector Space Model. The performance of structural indexing method and conventional indexing method will be analyzed according to our experiments. Experimental results show that structural indexing method can get higher precision than conventional indexing method for natural language search.
K. Anyanwu, A. Maduko and A. Sheth. SemRank: Ranking complex Relationship Search Results on the Semantic Web. WWW 2005, 2005.
R. Bekkerman and A. McCallum, Disambiguating Web Appearances of People in a Social Network, WWW 2005, May 1014, 2005.
R. Baeza-Yates and B. Ribeiro, Modern Information Retrieval, Addison-Wesley, 1999.
S. Chakrabarti, M. Joshi and V. Tawde, Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks, SIGIR'01, September 9-12, 2001.
S. Chakrabarti, Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction, WWW10, May 1-5, 2001.
J. Chu-Carroll, J. Prager, Y. Ravin and C. Cesar. A Hybrid Approach to Natural Language Web Search. ACL, 2002.
D. D. Lewis and K. S. Jones, Natural Language Processing for Informantion Reterival, Communications of the ACM, 1996
Y. Hu, G. Xin, R. Song, G. Hu, S. Shi, Y. Cao, and H. Li , Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval, SIGIR’05, 2005.
K. Kailing, H.-P. Kriegel, S. Schonauer and T. Seidl, Efficient Similarity Search for Hierarchical Data in Large Databases, Proc. 9th Int. Conf. on Extending Database Technology, pp. 676-693, Heraklion, Greece, 2004.
H.-Y. Kao, J.-M. Ho, and M.-S. Chen, DOMISA: DOM-based Information Space Adsorption for Web Information Hierarchy Mining, Proc. of the 4th SIAM Intern'l Conference on Data Mining (SDM-04), April 22-24, 2004.
J. W. Kim, K. S. Candan and M. E. Dönderler, Topic Segmentation of Message Hierarchies for Indexing and Navigation Support, WWW 2005, May 1014, 2005.
J. Kleinberg, Authoritative sources in a hyperlinked environment, Proc. ACM-SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM Research Report RJ 10076(91892) May 1997, and at http://www.cs.cornell.edu/home/kleinber/.
R. Kraft and J. Zien, Mining Anchor Text for Query Refinement, WWW, 2004, May 17–22, 2004.
C.-W. Lee, C.-W. Shih, M.-Y. Day, T.-H. Tsai, T.-J. Jiang, C.-W. Wu, C.-L. Sung, Y.-R. Chen, S.-H. Wu, W.-L. Hsu, ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA, NTCIR-5 Workshop Meeting, 2005.
C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
S. Melnik, H. Garcia-Molina and E. Rahm, Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching, Published in Proc. 18th Intl. Conf. on Data Engineering (ICDE), San Jose CA, 2002.
A. Nierman and H. V. Jagadish, Evaluating Structural Similarity in XML Documents, WebDB, 2002.
C. Patel, K. Supekar, Y. Lee and E. K. Park, OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification, WIDM’03, 2003.
D. R. Radev, H. Qi, Z. Zheng, S. B.-Goldensohn, Z. Zhang, W. Fan and J. Prager, Mining the Web for Answers to Natural Language Questions, CIKM’01, 2001.
D. Ramamonjisoa, Question Answering System with Fine Grain Answer Types and Search Refinement, NTCIR-5 Workshop Meeting, 2005.
D. C. Reis, P. B. Golgher and A. S. da Silva, Automatic Web News Extraction Using Tree Edit Distance, WWW 2004, 2004.
W. Rungworawut and T. Senivongse, Using Ontology Search in the Design of Class Diagram from Business Process Model, TRANSACTION ON ENGINEERING, COMPUTING AND TECHNOLOGY V12, 2006.
M. Sabou, C. Wroe, C. Goble and G. Mishne, Learning Domain Ontologies for Web Service Descriptions:an Experiment in Bioinformatics, WWW 2005, May 1014, 2005.
M. Stevenson and R. Gaizauskas, Using Corpus-derived Name List for Named Entity Recognition, sixth conference on Applied natural language processing, 2000.
J. Wang and F. H. Lochovsky, Data Extraction and Label Assignment for Web Databases, WWW 2003, 2003.
G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, H.-J. Zhang and C.-J. Lu, Implicit Link Analysis for Small Web Search, SIGIR’03, 2003.
G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu and Z. Chen, Exploiting the Hierarchical Structure for Link Analysis, SIGIR’05, August 15–19, 2005.
R. Yang, P. Kainis and A. K. H. Tung. Similarity Evaluation on Tree-structured Data. SIGMOD, 2005.
Y. Zhai and B. Liu. Web Data Extraction Based on Partial Tree Alignment. WWW 2005, 2005.