簡易檢索 / 詳目顯示

研究生: 林高弘
Lin, Kao-Hung
論文名稱: 使用網站結構學習問句結構之方法改進自然語言搜尋
Learning Question Structure based on Website Link Structure to Improve Natural Language Search
指導教授: 盧文祥
Lu, Wen-Hsiang
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 50
中文關鍵詞: 自然語言搜尋結構化索引
外文關鍵詞: Structural Indexing, Natural Language Search, Tri-link
相關次數: 點閱:92下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 傳統的搜尋引擎的搜尋方式皆以單一網頁為單位作索引,因此無法記錄網頁之間的順序結構關係;另外,傳統搜尋引擎對於自然語言式的問句仍然沒有較為有效的處理方式。針對上述兩個問題,首先,本論文先提出一個結構化索引方法來記錄網頁之間的連結結構關係(link structure),解決以往以單一網頁為索引單位無法記錄網頁間結構資訊的問題;此外,對於自然語言式問句的處理方式,本論文將提出一個Tri-link對應模型(TMM)來找出問句中隱含的結構,並利用這樣的結構去對應出網站之Tri-link結構,計算問句與Tri-link結構之間對應的機率數值;本論文中另外還利用啟發式(heuristic)之方法將問句平分成三等份當成問句結構,並利用「與位置相依之相似度 (PDS)」與「與位置獨立之相似度 (PIS)」兩種不同的方式計算問句結構與Tri-link之相似度,這兩種方式是以傳統向量空間模型(Vector Space Model)為基礎所變化而來。本論文之實驗方面,將比較使用者之問句對於結構化索引方式與非結構化索引方式之優劣,經由實驗結果,我們發現對於自然語言式問句來說,結構化索引方式之搜尋正確率將會高於非結構化索引的方式。

    The indexing method of conventional search engine always regards one page (or document) as primary indexing unit. Therefore, we cannot record the structural information between web pages. Furthermore, conventional search engines cannot deal with natural language questions effectively. For alleviating these two problems, first, we propose a structural indexing method to record the link structure information between pages. Second, we propose a Tri-link Mapping Model (TMM) to extract question implicitly structures embedded in user’s natural language questions. On the basis of question structures, we can map relevant Tri-links in the relevant website. We also propose a heuristic method to extract question structure by dividing question into three parts, and calculate the similarity with Tri-link based on the Position Dependent Similarity (PDS) and Position Independent Similarity (PIS). These two kinds of similarity measures are based on conventional Vector Space Model. The performance of structural indexing method and conventional indexing method will be analyzed according to our experiments. Experimental results show that structural indexing method can get higher precision than conventional indexing method for natural language search.

    摘要.....................................................................I Abstract................................................................II 誌謝...................................................................III 目錄....................................................................IV 圖目錄..................................................................VI 表目錄................................................................VIII 第一章 序論.............................................................1 1.1 前言................................................................1 1.2 研究動機............................................................1 1.3 解決方法............................................................4 1.4 論文章節架構........................................................5 第二章 文獻回顧與相關研究...............................................6 2.1 結構化比對的相關研究................................................6 2.2 網頁連結結構分析與階層式的排序概念..................................6 2.3 網頁結構建立及探勘方法..............................................7 2.4 自然語言搜尋與問答系統..............................................8 第三章 結構化索引與相似度計算方法.......................................9 3.1 結構化的索引方法...................................................10 3.2 問句結構的擷取與相似度計算.........................................12 3.2.1 啟發式之問句結構擷取與計算方法...................................12 3.2.1.1 與位置獨立之相似度(Position-Independent Similarity, PIS).....13 3.2.1.2 與位置相依之相似度(Position-Dependent Similarity, PDS).......15 3.2.2 透過網站結構學習問句結構並計算相似度.............................17 3.2.2.1 Tri-link對應模型(Tri-link Mapping Model).....................20 3.3 章節回顧...........................................................24 第四章 實驗設計與結果分析..............................................25 4.1 實驗資料之取得與分析...............................................25 4.2 實驗方法與正確率統計方式...........................................26 4.2.1 實驗方法.........................................................27 4.2.2 正確率之統計方式.................................................30 4.3 實驗結果之分析與討論...............................................32 4.3.1 自然語言式問句之準確率分析.......................................32 4.3.2 問句結構階層數對於TMM準確率之影響................................41 4.4 討論及章節回顧.....................................................44 第五章 結論與未來研究方向..............................................45 5.1 結論...............................................................45 5.2 未來研究方向.......................................................45 參考文獻................................................................48

    K. Anyanwu, A. Maduko and A. Sheth. SemRank: Ranking complex Relationship Search Results on the Semantic Web. WWW 2005, 2005.
    R. Bekkerman and A. McCallum, Disambiguating Web Appearances of People in a Social Network, WWW 2005, May 1014, 2005.
    R. Baeza-Yates and B. Ribeiro, Modern Information Retrieval, Addison-Wesley, 1999.
    S. Chakrabarti, M. Joshi and V. Tawde, Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks, SIGIR'01, September 9-12, 2001.
    S. Chakrabarti, Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction, WWW10, May 1-5, 2001.
    J. Chu-Carroll, J. Prager, Y. Ravin and C. Cesar. A Hybrid Approach to Natural Language Web Search. ACL, 2002.
    D. D. Lewis and K. S. Jones, Natural Language Processing for Informantion Reterival, Communications of the ACM, 1996
    Y. Hu, G. Xin, R. Song, G. Hu, S. Shi, Y. Cao, and H. Li , Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval, SIGIR’05, 2005.
    K. Kailing, H.-P. Kriegel, S. Schonauer and T. Seidl, Efficient Similarity Search for Hierarchical Data in Large Databases, Proc. 9th Int. Conf. on Extending Database Technology, pp. 676-693, Heraklion, Greece, 2004.
    H.-Y. Kao, J.-M. Ho, and M.-S. Chen, DOMISA: DOM-based Information Space Adsorption for Web Information Hierarchy Mining, Proc. of the 4th SIAM Intern'l Conference on Data Mining (SDM-04), April 22-24, 2004.
    J. W. Kim, K. S. Candan and M. E. Dönderler, Topic Segmentation of Message Hierarchies for Indexing and Navigation Support, WWW 2005, May 1014, 2005.
    J. Kleinberg, Authoritative sources in a hyperlinked environment, Proc. ACM-SIAM Symposium on Discrete Algorithms, 1998. Also appears as IBM Research Report RJ 10076(91892) May 1997, and at http://www.cs.cornell.edu/home/kleinber/.
    R. Kraft and J. Zien, Mining Anchor Text for Query Refinement, WWW, 2004, May 17–22, 2004.
    C.-W. Lee, C.-W. Shih, M.-Y. Day, T.-H. Tsai, T.-J. Jiang, C.-W. Wu, C.-L. Sung, Y.-R. Chen, S.-H. Wu, W.-L. Hsu, ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA, NTCIR-5 Workshop Meeting, 2005.
    C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999.
    S. Melnik, H. Garcia-Molina and E. Rahm, Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching, Published in Proc. 18th Intl. Conf. on Data Engineering (ICDE), San Jose CA, 2002.
    A. Nierman and H. V. Jagadish, Evaluating Structural Similarity in XML Documents, WebDB, 2002.
    C. Patel, K. Supekar, Y. Lee and E. K. Park, OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification, WIDM’03, 2003.
    D. R. Radev, H. Qi, Z. Zheng, S. B.-Goldensohn, Z. Zhang, W. Fan and J. Prager, Mining the Web for Answers to Natural Language Questions, CIKM’01, 2001.
    D. Ramamonjisoa, Question Answering System with Fine Grain Answer Types and Search Refinement, NTCIR-5 Workshop Meeting, 2005.
    D. C. Reis, P. B. Golgher and A. S. da Silva, Automatic Web News Extraction Using Tree Edit Distance, WWW 2004, 2004.
    W. Rungworawut and T. Senivongse, Using Ontology Search in the Design of Class Diagram from Business Process Model, TRANSACTION ON ENGINEERING, COMPUTING AND TECHNOLOGY V12, 2006.
    M. Sabou, C. Wroe, C. Goble and G. Mishne, Learning Domain Ontologies for Web Service Descriptions:an Experiment in Bioinformatics, WWW 2005, May 1014, 2005.
    M. Stevenson and R. Gaizauskas, Using Corpus-derived Name List for Named Entity Recognition, sixth conference on Applied natural language processing, 2000.
    J. Wang and F. H. Lochovsky, Data Extraction and Label Assignment for Web Databases, WWW 2003, 2003.
    G.-R. Xue, H.-J. Zeng, Z. Chen, W.-Y. Ma, H.-J. Zhang and C.-J. Lu, Implicit Link Analysis for Small Web Search, SIGIR’03, 2003.
    G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu and Z. Chen, Exploiting the Hierarchical Structure for Link Analysis, SIGIR’05, August 15–19, 2005.
    R. Yang, P. Kainis and A. K. H. Tung. Similarity Evaluation on Tree-structured Data. SIGMOD, 2005.
    Y. Zhai and B. Liu. Web Data Extraction Based on Partial Tree Alignment. WWW 2005, 2005.

    下載圖示 校內:2007-08-04公開
    校外:2007-08-04公開
    QR CODE