| 研究生: |
江柏勳 Jiang, Bo-Xun |
|---|---|
| 論文名稱: |
基於自然語言處理技術之網路文件問答系統 NLP-based Question Answering System with application on WEB documents |
| 指導教授: |
蔣榮先
Chiang, Jung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2005 |
| 畢業學年度: | 93 |
| 語文別: | 中文 |
| 論文頁數: | 53 |
| 中文關鍵詞: | 搜尋引擎 、查詢建構 、自然語言 、問答系統 |
| 外文關鍵詞: | Search Engine, Query Formulator, Question Answering System, Natural Language Processing, NLP |
| 相關次數: | 點閱:145 下載:3 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自然語言問答系統為使答案正確率提高,必須對問句所詢問的意向(Intension Analysis)做分析,意即了解問句的內容。在本論文中便是透過自然語言處理技術,藉由對問句之字詞結構與所含語意做分析,協助判定問句之詢問意函。由於英文問句之疑問詞多含有對於所求答案類型之語意資訊,比如when開頭的問句所求為時間、日期等答案類型,而where則是有關地方、位置等答案類型,因此在本論文中將詢問意向界定在於問句所求之答案類型,並針對問句所求答案類型之不同做分類。
對問句做分類的目的,主要是為了能在系統後續的處理中建構各個答案類型之最佳處理策略。由於本系統是採用網路搜尋引擎作為文件擷取工具,為使擷取文件含有答案之機率提高,必須依據問句之答案類型資訊,選擇本系統所建查詢樣板庫中相對應之樣板,以轉換該問句於搜尋引擎上之最佳查詢。此外,不同答案類型的答案在文件語句的字詞結構中亦有不同的出現位置,因此針對個別的答案類型將進行不同的答案萃取處理以增加答案之正確率。
To raise the accuracy of a natural language question-answering system, it is imperative to perform intention analysis on the inquiries, that is, to understand the content of the questions. In this thesis, we analyze both the syntactic structure and semantic interpretation of the questions to diagnose the intentions through natural language processing techniques. In English, the interrogative sentences usually contain semantic information telling the type of answers expected. For instance, sentences begin with the word “when” anticipate date or time information as the responded answers, while place or location information is expected by the sentences start with the word “where”. Accordingly, we categorize the questions based on their corresponding answer types.
The main intent of such categorization of questions is to assist in the template construction of various answer types for the system processing later on. When a question is submitted to our system, it first identifies the corresponding answer type for the query and then rephrase the question into a form such that the probability of documents retrieved from public Search Engines containing the expected responses is boosted. Furthermore, since an answer can appear in different places in a sentence for different answer types, the answer retrieval process is carried out separately for individual answer type in order to increase the precision.
[1] Agichtein, E., Lawrence, S., Gravano, L. Learning Search Engine Specific Query Transformations for Question Answering. 10th WWW Conference, 2001.
[2] Allan Heydon, Marc Najork, “Mercator: A Scalable, Extensible Web Crawler”, World Wide Web, vol. 2, no. 4, pp. 219--229, 1999.
[3] Amit Singhal, “ Modern Information Retrieval: A Brief Overview”, Google, Inc.
[4] Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran, “Omnibase: Uniform access to heterogeneous data for question answering”. In Proceedings of the 7th International Workshop on Applications of Natural Language to Information Systems (NLDB 2002), 2002,
[5] Dragomir R. Radev, Weiguo Fan, Hong Qi, Harris Wu, and Amardeep Grewal, “Probabilistic question answering on the web”, in WWW ’02: Proceedings of the eleventh international conference on World Wide Web, (Honolulu, Hawaii, USA), ACM Press, 2002.
[6] Ellen Riloff, Janyce Wiebe, “Learning Extraction Patterns for Subjective Expressions”. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP-03),2003
[7] Ellen M. Voorhees, “The TREC-8 question answering track report”. In proceedings of the 8th Text REtrieval Conference (TREC), 1999
[8] Ellen M. Voorhees, “Overview of the TREC 2001 Question Answering Track”. In Proceedings of the Tenth Text REtrieval Conference (TREC), 2001.
[9] Ellen.M. Voorhees, “Overview of the TREC 2002 Question Answering Track”. In Proceedings of the Eleventh Text REtrieval Conference (TREC), 2002.
[10] Stephen Soderland, “Learning to Extract Text-based Information from the World Wide Web”. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
[11] website:TREC, http://trec.nist.gov/
[12] website:START Question Answeing System, http://start.csail.mit.edu/
[13] website:NSIR , http://tangra.si.umich.edu/clair/NSIR/html/nsir.cgi
[14] website:Monty Tagger, http://web.media.mit.edu/~hugo/montylingua/
[15] website:CIA – The World Factbook , published by the US Central Intelligence Agency, http://www.cia.gov/cia/publications/factbook/
[16] website:AnswerBus, http://www.answerbus.com/index.shtml
[17] website:Google, http://www.google.com.tw/
[18] website:Yahoo, http://tw.yahoo.com/
[19] website:Google Web APIs, http://www.google.com/apis/
[20] website:The Internet Movie Database , http://www.imdb.com/
[21] Xin Li and Dan Roth,“Learning Question Classifiers”. In Proceedings of the 19th International Conference on Computational Linguistics, 2002
[22] Zhiping Zheng,“AnswerBus question answering system”. In proceeding of 2002 Human Language Technology Conference (HLT 2002), 2002.