| 研究生: |
吳子華 Ng, Chi-Wa |
|---|---|
| 論文名稱: |
建置多項問題類型中文問答系統之研究 Multi- Question Type for Chinese Question Answering |
| 指導教授: |
王惠嘉
Wang, Hei-Chia |
| 學位類別: |
碩士 Master |
| 系所名稱: |
管理學院 - 資訊管理研究所 Institute of Information Management |
| 論文出版年: | 2010 |
| 畢業學年度: | 98 |
| 語文別: | 中文 |
| 論文頁數: | 57 |
| 中文關鍵詞: | 問答系統 、資訊檢索 、答案排序 |
| 外文關鍵詞: | Question Answering, Information Retrieval, Information Ranking |
| 相關次數: | 點閱:87 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著科技的發展,網路資訊量增加,帶動了Information Retrieval (IR)的發展,其中最著名的是搜尋引擎,方便人們在網路中找出所需要的資訊。可是,搜尋引擎目前仍有一些缺點存在:搜尋結果包含過多不重要訊息,以及無法利用自然語言進行查詢。基於搜尋引擎無法解決以上問題,造就了問答系統(Question Answering)的發展。
目前,問答系統受到各大型研討會(如:TREC)的重視。可是,現在所發展的問答系統多為模擬陳述問答系統,這種問答系統可處理的問題類型,主要包括:人(PERSON)、地(LOCATION)、組織(ORG)、時間(TIME)、數字(NUMBER)、物(ARTIFACT)六項。這些類型的問題的回答方式多以一個名詞或數詞作為答案來回覆使用者,所以這些答案擷取的方法,一般只環繞著答案字詞的處理;對於較複雜的問題時(如:“如何”和“定義”問題),這些可能需要擷取一句句子或某文章某一段落作為答案來回覆使用者,對於這類的問題應用以句子作為答案,傳統的字詞處理是沒有太大的幫助。
為解決使用者不同問題類型的問答系統,本論文主要研究綜合性問答系統,包括:簡單的模擬陳述類型問題,以及較複雜的如何(HOW)及定義(DEFINITION)的問題類型。複雜問題的答案主要是要尋找某方法、某原因或某定義的句子,這些都是擷取字詞答案方法未能有效處理特定的問題類型,所以本研究結合應用classification、co-occurrence、pattern來解決不同類型問題,進行答案擷取。本研究針對如何(HOW)以及定義(DEFINITION)的問題類型,提出兩種答案擷取方法—Okapi BM25_scoqat、NQPS,分別進行這兩種類型的答案排序。
在驗證部分,主要驗證本研究所提出的方法對於不同問題類型的問題處理能力,分別進行四個實驗。最後,本研究提出結論及未來發展。
In recent years, increasing network information which assist the development of Information Retrieval (IR). The search engine is the most famous, which people using conveniently to find the information. However, there are some defects of it: the searching results include much irrelative information, and it cannot use the nature language sentences to query. Because of these problem of search engine, many researches start to toward to the development of question answering.
Since 1990s, many conferences focus on the development (such as: TREC). However, most of them focus on factoid type question. The factoid type includes 6 fine types generally: PERSON, LOCATION, ORGANIZATION, TIME, NUMBER, and ARTIFACT. The question answering system always responses several terms to the users. Thus, most of the answer ranking methods focus on extraction of terms. In the complex types of questions (“HOW” and “DEFINITIONAL”). The users need complete sentences as answers. However, it is inefficient to use the typical terms processes of answer ranking to extract answer.
In this paper, we propose a Chinese Question Answering system for supporting different types (“factoid type”, “HOW” and “DEFINITION”). We use many methods (classification, co-occurrence, pattern) to process answers. We use complex features approach, Okapi BM25_scoqat and NQPS for “facoid”, “how”, and “definition” respectively. The results showed that the proposed approach improves the performances of answer extraction.
In experiments, we have four experiments for different answering methods. Finally , we also have discuss and future work.
Ali, S., & Smith-Miles, K. A. (2006). A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1-3), 173-186.
Buscaldi, D., Rosso, P., Gomez-Soriano, J. M., & Sanchis, E. (2010). Answering questions with an n-gram based passage retrieval engine. Journal of Intelligent Information Systems, 34(2), 113-134.
Carmen, M. G., & A, L. L. (2008). Gathering Definition Answers by Information Gain. Paper presented at the 9th International Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel.
Chen, K. J., & Liu, S. H. (1992). Word Identification for Mandarin Chinese Sentences. Paper presented at the COLING-92, Nantes, France.
Cui, H., Kan, M. Y., & Chua, T. (2007). Soft pattern matching models for definitional question answering. Acm Transactions on Information Systems, 25(2).
Denicia-Carral, C., Montes-y-Gomez, M., Villasenor-Pineda, L., & Hernandez, R. G. (2006). A text mining approach for definition question answering. Paper presented at the Advances in Natural Language Processing, Proceedings, Berlin.
Dumais, S., Banko, M., Brill, E., Lin, J., & Ng, A. (2002). Web Question Answering: Is More Always Better? Paper presented at the SIGIR' 02, Tampere, Finland.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive Learning Algorithms and Representations for Text Categorization. Paper presented at the Seventh International Conference on Information and Knowledge Management, Queensland.
Echihabi. A, Hermjakob. U , Hovy. E, Marcu. D, Melz. E, & Ravichandran. D. (2003). Multiple engine question answering in textmap. Paper presented at the Proceedings of the Twelfth Text REtreival Conference (TREC 2003), Gaithersburg, MD.
Ferrandez, O., Izquierdo, R., Ferrandez, S., & Vicedo, J. L. (2009). Addressing ontology-based question answering with collections of user queries. Information Processing & Management, 45(2), 175-188.
Guo, Q. L., & Zhang, M. (2009). Question answering based on pervasive agent ontology and Semantic Web. Knowledge-Based Systems, 22(6), 443-448.
Han, K. S., Song, Y. I., Kim, S. B., & Rim, H. C. (2007). Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology. Information Processing & Management, 43(2), 353-364.
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Hickl, A., & Andwang, P. (2005). Employing two question answering systems in TREC-2005. Paper presented at the the Fourteenth Text REtrieval Conference.
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Williams, J., & Bensley, J. (2003). Answer mining by combining extraction techniques with abductive reasoning. Paper presented at the TREC 2003.
Hu, H. Q., Jiang, P. L., Ren, F. J., & Kuroiwa, S. (2006). A new Question Answering system for Chinese restricted domain. Ieice Transactions on Information and Systems, E89D(6), 1848-1859.
James, A., Wade, C., & Bolivar, A. (2003). Retrieval and Novelty Detection at the Sentence Level. Paper presented at the the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (SIGIR ’03), Toronto, Canada.
Jijkoun, V., & de Rijke, M. (2004). Answer selection in a multi-stream open domain question answering system. Advances in Information Retrieval, Proceedings, 2997, 99-111.
Kim, S., & Oh, S. (2009). Users' Relevance Criteria for Evaluating Answers in a Social Q&A Site. Journal of the American Society for Information Science and Technology, 60(4), 716-727.
Ko, Y., & Seo, J. (2008). An effective sentence-extraction technique using contextual information and statistical approaches for text summarization. Pattern Recognition Letters, 29(9), 1366-1371.
Kosseim, L., & Yousefi, J. (2008). Improving the performance of question answering with semantically equivalent answer patterns. Data & Knowledge Engineering, 66(1), 53-67.
Kwok, C., Etzioni, O., & Weld, D. S. (2001). Scaling Question Answering to the Web. Paper presented at the WWW’01.
Lancaster, F. W. (1965). The Baseball Program - an Automatic Question-Answerer - Wolf,Ak, Chomsky,Cs, Green,Bf. American Documentation, 16(1), 39-39.
Lee, C. W., Day, M. Y., Sung, C. L., Lee, Y. H., Jiang, T. J., Wu, C. W., et al. (2005). ASQA: Academia Sinica Question Answering System for NTCIR-5 CLQA. Paper presented at the NTCIR-5 Workshop Meeting, Tokyo, Japan.
Lee, C. W., Day, M. Y., Sung, C. L., Lee, Y. H., Jiang, T. J., Wu, C. W., et al. (2007). Chinese-Chinese and English-Chinese Question Answering with ASQA at NTCIR-6 CLQA. Paper presented at the Computational Linguistics and Chinese Language Processing Tokyo, Japan.
Lin, C. Y., & Hov, E. (2003). Automatic evaluation of summaries using N-gram co-occurrence statistics(HLT-NAACL-2003). Paper presented at the Human Technology Conference 2003, Edmonton, Canada.
Lin, S. J., Shia, M. S., Lin, K. H., Lin, J. H., Yu, S., & Lu, W. H. (2005). Improving Answer Ranking Using Cohesion between Answer and Keywords. Paper presented at the NTCIR-5, Tokyo, Japan.
Magnini, B., Negri, M., Prevete, R., & Tanev, H. (2001). Is it the right answer?: exploiting web redundancy for Answer Validation. Paper presented at the the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA.
Otterbacher, J., Erkan, G., & Radev, D. R. (2009). Biased LexRank: Passage retrieval using random walks with question-based priors. Information Processing & Management, 45(1), 42-54.
Perez-Coutino, M., Solorio, T., Montes-y-Gomez, M., Lopez-Lopez, A., & Villasenor-Pineda, L. (2005). Question answering for Spanish supported by lexical context annotation. In C. Peters, P. Clough, J. Gonzalo, G. J. F. Jones, M. Kluck & B. Magnini (Eds.), Multilingual Information Access for Text, Speech and Images (Vol. 3491, pp. 502-511). Berlin: Springer-Verlag Berlin.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. Paper presented at the TREC-3, Third Text REtrieval Conf., National Institute of Standards and Technology, Gaithersburg, MD.
Rollinger, C. R., & Herzog, O. (1991). Introducing Lilog. Lecture Notes in Artificial Intelligence, 546, 3-13.
Ruiz-Casado, M., Alfonseca, E., & Castells, P. (2007). Automatising the learning of lexical patterns: An application to the enrichment of WordNet by extracting semantic relationships from Wikipedia. Data & Knowledge Engineering, 61(3), 484-499.
Shen, D., Kruijff, G. J. M., & Klakow, D. (2005). Exploring syntactic relation patterns for question answering. In R. Dale, K. F. Wong, J. Su & O. Y. Kwong (Eds.), Natural Language Processing - Ijcnlp 2005, Proceedings (Vol. 3651, pp. 507-518). Berlin: Springer-Verlag Berlin.
Shim, B., Ko, Y., & Seo, J. (2006). Using IS-A relation patterns for factoid questions in Question Answering systems. Ieice Transactions on Information and Systems, E89D(12), 2985-2989.
Soricut, R., & Brill, E. (2006). Automatic Question Answering using the Web: Beyond the factoid. Information Retrieval, 9(2), 191-206.
Sung, C. L., Lee, C. W., Yen, H. C., & Hsu, W. L. (2009). Alignment-based surface patterns for factoid question answering systems. Integrated Computer-Aided Engineering, 16(3), 259-269.
Wang, X. J., Tu, X., Feng, D., & Lei, Z. (2009). Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning. Paper presented at the SIGIR' 09, Boston, Massachusetts.
Weizenba.J. (1966). Eliza - a Computer Program for Study of Natural Language Communication between Man and Machine. Communications of the Acm, 9(1), 36-&.
Wilensky, R., Chin, D. N., Luria, M., Martin, J., Mayfield, J., & Wu, D. K. (2000). The Berkeley UNIX consultant project. Artificial Intelligence Review, 14(1-2), 43-88.
Winograd, T. (1972). Understanding Natural Language. Cognitive Psychology, 3(1), 1-191.
Woods, W. A. (1973). Progress in natural language understanding: an application to lunar geology. Paper presented at the Proceedings of the June 4-8, 1973, national computer conference and exposition, New York, New York.
Wu, C. H., Yeh, J. F., & Lai, Y. S. (2006). Semantic segment extraction and matching for Internet FAQ retrieval. Ieee Transactions on Knowledge and Data Engineering, 18(7), 930-940.
Yang, S. Y., Chuang, F. C., & Ho, C. S. (2007). Ontology-supported FAQ processing and ranking techniques. Journal of Intelligent Information Systems, 28(3), 233-251.
Zhang, X., Hao, Y., Zhu, X. Y., & Li, M. (2008). New information distance measure and its application in question answering system. Journal of Computer Science and Technology, 23(4), 557-572.
Zhou, W., Yu, C., Smalheiser, N., Torvik, V., & Hong, J. (2007). Knowledge-intensive Conceptual Retrieval and Passage Extraction of Biomedical Literature. Paper presented at the SIGIR' 07, Amsterdam.
校內:2013-06-29公開