簡易檢索 / 詳目顯示

研究生: 黃仲謙
Huang, Jhong-Cian
論文名稱: 基於問答語句與自由對話分析之口語對話系統
Based on Question Answer Pairs and Free Talk Analysis for Spoken Dialogue System
指導教授: 王駿發
Wang, Jhing-Fa
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 54
中文關鍵詞: 句子相似度自動問答系統資訊檢索社區問答自由對話
外文關鍵詞: Sentence similarity, Automatic question answering system, Information retrieval, Community Question Answering website (CQA), Free talk
相關次數: 點閱:116下載:16
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究提出一個基於詞序之改良句子相似度與快速文本匹配之對話系統,利用句子相似度去計算其數值及找尋對應之語句,當資料庫不足時則轉換至自由對話利用社區問答作其輔助及回答。將輸入語音透過ASR轉換成文字後,透過CKIP斷詞系統進行斷詞,經過人機介面篩選指令句子,並將其一般語句傳遞到對話系統,接著利用已經預處理之詞袋模型進行向量化。由於資料庫所有語句皆作過了預處理以及向量化並置入向量空間模型,再將輸入語句之向量與向量空間模型作句子相似度運算,本篇提出改良式之混和相似度演算,當此相似度大於自定義混和相似度之閥值,則會進行詞序之編輯距離運算,計算是否具有詞序關係,並且選擇最高相似度之句子作為輸出回答語句。為了使對話系統更加有趣,我們導入影像情緒辨識,去辨識每一對話回合之心情,再去找出最大回合數之心情,接著根據心情語料去作心情對話。當資料庫不足時,則會轉向自由對話去處理任何類型的語句,本篇利用元搜尋引擎(meta search engine)去作為搜尋答案的媒介,第一層為Google搜尋引擎之資料,第二層為百度知道之社區問答,利用兩層式的搜尋方式去尋找相關資料,對這些資料作答案信心度之計算,利用贊同數、排序數、問句相似度以及關鍵字詞密度去作計算以及利用同義詞林對關鍵字擴展增強以及增加關鍵字找尋之強健性,並將其答案語音合成並且輸出。

    This thesis presents the based-on question answering system for spoken dialogue. Utilizing sentence similarity to calculate the score and finding the corresponding sentence are main parts. When the corpus is insufficient, the user-generated answer is generated form Free Talk. In our system, the ASR transcription is processed through Chinese Knowledge and Information Processing (CKIP) Chinese words segmentation system. Then, filtering the command sentences through human machine interface, remaining normal sentences are passed the dialogue system, and normal sentences are vectorized through preprocess bag of word. All sentences of corpus have been preprocessed and vectorized in vector space model, then, input sentence is calculated by sentence similarity with vector space model. This thesis presents hybrid sentence similarity. When the original sentence similarity is over the threshold of word order, it will execute additional edit distance to support original sentence similarity, and to calculate similarity. Selecting the corresponding answer, which is highest similarity, to be our response. In order to make system more enjoyable, we adapt image recognition to record the round emotions. According to the round emotions, to response the emotion conversation. When the similarity is lower than threshold, it will be delivered to Free Talk to process any type sentences. This thesis adapts the concept of meta search engine to be the medium of search answer. The first level is google search engine, the second level is baidu search engine of Community Question Answering website (CQA). Using the method of two level to search relevant information, and calculating the answer confidence for the answers. Answer confidence utilizes the agree, rank, sentence similarity and keyword density to be calculation, in keyword density, using Chinese synonym forest to do synonym extension. Finally, the answer can output to the human machine interface.

    中文摘要 I Abstract II 誌謝 IV Content V Table List VII Figure List VIII Chapter1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Objectives 3 1.4 Organization 3 Chapter2 Related Work 4 2.1 Overview of Question Answering System 4 2.2 Overview of Sentence Similarity Measure and Preprocessing 5 2.2.1 Symbolic Sentence Similarity based on Word Set 6 2.2.2 Symbolic Sentence Similarity based on Edit Distance 7 2.2.3 Semantic Sentence Similarity based on WordNet 7 2.2.4 Structural Sentence Similarity based on Dependency Relationship 8 2.3 Community Question Answering Website of IR 9 2.4 Overview of Answer Validation 10 Chapter3 Based on QA pair and Free talk system 11 3.1 System Overview 11 3.2 Pre-Processing 12 3.2.1 Frame Overview 12 3.2.2 Segmentation & Feature Extraction 13 3.2.3 Word Weighting 14 3.2.4 Vectorization 16 3.3 Closed-domain QA system and Emotion Conversation 17 3.3.1 Frame Overview 17 3.3.2 Vector Space Model 19 3.3.3 Emotion Confidence 21 3.3.4 Hybrid Similarity 22 3.4 Free Talk system 29 3.4.1 Framework Overview 29 3.4.2 Metasearch Engine 30 3.4.3 Passage Retrieval 32 3.4.4 Syntactic Analysis 35 3.4.5 Answer Confidence 36 3.4.6 Answer Extraction 40 Chapter4 Experimental Results 42 4.1 Experiment for Closed-domain QA system 42 4.1.1 Corpus 42 4.1.2 Evaluation methods 43 4.1.3 Experimental Results 44 4.2 Experiment for Free Talk 48 Chapter5 Conclusions and Future Works 50 5.1 Conclusions 50 5.2 Future Works 51 Reference 52

    [1] Y. Wilks and R. Catizone, "Human-computer conversation," arXiv preprint cs/9906027, 1999.
    [2] J. S. Brown, "Terry Winograd. Understanding natural language. New York: Academic Press, 1972, $8.95," ed: Wiley Online Library, 1973.
    [3] D. C. U. Press, "Introduction to Information Retrieval," April 1 2009.
    [4] M. Dillon, "Introduction to modern information retrieval: G. Salton and M. McGill. McGraw-Hill, New York (1983). xv+ 448 pp., $32.95 ISBN 0-07-054484-0," ed: Pergamon, 1983.
    [5] R. Baeza-Yates, "Information Retrieval: Data Structure Algorithms," PTR Prentice-Hall. dictionary. IEICE TRANSACTIONS on Information and Systems, pp. 227-239, 2004.
    [6] T. Misu and T. Kawahara, "Speech-based interactive information guidance system using question-answering technique," in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, 2007, pp. IV-145-IV-148.
    [7] J. Wenqian, L. Zhoujun, C. Wenhan, and C. Xiaoming, "A new method for calculating similarity between sentences and application on automatic abstracting," Intelligent Information Management, vol. 1, p. 36, 2009.
    [8] T. P. Sahu, N. K. Nagwani, and S. Verma, "Selecting Best Answer: An Empirical Analysis on Community Question Answering Sites," IEEE Access, vol. 4, pp. 4797-4808, 2016.
    [9] Z. Y.-Z. LIU Kang, JI Guo-Liang, LAI Si-Wei, ZHAO Jun, "Representation Learning for Question Answering over Knowledge Base: An Overview," Acta Automatica Sinica, vol. 42, pp. 807-818, 2016-06-20 2016.
    [10] H. Sugiyama, T. Meguro, R. Higashinaka, and Y. Minami, "Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures."
    [11] a. jeeves, "http://www.ask.com/."
    [12] H. P. Grice, "Logic and conversation."
    [13] 秦兵, 刘挺, 王洋, 郑实福, and 李生, "基于常问问题集的中文问答系统研究."
    [14] H. Cui, D. Cai, and X. Miao, "Research on Web-based Chinese question answering system and answer extraction," Journal of Chinese Information Processing, vol. 18, pp. 24-31, 2004.
    [15] M. C. Lee, "A novel sentence similarity measure for semantic-based expert systems," Expert Systems with Applications, vol. 38, pp. 6392-6399, 2011.
    [16] Z. Sui and S. Yu, "The skeletal-dependency-tree-based computational model for the sentence similarity," in Proc. of the Conf. of Chinese Information Processing, 1998, pp. 458-465.
    [17] P. W. Foltz, W. Kintsch, and T. K. Landauer, "The measurement of textual coherence with latent semantic analysis," Discourse processes, vol. 25, pp. 285-307, 1998.
    [18] L. Ru, W. Zhiqiang, L. Shuanghong, L. Jiye, and C. Baker, "Chinese sentence similarity computing based on frame semantic parsing [J]," Journal of Computer Research and Development, vol. 50, pp. 1728-1736, 2013.
    [19] J. Allen, Natural Language Understanding: Benjamin/Cummings Publishing Company, 1995.
    [20] P. Zhang, Z. Zhang, W. Zhang, and C. Wu, "Semantic Similarity Computation Based on Multi-feature Combination using HowNet," 2014.
    [21] E. distance, " https://en.wikipedia.org/wiki/Edit_distance."
    [22] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American society for information science, vol. 41, p. 391, 1990.
    [23] 胡綾真, "應用潛藏式語意分析於護理診斷之決策," 2007 年生物醫學工程科技研討會暨國科會醫學工程學門成果發表會, pp. 1164-1167, 2007.
    [24] Q. Liu and S. Li, "Word similarity computing based on How-net," Computational Linguistics and Chinese Language Processing, vol. 7, pp. 59-76, 2002.
    [25] 赵白玉 and 彭黎, "基于依存句法分析的中文文本相似度计算研究," 2012.
    [26] G. A. Miller, "WordNet: a lexical database for English," Communications of the ACM, vol. 38, pp. 39-41, 1995.
    [27] HowNwt, "http://www.keenage.com."
    [28] 江玉婷 and 陳光華, "TREC 現況及其對資訊檢索研究之影響."
    [29] A. Aizawa, "An information-theoretic perspective of tf–idf measures," Information Processing & Management, vol. 39, pp. 45-65, 2003.
    [30] E. S. Ristad and P. N. Yianilos, "Learning string-edit distance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, pp. 522-532, 1998.
    [31] J. Jia, E. Song, and H. Su, "Research on assessment of answer quality in social Q&A platform," Journal of Information Resources Management (in Chinese), vol. 3, pp. 19-28, 2013.
    [32] W.-N. Zhang, Z.-Y. Ming, Y. Zhang, T. Liu, and T.-S. Chua, "Capturing the Semantics of Key Phrases Using Multiple Languages for Question Retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 28, pp. 888-900, 2016.
    [33] 杨栩灼, "中文元搜索引擎发展研究," 科技情报开发与经济, vol. 21, pp. 121-124, 2011.
    [34] Y. Answers, "https://answers.yahoo.com/answer."
    [35] W. Answer, "http://answers.wikia.com/wiki/HOW_IS_TAIWAN?state=asked."
    [36] B. Zhidao, "https://zhidao.baidu.com."
    [37] G. Trend, "http://www.google.com/trends/."
    [38] P.-H. Su, "基於快速句子相似度匹配演算法之 QA 文本對話系統," 成功大學電機工程學系學位論文, pp. 1-54, 2016.

    下載圖示 校內:2022-08-04公開
    校外:2022-08-04公開
    QR CODE