| 研究生: |
尹崇珂 Yin, Chung-Ko |
|---|---|
| 論文名稱: |
應用語者向量及上下文追蹤於收話者選擇與深度強化學習之多人對話系統 Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversational Systems |
| 指導教授: |
吳宗憲
Wu, Chung-Hsien |
| 學位類別: |
碩士 Master |
| 系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
| 論文出版年: | 2018 |
| 畢業學年度: | 106 |
| 語文別: | 英文 |
| 論文頁數: | 56 |
| 中文關鍵詞: | 對話系統 、多人對話系統 、語者向量 、回應時機選擇 、收話者選擇 、強化學習 |
| 外文關鍵詞: | Multi-party Conversational System, Speaker Embedding, Response Timing Decision, Addressee Selection, Reinforcement Learning |
| 相關次數: | 點閱:166 下載:0 |
| 分享至: |
| 查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文建立了一個多人對話系統,並針對上下文追蹤(Context Tracking)、回應時機選擇(Response Timing Decision)、收話者選擇(Addressee Selection)及回應生成/選擇(Response Selection/Generation)等進行研究。在聊天的過程中,系統需要聆聽使用者說的話,紀錄使用者的歷史資訊和偵測使用者意圖,透過雙編碼器(Dual Encoder)對使用者輸入語句做回合向量表示及個別使用者及整場對話的上下文追蹤,並在適當的時機對合適的收話者給予回覆。據我們目前所知,本論文是第一個將基於強化學習(Reinforcement Learning)的對話決策使用在多人對話系統上的研究,根據收話者不同,對話決策時所考量的觀察狀態也會有所不同,系統會根據收話者不同選擇不同的動作。決定了系統回應動作後,本論文利用Transformer模型產生回應模板,再利用聊天過程中所記錄的使用者歷史資訊做填詞得到系統回應。
本論文使用MHMC多人聊天對話語料庫,總共包含331場與臺灣旅遊相關的多人聊天對話。在回應時機及收話者選擇的實驗中,準確率為81.71%,比過去的多人對話系統研究的方法SI-RNN好,我們也針對目前實驗的結果做了分析;在對話決策的部分,觀察狀態若考量收話者的語者向量時的獎勵值為0.42,平均回合數為71.13回合,而沒有考量的語者向量的獎勵值為0.39,平均回合數為41.32步。
This thesis develops a multi-party conversational system and discusses related issues, including context tracking, response timing decision, addressee selection, and response generation. The constructed conversational system extracts the user information and detects the user intent during the conversation. A dual encoder model is adopted to encode the user input, track dialog context and user context individually, and then provide a suitable response to the selected addressee(s) at an appropriate time. As far as we know, this thesis is the first to apply the reinforcement-learning-based dialog policy to a multi-party conversational system. As the observation state considered in the dialog state could be different for different selected addressees, the system thus chooses the action according to the selected addressee. A transformer model is employed to generate the response template after the system decides the dialog act. Finally, slot values are obtained for the generated templates from the user’s historical information, and are placed into the corresponding semantic slots.
This thesis collected a multi-party chatting dataset (named MHMC), consisting of 331 dialogs in the travel domain. Experimental results of the response timing decision and address selection show that the proposed method achieved an accuracy of 81.71%, outperforming an existing multi-party conversational system (SI-RNN). For dialog policy decision, the reward is 0.42 and the average number of turns is 71.13 when considering the speaker embedding of the addressee in the dialog state. The effect is better than that without considering speaker embedding, for which the reward is 0.39 and the average turn number is 41.32.
[1] Apple Siri. Available: https://www.apple.com/ios/siri/
[2] Google Assistant. Available: https://assistant.google.com/#?modal_active=none
[3] Microsoft Cortana. Available: https://www.microsoft.com/en-us/cortana
[4] Amazon Alexa. Available: https://www.alexa.com/
[5] Statista - Size of the chatbot market worldwide. Available: https://www.statista.com/statistics/656596/worldwide-chatbot-market/
[6] D. Traum, "Issues in Multiparty Dialogues," Berlin, Heidelberg, 2004, pp. 201-211: Springer Berlin Heidelberg.
[7] R. Zhang, H. Lee, L. Polymenakos, and D. Radev, "Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs," ArXiv e-prints, Accessed on: September 01, 2017
[8] H. Ouchi and Y. Tsuboi, "Addressee and Response Selection for Multi-Party Conversation," 2016.
[9] R. Nishimura, Y. Todo, K. Yamamoto, and S. Nakagawa, "Chat-like spoken dialog system for a multi-party dialog incorporating two agents and a user," Proceedings of iHAI, vol. 13, 2013.
[10] K. Dohsaka, R. Asai, R. Higashinaka, Y. Minami, and E. Maeda, "Effects of conversational agents on human communication in thought-evoking multi-party dialogues," in Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, London, United Kingdom, 2009, pp. 217-224, 1708408: Association for Computational Linguistics.
[11] O. Akhtiamov, D. Ubskii, E. Feldina, A. Pugachev, A. Karpov, and W. Minker, "Are You Addressing Me? Multimodal Addressee Detection in Human-Human-Computer Conversations," Cham, 2017, pp. 152-161: Springer International Publishing.
[12] S. Ravuri and A. Stolcke, "Recurrent neural network and lstm models for lexical utterance classification," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[13] R. Nishimura and S. Nakagawa, "Response timing generation and response type selection for a spontaneous spoken dialog system," in Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on, 2009, pp. 462-467: IEEE.
[14] N. Jovanovic and R. o. d. Akker, "Towards Automatic Addressee Identification in Multi-party Dialogues," in Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004, 2004.
[15] R. S. Wallace, The anatomy of A.L.I.C.E. 2009, pp. 181-210.
[16] O. Vinyals and Q. Le, "A Neural Conversational Model," ArXiv e-prints, Accessed on: June 01, 2015
[17] Y. Wu, W. Wu, Z. Li, and M. Zhou, "Response Selection with Topic Clues for Retrieval-based Chatbots," ArXiv e-prints, Accessed on: April 01, 2016
[18] R. Lowe, N. Pow, I. Serban, and J. Pineau, "The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems," ArXiv e-prints, Accessed on: June 01, 2015
[19] T.-H. Wen, M. Gasic, D. Kim, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, "Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking," ArXiv e-prints, Accessed on: August 01, 2015
[20] T.-H. Wen, M. Gasic, N. Mrksic, P.-H. Su, D. Vandyke, and S. Young, "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems," ArXiv e-prints, Accessed on: August 01, 2015
[21] Z. Yang, W. Chen, F. Wang, and B. Xu, "Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets," ArXiv e-prints, Accessed on: March 01, 2017
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," ArXiv e-prints, Accessed on: June 01, 2017
[23] 朱慶明, 現代漢語: 實用語法分析. 清華大學出版社, 2005.
[24] 詹益綾 and 柯華葳, "由眼動資料探討連接詞在閱讀歷程中扮演的角色," 教育心理學報, vol. 42, no. 2, pp. 297-316, 2010.
[25] 賴慶雄, 關聯詞造句手冊. 螢火蟲, 2000.
[26] 陳美慧, "國小三年級學童國語文關聯詞語知識結構之探究," 國立臺中教育大學教育測驗統計研究所碩士論文. 未出版, 2006.
[27] 程祥徽 and 田小琳, 現代漢語. 三聯書店(香港)有限公司, 1989.
[28] 張藍尹, "中文關聯詞測驗與閱讀理解相關之研究-以一到四年級為例," 進修部語文教育碩學位在職專(暑), 臺東大學, 2009.
[29] J. Sun, "‘Jieba’Chinese word segmentation tool," 2012.
[30] Gensim. Available: https://radimrehurek.com/gensim/
[31] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," ArXiv e-prints, Accessed on: June 01, 2014
[32] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," ArXiv e-prints, Accessed on: December 01, 2014
[33] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[34] S. Thrun and A. Schwartz, "Issues in using function approximation for reinforcement learning," in Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum, 1993.
[35] H. van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-learning," ArXiv e-prints, Accessed on: September 01, 2015
[36] Z. Yu, Z. Xu, A. W. Black, and A. Rudnicky, "Strategy and policy learning for non-task-oriented conversational systems," in Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2016, pp. 404-412.
校內:2020-08-31公開