研究生: |
張忱芯 Chang, Chen-Hsin |
---|---|
論文名稱: |
考量參與者角色於多人對話中回應時機與收話者之選擇 Response Timing Decision and Addressee Selection Considering Participant Roles in Multiparty Conversations |
指導教授: |
吳宗憲
Wu, Chung-Hsien |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 55 |
中文關鍵詞: | 多人對話系統 、回應時機決策 、收話人選擇 |
外文關鍵詞: | multi-part conversation system, response timing decision, addressee selection, participant role, task role, social role, conversation role |
相關次數: | 點閱:133 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對話系統在人工智慧的這個領域是非常熱門的,而對話系統的研究已經從以往的一對一對話擴展到多人對話。相比一對一的對話系統多人系統有更多的參與者,因此有更多互動上的議題需要探討。本論文提出的系統是針對不同的參與角色對於多人對話系統互動關係的影響,包含兩個部分,分別是回應時機決策與收話人選擇。
本論文收集的語料是老人醫療健康諮詢多人對話,合計有131場,總共1622句。將其作為所有實驗的語料,每一場對話中有兩個使用者和一個系統,使用者擁有不同的參與角色,在任務角色上分別扮演年長者與一個陪伴者,而系統擔任醫療知識的專業人士適時提供需要的專業知識。在社會角色上擁有不同的年齡與家庭關係,年齡分為五個不同的區段,家庭關係分為四種關係、八個角色。
本論文主要探討的是參與者角色中對話角色、任務角色與社會角色對於回應時機決策和收話者選擇的影響。回應時機決策會考量任務角色與社會角色,我們將每一個句子的使用者意圖當作重要的歷史資訊,因此使用BERT做Single Sentence Classification Tasks準確的辨認出使用者意圖,也將使用者不同的社會角色進行編碼做為使用者的初始向量,考量不同的使用者我們分別使用三個Gated Recurrent Unit(GRU)來更新不同的使用者歷史訊息總合,再共同參考這一回合使用者說的話,經過最後一個GRU決定出是否在這個回合進行回應,而在結果輸出前的最後一層GRU輸出會做為回應時機決策編碼(response timing decision embedding)。在收話者選擇的任務上改進目前使用對話角色概念的模型,將回應時機決策編碼(response timing decision embedding)做為選擇收話者時的互動行為資訊一起考量進來。最後使用強化學習選擇系統行為與決策樹選擇適合的輸出模板將多人對話系統建置完善,可以應用在老人醫療健康諮詢的任務上。
實驗結果顯示,使用社會角色的編碼與任務角色的分開更新在回應時機決策,能夠比沒有使用的情形增加了10%的準確率。在考量回應時機決策的資訊下,收話人選擇的正確率也比現有的方法還要準確3.7%。
The dialogue system is very popular in this field of artificial intelligence, and the research of the dialogue system has expanded from the previous mono-dialogue to multi-party dialogue. As there are more participants involving in the multiparty dialogue system than the one-to-one dialogue system, there are more interactive issues that need to be explored. The system proposed in this thesis is aimed at the influence of different participant roles on response timing decision and addressee selection in multi-party dialogue systems.
The corpus collected in this thesis is multi-party dialogues on medical consultation for the elderly, with a total of 131 dialogues and 1622 turns. In the collected corpus, there are two users and one system in each dialogue. The multiple users have different participant roles, playing the patient and a companion in the task roles, and the system is a medical expert. There are different combinations of ages and family relationships in social roles. The age is divided into five different structures, and the family relationship is categorized into four relationships and eight roles.
This thesis mainly discusses the influence of participant roles, task roles and social roles on the response timing decision and conversation role in addressee selection. We regard the user intent of each sentence as important historical information. Therefore, we use Google’s BERT for intent detection. The user's different social roles are also encoded as the user's initial vector. Considering task roles, we use three Gated Recurrent Units (GRUs), each for one task role, to update different user history information. Referring to history information and the current user utterance, we decide whether to respond in this turn after the last GRU, and the last hidden layer of GRU is used as the response timing decision embedding. In addressee selection, we improve the model by considering the concept of a conversation role, and the response timing decision embedding is considered as the interaction behavior information when selecting the addressee. We use the reinforcement learning to select the system act and decision tree to select the appropriate output template to build a multi-party dialogue system, which could have a good performance in the task of medical consultation for the elderly.
The experimental results showed that the use of social role embedding and task role modeling in response timing decision could increase the accuracy by 10% compared to the system without the use of social role embedding and task role modeling. Using the information of the response timing decision, addressee selection was increased by 3.7% using Five-fold Cross-Validation experiments compared to the existing methods.
[1] "蘋果Siri 並沒有你想像中那麼「蠢」?四大語音助理挑戰了800 個問題!," in INSIDE, ed, 2018.
[2] Google Assistant [Online]. Available: https://assistant.google.com/.
[3] Siri [Online]. Available: https://www.apple.com/tw/siri/.
[4] Alexa [Online]. Available: https://www.alexa.com/.
[5] Cortana [Online]. Available: https://www.microsoft.com/zh-cn/windows/cortana.
[6] M. Nakano et al., "A robot that can engage in both task-oriented and non-task-oriented dialogues," in 2006 6th IEEE-RAS International Conference on Humanoid Robots, 2006: IEEE, pp. 404-411.
[7] N. Karatas, S. Tamura, M. Fushiki, and M. Okada, "Multi-party Conversation of Driving Agents: The Effects of Overhearing Information on Lifelikeness and Distraction," in Proceedings of the 6th International Conference on Human-Agent Interaction, 2018: ACM, pp. 84-91.
[8] R. Nishimura, Y. Todo, K. Yamamoto, and S. Nakagawa, "Chat-like spoken dialog system for a multi-party dialog incorporating two agents and a user," Proceedings of iHAI, vol. 13, 2013.
[9] D. Traum, "Issues in multiparty dialogues," in Workshop on Agent Communication Languages, 2003: Springer, pp. 201-211.
[10] K. Laskowski, M. Ostendorf, and T. Schultz, "Modeling vocal interaction for text-independent participant characterization in multi-party conversation," in Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, 2008: Association for Computational Linguistics, pp. 148-155.
[11] T.-C. Chi, P.-C. Chen, S.-Y. Su, and Y.-N. Chen, "Speaker role contextual modeling for language understanding and dialogue policy learning," arXiv preprint arXiv:1710.00164, 2017.
[12] H. Ouchi and Y. Tsuboi, "Addressee and response selection for multi-party conversation," in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2133-2143.
[13] S. Duncan, "Some signals and rules for taking speaking turns in conversations," Journal of personality and social psychology, vol. 23, no. 2, p. 283, 1972.
[14] R. Zhang, H. Lee, L. Polymenakos, and D. Radev, "Addressee and response selection in multi-party conversations with speaker interaction rnns," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[15] C.-K. Yin, "Addressee Selection and Deep RL-based Dialog Act Selection with Speaker Embedding and Context Tracking for Multi-Party Conversarional Systems," 2018.
[16] R. S. Wallace, "The anatomy of ALICE," in Parsing the Turing Test: Springer, 2009, pp. 181-210.
[17] R. Nishimura and S. Nakagawa, "Response timing generation and response type selection for a spontaneous spoken dialog system," in 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009: IEEE, pp. 462-467.
[18] D. Bohus and E. Horvitz, "Decisions about turns in multiparty conversation: from perception to action," in Proceedings of the 13th international conference on multimodal interfaces, 2011: ACM, pp. 153-160.
[19] P. Auer, "Gaze, addressee selection and turn-taking in three-party interaction," Eye-tracking in Interaction: Studies on the role of eye gaze in dialogue, vol. 10, p. 197, 2018.
[20] N. Jovanovic and R. op den Akker, "Towards automatic addressee identification in multi-party dialogues," in Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004, 2004, pp. 89-92.
[21] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[22] R. Lowe, N. Pow, I. Serban, and J. Pineau, "The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems," arXiv preprint arXiv:1506.08909, 2015.
[23] V.-K. Tran, V.-T. Nguyen, and L.-M. Nguyen, "Enhanced semantic refinement gate for RNN-based neural language generator," in 2017 9th International Conference on Knowledge and Systems Engineering (KSE), 2017: IEEE, pp. 172-178.
[24] J. Sun, ""‘Jieba’Chinese word segmentation tool."," (2012).
[25] Gensim [Online]. Available: https://radimrehurek.com/gensim/.
[26] R. Dey and F. M. Salemt, "Gate-variants of gated recurrent unit (GRU) neural networks," in 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), 2017: IEEE, pp. 1597-1600.
[27] R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," 2011.
[28] E. Levin, R. Pieraccini, and W. Eckert, "Using Markov decision process for learning dialogue strategies," in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP'98 (Cat. No. 98CH36181), 1998, vol. 1: IEEE, pp. 201-204.
[29] T. Hester et al., "Deep q-learning from demonstrations," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[30] C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992.
[31] H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Thirtieth AAAI conference on artificial intelligence, 2016.
[32] T. S. Wiens, B. C. Dale, M. S. Boyce, and G. P. Kershaw, "Three way k-fold cross-validation of resource selection functions," Ecological Modelling, vol. 212, no. 3-4, pp. 244-255, 2008.
[33] R. Zhang, "Addressee and Response Selection in Multi-party Conversations with Speaker Interaction RNNs," 2017.