簡易檢索 / 詳目顯示

研究生: 陳寬祐
Chen, Kuan-Yu
論文名稱: 應用強化學習及多解碼器進行階段 轉換之情感支援對話系統
Applying Reinforcement Learning and Multi-decoders for Stage Transition in an Emotional Support Dialogue System
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 93
中文關鍵詞: 情感支援技巧同理心回話回話策略長時距對話系統
外文關鍵詞: Emotional Support Skills, Empathetic Responses, Response Strategies, Long-term dialogue system
相關次數: 點閱:41下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,人機互動對話系統因其多功能和有效性而廣泛被應用。然而,為了使系統的回應更適用於情緒低落的人,我們必須納入 "情感支援" 技能。透過使用此對話系統,即使在沒有其他人陪伴的時候,情緒低落的人也能找到與之交談的對象。
    本論文介紹了一種新穎的情感支援對話系統,能夠生成探索、安慰和建議句子,通過長時間對話,確定用戶所面臨的問題,表達適當的情緒及同理心,並提供建議,幫助用戶克服所面臨的問題。該系統專門為解決現代社會心理健康挑戰而設計,是一個不可或缺的工具。
    本研究旨在將情感支援技能融入對話系統中。我們利用DialoGPT開發了三個特定的解碼器:探索、安慰和建議。在探索階段,我們使用問題偵測模型來識別用戶所面臨的問題,並將此信息條件化到我們的探索模型中,進一步生成與用戶經歷相關的問句。在安慰階段,我們使用情緒偵測模型來識別用戶的情緒,並相應地條件化系統回應,確保生成適當的情感回應。在建議階段,我們從外部知識庫中提取數據來訓練用戶意圖預測模型,為建議解碼器提供條件,以產生更有幫助的回應。
    此外,我們採用了一種基於迭代的階段概率估計方法來進行決策模型的預訓練,並結合了強化學習框架,透過我們系統與使用者模型之間的自我對話來進一步進行訓練。這使得我們能夠模擬與真實人類對話的情境,並根據適當的時機選擇我們情感支援對話系統的階段,以生成回應。我們的目標是降低使用者的情緒困擾,並提供更好的整體用戶體驗。
    最後,我們的實驗結果顯示,與基準系統相比,我們的系統在BLEU、Rouge-L和Distinct指標方面都取得了顯著的改善。平均而言,我們的系統在BLEU分數上提升了0.87,Rouge-L提升了1.85,Distinct-1提升了0.69,而Distinct-2提升了2.26。這些結果凸顯了我們方法的有效性,利用了條件生成技術,並搭配了迭代的階段概率估計方法和強化學習框架,讓我們的系統能夠準確判定生成回應的最佳階段,從而實現了性能改善。

    The application of human-machine interactive dialogue systems has grown in recent years due to their versatility and effectiveness in completing various tasks. However, to improve the responses generated by these systems and make them more suitable for individuals experiencing depression, it is crucial to incorporate "Emotional Support" skills. By using Emotional Support dialogue systems, individuals experiencing depression or anxiety can have someone to talk to even when no one is available.
    This thesis introduces a novel Emotional Support dialogue system that generates exploration, comforting, and suggestion sentences, aimed to identify users’ struggles, express appropriate emotions or empathy, and provide advice to help users overcome the problems they are facing, through long-term conversations. The system is an essential tool in addressing the mental health challenges of modern society.
    To incorporate Emotional Support skills into our system, we developed specific decoders using DialoGPT for three distinct stages: Explore, Comfort, and Suggest. During the explore stage, a problem detection model is utilized to identify the user's issues, and this information is conditioned into our exploration decoder to generate on-topic questions and further explore the user's experiences. For the comfort stage, an emotion detection model is utilized to condition the comfort decoder, ensuring that appropriate emotional responses are generated. In the suggest stage, a user intent generation model, trained on data extracted from an external knowledge base, provides conditions for the suggestion decoder, enabling it to generate more helpful and diverse responses.
    Furthermore, we utilized a recurrent-based method, combined with a reinforcement learning framework to train our decision model. This allowed us to simulate conversing with a real human being and determine the most appropriate moment to use each stage to generate responses. Our goal is to reduce the user's emotional distress and provide a better overall user experience.
    Our experimental results also demonstrated significant improvements in terms of BLEU, Rouge-L, and Distinct metrics when compared to the baseline. On average, our system demonstrated an increase of 0.87 in BLEU score, 1.85 in Rouge-L, 0.69 in Distinct-1, and 2.26 in Distinct-2. These findings highlighted the effectiveness of our approach, which leverages conditional generation, incorporates recurrent stage probabilities estimation, along with a reinforcement learning framework. This combined framework enables our system to accurately determine the optimal stage for generating responses, ultimately resulting in improved performance.

    摘要 I Abstract III 誌謝 V Contents VI List of Tables IX List of Figures XI Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 2 1.3 Literature Review 5 1.3.1 Natural Language Generation 5 1.3.2 Task-Oriented Dialogue Systems 7 1.3.3 Chit-Chat Dialogue Systems 7 1.3.4 Text Categorization 10 1.4 Problems 11 1.5 Proposed Methods 12 Chapter 2 System Framework 13 2.1 Model Basic Architecture 14 2.1.1 Transformer 14 2.1.2 BERT 18 2.1.3 GPT2 20 2.1.4 GRU 22 2.2 Conditional Models Training 24 2.2.1 Problem and Emotion Detection Model 24 2.2.2 Intent Generation Model 25 2.3 Conditional Generation Models Training 26 2.3.1 Explore Decoder 27 2.3.2 Comfort Decoder 28 2.3.3 Suggest Decoder 29 2.4 Next Stage Decision Model Training 30 2.4.1 Pretrain with Recurrent Stage Probability Estimation 30 2.4.2 RL with REINFORCE Algorithm and Self-Play 33 2.4.3 Rewards 36 Chapter 3 Dataset 42 3.1 Emotional Support Conversations (ESConv) 42 3.2 Go-Emotions 46 3.3 Stanford Sentiment Treebank (SST) 48 3.4 Augmented Problem Dataset 49 3.5 Extracted Intent Dataset 51 Chapter 4 Experiments 54 4.1 Evaluation Metrics 54 4.1.1 BLEU score 54 4.1.2 Rouge-L 56 4.1.3 Distinct-n 57 4.1.4 Human Subjective Evaluation 58 4.2 Experimental Results and Discussion 60 4.2.1 Problem Detection Model 60 4.2.2 Emotion Detection Model 63 4.2.3 Intent Generation Model 65 4.2.4 Conditional Generation Models 67 4.2.5 Baseline Systems 70 4.2.6 Our Integrated System using Next Stage Decision Model 75 4.2.7 Dialogue Examples 82 Chapter 5 Conclusion and Future Work 88 References 89

    [1] H. Rashkin, E. M. Smith, M. Li, and Y.-L. Boureau, "Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset," p. arXiv:1811.00207doi: 10.48550/arXiv.1811.00207.
    [2] C. E. Hill, Helping skills: Facilitating, exploration, insight, and action, 3rd ed (Helping skills: Facilitating, exploration, insight, and action, 3rd ed.). Washington, DC, US: American Psychological Association, 2009, pp. xix, 430-xix, 430.
    [3] L. Zhou, J. Gao, D. Li, and H.-Y. Shum, "The Design and Implementation of XiaoIce, an Empathetic Social Chatbot," p. arXiv:1812.08989doi: 10.48550/arXiv.1812.08989.
    [4] Z. Liang et al., "Learning Neural Templates for Recommender Dialogue System," p. arXiv:2109.12302doi: 10.48550/arXiv.2109.12302.
    [5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
    [6] S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
    [7] D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," p. arXiv:1409.0473doi: 10.48550/arXiv.1409.0473.
    [8] A. Vaswani et al., "Attention Is All You Need," p. arXiv:1706.03762doi: 10.48550/arXiv.1706.03762.
    [9] T.-H. Wen et al., "A Network-based End-to-End Trainable Task-oriented Dialogue System," p. arXiv:1604.04562doi: 10.48550/arXiv.1604.04562.
    [10] T. Zhao and M. Eskenazi, "Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning," p. arXiv:1606.02560doi: 10.48550/arXiv.1606.02560.
    [11] S. Roller et al., "Recipes for building an open-domain chatbot," p. arXiv:2004.13637doi: 10.48550/arXiv.2004.13637.
    [12] E. M. Smith, M. Williamson, K. Shuster, J. Weston, and Y.-L. Boureau, "Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills," p. arXiv:2004.08449doi: 10.48550/arXiv.2004.08449.
    [13] S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston, "Personalizing Dialogue Agents: I have a dog, do you have pets too?," p. arXiv:1801.07243doi: 10.48550/arXiv.1801.07243.
    [14] H. Song, Y. Wang, W.-N. Zhang, X. Liu, and T. Liu, "Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation," p. arXiv:2004.07672doi: 10.48550/arXiv.2004.07672.
    [15] Q. Liu et al., "You Impress Me: Dialogue Generation via Mutual Persona Perception," p. arXiv:2004.05388doi: 10.48550/arXiv.2004.05388.
    [16] T. Wolf, V. Sanh, J. Chaumond, and C. Delangue, "TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents," p. arXiv:1901.08149doi: 10.48550/arXiv.1901.08149.
    [17] E. Dinan, S. Roller, K. Shuster, A. Fan, M. Auli, and J. Weston, "Wizard of Wikipedia: Knowledge-Powered Conversational agents," p. arXiv:1811.01241doi: 10.48550/arXiv.1811.01241.
    [18] R. Lian, M. Xie, F. Wang, J. Peng, and H. Wu, "Learning to Select Knowledge for Response Generation in Dialog Systems," p. arXiv:1902.04911doi: 10.48550/arXiv.1902.04911.
    [19] P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," p. arXiv:2005.11401doi: 10.48550/arXiv.2005.11401.
    [20] Z. Lin, A. Madotto, J. Shin, P. Xu, and P. Fung, "MoEL: Mixture of Empathetic Listeners," p. arXiv:1908.07687doi: 10.48550/arXiv.1908.07687.
    [21] S. Sabour, C. Zheng, and M. Huang, "CEM: Commonsense-aware Empathetic Response Generation," p. arXiv:2109.05739doi: 10.48550/arXiv.2109.05739.
    [22] C. Zheng, Y. Liu, W. Chen, Y. Leng, and M. Huang, "CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response Generation," p. arXiv:2105.08316doi: 10.48550/arXiv.2105.08316.
    [23] L. Wang et al., "Empathetic Dialogue Generation via Sensitive Emotion Recognition and Sensible Knowledge Selection," p. arXiv:2210.11715doi: 10.48550/arXiv.2210.11715.
    [24] H.-C. Yu, T.-H. Huang, and H.-H. Chen, "Domain Dependent Word Polarity Analysis for Sentiment Classification," Chung-Li, Taiwan, September 2012: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), in Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012), pp. 30-31.
    [25] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.
    [26] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," p. arXiv:1810.04805doi: 10.48550/arXiv.1810.04805.
    [28] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI blog, vol. 1, no. 8, p. 9, 2019.
    [29] K. Cho et al., "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," p. arXiv:1406.1078doi: 10.48550/arXiv.1406.1078.
    [30] C. Raffel et al., "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer," p. arXiv:1910.10683doi: 10.48550/arXiv.1910.10683.
    [31] N. Majumder et al., "MIME: MIMicking Emotions for Empathetic Response Generation," p. arXiv:2010.01454doi: 10.48550/arXiv.2010.01454.
    [32] Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach," p. arXiv:1907.11692doi: 10.48550/arXiv.1907.11692.
    [33] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, "A Learning Algorithm for Boltzmann Machines*," Cognitive Science, vol. 9, no. 1, pp. 147-169, 1985.
    [34] L. E. Baum and T. Petrie, "Statistical Inference for Probabilistic Functions of Finite State Markov Chains," The Annals of Mathematical Statistics, vol. 37, no. 6, pp. 1554-1563, 10, 1966.
    [35] K. Tran, Y. Bisk, A. Vaswani, D. Marcu, and K. Knight, "Unsupervised Neural Hidden Markov Models," p. arXiv:1609.09007doi: 10.48550/arXiv.1609.09007.
    [36] J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky, "Deep Reinforcement Learning for Dialogue Generation," p. arXiv:1606.01541doi: 10.48550/arXiv.1606.01541.
    [37] R. J. Williams, "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning," Mach. Learn., vol. 8, no. 3–4, pp. 229–256, 1992, doi: 10.1007/bf00992696.
    [38] Y. Wu and Y. Tian, "Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning," in International Conference on Learning Representations, 2016.
    [39] S. Liu et al., "Towards Emotional Support Dialog Systems," p. arXiv:2106.01144doi: 10.48550/arXiv.2106.01144.
    [40] D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, "GoEmotions: A Dataset of Fine-Grained Emotions," p. arXiv:2005.00547doi: 10.48550/arXiv.2005.00547.
    [41] R. Socher et al., "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank," Seattle, Washington, USA, October 2013: Association for Computational Linguistics, in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631-1642.
    [42] B. Pang and L. Lee, "A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts," p. cs/0409058doi: 10.48550/arXiv.cs/0409058.
    [43] OpenAI, "GPT-4 Technical Report," p. arXiv:2303.08774doi: 10.48550/arXiv.2303.08774.
    [44] J. D. Hwang et al., "COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs," p. arXiv:2010.05953doi: 10.48550/arXiv.2010.05953.
    [45] M. Lewis et al., "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension," p. arXiv:1910.13461doi: 10.48550/arXiv.1910.13461.
    [46] N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," p. arXiv:1908.10084doi: 10.48550/arXiv.1908.10084.
    [47] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "Bleu: a Method for Automatic Evaluation of Machine Translation," Philadelphia, Pennsylvania, USA, July 2002: Association for Computational Linguistics, in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311-318, doi: 10.3115/1073083.1073135.
    [48] C.-Y. Lin, "ROUGE: A Package for Automatic Evaluation of Summaries," Barcelona, Spain, July 2004: Association for Computational Linguistics, in Text Summarization Branches Out, pp. 74-81.
    [49] J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan, "A Diversity-Promoting Objective Function for Neural Conversation Models," p. arXiv:1510.03055doi: 10.48550/arXiv.1510.03055.
    [50] M. Li, J. Weston, and S. Roller, "ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons," p. arXiv:1909.03087doi: 10.48550/arXiv.1909.03087.
    [51] D. Hendrycks and K. Gimpel, "Gaussian error linear units (gelus)," arXiv preprint arXiv:1606.08415, 2016.
    [52] I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," arXiv preprint arXiv:1711.05101, 2017.
    [53] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [54] Y. Zhang et al., "Dialogpt: Large-scale generative pre-training for conversational response generation," arXiv preprint arXiv:1911.00536, 2019.
    [55] S. Bao, H. He, F. Wang, H. Wu, and H. Wang, "PLATO: Pre-trained dialogue generation model with discrete latent variable," arXiv preprint arXiv:1910.07931, 2019.
    [56] T.-Y. Chen, "Informative and Long-Term Response Generation Using Multiple Suggestions and User Persona Retrieval in a Dialogue System," 2022.

    下載圖示 校內:2025-08-31公開
    校外:2025-08-31公開
    QR CODE