簡易檢索 / 詳目顯示

研究生: 魏本憲
Wei, Ben-Hsien
論文名稱: 口語對話系統模擬臨床訪談於躁鬱症疾患評估
Spoken Dialogue System Mimicking Clinical Interview For Bipolar Disorder Assessment
指導教授: 吳宗憲
Wu, Chung-Hsien
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 人工智慧科技碩士學位學程
Graduate Program of Artificial Intelligence
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 69
中文關鍵詞: 躁鬱症資料擴增意圖偵測語音偵測HY評分對話生成
外文關鍵詞: Bipolar disorder, Data augmentation, Intent detection, Speech detection, HY scoring, Dialogue generation
相關次數: 點閱:68下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 躁鬱症是一種很常見的心理疾病之一,並且復發率非常高。患者需要定期透過躁鬱症的臨床訪談評估來追蹤他們的身心靈狀態,並且使用HAMD和YMRS(HY)量表來評分表示。然而,評估過程需要大量的醫療人力和時間成本。為了降低醫療資源,本論文建立一個模擬躁鬱症評估之臨床訪談的口語對話系統。
    由於資料是從實際的躁鬱症患者中收集,因此資料類別的數量不平衡而且稀少。因此,我們通過各種自動方法對資料進行擴增。我們還定義了HY意圖和評分方法,以更有效的去做評分。在我們提出的系統中,我們使用基於BERT架構的意圖偵測模型來確認患者是否在回答HY問題還是進行閒聊對話。同時,我們使用基於多層感知機(MLP)的語音偵測模型來偵測患者的說話速度和生氣情緒的起伏表現程度。
    在文字和語音偵測模型的幫助下,基於BERT架構的評分模型去預測HY量表的所有分數出來。在評分過程之後,使用DialoGPT架構的生成模型根據訪談歷史記錄和預測出來的HY分數去生成下一個HY問題或閒聊句子。
    在模型表現方面,在資料擴增後,使用文字和語音偵測模型的條件資訊在HY評分和對話生成任務上,取得了更好的性能結果。我們發現,HY評分任務的MSE損失值減少了25%,對話生成任務上的BLEU和ROUGE分數分別平均提高了6%和32%。

    Bipolar disorder is one of the most common mental illnesses and has a very high recurrence rate. Patients need to track their physical and psychological condition regularly through clinical interviews for bipolar disorder, using the HAMD and YMRS (HY) assessments scoring. However, the evaluation process requires a huge amount of medical manpower and time. To reduce the cost of medical resources, this thesis establishes a spoken dialogue system that simulates a clinical interview for bipolar disorder assessment.
    Since the dataset was collected from actual bipolar disorder patients, the classes in dataset were unbalanced and sparse. Therefore, the dataset is augmented through various automatic approaches. We also define the HY intents and scoring methods to perform scoring more effectively. In our proposed system, we use a BERT-based intent detection model to determine whether the patient is answering the HY questions or simply engaging in conversation. Simultaneously, we detect patient’s talking speed rate and fluctuations in angry mood expression with an MLP-based speech detection model.
    With the help of the text and speech detection models, a BERT-based scoring model predicts all the scores for the HY assessments. After the scoring process, a DialoGPT generation model generates the next HY question or chit-chat sentence according to the interview history record and predicted HY scores.
    In terms of model performance result, after data argumentation, using conditional information from the text and speech detection models achieves better performance result for HY scoring and dialogue generation tasks. We observed a 25% decrease in MSE loss for the HY scoring task and an average increase of 6% and 32% on BLEU and ROUGE scores, respectively, for dialogue generation task in performance results.

    摘要 I Abstract II Table of Contents IV List of Tables VII List of Figures IX Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 4 1.3 Problems and approaches 6 1.4 Literature review 7 1.4.1 Bipolar disorder 7 1.4.2 Medical dialogue 8 1.4.3 Bipolar disorder assessment 8 Chapter 2 Bipolar disorder analysis 10 2.1 Participate selection criteria 10 2.2 Markov state transition 10 2.3 Dataset contents and scores 12 2.4 HAMD and YMRS assessments 13 2.4.1 HAMD 13 2.4.2 YMRS 18 2.5 HAMD and YMRS intent definition 21 Chapter 3 Methods 23 3.1 Data augmentation 24 3.1.1 HAMD and YMRS assessment dialogue augmentation 24 3.1.2 Chit-chat dialogue augmentation 26 3.1.3 Bipolar Disorder dialogue composition 27 3.2 Intent detection model 28 3.3 Hybrid speech rate and angry mood level detection model 32 3.4 HAMD and YMRS scoring model 35 3.5 HAMD and YMRS intent keyword extraction 40 3.6 HAMD and YMRS intent-to-keyword mapping 42 3.7 Dialogue generation model 43 Chapter 4 Dataset 47 4.1 HAMD and YMRS assessment statistics 47 4.1.1 HAMD assessment statistics 47 4.1.2 YMRS assessment statistics 48 4.2 Augmented HAMD and YMRS assessment statistics 49 4.2.1 Augmented HAMD assessment statistics 49 4.2.2 Augmented YMRS assessment statistics 50 4.3 Dataset statistics 51 4.3.1 Dataset list 51 4.3.2 Dataset number 51 4.3.3 Augmented bipolar disorder dataset detail statistics 52 4.3.4 Other datasets detail statistics 52 Chapter 5 Experiments 53 5.1 Experiment setup 53 5.2 Evaluation metrics 54 5.3 Experiment results 56 5.3.1 Intent detection 56 5.3.2 Speech rate detection 58 5.3.3 Angry mood level detection 60 5.3.4 HAMD and YMRS scoring 61 5.3.5 HAMD and YMRS intent keyword extraction 62 5.3.6 Dialogue generation 64 5.3.7 Dialogue example 65 Chapter 6 Conclusion 66 Reference 67

    [1] R. Hirschfeld and L. A. Vornik, "Bipolar disorder—costs and comorbidity," Am J Manag Care, vol. 11, no. 3 Suppl, pp. S85-S90, 2005.
    [2] A. C. Wendt, G. Stamper, M. Howland, J. M. Cerimele, and A. Bhat, "Indirect psychiatric consultation for perinatal bipolar disorder: a scoping review," General hospital psychiatry, vol. 68, pp. 19-24, 2021.
    [3] R. J. Baldessarini, L. Tondo, G. H. Vázquez, J. Undurraga, L. Bolzani, A. Yildiz, H.-M. K. Khalsa, M. Lai, B. Lepri, and M. Lolich, "Age at onset versus family history and clinical outcomes in 1,665 international bipolar-I disorder patients," World Psychiatry, vol. 11, no. 1, pp. 40-46, 2012.
    [4] A. Grünerbl, A. Muaremi, V. Osmani, G. Bahle, S. Oehler, G. Tröster, O. Mayora, C. Haring, and P. Lukowicz, "Smartphone-based recognition of states and state changes in bipolar disorder patients," IEEE journal of biomedical and health informatics, vol. 19, no. 1, pp. 140-148, 2014.
    [5] S. Kanba, T. Kato, T. Terao, K. Yamada, and J. S. o. M. D. Committee for Treatment Guidelines of Mood Disorders, "Guideline for treatment of bipolar disorder by the J apanese S ociety of M ood D isorders, 2012," Psychiatry and clinical neurosciences, vol. 67, no. 5, pp. 285-300, 2013.
    [6] R. H. Perlis, M. J. Ostacher, J. K. Patel, L. B. Marangell, H. Zhang, S. R. Wisniewski, T. A. Ketter, D. J. Miklowitz, M. W. Otto, and L. Gyulai, "Predictors of recurrence in bipolar disorder: primary outcomes from the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD)," American Journal of Psychiatry, vol. 163, no. 2, pp. 217-224, 2006.
    [7] "財務委員會審核二零一七至一八年度開支預算管制人員的答覆." https://www.legco.gov.hk/yr16-17/chinese/fc/fc/w_q/cmab-c.pdf
    [8] Y.-C. Tseng, E. C.-l. Lin, C. H. Wu, H.-L. Huang, and P. S. Chen, "Associations among smartphone app-based measurements of mood, sleep and activity in bipolar disorder," Psychiatry Research, vol. 310, p. 114425, 2022.
    [9] 王韋凱, "多重相關性資料補值及多重感測器 Lasso 回歸於躁鬱症疾患評估," 2021.
    [10] M.-H. Su, C.-H. Wu, K.-Y. Huang, and T.-H. Yang, "Cell-coupled long short-term memory with $ l $-skip fusion mechanism for mood disorder detection through elicited audiovisual features," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 1, pp. 124-135, 2019.
    [11] 曾華偉, "Multitask Self-Supervised Learning Based on Temporal and Type Correlation in Digital Phenotyping for Bipolar Disorder State Prediction," 2022.
    [12] W. Yang, G. Zeng, B. Tan, Z. Ju, S. Chakravorty, X. He, S. Chen, X. Yang, Q. Wu, and Z. Yu, "On the generation of medical dialogues for COVID-19," arXiv preprint arXiv:2005.05442, 2020.
    [13] Y. Zhang, Z. Jiang, T. Zhang, S. Liu, J. Cao, K. Liu, S. Liu, and J. Zhao, "MIE: A medical information extractor towards medical dialogues," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 6460-6469.
    [14] H.-Y. Su, C.-H. Wu, C.-R. Liou, E. C.-L. Lin, and P. S. Chen, "Assessment of bipolar disorder using heterogeneous data of smartphone-based digital phenotyping," in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021: IEEE, pp. 4260-4264.
    [15] Openai. https://openai.com/blog/chatgpt/
    [16] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
    [17] H. Face. "LLaMA-7B." https://huggingface.co/ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt
    [18] K. O'Shea and R. Nash, "An introduction to convolutional neural networks," arXiv preprint arXiv:1511.08458, 2015.
    [19] Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T.-Y. Liu, "Fastspeech 2: Fast and high-quality end-to-end text to speech," arXiv preprint arXiv:2006.04558, 2020.
    [20] P. Bafna, D. Pramod, and A. Vaidya, "Document clustering: TF-IDF approach," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 2016: IEEE, pp. 61-66.
    [21] Ke, Pei, Zheng, Yinhe, Huang, Kaili, Jiang, Yong, Zhu, Xiaoyan, Huang, Minlie, "A large-scale chinese short-text conversation dataset." Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9. Springer International Publishing, 2020.
    [22] "Leiden Weibo Corpus." http://lwc.daanvanesch.nl/
    [23] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, and B. Dolan, "Dialogpt: Large-scale generative pre-training for conversational response generation," arXiv preprint arXiv:1911.00536, 2019.
    [24] M. Post, "A call for clarity in reporting BLEU scores," arXiv preprint arXiv:1804.08771, 2018.
    [25] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out, 2004, pp. 74-81.
    [26] J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan, "A diversity-promoting objective function for neural conversation models," arXiv preprint arXiv:1510.03055, 2015.

    下載圖示 校內:2025-08-31公開
    校外:2025-08-31公開
    QR CODE